100% found this document useful (2 votes)
582 views363 pages

Morris W. Hirsch and Stephen Smale (Eds.) - Differential Equations, Dynamical Systems, and Linear Algebra-Academic Press, Elsevier (1974) Bonito

Uploaded by

Fis Mat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
582 views363 pages

Morris W. Hirsch and Stephen Smale (Eds.) - Differential Equations, Dynamical Systems, and Linear Algebra-Academic Press, Elsevier (1974) Bonito

Uploaded by

Fis Mat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 363

Dvferential Equations,

Dynamical Systems,
and Linear Algebra

M O R R I S W . HIRSCH A N D STEPHEN S M A L E
Uniuersity of Gdijomiu, Berkeley

ACADEMIC PRESS New York Sun Francisco London


A Subsidiary of Harcourr Brace Jovanovich, Publishers
COPYRIGHT 0 1974, BY ACADEMIC PRESS, INC.
ALL RIGHTS RESERVED.
NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR
TRANSMIWED IN ANY FORM OR BY ANY MEANS, ELECTRONIC
OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDINQ, OR ANY
INFORMATlON STORAGE AND RETRIEVAL SYSTEM, WITHOUT
PERMlSSlON IN WRITING FROM THE PUBLISHER.

ACADEMIC PRESS, INC.


111 Fifth Avenue, New York. New York 10003

United Kingdom Edition published by


ACADEMIC PRESS, INC. (LONDON)LTD.
24/28 Oval Road. London N W l

Library of Congress Cataloging in Publication Data

Hirsch, Moms,Date
Differential equations, dynamical systems, and
linear algebra.

(Pure and applied mathematics; a series of monographs


and textbooks, v. )
1. Differential equations. 2. Algebras, Linear.
I. Smale, Stephen, Date joint author. 11. Title.
111. Series.
QA3.P8 [QA372] 5 10'.8s [5 15'. 35 1 73-1895 1
ISBN 0-12-349550-4
AMS (MOS) 1970 Subject Classifications: 15-01. 34-01

PRINTED IN THE UNlTED STATES OF AMERICA


Preface

This book is about dynamical aspects of ordinary differential equations and the
relations between dynamical systems arid certain fields outside pure mathematics.
A prominent role is played by the structure theory of linear operators on finite-
dimensional vector spaces; we have included a self-contained treatment of that
subject .
The background material needed to understand this book is differential calculus
of several variables. For example, Serge Lang’s Calculus of Several Variables, up to
the chapter on integration, contains more than is needed to understand much of our
text. On the other hand, after Chapter 7 we do use several results from elementary
analysis such as theorems on uniform convergence; these are stated but not proved.
This mathematics is contained in Lang’s Analysis I , for instance. Our treatment of
linear algebra is systematic and self-contained, although the most elementary parts
have the character of a review; in any case, Lang’s Calculus of Several Variables
develops this elementary linear algebra at a leisurely pace.
While this book can be used as early as the sophomore year by students with a
strong first year of calculus, it is oriented mainly toward upper division mathematics
and science students. It can also be used for a graduate course, especially if the later
chapters are emphasized.
It has been said that the subject of ordinary differential equations is a collection
of tricks and hints for finding solutions, and that it is important because it can
solve problems in physics, engineering, etc. Our view is that the subject can be
developed with considerable unity and coherence; we have attempted such a de-
velopment with this book. The importance of ordinary differential equations
vis d vis other areas of science lies in its power to motivate, unify, and give force to
those areas. Our four chapters on “applications” have been written to do exactly
this, and not merely to provide examples. Moreover, an understanding of the ways
that differential equations relates to other subjects is a primary source of insight
and inspiration for the student and working mathematician alike.
Our goal in this book is t o develop nonlinear ordinaty differential equations in
open subsets of real Cartesian space, R”,in such a way that the extension to
manifolds is simple and natural. We treat chiefly autonomous systems, emphasizing
qualitative behavior of solution curves. The related themes of stability and physical
significance pervade much of the material. Many topics have been omitted, such as
Laplace transforms, series solutions, Sturm theory, and special functions.
The level of rigor is high, and almost everything is proved. More important,
however, is that ad hoc methods have been rejected. We have tried to develop
X PREFACE

proofs that add insight to the theorems and that are important methods in their
own right.
We have avoided the introduction of manifolds in order to make the book more
widely readable; but the main ideas can easily be transferred t o dynamical systems
on manifolds.
The first six chapters, especially Chapters 3-6, give a rather intensive and com-
plete study of linear differential equations with constant coefficients. This subject
matter can almost be identified with linear algebra; hence those chapters constitute
a short course in linear algebra as well. The algebraic emphasis is on eigenvectors and
+
how to find them. We go far beyond this, however, to the “semisimple nilpotent”
decomposition of an arbitrary operator, and then on to the Jordan form and its real
analogue. Those proofs that are far removed from our use of the theorems are
relegated to appendices. While complex spaces are used freely, our primary concern
is to obtain results for real spaces. This point of view, so important for differential
equations, is not commonly found in textbooks on linear algebra or on differential
equations.
Our approach to linear algebra is a fairly intrinsic one; we avoid coordinates
where feasible, while not hesitating to use them as a tool for computations or proofs.
On the other hand, instead of developing abstract vector spaces, we work with
linear subspaces of Rnor Cn,a small concession which perhaps makes the abstraction
more digestible.
Using our algebraic theory, we give explicit methods of writing down solutions
to arbitrary constant coefficient linear differential equations. Examples are included.
In particular, the S + N decomposition is used to compute the exponential of an
arbitrary square matrix.
Chapter 2 is independent from the others and includes an elementary account
of the Keplerian planetary orbits.
The fundamental theorems on existence, uniqueness, and continuity of solutions
of ordinary differential equations are developed in Chapters 8 and 16. Chapter 8 is
restricted to the autonomous case, in line with our basic orientation toward dynami-
cal systems.
Chapters 10, 12, and 14 are devoted to systematic introductions to mathematical
models of electrical circuits, population theory, and classical mechanics, respectively.
The Brayton-Moser circuit theory is presented as a special case of the more general
theory recently developed on manifolds. The Volterra-Lotka equations of competing
species are analyzed, along with some generalizations. I n mechanics we develop
the Hamiltonian formalism for conservative systems whose configuration space is
an open subset of a vector space.
The remaining five chapters contain a substantial introduction to the phase
portrait analysis of nonlinear autonomous systems. They include a discussion of
“generic” properties of linear flows, Liapunov and structural stability, Poincarb
Bendixson theory, periodic attractors, and perturbations. We conclude with an
Afterword which points the way toward manifolds.
PREFACE xi

The following remarks should help the reader decide on which chaptcrs t o rrad
and in what order.
Chapters 1 and 2 are elementary, but they present many ideas that recur through-
out the book.
Chapters 3-7 form a sequence that develops linear theory rather thoroughly.
Chapters 3, 4, and 5 make a good introduction to linear operators and linear differ-
ential equations. The canonical form theory of Chaptcr 6 is thc basis of the stability
results proved in Chapters 7, 9, and 13; howcvrr, this hravy algebra might be post-
poned at a first exposure to this material and the results takcn on faith.
The existence, uniqueness, and continuity of solutions, proved in Chapter 8, are
used (often implicitly) throughout the rest of the book. Depending on the reader’s
taste, proofs could be omitted.
A reader intcrrstcd in the nonlinear material, who has some background i n linear
theory, might start with the stability theory of Chapter 9. Chapters 12 (rcology),
13 (periodic attractors), and 16 (perturbations) depend strongly on Chaptor 9, whilr
the section on dual vector spaccs and gradicnts will mnkc Chapters 10 (clcctrical
circuits) and 14 (mechanics) easier to understand.
Chaptcr 12 also depends on Chapter 11 (PoincarbBcndixson) ; and the material
in Section 2 of Chapter I 1 on local sections is used again in Chapters 13 and 16.
Chapter 15 (nonautonomous equations) is a continuation of Chapter 8 and is
used in Chapters 11, 13, and 16; however it can be omitted at a first reading.
The logical dependence of the later chapters is summarized in the following chart :

The book owes much t o many people. We only mention four of them here. Ikuko
Workman and Ruth Suzuki did an excellent job of typing the manuscript. Dick
Palais made a number of useful comments. Special thanks are due to Jacob Palis,
who read the manuscript thoroughly, found many minor errors, and suggested
several substantial improvements. Professor Hirsch is grateful to the Miller Institute
for its support during part of the writing of the book.
Chapter 1
First Examples

The purpose of this short chapter is to develop some simple examples of differen-
tial equations. This development motivates the linear algebra treated subsequently
and moreover gives in an elementary context some of the basic ideas of ordinary
differential equations. Later these ideas will be put into a more systematic exposi-
tion. In particular, the examples themselves are special cases of the class of differen-
tial equations considered in Chapter 3. We regard this chapter as important since
some of the most basic ideas of differential equations are seen in simple form.

11. The Simplest Examples

The differential equation

is the simplest differential equation. It is also one of the most important. First,
what does it mean? Here z = z ( t ) is an unknown real-valued function of a real
variable t and d x / d t is its derivative (we will also use z’ or z’ ( t ) for this derivative) .
The equation tells us that for every value of t the equality
x’(t) = w(t)
is true. Here a denotes a constant.
The solutions to (1) are obtained from calculus: if K is any constant (real num-
ber), the function f ( t ) = K s t is a solution since
f(t) = aK@ = aj(t).
2 1. FIRST EXAMPLES

I"

a >O 0-0 a a >O


FIG. A

Moreover, there are no other solutions. To see this, let u ( t ) be any solution and
compute the derivative of u(t)e-':
d
- (u(t)e-l)
dt
= u'(t)e-'" + u(t) (-(~e-~')

= au(t)e+ - au(t)e-.' = 0.
Therefore u(t)e*' is a constant K , so u ( t ) = Ks'. This proves our assertion.
The constant K appearing in the solution is completely determined if the value
uoof the solution at a single point to is specified. Suppose that a function ~ ( t satisfy-
)
ing (1) is required such that z(t0) = uo, then K must satisfy KS'O = uo. Thus
equation (1) has a unique solution satisfying a specified initial condition z( t o ) = uo.
For simplicity, we often take to = 0; then K = uo. There is no loss of generality
in taking to = 0, for if u ( t ) is a solution with u(0) = uo,then the function u ( t ) =
u ( t - t o ) is a solution with u(t0) = uo.
It is common to restate (1) in the form of an initial value problem:
(2) Z' = ax, ) K.
~ ( 0=
A solution z( t ) to (2) must not only satisfy the first condition (1) , but must also
take on the prescribed initial value K at t = 0. We have proved that the initial
value problem (2) has a unique solution.
The constant a in the equation Z' = az can be considered 88 a parameter. If a
changes, the equation changes and so do the solutions. Can we describe qualita-
tively the way the solutions change?
The sign of a is crucial here:
if a > 0, 1imt+- KP' equals co when K > 0, and equals - 00 when K < 0;
if a = 0, K s ~= constant;
if a < 0, 1imt+*Ks' = 0.
41. THE SIMPLEST EXAMPLES 3

The qualitative behavior of solutions is vividly illustrated by sketching the graphs


of solutions (Fig. A). These graphs follow a typical practice in this book. The
figures are meant to illustrate qualitative features and may be imprecise in quanti-
tative detail.
The equation x’ = ax is stable in a certain sense if a # 0. More precisely, if a
is replaced by another constant b sufficiently close to a, the qualitative behavior
of the solutions does not change. If, for example, I b - a I < 1 a 1, then b has the
same sign as a. But if a = 0, the slightest change in a leads to a radical change in
the behavior of solutions. We may also say that a = 0 is a bifurcation point in the
one-parameter family of equations x’ = ax, a in R.
Consider next a system of two differential equations in two unknown functions:
t
(3) XI = atzt,
x: = a2X2.

This is a very simple system; however, many more-complicated systems of two


equations can be reduced to this form as we shall see a little later.
Since there is no relation specified between the two unknown functions x l ( t ) ,
n ( t ) , they are “uncoupled”; we can immediately write down all solutions (as for
(1)):
x t ( t ) = K1exp(alt), K1 = constant,

a(t) = K2 exp(ad), K2 = constant.


Here K I and K z are determined if initial conditions XI(&,) = ut, s(&,) = uz are
specified. (We sometimes write exp a for b.)
Let us consider equation (2) from a more geometric point of view. We consider
two functions xl(t), a(t)as specifying an unknown curve x ( t ) = ( x t ( t ) , xz(t)) in
the (xl, ZZ) plane R2.That is to say, x is a map from the real numbers R into R2,x:
R + R2. The right-hand side of (3) expresses the tangent vector x’( t ) = (5;(t) , x: ( 1 ) )
to the curve. Using vector notation,
(3’) x’ = A X ,
where Ax denotes the vector (alxl, a2x2), which one should think of as being based
a t x.

XI

Ax=(2,-1/2)

FIG. B
4 1. FIRST EXAMPLES

Initial conditions are of the form x(k) = u where u = (ul, w)is a given point
of R2.Geometrically, this means that when t = to the curve is required to pass
through the given point u.
The map (that is, function) A : R2+ R: (or x + Az) can be considered a vector
@field on R2.This means that to each point x in the plane we assign the vector Ax.
For purposes of visualization, we picture Ax as a vector “based a t 2’); that is, we
+
assign to z the directed line segment from x to x Az. For example, if a1 = 2,
a = -3, and x = (1, 1), then a t (1, 1) we picture an arrow pointing from (1, 1)
to (1, 1)+ (2, -3) = (3,3) (Fig. B). Thus if Ax = (2x1, - + G ) , we attach to
+
each point x in the plane an arrow with tail at x and head a t x Ax and obtain
the picture in Fig. C.
Solving the differential equation (3) or (3‘) with initial conditions (u1, A) at
t = 0 means finding in the plane a curve z( 1 ) that satisfies (3’) and passes through
the point u = (ul, uz) when t = 0. A few solution curves are sketched in Fig. D.
The trivial solution (xl(t),a ( t ) )= (0, 0) is also considered a “curve.”
The family of all solution curves as subsets of R2 is called the “phase portrait”
of equation (3) (or (3’)).
The one-dimensional equation x’ = a2 can also be interpreted geometrically: the
phase portrait is as in Fig. E, which should be compared with Fig. A. It is clearer
to picture the graphs of (1) and the solution curves for (3) since two-dimensional
pictures are better than either one- or three-dimensional pictures. The graphs of
$1. THE SIMPLEST EXAMPLES 5

FIG. D. Some solution curves to z’ = A z , A =


[: -;I
solutions to (3) require a three-dimensional picture which the reader is invited to
sketch!
Let us consider equation ( 3 ) as a dynamical system. This means that the inde-
pendent variable t is interpreted as time and the solution curve z ( t ) could be thought
of, for example, as the path of a particle moving in the plane R2, We can imagine
a particle placed a t any point u = (ul,UZ) in R2 a t time t = 0. As time proceeds
the particle moves along the solution curve z ( t ) that satisfies the initial condition
z(0) = u.A t any later time t > 0 the particle will be in another position z ( t ) .And
at an earlier time t < 0, the particle was a t a position z ( t ) . To indicate the de-
pendence of the position on t and u we denote it by I $ ~ ( U ) .Thus
I$t(u)= (UIexp (a& , uz exp ( 4) .
We can imagine particles placed a t each point of the plane and all moving simul-
taneously (for example, dust particles under a steady wind). The solution curves
are spoken of as trajectories or orbits in this context. For each fixed 1 in R, we have
a transformation assigning to each point u in the plane another point $ P ~ ( U )This
.
transformation denoted by &: R2 3 R2 is clearly a linear transformation, that is,
6 1. FIRST EXAMPLES

a ’0 a c 0
FIG. E

&(u + u ) = &(u) + & ( u ) and t$,(Xu) = M f ( u ) , for all vectors u, u, and all
real numbers A.
As time proceeds, every point of the plane moves simultaneously along the tra-
jectory passing through it. In this way the collection of maps c#q : R2 -+ Rz,t E R,is
a one-parameter family of transformations. This family is called the $ow or dynami-
cal system or Rzdetermined by the vector field x + Ax, which in turn is equivalent
to the system (3).
The dynamical system on the real line R corresponding to equation (1) is par-
ticularly easy to describe: if a < 0, all points move toward 0 as time goes to 00 ; if
a > 0, all points except 0 move away from 0 toward f 00 ; if a = 0, all points stand
still.
We have started from a differential equation and have obtained the dynamical
system &. This process is established through the fundamental theorem of ordinary
differential equations as we shall see in Chapter 8.
Later we shall also reverse this process: starting from a dynamical system &, a
differential equation will be obtained (simply by differentiating &(u) with respect
to t ) .
It is seldom that differential equations are given in the simple uncoupled form
(3). Consider, for example, the system:

(4) X: = 5x1 + 3x2,


d = -6621 - 4x2
or in vector notation

(4’) X’ = (5x1 + 3x2, - 6 ~ 1 - 4x2) 3 Bx.


Our approach is to find a linear change of coordinutes that will transform equation
(4) into uncoupled or diagonal form. It turns out that new coordinates (yl, yz) do
the job where
y1 = 221 + 22,

yz = 51 + 22.

(In Chapter 3 we explain how the new coordinates were found.)


Solving for z in terms of y, we have

21 = y1 - yz,
x2 = -yl + 2y2.
$1. THE SIMPLEST EXAMPLES 7

I
I
I YI =o
I
FIG. F

To find y;, y; differentiate the equations defining y1, y2 to obtain


y: = 22; + 2;,

y; = 2: + 2;.
By substitution
y: = +
2(521 322) + ( -621 - 422) = 421 + 2322,
y; = (521 + 322) -k ( -621 - 422) = -21 - 52.
Another substitution yields
Y; = ~ ( Y -
I Yz) 2(-Y1 ~Yz)J
y; = -(y1 - yz) - (-yl f 2y2)J
or
(5) y: = 2Y1,
y; = -y2.
The last equations are in diagonal form and we have already solved this class of
systems. The solution (yl(t), yz(t)) such that (y1(0), y ~ ( 0 ) )= (UI, UZ) is
y1(t) = 8%
y2(t) = e-'q.
8 1. FIRST EXAMPLES

The phase portrait of this system (5) is given evidently in Fig. D. We can find
the phase portrait of the original system (4) by simply plotting the new coordinata
BX~E yl = 0,yS = 0 in the (x1,g) plane and sketching the trajectories y(l) in these
coordinates. Thus y1 = 0 is the line LI: xr = -221 and yt = 0 is the line 4: g =
-21.
Thue we have the phase portrait of (4) as in Fig. F, which should be compared
with Fig. D.
Formulas for the solution to (4) can be obtained by substitution aa follows.
Let (ul, ur) be the initial values (x1(0), a(0)) of a solution (zl(t), a ( t ) ) to (4).
Corresponding to (ul, ur) is the initial value (Y, q) of a solution (yl(t), y2(t)) to
(5) where
Ul = 2Ul+ ur,

Thue

and
zdt) = W2u1 + ur) - e-'(ul + 21q),

zr(t) = -eY2u1 + a)+ 2e-Yu1 + 4.

Ifwe compare these formulasto Fig. F, we see that the diagram instantly gives us
the qualitative picture of the solutions, while the formulas convey little geometric
information. In fact, for many purposes, it is better to forget the original equation
(4) and the corresponding solutions and work entirely with the "diagonalized"
equations (5), their solution and phase portrait.

PROBLEMS

1. Each of the "matrices"

given below defines a vector field on R*,assigning to x = (zl, xt) E P the


+
vector Ax = (allz1 a*, ~ 1 x 1 a) + based a t x. For each matrix, draw
enough of the vectors until you get a feeling for what the vector field looke
$2. LINEAR SYSTEMS WITH CONSTANT COEFFICIENTS 9

like. Then sketch the phase portrait of the corresponding differential equation
x’ = Ax, guessing where necessary.

(a) [; 01 (b) [; 01 (c) [-; 01


(d) [; -3 (el [; -3 [; ;] (f)

(g) [-: I:] (h) [o4 $1 (i) [-: 3


2. Consider the one-parameter family of differential equations
x; = 2x1,
x; = ax2; -a <a< w.

(a) Find all solutions (21( 1 ) , xz( t )) .


(b) Sketch the phase portrait for a equal to - 1, 0, 1, 2, 3. Make some guesses
about the stability of the phase portraits.

$2. Linear Systems with Constant Coefficients

This section is devoted to generalizing and abstracting the previous examples.


The general problem is stated, but solutions are postponed to Chapter 3.
Consider the following set or “system” of n differential equations:

dxn
-
dt
Here the aij (i = 1, . . . , n; j = 1, . . . , n ) are n2 constants (real numbers), while
each z,denotes an unknown real-valued function of a real variable t. Thus (4) of
Section 1 is an example of the system (1) with n = 2, all = 5, a12 = 3, &I = -6,
as?= -4.
10 1. FIRST EXAMPLES

At this point we are not trying to solve ( 1 ) ; rather, we want to place it in a geo-
metrical and algebraic setting in order to understand better what a solution means.
At the most primitive level, a solution of (1) is a set of n differentiable real-
valued functions xi(t) that make ( 1 ) true.
In order to reach a more conceptual understanding of (1) we introduce real n-di-
mensional Cartesian space R".This is simply the set of all n-tuples of real numbers.
An element of Rnis a "point" x = (XI, . . . , xn); the number xi is the ith coordinate
of the point x. Points x, y in Rnare added coordinatewise:
x +y = (Xl, . . . , 2") + (Yl, . . . , Yn) = (51 + Y1, . . . , xn + Yn).

Also, if A is a real number we defme the product of X and x to be


AX = (AxI, . . . , AX,).
The distance between points x, y in Rnis defined to be
I x - y I = [ ( X l - y1)2 + * . + (xn - y")211'2.
The length of x is
15I = (212 + - + x,*)"2.
*

A vector based at x E R" is an ordered pair of points x, y in R",denoted by 5.


We think of this as an arrow or line segment directed from x to y. We say x$ is
based at x.
A vector based at the origin
O = (0,. . . , 0) E R "
is identified with the point x E R".
To a vector x 7 based at x is associated the vector y - x based at the origin 0.
We call the vectors xTand y - x translates of each other.
From now on a vector based at 0 is called simply a vector. Thus an element of
R" can be considered either as an n-tuple of real numbers or as an arrow issuing
from the origin.
It is only for purposes of visualization that we consider vectors based at points
other than 0. For computations, all vectors are based at 0 since such vectors can
be added and multiplied by real numbers.
We return to the system of differential equations ( 1). A candidate for a solution
is a curve in Rn:
(*I x(t) = ( a ( t ) , . . . , xn(t)).
By this we mean a map
z:R-bR".
Such a map is described in t8ermsof coordinates by (*). If each function si(t) is
$2. LINEAR SYSTEMS WITH CONSTANT COEFFICIENTB 11

differentiable, then the map z is called differentiable; its derivative is defined t o be

Thus the derivative, aa a function of t, is again a map from R to R".


The derivative can also be expressed in the form

d(t) =
h-4
1
lim- ( z ( t + h) - x ( t ) ) .
It haa a natural geometric interpretation aa the vector u ( t ) based at x ( t ) ,which is
a translate of x'(t). This vector is called the langat w b r to the curve at t (or a t
dt)).
If we imagine t as denoting time, then the length I x'(t) I of the tangent vector is
interpreted physically as the speed of a particle describing the curve x ( t ) .
To write (1) in an abbreviated form we call the doubly indexed set of numbers
aij an n X n matrix A, denoted thus:

Next, for each x E Rnwe define a vector Ax E Rn whose ath coordinate is


aiG1 + + ainzn;
* *

note that this is the ith row in the right-hand side of (1). In this way the matrix A
is interpreted aa a map
A:Rn+Rn
which to x assigns Ax.
With this notation (1) is rewritten
(2) X' = AX.
Thus the system (1) can be considered aa a single "vector differential equation"
(2). (The word equation is classically reserved for the case of just one variable; we
shall call (2) both a system and an equation.)
We think of the map A: R" + R" aa a vector &ld on R": to each point x E R"
it assigns the vector based at x which is a translate of Ax. Then a solution of (2)
is a curve z:R + Rnwhose tangent vector at any given 1 is the vector Az(t) (trans-
lated to x ( t ) ) . See Fig. D of Section 1.
In Chapters 3 and 4 we shall give methods of explicitly solving (2) , or equivs-
lently (1). In subsequent chapters it will be shown that in fact (2) haa a Unique
solution x ( t ) satisfying any given initial condition x(0) = uo E R". This is the
fundamental theorem of linear differential equations with constsnt c d c i e n t a ; in
Section 1 this waa proved for the special case 7~ = 1.
12 1. FIRBT EXAMPLES

PROBLEMS

1. For each of the following matrices A sketch the vector field x + A x in R*.
(Missing matrix entries are 0.)

I L L

0 -1 -1
1 0 1 1
-3 1 1

2. For A as in (a), (b) , (c) of Problem 1, solve the initial value problem
X’ = A X , ) (ki,kf, ka).
~ ( 0=
3. Let A be aa in (e) , Problem 1. Find constants a, b, c such that the curve t +
( a COB t, b sin t, ce-f‘*) is a solution to x’ = A s with z(0) = (1, 0, 3).
4. Find two different matrices A, B such that the curve
z ( t ) = (el, 2eef,48’)
satisfiea both the differential equations
x’ = A x and x‘ = Bx.
5. Let A = [air] be an n X n diagonal matrix, that is, aij = 0 if i # j . Show that
the differential equation
x’ = Ax
has a unique solution for every initial condition.
6. Let A be an X n diagonal matrix. Find conditions on A guaranteeing that
limx(t) = 0
I-.-

for all solutions to x’ = A x .


7. Let A = [air] be an 71 X matrix. Denote by -A the m t r i x [-ail.
(a) What is the relation between the vector fields x A x an6 x --* ( - A ) x ?
(b) What is the geometric relation between solution curves of x’ = A s and
of X‘ = -AX?
8. (a) Let u ( t ) , v ( t ) be solutions to x‘ = Ax. Show that the curve w ( t ) =
+
aru ( t ) Bu( t ) is a solution for all real numbers a,8.
NOTES 13

(b) Let A = [I -23. Find solutions u ( t ) , u(l) to z’= A s such that every
+
solution can be expressed in the form au(t) Bu(t) for suitable con-
st.anta a,8.

Notes

The background needed for a reader of Chapter 1 is a good first year of college
calculus. One good source is S. Lang’s Second Course in Calculus [12, Chapters I,
11, and 1x1. In this reference the material on derivatives, curves, and vectors in
Rnand matrices is discussed much more thoroughly than in our Section 2.
Chapter 2
Newton’s Equation and Kepler’s Law

We develop in this chapter the earliest important examples of differential equa-


tions, which in fact are connected with the origins of calculus. These equations were
used by Newton to derive and unify the three laws of Kepler. These laws were
found from the earlier astronomical observations of Tycho Brahe. Here we give a
brief derivation of two of Kepler’s laws, while at the same time setting forth some
general ideas about differential equations.
The equations of Newton, our starting point, have retained importance through-
out the history of modern physics and lie at the root of that part of physics called
classical mechanics.
The first chapter of this book dealt with linear equations, but Newton’s equa-
tions are nonlinear in general. In later chapters we shall pursue the subject of non-
linear differential equations somewhat systematically. The examples here provide
us with concrete examples of historical and scientific importance. Furthermore, the
case we consider most thoroughly here, that of a particle moving in a central force
gravitational field, is simple enough so that the differential equations can be solved
explicitly using exact, classical methods (just calculus!). This is due to the existence
of certain invariant functions called inwrals (sometimes called “first integrals” ;
we do not mean the integrals of elementary calculus). Physically, an integral is a
conservation law; in the case of Newtonian mechanics the two integrals we find
correspond to conservation of energy and angular momentum. Mathematically
an integral reduces the number of dimensions.
We shall be working with a particle moving in a field of force F . Mathematically
F is a vector field on the (configuration) space of the particle, which in our case we
suppose to be Cartesian three space R*.Thus F is a map F : R3+ R3that assigns
to a point x in RS another point F ( x ) in R8. From the mathematical point of view,
F ( x ) is thought of as a vector based at x . From the physical point of view, F ( x )
is the force exerted on a particle located at x.
The example of a force field we shall be most concerned with is the gravitational
field of the sun: F ( z ) is the force on a particle located at x attracting it to the sun.
$1. HARMONIC OSCILLATORS 15

We shall go into details of’this field in Section 6. Other important examples of force
fields are derived from electrical forces, magnetic forces, and so on.
The connection between the physical concept of force field and the mathematical
concept of differential equation is Newton’s second law: F = ma. This law asserts
that a particle in a force field moves in such a way that the force vector at the loca-
tion of the particle, a t any instant, equals the acceleration vector of the particle
times the mass m. If x ( t ) denotes the position vector of the particle at time t , where
x: R + Ra is a sufficiently differentiable curve, then the acceleration vector is the
second derivative of x ( t ) with respect to time
a ( t ) = *(t).
(We follow tradition and use dots for time derivatives in this chapter.) Newton’s
second law states
F ( z ( t ) ) = m*(t).
Thus we obtain a second order differential equation:
1
= - F(z).
m
In Newtonian physics it is assumed that m is a positive constant. Newton’s law of
gravitation is used to derive the exact form of the function F (2). While these equa-
tions are the main goal of this chapter, we first discuss simple harmonic motion
and then basic background material.

$1. Harmonic Oscillators

We consider a particle of mass m moving in one dimension, its position at time


1 given by a function t ---t x ( t ) , x: R + R. Suppose the force on the particle at a

point x E R is given by -mp2x, where p is some real constant. Then according


to the laws of physics (compare Section 3) the motion of the particle satisfies
(1) z + p2x = 0.
This model is called the harmonic oscillator and (1) is the equation of the harmonic
oscillator (in one dimension).
An example of the harmonic oscillator is the simple pendulum moving in a plane,
when one makes an approximation of sin 5 by z (compare Chapter 9). Another
example is the case where the force on the particle is caused by a spring.
It is easy to check that for any constants A , B , the function
(2) z ( t ) = A cos p t + B sin pt
is a solution of ( l ) ,with initial conditions x(0) = A , k ( 0 ) = pB. In fact, as is proved
16 2. NEWTON’S EQUATION AND KEPLER’S LAW

often in calculus courses, (2) is the only solution of (1) satisfying these initial condi-
tions. Later we will show in a systematic way that these facts are true.
Using basic trigonometric identities, (2) may be rewritten in the form
(3) 4 0 = a cos (Pt to), +
+
where a = ( A z B2)Ilais called the amplitude, and cos to = A (A2 B2)-1’2. +
In Section 6 we will consider equation (1) where a constant term is added (repre-
senting a constant disturbing force) :
(4) X +
p2x = K .
Then, similarly to ( 1), every solution of (4)has the form

(5) x ( t ) = acos ( p t + to) + K-.P2


The two-dimensional version of the harmonic oscillator concerns a map x:R + R*
and a force F ( x ) = -mkz (where now, of course, x = (XI, 52) E Rz).Equation
(1) now has the same form
(1’) X + k2x = 0
with solutions given by
(2’) xl(t) = A cos kt + B sin kt,
q ( t ) = C cos kt + D sin kt.
See Problem 1.
Planar motion will be considered more generally and in more detail in later sec-
tions. But first we go over some mathematical preliminaries.

52. Some Calculus Background

A path of a moving particle in Rn (usually n 5 3) is given by a map f : I + Rn


where I might be the set R of all real numbers or an interval (a, b) of all real num-
bers strictly between a and b. The derivative of f (provided f is differentiable at
each point of I) defines a map j’: I + Rn.The map f is called C1, or continuously
digerentiable, if j’ is continuous (that is to say, the corresponding coordinate func-
tions f: ( t ) are continuous, i = 1, . . , , n ) . If f’: I + Rn is itself C1, then f is said
to be C2.Inductively, in this way, one defines a map f : I + Rnto be CI, where r =
3, 4,5, and so on.
The inner product, or “dot product,” of two vectors, x , y in Rn is denoted by
(x,y ) and defined by
n
(x,Y > = C xiyi.
6 1
93. CONSERVATIVE FORCE FIELDS 17

Thus (z,z)= I z I*. If z,y: I --* Rnare C1functions, then a version of the Leibnie
product rule for derivatives is
J('y)' = ("J a / ) ('J +
9')l
aa can be easily checked using coordinate functions.
We w i l l have occasion to consider functions f: Rn-+ R (which, for example,
could be given by temperature or density). Such a map f is called C1 if the map
Rn-+ R given by mch partial derivative z + af/azi(z)is defined and continuous
(in Chapter 5 we discuss continuity in more detail). In this case the gradient of
f, called grad f, is the map Rn-+ Rnthat sends z into (af/azl (z), . . . ,af/az. (3)) .
Gradf is an example of a vector field on Rn. (In Chapter 1 we considered only
linear vector fields, but gradf may be more general.)
Next, consider the composition of two C1maps aa follows:
f 0
I + R n + R.
The chain rule can be expressed in this context aa

using the definitions of gradient and inner product, the reader can prove that this
is equivalent t o

93. Coneervative Force Fields

A vector field F:R' +RS is called IL force field if the vector F ( z ) assigned to the
point z is interpreted as a force acting on a particle placed at 2.
Many force fields appearing in physics arise in the following way. There is a C1
function
V : Ra+ R
euch that

= -grad V ( z ) .
(The negative sign is traditional.) Such a force field is called conservative. The
function V ie called the potential energy function. (More properly V should be called
a potential energy since adding a constant to it does not change the force field
-grad V ( z ) . )Problem 4 relates potential energy to work.
1s 2. NEWTON’S EQUATION AND KEPLER’SLAW

The planar harmonic oscillation of Section 1 corresponds to the force field


F : R2+ R2, F ( z ) = -mkx.
This field is conservative, with potential energy
V ( r ) = 3mk 1 x l2
as is easily verified.
For any moving particle s ( t ) of mass m , the kinetic energy is defined to be
T = +m I i ( t ) 12.

Here i ( t ) is interpreted as the velocity vector at time t ; its length I k ( t ) I is the speed
a t time t. If we consider the function z: R + R* as describing a curve in R3,then
k ( t ) is the tangent vector to the curve at z ( t ) .
For a particle moving in a conservative force field F = -grad V , the potential
.
energy a t x is defined to be V (x) Note that whereas the kinetic energy depends on
the velocity, the potential energy is a function oi position.
The total energy (or sometimes simply energy) is
E=T+V.
This has the following meaning. If z ( t ) is the trajectory of a particle moving in
the conservative force field, then E is a real-valued function of time:
E(t) = 4 I m k ( t ) l2 + V ( z ( t ) ) .

Theorem (Conservation of Energy) Let x ( t ) be the trajectory of a particle moving


i n a conservative force field F = -grad V . Then the total energy E is independent of
time.
Proof. It needs to be shown that E ( z ( t ) )is constant in t or that
d
- (T
dt
+ V ) = 0,
or equivalently,

I t follows from calculus that

( a version of the Leibnir product formula) ; and also that


d
- ( V ( k ) ) = (grad V ( z ) ,i)
dt
(the chain rule).
54. CENTRAL FORCE FIELDS 19

These facts reduce the proof to showing that


m(x, 3 ) + (grad V , 2 ) = 0
+
or (mx grad V , 3 ) = 0. But this is so since Newton’s second law is mx +
grad V ( x ) = 0 in this instance.

94. Central Force Fields

A force field F is called central if F ( x ) points in the direction of the line through
every Z. In other words, the vector F ( z ) is always a scalar multiple of x, the
5 , for
coefficient depending on x :
F(Z) = X(z)x.
We often tacitly exclude from consideration a particle at the origin; many central
force fields are not defined (or are “infinite”) at the origin.

Lemma Let F be a conservative force field. Then the following statements are
equivalent :
(a) F is central,
(b) F ( x ) = f ( l x l)z,
(c) F ( z ) = -grad V ( Z )and V ( z ) = g(l x I).
Proof. Suppose (c) is true. To prove (b) we find, from the chain rule:

this proves (b) with f ( l z I) = g’(l x 1)/1 x I. It is clear that (b) implies (a). To
show that (a) implies (c) we must prove that V is constant on each sphere.
S,= ( z E R 3 1 1 x l = a ) , a>0.
Since any two points in S, can be connected by a curve in S, it suffices to show that
V is constant on any curve in S,. Hence if J C R is an interval and u : J + S, is
a C1 map, we must show that the derivative of the composition V u 0

II V
J+S,CR3+R
is identically 0. This derivative is
d
- v ( u ( t ) ) = (grad v ( u ( t > ) u, ’ ( t ) )
dt
20 2. NEWTON’S EQUATION AND KEPLER’S LAW

aa in Section 2. Now grad V (2) = -F (z)= - X (2) z since F is central

=o
because 1 u ( t ) I = a.
In Section 5 we shall consider a special conservative central force field obtained
from Newton’s law of gravitation.
Consider now a central force field, not necessarily conservative.
Suppose at some time to, that P C Rsdenotes the plane containing the particle,
the velocity vector of the particle and the origin. The force vector F ( z ) for any
point z in P also lies in P. This makes it plausible that the particle stays in the plane
P for all time. In fact,, this is true: a particle moving in a central force field moves
in a fixed plane.
The proof depends on the cram product (or vector product) u X v of vectors u,
u in Rs. We recall the definition
uX u = (UZUS- uxl)~,USUI - UIUS, U I -~ ~ 2 0 1 )E Rs
and that u X u = - u X u .= I u I I u I N sin 0, where N is a unit vector perpendicu-
lar to u and u, ( U , u, N) oriented as the axes (“right-hand rule”), and 0 is the angle
between u and u.
Then the vector u X u = 0 if and only if one vector is a scalar multiple of the
other; if u X u # 0, then u X u is orthogonal to the plane containing u and u. If
u and u are functions of t in R,then a version of the Leibniz product rule asserts
(as one can check using Cartesian coordinates) :
d
-(uXu) =tixu+uxu.
dt
Now let z ( t ) be the path of a particle moving under the influence of a central
force field. We have
d
-(zXk) = x x x + z x x
dt
= x x x
=o
because 3 is a scalar multiple of z. Therefore z ( t ) X k ( t ) is a constant vector y.
If y # 0, this means that z and x always lie in the plane orthogonal to y, as asserted.
If y = 0, then k ( t ) = g ( t ) z ( t ) for some scalar function g ( t ) . This means that the
velocity vector of the moving particle is always directed along the line through the
64. CENTRAL FORCE FIELDS 21

origin and the particle, aa is the force on the particle. This makes it plausible that
the particle always moves along the same line through the origin. To prove this let
( q ( t ) , z 2 ( t ) ,z 8 ( t ) )be the coordinates of z ( t ) . Then we have three differential
equations

k = 1 , 2 , 3.

By integration we find

Z k ( t ) = f?'(')Zk(h), h(t) = /'g(S) d8.


10

Therefore z ( t ) is always a scalar multiple of z(to) and so z( t ) moves in a fixed line,


and hence in a fixed plane, as asserted.
We restrict attention to a conservative central force field in a plane, which we
take to be the Cartesian plane R2.Thus z now denotes a point of R2,the potential
energy V is defined on R2and

F ( z ) = -grad V(z)

Introduce polar coordinates ( r , 6 ) ) with r = I z I.


= - --CI3
Define the angular momentum of the particle to be
h = mr28,
where B is the time derivative of the angular coordinate of the particle.

Theorem (Conservation of Angular Momentum) For a particle moving in a


central force field:
dh
- = 0, where h = mrg.
dt
Proof. Let i = i ( t ) be the unit vector in the direction z ( t ) so z=ri. Let j =
j ( t ) be the unit vector with a 90" angle from i to j. A computation shows that dildt =
dj, djldt = -Bi and hence
x = ri + dj.
Differentiating again yields

x = + Ir ddt
( i - d*)i - - (r"e)j.

If the force is central, however, it has zero component perpendicular to 5. There-


fore, since 1 = m-'F(z) , the component of 3 along j must be 0. Hence
d
- (&) = 0,
dt
proving the theorem.
82 2. NEWTON’S EQUATION AND KEPLER’S LAW

We can now prove one of Kepler’s laws. Let A ( t ) denote the area swept out by
the vector ~ ( tin) the time from to to 1. In polar coordinates d.4 = +r2 do. We define
the areal velocity to be
A = ir24,
the rate at which the position vector sweeps out area. Kepler observed that the
line segment joining a planet to the sun sweeps out equal areas in equal times, which
we interpret to mean A = constant. We have proved more generally that this is
true for any particle moving in a conservative central force field; this is a con-
sequence of conservation of angular momentum.

i5. States

We recast the Newtonian formulation of the preceding sections in such a way


that the differential equation becomes first order, the states of the system are made
explicit, and energy becomes a function on the space of states.
A state of a physical system is information characterizing it at a given time. In
particular, a state of the physical system of Section 1 is the position and velocity
of the particle. The space of states is the Cartesian product R3 X R3of pairs ( 2 , v ) ,
x , v in R3;x is the position, v the velocity that a particle might have at a given
moment.
We may rewrite Newton’s equation
(1) mx = F(x)
aa a first order equation in terms of x and v. (The order of a differential equation
is the order of the highest derivative that occurs explicitly in the equation.) Con-
sider the differential equation
dx
- = v,
dt

dv
m- = F(x).
dt
A solution to (1’) is a curve t --t ( x ( t ), v ( t ) ) in the state space R3 X R3such that
$(t) = v(t) and i ) ( t ) = m - * F ( x ( l ) ) for all 1.
It can be seen then that the solutions of (1) and (1’) correspond in a natural
fashion. Thus if x ( t ) is a solution of ( 1 ) , we obtain a solution of (1’) by setting
v ( t ) = k ( t ) . The map R3 X R3--t R3X R3that sends ( x , v ) into ( v , m-’F(x) ) is a
vector field on the space of states, and this vector field defines the diflerentiul equation,
(1’).
56. ELLIPTICAL PLANETARY ORBITS 23

A solution (x(t), v ( t )) to (1’) gives the passage of the state of the system in time.
Now we may interpret energy as a function on the state space, R3X R3 -+ R,
defined by E ( x , v ) = +m I v l2 +
V(x). The statement that “the energy is an
integral” then means that the composite function
t--, (x(t), do) + E ( 4 t ) , v ( t > )
is constant, or that on a solution curve in the state space, E i s constant.
We abbreviate R3X R3by 8. An integral (for (1’)) on 8 is then any function
that is constant on every solution curve of (1‘). It was shown in Section 4 that in
addition to energy, angular momentum is also an integral for (1’). In the nineteenth
century, the idea of solving a differential equation was tied to the construction of a
sufficient number of integrals. However, it is realized now that integrals do not exist
for differential equations very generally; the problems of differentialequations have
been considerably freed from the need for integrals.
Finally, we observe that the force field may not be defined on all of R3, but only
on some portion of it, for example, on an open subset U C R3.In this case the path
x(t) of the particle is assumed to lie in U . The force and velocity vectors, however,
are still allowed to be arbitrary vectors in R3.The force field is then a vector field
on U , denoted by F : U + R3.The state space is the Cartesian product U X R3,and
(1’) is a first order equation on U X R3.

86. Elliptical Planetary Orbits

We now pass to consideration of Kepler’s first law, that planets have elliptical
orbits. For this, a central force is not sufficient. We need the precise form of V as
given by the “inverse square law.”
We shall show that in polar coordinates ( r , 0) , an orbit with nonzero angular
momentum h is the set of points satisfying
r(1 + B cos 0 ) = 1 = constant; e = constant,
which defines a conic, as can be seen by putting r cos e = x, r2 = x2 y2. +
Astronomical observations have shown the orbits of planets to be (approxi-
mately) ellipses.
Newton’s law of gravitation states that a body of mass ml exerts a force on a
body of mass m2.The magnitude of the force is gmlm2/r2,where r is the distance
between their centers of gravity and g is a constant. The direction of the force on
Inz is from r i i 2 to ml.
Thus if m llies at the origin of R3 and m2 lies at x E R3,the force on m2 is

The force on nil is the negative of this.


24 2. NEWTON'S EQUATION AND KEPLER'S LAW

We must now face the fact that both bodies will move. However, if m 1 is much
greater than m2,its motion will be much less since acceleration is inversely propor-
tional to mass. We therefore make the simplifying assumption that one of the
bodies does not move; in the case of planetary motion, of course it is the sun that
is assumed at rest. (One might also proceed by taking the center of mass at the
origin, without making this simplifying assumption.)
We place the sun at the origin of R3 and consider the force field corresponding
to a planet of given mass m. This field is then

F(x) = -c- X
Ix 13)

where C is a constant. We then change the units in which force is measured to obtain
the simpler formula
X
F(x) = --
I x 13'
It is clear this force field is central. Moreover, it is conservative, since

-2- - grad V ,
I 2 13
where
-1
V=-.
1x1
Observe that F ( x ) is not defined at 0.
As in the previous section we restrict attention to particles moving in the plane
R2;or, more properly, in R2- 0. The force field is the Newtonian gravitational field
in R2,F ( x ) = -x/1 x 13.
Consider a particular solution curve of our differential equation 3 = m-'F(x).
The angular momentum h and energy E are regarded as constants in time since
they are the same at all points of the curve. The case h = 0 is not so interesting; it
corresponds to motion along a straight line toward or away from the sun. Hence
we assume h # 0.
Introduce polar coordinates ( r , 0) ; along the solution curve they become func-
tions of time ( r ( t ) , 0 ( t ) ) . Since ~ * is8 constant and not 0, the sign of 8 is constant
along the curve. Thus 0 is always increasing or always decreasing with time. There-
fore T is a function of 0 along the curve.
Let u ( t ) = l/r ( t ) ; then u is also a function of 0 ( t ) . Note that
u= -v.
We have a convenient formula for kinetic energy T.
$6. ELLIPTICAL PLANETARY ORBITS 25

Lemma

Proof. From the formula for x in Section 4 and the definition of T we have
T = )m[F + (r6)Z-J.
Also,
- 1 du du
- h--
p = --d =
u2 d0 m do
by the chain rule and the definitions of u and h ; and also
h hu
r(j = -- = -.
mr m
Substitution in the formula for T proves the lemma.

Now we find a differential equation relating u and 0 along the solution curve.
Observe that T = E - V = E +
u. From the lemma we get

Differentiate both sides by 8, divide by 2 du/dO, and use dE/d0 = 0 (conservation


of energy). We obtain another equation

-+u=- m
d2u
do2 h2'
where na/h2is a constant.
We re-examine the meaning of just what we are doing and of (2). A particular
orbit of the planar central force problem is considered, the force being gravitational.
Along this orbit, the distance r from the origin (the source of the force) is a function
of 0, as is l / r = u. We have shown that this function u = u(0) satisfies ( 2 ) , where
h is the constant angular momentum and m is the mass.
The solution of ( 2 ) (as was seen in Section 1) is

where C and Bo are arbitrary constants.


To obtain a solution to ( I ) , use (3) to compute duld0 and d2u/d02,substitute
t.he resulting expression into (1) and solve for C . The result is

C =
1
f - (2mh2E
h2
+ m2)1'2.
26 2. NEWTON’S EQUATION AND KEPLER’S LAW

Putting this into (3) we get

where q is an arbitrary constant. There is no need to consider both signs in front


+ +
of the radical since code q T ) = -cos(O +
q ) . Moreover, by changing the
variable O to e - q we can put any particular solution in the form

(4)

We recall from analytic geometry that the equation of a conic in polar coordinates
is

(5)
1
u = - (1
1
+e ~ ~ ~ e u) =, -.r
1

Here 1 is the latus rectum and c 2 0 is the eccentricity. The origin is a focus and the
three cases e > 1, c = 1, e < 1 correspond respectively to a hyperbola, parabola,
and ellipse. The case e = 0 is a circle.
Since (4) is in the form ( 5 ) we have shown that the orbit of a particle moving under
fhe influence of a Newtonian gravitational force i s a conic of eccentricity

e = ( l + - & - ) 2Eh2 ‘ I 2

Clearly, c 2 1 if and only if E 2 0. Therefore the orbit is a hyperbola, parabola, or


ellipse according to whether E > 0, E = 0, or E < 0.
The quantity u = l / r is always positive, From (4) it follows that

(1 + ~ ) 1 ’ 2 c o s B> -1.
But if 0 = =tr radians, cos 0 = - 1 and hence

(1 +2 y ’ < 1.

This is equivalent to E < 0. For some of the planets, including the earth, complete
revolutions have been observed; for these planets cos e = - 1 at least once a year.
Therefore their orbits are ellipses. In fact from a few observations of any planet it
can be shown that the orbit is in fact an ellipse.
NOTES 27

PROBLEMS

1. A particle of rmm m moves in the plane R*under the influence of an elastic


band tying it to the origin. The length of the band is negligible. Hooke’s law
states that the force on the particle is always directed toward the origin and
ia proportional to the distance from the origin. Write the force field and verify
that it is conservative and central. Write the equation F = ma for this cam
and solve it. (Compare Section 1.) Verify that for “most” initial conditions the
particle moves in an ellipse.
2. Which of the following force fields on R*are conservative?
(a) F ( z J !/I = (-’*, -’!/*)
(b) F ( z J !/I = (z2 - up,2zy)
(c> F(sJ !/I
= (zJ O)
3. Consider the case of a particle in a gravitational field moving directly away
from the origin a t time t = 0. Discuss its motion. Under what initial conditions
does it eventually reverse direction?
4. Let F ( z ) be a force field on R*.Let Z O , Zbe
~ points in R*and let y(s) be a path
in R’,so 2 s 5 sl, parametrized by arc length s, from zoto zl. The work done
in moving a particle along this path is defined to be the integral

( F ( y ( s ) ) ,! / ’ ( s ) ) d s J

where ~’(s) is the (unit) tangent vector to the path. Prove that the force field
is conservative if and only if the work is independent of the path. In fact if
F = -grad V, then the work done is V(ZI) - V ( Z O ) .
5. How can we determine whether the orbit of (a) Earth and (b) Pluto is an
ellipse, parabola, or hyperbola?
6. Fill in the details of the proof of the theorem in Section 4.
7. Prove the angular momentum h, energy E, and mass m of a planet are related
by the inequality
m
E>--
2h2’

Notes

Lang’s Second Course in Calculus [I21 is a good background reference for the
mathematics in this chapter, especially his Chapters 3 and 4. The physics material
is covered extensively in a fairly elementary (and perhaps old-fashioned) way in
28 2. NEWTON’S EQUATION AND KEPLER’S LAW

Principles of Jlechanics by Synge and Griffith [23]. One can also find the mechanics
discussed in t,he book on advanced calculus by Loomis and Sternberg [15, Chapter
131.
The unsystematic ad hoc methods used in Section 6 are successful here because
of the relative simplicity of the equations. These methods do not extend very far
into mechanics. In general, there are not enough “integrals.”
The model of planetary motion in this chapter is quite idealized; it ignores the
gravitational effect of the other planets,
Chapter 3
Linear Systems with Constant
Coeficients and Real Eigenvalues

The purpose of this chapter is to begin the study of the theory of linear operators,
which are basic to differential equations. Section 1 is an outline of the necessary
facts about vector spaces. Since it is long it is divided into Parts A through F. A
reader familiar with some linear algebra should use Section 1 mainly as a reference.
In Section 2 we show how to diagonalize an operator having real, distinct eigen-
values. This technique is used in Section 3 to solve the linear, constant coefficient
system 5' = A z , where A is an operator having real distinct eigenvalues. The last
section is an introduction to complex eigenvalues. This subject will be studied
further in Chapter 4.

$1. Basic Linear Algebra

We emphasize that for many readers this section should be used only as a refer-
ence or a review.

A . Matrices and operators

The setting for most of the differential equations in this book is Cartesian space
R";this space was defined in Chapter 1, Section 2, as were the operators of addition
and scalar multiplication of vectors. The following familiar properties of these
30 3. L I N E A R SYSTEMS : CONSTANT COEFFICIENTS, REAL EIGENVALUES

operations are immediate consequences of the definitions :

VSl: I: +
y =y + 2,
2+0=2,
+ (-z) = 0,
+Y) + + (Y +
5
(2 = 5 2).

Here 2, y, z E R",- z = ( - l ) z , and 0 = (0, . . . , 0) E R".

vs2: (A +
p ) z = xz + pz,
X(x +
y ) = xz + xy,
12 = 5,
Oz = 0 (the first 0 in R, the second in R").

These operations satisfying VS1 and VS2 define the vector space structure on R".
Frequently, our development relies only on the vector space structure and ignores
the Cartesian (that is, coordinate) structure of Rn.To emphasize this idea, we may
write E for Rnand call E a vector space.
The standard coordinates are often ill suited to the differential equation being
studied; we may seek new coordinates, as we did in Chapter 1, giving the equation
a simpler form. The goal of this and subsequent chapters on algebra is to explain
this process. I t is very useful to be able to treat vectors (and later, operators) as
objects independent of any particular coordinate system.
The reader familiar with linear algebra will recognize VS1 and VS2 as the defining
axioms of an abstract vector space. With the additional axiom of finite dimen-
sionality, abstract vector spaces could be used in place of R" throughout most of
this book.
Let A = [a,,] be some n X m matrix as in Section 2 of Chapter 1. Thus each
a,, is a real number, where ( i ,j ) ranges over all ordered pairs of integers with 1 5
i 5 n, 1 5 j 5 n . The matrix A can be considered as a map A : Rn+ Rn where
the ith coordinate of A z is CYpIa,,x,, for each z = (zl, . . . , 5,) in R".I t is easy
t o check that this map satisfies, for z, y E R", E R :

+
L 1 : A ( x y) = A X + Ay,
L2: A(Xz) = XAz.

These are called linearity properties. Any map A : Rn+ R" satisfying L1 and L2
is called a linear map. Even more generally, a map A : Rn Rm (perhaps different ---f

domain and range) that satisfies L1 and L2 is called linear. In the case where the
domain and range are the same, A is also called an operator. The set of all operators
on Rnis denoted by L(Rn).
Note that if ek E Rnis the vector

ek = ( 0 , . . . , 0, 1 , 0 , . . , , O),
$1. I3.4SIC LINEAR ALGEBRA 31

with a 1 in the kth place, geros elsewhere, then

Thus the image of ek is the kth column of the matrix A .


Let AM, be the set of all n X n matrices. Then from what we have just described,
there is a natural map
(2) Mn -+ L(Rn)
that associates to each matrix the corresponding linear map. There is an inverse
process that associates to every operator on R" a matrix. In fact let T : Rn-+ Rn
be any operator. Then define a,, = the ith coordinate of Te,. The matrix A = [a,,]
obtained in this way has for its kth column the vector Tek. Thus
Tek = Aek; k = 1, . . . , n.
It follows that the operator defined by A is exactly T . For let x E Rn be any
vector, z = ( X I , . . . , zn). Then
z = z1e1 + * . . + xnen.
Hence
AX = A(cXkek) = zk(Aek) (by L1 and L2)
= cxk(Tek)
= T(CX k e k )
= Tx.
In this way we obtain a natural correspondence between operators on R" and n X n
matrices.
More generally, to every linear map R" + Rm corresponds an m X n matrix,
and conversely. In this book we shall usually be concerned with only operators and
n X n matrices.
Let S, T be operators on R".The composite map T S , sending the vector x to
T (S (z)) , is again an operator on Rn.If S has the matrix [a,,]= A and T has the
matrix [b,,] = B , then T S has the matrix [c,,] = C , where
n
C,, = cbtkakl.
k-1

To see this we compute the image of e, under T S :


(TS)e, = B(Ae,) = B(C aklek)
k

= c ak,(Bek)
k

= c byer).
k
U,k,(C
a
32 3. LINEAR SYSTEMS: CONSTANT COEFFICIENTS, REAL EIGENVALUES

Therefore
(TS)ej= C (C bikakj)ei.
i k

This formula says that the ith coordinate of ( T S )e j is


bikakj.
k

Since this ith coordinate is c,,, our assertion follows.


We call the matrix C obtained in this way the product BA of B and A (in that
order).
Since composition of mappings is associative, it follows that C ( B A ) = ( C B ) A
if A , B , C are n x n matrices.
The sum S + T of operators S , T E L(R") is defined to be the operator
x+ Sx + Tx.
It is easy to see that if A and B are the respective matrices of S and T , then the
matrix of S + + +
T is A B = [aij b i j ] .
Operators and matrices obey the two distributive laws
P(Q + R ) = PQ + P R ; (Q + R ) P = QP + RP.
Two special operators are 0: x + 0 and I : x + x . We also use 0 and I to denote
the corresponding matrices. All entries of 0 are 0 E R while Z = [ S i j ] where 6;j
is the Kronecker function:
0 if i # j ,
1 if i = j.

Thus I has ones on the diagonal (from upper left to lower right) and zeros elsewhere.
I t is clear that A + +
0 = 0 A = A , OA = A 0 = 0, and A I = I A = A , for
both operators and matrices.
If T is an operator and X any real number, a new operator T is defined by
(XT)x = ~ ( T x ) .
If A = [ a i j ] is the matrix of T , then the matrix of AT is XA = [hai,], obtained by
multiplying each entry in A by A. It is clear that
OT = 0,
1T = T ,
and similarly for matrices. Here 0 and 1 are real numbers.
The set L ( R") of all operators on R", like the set M , of all n X n matrices, satis-
ties the vector space axiom VS1, VS2 with 0 as 0 and x, y, z as operators (or ma-
trices). If we consider an n X n matrix as a point in Rn2,the Cartesian space of
dimension n2, then the vector space operations on L ( Rn) and M , are the usual
ones.
$1. BASIC LINEAR ALGEBRA 33

An operator T is called invertible if there exists an operator S such that ST =


TS = I . We call S the inverse of T and write S = T-I, T = S-I. If A and B are
the matrices corresponding to S and T, then AB = B A = I . We also say A is
invertible, B = A + , A = B-*.
It is not easy to find the inverse of a matrix (supposing it has one) in general;
we discuss this further in the appendix. The 2 X 2 case is quite simple, however.
The inverse of

is

provided the determinant D # 0. If D = 0, A is not invertible. (Determinants are


considered in Part E.)

B. Subspaces, bases, and dimension

Let E = R".A nonempty subset F C E is called a subspace (more properly, a


linear subspace) if F is closed under the operations of addition and scalar multi-
plication in E ; that is, for all x E F , y E F , X E R:
x+yEF, XxEF.
I t follows that with these operations F satisfies VS1 and VS2 of Part A.
If F contains only 0, we write F = 0 and call F the trivial subspace. If F # E , we
call F a proper subspace.
If F1 and Fz are subspaces and F1 C FP, we call F1 a subspace of F2.
Since a subspace satisfies VS1 and VS2, the concept of a linear m a p T :F1+ FZ
between subspaces F1 C R",F2 C Rm,makes sense: T is a map satisfying L1 and
L2 in Part A. In particular, if m = n and F1 = F2, T is an operator on a subspace.
Henceforth we shall use the term vector space to mean "subspace of a Cartesian
space." An element of a vector space will be called a vector (also a point). To dis-
tinguish them from vectors, real numbers are called scalars.
Two important subspaces are determined by a linear map
A : El --t Ez,
where El and Ez are vector spaces. The kernel of A is the set
KerA = ( x E El I A x = 0) = A-l(O).
33 3. LINEAR SYSTEMS: CONSTANT COEFFICIENTS, REAL EIGENVALUES

The image of d is the set


ImA = { y E E t [ A x = y f o r s o m e x EE l )
= A(E1).
Let F be a vector space. A set S = (al,. . . , ak) of vectors in F is said to span F
if every vector in F is a linear combination of all . . . , ak; that is, for every x E F
there are scalars t 1 , . . . , t k such that
s = tlal + . + tkak.
The set S is called independent if whenever t l , . . . , t k are scalars such that
hal + a + tkak = 0,
then f1 = . . = In:= 0.
A basis of F is an ordered set of vectors in F that is independent and which spans
F.
The following basic fact is proved in Appendix I.

Proposition 1 Every vector space F has a basis, and every basis of F has the same
number of elements. If (el, . . . , e k ] C F is an independent subset that is not a basis,
by adjoining to it suitable vectors ek+l, . . . , em one can form a basis (el, . . . , em).
The number of elements in a basis of F is called the dimension of F , denoted by
dim F . If (el, . . . , em] is a basis of F , then every vector z E F can be expressed
m
z = C tiei, ti E R,
i-1

since the ei span F. Moreover, the numbers tl, . . . , t, are unique. To see this,
suppose also that
m
x = C siei.
i-1

Then
0 = x - x = C ( t i - si)ei;
i

by independence,
t,-si=O, i = 1, . . . , m.
These numbers tl, . . . , t, are called the coordinates of x in the basis (el, . . . , em).
The standard basis el, . . . , em of R" is defined by
e, = ( 0 , . . . , 0, 1, 0 , . . . , 0 ) ; i = 1,. . . , n,
with 1 in the ith place and 0 elsewhere. This is in €act a basis; for C tie, =
( 1 1 , . . . , t n ) , so { e l , . . . , en] spans R";independence is immediate.
$1. BASIC LINEAR ALGEBRA 35

I t is easy to check that Ker A and Im A are subspaces of El and E2,respectively.


A simple but important property of Ker A is this: A i s one-to-one i f and only i f
Ker A = 0. For suppose A is one-to-one, and x E Ker A. Then A x = 0 = AO.
Hence z = 0; therefore 0 is the only element of Ker A. Conversely, suppose Ker A =
0, and Ax = Ay. Then A (x - y ) = 0, so z - y E I<er A. Then A ( x - y ) = 0,
so x - y E I<er A . Thus z - y = 0 so z = y.
The kernel of a linear map Rn + Rmis connected with linear equations (algebraic,
not differential) as follows. Let A = [alj] be the m X n matrix of the map. Then
x = (zl, . . . , x,) is in Ker A if and only if

In other words, ( 5 1 , . . . , x,) is a solution to the above system of m linear homo-


geneous equations in n unknowns. I n this case Ker A is called the solution space of
the system. “Solving” the system meansjinding a basis for Ker A .
If a linear map T : E + F is both one-to-one and onto, then there is a unique
map S : F - + E such that S T ( x ) = x and T S ( y ) = y for all x E El y E F . The
map S is also linear. In this case we call T an isomorphism, and say that E and F
are isomorphic vector spaces.

Proposition 2 Two vector spaces are isomorphic i f and only if they have the same
dimension. I n particular, every n-dimensional vector space is isomorphic to R”.
Proof. Suppose E and F are isomorphic. If {el, . . . , en) is a basis for El it is
easy to verify that Tel, . . . , Ten span F (since T is onto) and are independent
(since T is one-to-one). Therefore E and F have the same dimension, n. Conversely,
.
suppose ( e l , . . . , e n ) and ( f l , . . , f n ) are bases for E and F , respectively. Define
T :E F to be the unique linear map such that Tei = f , , i = 1, . . . , n: if x =
---f

C xiei E El then Tx = C x,jl.Then T is onto since the fi span F , and Ker T = 0


since the f , are independent.
The following important proposition is proved in Appendix I.

Proposition 3 Let T : E + F be a linear map. Then

dim(1m T ) + dim(I<er T ) = dim E.


I n particular, suppose dim E = dim F. Then, the following are equivalent statements:

(a) Ker T = 0,
(b) I m T = F ,
(c) T i s an isomorphism.
36 3. LINEAR SYSTEMS: CONSTANT COEFFICIENTS, REAL EIGENVALUES

C. Changes of bases and coordinates

To every basis (el, , . . en) of a vector space E we have associated a system of


)

coordinates as follows: to each vector z E we assign the unique n-tuple of real


numbers (21, . . . , 2,) such that z = C xiei. If we consider xi as a function of z,
we may define a map
cp: E + Rnl V ( Z ) = (z~(z), .. . ,z ~ ( z ) ) .
This is a linear map; it is in fact the unique linear map sending each basis vector
e, of E into the corresponding standard basis vector of R", which we denote here
by Bi.
It is easy to see that cp is an isomorphism (see Proposition 2 of Part B). The
isomorphism cp sends each vector z into its n-tuple of coordinates in the basis
(81, . . . Zn].
Conversely, let cp: E + Rn be any isomorphism. If ( B1, . . . , Zn) is the standard
)

basis of Rn, then define ei = cp-'(&), i = 1, . . . n. Then (el, . . . , en) is a basis of


)

E, and clearly,
cp(C xiei) = (51, . . . Zn). )

In this way we arrive at the following definition: A coordinate system on a vector


space E is an isomorphism cp: E + Rn. (Of course, n = dim E.) The coordinates
of z E E are (XI, . . . , x,,), where cp(z) = (xl,. . . x,,).Each coordinate x, is a
)

linear function xi: E + R.


We thus have three equivalent concepts: a basis of El a coordinate system on El
and an isomorphism E + Rn.
Readers familiar with the theory of dual vector spaces (see Chapter 9) will
recognize the coordinate functions xi as forming the basis of E* dual to (el, . . . ,en);
here E* is the "dual space" of E, that is, the vector space of linear maps E ---t R.
The coordinate functions xi are the unique linear functions E + R such that

si(ej) = 6ij) i = 1, . . . , n; j = 1, . . . ,n,


where 6ij = 0 if i # j and 1 if i = j.
Now we investigate the relations between two bases in E and the two correspond-
ing coordinate systems.
Let (el, . . . , en) be a basis of E and (zl, . . . , zn)the corresponding coordinates.
Let cp: E + R" be the corresponding isomorphism. Let (jl,. . . ,jn) be a new basis,
with coordinates (yl, . . . , yn). Let $: E + R" be the corresponding isomorphism.
Each vector ji is a linear combination of the ej; hence we define an n X n matrix:

Each of the new coordinates yi: E + R is a linear map, and so can be expressed
in terms of the old coordinates (xl,. . . , x,). In this way another n X n matrix is
Q1. BASIC LINEAR ALGEBRA 37

defined :
(4)

In fact, Q is the matrix of the linear operator $pl:


Rn 3 Rn.
How are the matrices P and Q related? To answer this we first relate the bases
with their corresponding coordinates :

(5) zi(ej) = 6ij, 1, j = . . . , n;


1,

(6) yk(fi) = 6ki, k, i = 1 , . . . , n.


Substituting (4) and (3) into (6) :
6ki = c 1
q k l x l ( C pijej).
j

Since 21 is a linear function, we have


6ki = c (c I i
qklpijxl(ej))

= c (c qklPi$lj)
l j

by ( 5 ) . Each term of the internal sum on the right is 0 unless I = j, in which case
it is q k & j . Thus
6ki = QkjQij.
j

To interpret this formula, introduce the matrix R which is the transpose Pt of


p , by
R = [rij], rij = pji.

Each row of R is the corresponding column of P. Then,


6ki = qkj rji
j

tells us that the (k,i) th entry in the matrix QR is 6 k i ; in other words,


I = QR.
We finally obtain
I = QPt.
Thus
Q = (pt)--1= (p-')t.
The last equality follows from the identities It = I , and = BtAt for any
n X n matrices A , B. Hence
1 = (pp-')t = pt (P-'1 t,
80 (Pt)-1 = (P-1) t,
38 3. LINEAR SYSTEMS : CONSTANT COEFFICIENTS, REAL EIOENVALUES

We have proved ;

Proposition 4 The m.alrix expressing new wordinales in terms of fhe old is the in-
verse transpose of the matrix expressing the new basis in terms of the old.

D. Operator, bases, and matrices

In Part A we associated to an operator T on Rn a matrix [aij] by the rule


(7) Tej = C ai,ei; i = 1, . . . n, )

where {el, . . . en) is the standard basis of Rn. Equivalently, the ith coordinate
.
)

of T x , z = (21, . . , xn)) is

It is useful to represent (8)as the product of an n x n matrix and an n x 1 matrix :


...
...
...
We carry out exactly the same procedure for an operator T : E + E , where E
is any vector space and (el, . . . en) is a given basis of E. Namely, (7) defines a
matrix [ai,]. The coordinates of T x for the basis (el, . . . en) are computed by (8).
)

It is helpful to use the following rules in constructing the matrix of an operator


in a given basis:
The j t h column of the matrix gives the coordinates of the image of the jth baais
vector, as in (7).
The ith TOW of the matrix expresses the ith coordinate of the image of z as a linear
function of the coordinates of x , aa in (8).
If we think of the coordinates aa linear functions x i : E + R, then (7)is expressed
succinctly by
(9) ziT = C aijq; i = 1, . . . , n.
i
This looks very pretty when placed next to (7) ! The left side of (9) is the composi-
tion
T zi
E+E+R.
The right-hand side of (9) is a linear combination of the linear functiontt 21, . . . , xn.
The meaning of (9) is that the two linear functions on E, expressed by the left and
right sides of (9), are equal.
$1. BASIC LINEAR ALGEBRA 39

Now suppose a new system of coordinates (y~,. . . , yn) is introduced in El cor-


responding to a new basis (fi, . . . , fn 1. Let B be the matrix of T in the new coordi-
nates. How is B related to A?
The new coordinates are related to the old ones by an invertible matrix Q = [ q i j ] ,
as explained in Part C. If z E E is any point, its two sets of coordinates
x = (XI, . . . , 2,) and y = (y~,. . . , yn) are related by
x = Q-ly.
y = Qx;
(Here we think of x and y as points in Rn.)The image Tz also has two sets of co-
ordinates, Ax and By,where B is the matrix of T in the new coordinates. Therefore
By = &AX.
Hence
By = QAQ-ly
for all y E Rn.It follows that
(10) B = &A&-'.
This is a basic fact. It is worth restating in terms of the matrix P expressing the
new basis vectors f i in terms of the old basis (el, . . . , en) :

P = [pij], fi = C piiei.
j

In Part C we saw that Q is the inverse transpose of P. Therefore


(11) B = (Pt)-'APt.
The matrix Pt can be described as follows: the ith column of Pt consists of the co-
ordinates of the new basis vector fi in the old basis (el, . . . , en). Observe that in
(10) and (1 1) the inverse signs - 1 appear in different places.
Two n X n matrices B and A related as in (10) by some invertible matrix Q are
called similar. This is a basic equivalence relation on matrices. Two matrices are
similar if and only if they represent the same operator in different bases. Any matrix
property that is preserved under similarity is a property of the underlying linear
transformation. One of the main goals of linear algebra is to discover criteria for
the similarity of matrices.
We also call two operators S, T E L ( E ) similar if T = QSQ-l for some invertible
operator Q E L ( E ) . This is equivalent to similarity of their matrices. Similar
operators define differential equations that have the same dynamical properties.

E. Determinant, trace, and rank

We recall briefly the main properties of the determinant function


Det : M, + R,
40 3. LINEAR SYSTEMS : CONSTANT COEFFICIENTS, REAL EIGENVALUES

where &Inis the set of n X n matrices:


D1: Det(AB) = (Det A ) (Det B),
D2: Det I = 1,
D3: Det A # 0 if and only if A is invertible.
There is a unique function Det having these three properties; it is discussed in more
detail in the appendix. For a 1 X 1matrix A = [ a ] , Det A = a. For a 2 X 2 matrix
[::I = A ,
Det A = ad - bc.
From D1 and D2 it follows that if A-l exists, then
Det(A-l) = (Det A)-l.
From D1 we then obtain
Det(RAZ3-l) = Det A.
In other words, similar matrices have the same determinant. We may therefore define
the determinant of an operator T: E E to be the determinant of any matrix
--f

representing T .
For n = 1, the determinant of T: R1 R1is the factor by which T multiplies
--f

lengths, except possibly for sign. Similarly, for R2and areas, Ra and volumes.
If A is a triangular matrix (aij = 0 for i > j, or aij = 0 for i < j) then Det A =
-
all - * annJthe product of the diagonal elements.
From D3 we deduce:

Proposition 5 Let A be an operator. Then the following statements are equivalent:


(a) Det A # 0,
(b) KerA = 0,
(c) A is one-to-one,
(d) A is onto,
(e) A ia invertible.
I n particular, Det A = 0 if and only i f Ax = 0 for some vector x # 0.

Another important similarity invariant is the trace of a matrix A = [aij]:


Tr A = C aii,
i

the s u m of the diagonal elements. A computation shows that


Tr(AB) = Tr(BA)
and hence
Tr(I2AR-l) = Tr(R-lRA)
= Tr(A).
#I. BASIC LINEAR ALGEBRA 41

Therefore we can define the trace of an operator to be the trace of any matrix repre-
senting it. It is not easy to interpret the trace geometrically.
Note that
+
Tr(A B) = Tr(A) Tr(B). +
The rank of an operator is defined to be the dimension of its image. Since every
n X nmatrix defines an operator on Rn,we can define the rank of a matrix A to be
the rank of the corresponding operator T. Rank is invariant under similarity.
The vector space Im T is spanned by the images under T of the standard b&
. .
vector, el, , , ek. Since Tej is the n-tuple that is thejth column of A , it follows that
the rank of A equals the maximum number of independent columns of A.
This gives a practical method for computing the rank of an operator T. Let A
be an n X n matrix representing T in some basis. Denote the j t h column of A by
cj, thought of as an n-tuple of numbers, that is, an element of R".The rank of T
. .
equals the dimension of the subspace of Rnspanned by cl, . , c,. This subspace is
+
also spanned by c1, . . . , Cj-1, cj h k , Cj+1, . . . , Cn; X E R. Thus we may replace
any column cj of A by c, + k k , for any X E R, k # j . In addition, the order of
the columns can be changed without altering the rank. By repeatedly transform-
ing A in these two ways we can change A to the form

where D is an r X T diagonal matrix whose diagonal entries are different from zero
and C has n - r rows and T columns, and all other entries are 0. It is easy to see
that the rank of B, and hence of A , is r.
From Proposition 3 (Part B) it follows that an operator on an n-dimensional
vector space is invertible if and only if it has rank n.

F. Direct sum decomposition

Let El, . . . , E, be subspaces of E. We say E is the direct sum of them if every


vector x in E can be expressed uniquely:
5 = XI + - - - +g, xi E E,, i = 1,. . . , r.
This is denoted
r
E = El e . - . e E r = 0 E,.
i-1

Let T : E -+ E and Ti: Ei E,, i = 1, . . . , n be operators. We say that T is


the direct sum of the Ti if E = El e - . e En,each E , is invariant under T , that
is, T(E,) C Ei,, and Tx = T,x i f x E Ei. We denote the situation by T =
-
T I e * * e T,. If Tj has the matrix A j in some basis for each Ej, then by taking
42 3. LINEAR SYSTEMS: CONSTANT COEFFICIENTS, REAL EIGENVALUES

the union of the basis elements of the Ej to obtain a basis for E, T has the matrix

This means the matrices A are put together corner-to-corner diagonally as indi-
cated, all other entries in A being zero. (We adopt the convention that the blank
entries in a matrix are zeros.)
For direct sums of operators there is the useful formula:
Det(T1 e * * * e Tn) = (Det Ti) (Det Tn),
and the equivalent matrix formula:
Det diag(Al, . . . , A n ) = (Det AI) - (Det An).
Also :
Tr(T1 q . 9 CB Tn)+ + Tr(Tn),
= Tr(TI) * . *

and
Tr diag(Al, . . . , An) = Tr(AI) + - + Tr(An). *

We identify the Cartesian product of Rm and Rnwith Rm+"in the obvious way.
If E C Rm and F C Rn are subspaces, then E X F is a subspace of Rm+nunder
this identification. Thus the Cartesian product of two vector spaces is a vector space.

52. Real Eigenvalues

Let T be an operator on a vector space E. A nonzero vector x E E is called a


(real) eigenuector if Tx = ax for some real number a. This a is called a real eigen-
ualue; we say x belongs to a.
Eigenvalues and eigenvectors are very important. Many problem in physics
and other sciences, as well as in mathematics, are equivalent to the problem of
finding eigenvectors of an operator. Moreover, eigenvectors can often be used to
find an especially simple matrix for an operator.
The condition that a is a real eigenvalue of T means that the kernel of the operator
T - aZ: E - t E
is nontrivial. This kernel is called the a-eigempace of T ; it consists of all eigen-
vectors belonging to a together with the 0 vector.
To find the real eigenvalues of T we must find all real numbers X such that
(1) Det(T - AZ) = 0.
&a. REAL EIGENVALUES 43

(See Part E of the previous section.) To do this let A be a representative of T . Then


(1) is equivalent to
(2) Det(A - XI) = 0.
We consider X as an indeterminate (that is, an “unknown number”) and compute
the left-hand side of (2) (see Appendix I ) . The result is a polynomial p ( X ) in A,
called the characteristic polynomial of A . Thus the real eigenvalues of T are exactly
the real roots of the p ( X ) . Actually, p ( X ) is independent of the basis, for if B is
the matrix of T in another basis, then
B = &A&-’
for some invertible n X n matrix Q (Section 1, Part D). Hence
Det ( B - XI) = Det (&A&-’ - XI)
= Det(&(A - X I ) & - ’ )
= Det(A - X I )
(Section 1, Part E). We therefore call p ( X ) the characteristic polynomial of the
operator T . Note that the degree of p ( X ) is the dimension of E.
A complex root of the characteristic polynomial is called a complex eigenvalue
of T . These will be considered in Section 4.
Once a real eigenvalue a has been found, the eigenvectors belonging to a are
found by solving the equation
(3) (A - XI)X = 0.
By (2) there must exist a nonzero solution vector x . The solution space of (3) is
exactly the a-eigenspace.
Example. Consider the operator A = [-:
3 on R2,used to describe a differen-
tial equation (4) in Chapter 1. The characteristic polynomial is

Det [j -
-6 -4-X
= (X - 2)(X + 1).
The eigenvalues are therefore 2 and - 1. The eigenvectors belonging to 2 are solu-
tions of the equations ( T - 21)s = 0, or

which, in coordinates, is
3x1 + 3x2 = 0,
-621 - 6x2 = 0.
44 3. LIN E AR SYSTEMS: CONSTANT COEFFICIENTS, REAL EIGENVALUES

The solutions are


21 = 1, 5 2 = -i?; t E R.
Thus the vector
fi = (1, -1) E R2
is a basis for the eigenspace belonging to the real eigenvalue 2.
The - 1 eigenspace comprises solutions of
(A + Z)X = 0,
or

This matrix equation is equivalent to the pair of scalar equations


6x1 -k 3x2 = 0,
-621 - 3x2 = 0.
It is clear that (-1, 2) is a basis for the solution space. Therefore the vector
f2 = ( - 1, 2) E R2is a basis for the ( - 1)eigenspace of T.
The two vectors
f1 = (lJ - l ) J fa = (-lJ 2,
form a new basis (fl,fa)for R2.In this basis T has the diagonal matrix

Note that any vector x'= (a, a) in R2can be written in the form ylfi y&; +
then x = (yl- 92, --yl +2y2)using the definition of the fi. Therefore (yl, 92) are
the coordinates of x in the new basis. Thus

x=By, B=[-l
1 -
i].
This is how the diagonalizing change of coordinates was found in Section 1 of Chap-
ter 1.
Example. Let T have the matrix [i -:I. The characteristic polynomial is

Hence T has no real eigenvalues.


$2. REAL EIGENVALUES 45

If a real eigenvalue a is known, the general procedure for finding eigenvectors


belonging to a are found as follows. Let A be the matrix of T in a basis 63. The matrix
equation ( A - a I ) z = 0 is equivalent to the system of linear equations
(a11 - a>xi+ alzzz + + ainxn 0,
* * * =

a~lzl+ (an - + + = 0,
~)ZZ * * * anxn

anizi + - + an,n-ien-i + (arm - a ) z n


* * = 0.
The vanishing of Det(A - aZ) guarantees a nonzero solution z = (a,. . . , z,).
Such a solution is an eigenvector for a,expressed in the basis 63.
A very fortunate situation occurs when E has a basis (fl, . . . ,f,} such that each
f;is an eigenvector of T . For the matrix of T in this basis is just the diagonal matrix
D = diag{al, . . . , a,], that is,

all other entries being 0. We say T is diagonalizable.


It is very easy to compute with D. For example, if x E E has components
(51, . . . , z,) , that is, x = C zif;, then Tx = (alxl, . . . , an%,).The kth power
Dk = D D (k factors) is just diag(a:, . . . , a",.

An important criterion for diagonalizability is the following.

Theorem 1 Let T be an operator for an n-dimensional vector space E. If the charac-


teristic polynomial o j T has n distinct real roots, then T can be diagonalized.
Proof. Let el, . . . , en be eigenvectors corresponding to distinct eigenvalues
. . , a,. If el, . . . , en do not form a basis for E , order them so that el, . . . , em
< n. Then en = xy'l tjej; and
a1, ,
is a maximal independent subset, m
m
0 = ( T - anZ)en = C t j ( T - a,I)ej
i-1

m
= C t j ( Tej - ane,)
i-1
46 3. LINEAR SYSTEMS : CONSTANT COEFFICIENTS, REAL EIGENVALUES

Since ej, . . . , em are independent,


tj(aj - an) = 0, j = 1, . . . , m.
Since a, # a, by assumption, each t j = 0. Therefore, en = 0, contradicting en
being an eigenvector. Hence ( e l , . . . , en] is a basis, so T is diagonalizable.

The following theorem interprets Theorem 1 in the language of matrices.

Theorem 2 Let A be an n X n matrix having n distinct real eigenvalues A l l . . . , An.


Then there exists an invertible n X n matrix Q such that
QAQ-’ = diag(A1,. . . , An)*
Proof. Let ( e l , . . . , e n ) be the standard basis in Rn with corresponding co-
ordinates ( X I , . . . , x , ) . Let T be the operator on Rnwhere the matrix in the stand-
ard basis is A . Suppose (f1, . . . , fn) is a basis of eigenvectors of T , so that Afj =
Afj, j = 1, . . . , n. Put f j = (fJi,. . . , f i n ) . If Q is the matrix whose jth column is
j j , then QAQ-l is the matrix of T in the basis (fit . . . , f,), as shown in Part D of
Section 1. But this matrix is diag{Al, . . . , A,).

We will often use the expression “ A has real distinct eigenvalues” for the hypothesis
of Theorems 1 and 2.
Another useful condition implying diagonalizability is that an operator have a
symmetric matrix (aij = aj;) in some basis; see Chapter 9.
Let us examine a general operator T on R2for diagonalizability. Let the matrix
[z
be :I; the characteristic polynomial ~ T ( A ) is

Det [” -c A d-A
( a - A) ( d - A) - bc

A2 - (a + d)X + (ad - b c ) .
+
Notice that a d is the trace Tr and ad - bc is the determinant Det. The roots
of p ~A )(, and hence the eigenvalues of T , are therefore
+[Tr f (Tr2- 4 Det)l’z].
The roots are real and distinct if Tr2 - 4 Det > 0; they are nonreal complex c o n
jugates if Tr - 4 Det < 0; and there is only one root, necessarily real, if Tr -
4 Det = 0. Therefore T is diagonalizable if Tr2 - 4 Det > 0. The remaining case,
Tr2 - 4 Det = 0 is ambiguous. If T is diagonalizable, the diagonal elements are
eigenvectors. If p~ has only one root a,then T has a matrix [t 3. Hence T = al.
But this means any matrix for T is diagonal (not just diagonalizable) ! Therefore
when Tr2 - 4 Det = .O either every matrix for T,or no matrix for T , is diagonal.
[:
The operator represented by 3 cannot be diagonalized, for example.
g3. DIFFERENTIAL EQUATIONS WITH REAL, DISTINCT EIGENVALUES 47

$3. Differential Equations with Real, Distinct Eigenvalues

We use the results of Section 2 to prove an important result.

Theorem 1 Let A be an operator on R" having n distinct, real eigenvalues. Then


for all xo E Rn,the linear diflerential equation
(1) 2' = Ax; x(0) = xo,
has a unique solution.
Proof. Theorem 2 of Section 2 implies the existence of an invertible matrix Q
such that the matrix &A&-' is diagonal:
&A&-' = diag(X1, . . . , An] = B,
where XI, . . . , An are the eigenvalues of A. Introducing the new coordinates y = Qx
in Rn, with x = Q-'y, we find
y' = Qx' = &AX = QA(Q-'y)
so
(2) y' = By.
Since B is diagonal, this means
(2') yi' = Xiyi; i = 1, . . . ,n.
Thus (2) is an uncoupled form of ( 1 ) . We know that (2') has unique solutions for
every initial condition yi(0) :
ydt) = yd0) exp(th).
To solve (1) , put y(0) = Qxo. If y ( t ) is the corresponding solution of (2) , then
the solution of (1) is
x(t) = Q - ' y ( t ) .
More explicitly,
x(t) = Q-'(y1(0) e x p ( W , . . .,~ ~ ( exp(Lt)).
0 )
Differentiation shows that
5' = Q-ly' = &-'By

= Q-'(QAQ-')y
= A&-'y;

X' = A X .
4s 3. LINEAR SYSTEMS: CONSTANT COEFFICIENTS, REAL EIOENVALUES

Moreover,
~ ( 0=
) Q-'y(O) = Q-'Qzo = zO.

Thus z ( t ) really does solve (1).


To prove that there are no other solutions to (1), we note that z ( t ) is a solution
to (1) if and only if Qs(t) is a solution to
(3) 9' = BY, y(0) = Qzo.
Hence two different solutions to (1) would lead to two different solutions to (3),
which is impossible since B is diagonal. This proves the theorem.

It is important to observe that the proof is constructive; it actually shows how


to find solutions in any specific case. For the proof of Theorem 1 of Section 2
shows how to find the diagonalizing coordinate change Q (or Q-l) . We review this
procedure.
First, find the eigenvalues of A by finding the roots of the characteristic poly-
nomial of A . (This, of course, may be very difficult.) For each eigenvalue Xi find a
corresponding eigenvector f i by solving the system of linear equations corresponding
to the vector equation
( A - XJ)fi = 0.
(Thisis purely mechanical but may take a long time if n is large.) Write out each
eigenvector f i in coordinates:
fi = (pil, - J pin),

obtaining a matrix P = [pij]. Then the yi are defined by the equation


(4)

or
2 = Pty.
Note the order of the subscripts in (4) ! The ith column of P t consists of the coordi-
nates of fi. The matrix Q in the proof is the inverse of Pt. However, for some pur-
poses, it is not necessary to compute Q.
In the new coordinates the original differential equation becomes
(5) y' = X,yi, i = 1, . . . , n,
so the general solution is
yi(t) = aiexp(tXi); i = 1, . . . , n,
where al, . . . , a,, are arbitrary constants, ai = yi(0). The general solution to the
original equation is found from (4) :
(6) z,(t) = Ci pijuiexp(tXi); j = 1, . . . , n.
$3. DIFFERENTIAL EQUATIONS WITH REAL, DISTINCT EIQENVALUES 49

This substitution is most easily done by matrix multiplication


x(t) = Pty(t),
writing z ( t ) and 2/ ( t ) as column vectors,
a1 exp ( t X d

an exp (&)
To find a solution z ( t ) with a specified initial value
s(0) = u = (u1, . . . , Un),
one substitutes t = 0 in (6), equates the right-hand side to u,and solves the result-
ing system of linear algebraic equations for the unknowns (al,. . . , an) :
(7) C pi,ai = uj; j = 1, . . . , n.
i

This is equivalent to the matrix equation


Pta = u ; a = (a1) . . . , a,,).
Thus a = (Pt)--'u.Another way of saying this is that the initial values z(0) = u
corresponds to the initial value y(0) = (Pt)-'u of ( 5 ) . If one isinterested only in a
specific vector u,it is easier t o solve (7) directly then to invert the matrix Pt.
Here is a simple example. Find the general solution to the system
(8) xi = 21,

2: = 51 + 2x2,
Z$ = ~1 - xa.
The corresponding matrix is

Since A is triangular,
Det(A - XI) = (1 - A)(2 - A ) ( - 1 - A).
Hence the eigenvalues are 1, 2, - 1. They are real and distinct, so the theorem
applies.
The matrix B is
50 3. LINEAR SYSTEMS: CONSTANT COEFFICIENTS, REAL EIGENVALUES

In the new coordinates the equivalent differential equation is


y: = y1,
yl = 2y2,
9: = -ya,
which has the solution
Yl(0 = Wt,

ydt) = b@',
ya(t) = ce-', a, b, c arbitrary constants.

To relate the old and new coordinates we must find three eigenvectors f1, fi, fa
of A belonging respectively to the eigenvalues 1, 2, -1. The second column of A
shows that we can take
f2 = (0,1 , 0 > ,
and the third column shows that we may take
f: = (O,O, 1).
To find fl = (u1, vz, u8) we must solve the vector equation
( A - 1)ji = 0,
or

:
1 0 -2
:][:]=o;
us

this leads to the numerical equation


Ul + u2 = 0,
111 - 2ua = 0.
Any nonzero solution will do; we take g = 2, = -2, us = 1. Thus
ft = (2, -2, 1).
The matrix Pt has for its columns the triples f1, fi, fa:
53. DIFFERENTIAL EQUATIONS WITH REAL, DISTINCT EIGENVALUES 51

From x = Pty we have

hence
(9) x l ( t ) = 2ae1,
x 2 ( t ) = -2ae' + bg',
za(t) = aer + ce-',
where a, b, c are arbitrary constants.
The reader should verify that (9) is indeed a solution to (8).
To solve an initial value problem for (8), with
Zi(0) = ui; i = 1, 2, 3,
we must select a, b, c appropriately.
From (9) we find
a ( 0 ) = 2a,
x2(0)= -2a + b,
xs(0) = a + c.
Thus we must solve the linear system
(10) 2a = u1,

-2a + b = UZ,
a + c = u3,
for the unknowns a, b, c. This amounts to inverting the matrix of coefficientsof the
left-hand side of (10),which is exactly the matrix Pt. For particular values of UI,UZ,
us,it is easier to solve (10) directly.
This procedure can, of course, be used for more general initial values, z(t0) = u.
The following observation is an immediate consequenceof the proof of Theorem 1.

Theorem 2 Let the n x n matrix A have n distinct real eigenvalues X,I .. ., An.

Then every solution to the diferential equution


2' = Ax, z(0) = u,
is of the form
exp(tX1)
si(t) = c i ~ + - .- + ci,, exp(tXn); i = 1, . . . , n,
for unique constants cil, . . . , ci,, depending on u.
52 3. LINEAR SY STEMS: CONSTANT COEFFICIENTS, REAL EIGENVALUES

By using this theorem we get much information about the general character of
the solutions directly from the knowIedge of the eigenvalues, without explicitly
solving the differential equation. For example, if all the eigenvalues are negative,
evidently
lim ~ ( 1 )= 0
t-oa

for every solution z ( t ), and conversely. This aspect of linear equations will be
investigated in later chapters.
Theorem 2 leads to another method of solution of (1).Regard the coefficients ci,
a~ unknowns; set
si(t) = C cijexp(tXj); i = 1,. . . , n,
i

and substitute it into


2' = Az, r(0) = 21.
Then equate coefficients of exp(lX,) and solve for the cij. There results a system
of linear algebraic equations for the cij which can always be satisfied provided
xl, . . . A, are real and distinct. This is the method of "undetermined coefficients."
As an example we consider the same system as before,
XI = 51,

2: = 21 + 23hJ
Z: = 21 - za,
with the initial condition
s(0) = ( l , O , O ) .
The eigenvalues are X1 = 1, Xz = 2, Xa = - 1. Our solution must be of the form
zl(t) = cllet + + clre-l;C U ~ '

z2(t) = czle' + Me2' + he-';


s,(t) = csle' + cne2' + cae-'.
Then from xi( t ) = 21 we obtain
cllet + 2c12ez1- c18e-' = cllet + c12e2' + c18e-'
for all values of t. This is possible only if
c12 = c1g = 0.
(Differentiate and set t = 0 . ) From 2: = + we get
$1 222

ale' + 2cnezt - cse-' = (cll + 2cz1)el + (c122c22)8' + (cia + 2cn)e-'.


$3. DIFFERENTIAL EQUATIONS WITH REAL, DISTINCT EIGENVALUES 53

Therefore
czl = Cll + 2cz1,

2CB= c12 + 2%
-Czs = Cis + ~Qs,
which reduces to
czl = -Cn,
ha = 0.
From xi = x1 - xs we obtain
cad + 2 ~ ~ -8 1case-' = (c11 - cde' + (c12 - c d 8 ' + (CIS - cde-'.
Therefore
Ca1 = c11 - CSl,
2cm = c12 - cm,
-cas = ClS - caa,
which boils down to
c8l = $Cll,

Cm = 0.
Without using the initial condition yet, we have found
xl(6) = c d ,

q(t) = -cllet + cne?'


xdt> = 3 c d + case-',
which is equivalent to (9). From (zl(0), x2(0), xs(0)) = (1, 0,O) we find
c11 = 1, CB = 1, caa = -3.
The solution is therefore
x(t) = (e', -el + eat, te' - $8').
We remark that the conclusion of Theorem 2 is definitely false for some operators
with real, repeated eigenvalues. Consider the operator A = !], whose only eigen- [:
value is 1 and the system x' = A x :
x: = 21,
xi = 51 21. +
Obviously, zl(t) = ut?, a = constant, but there is no constant b such that a(t) =
be' is a solution to (11). The reader can verify that in fact a solution is
q ( t ) = ae',
z2(t) = e'(at + b),
a and b being arbitrary constants. All solutions have this form; see Problem 3.
3. LINEAR SYSTEMS: CONSTANT COEFFICIENTS, REAL EIGENVALUES

PROBLEMS

1. Solve the following initial value problems:


(a) 2' = -x, (b) S: = 221 +
+
~ 2 ,

y' = z + 2y; 22' = 21 22;


x(0) = 0 , y(0) = 3. a(l)= 1,22(1) = 1.
(c) z' = AX; (d) 5' = Ax,
40) = (3901; 40) = ( 0 , -b, b ) ,

Lo 2 -3J

2. Find a 2 X 2 matrix A such that one solution to x' = Ax is


x(i) = (e" - e-', e2' + 2e-9.
3. Show that the only solution to
2: = 21,

z: = 21 + x2;
x1(0) = a,

z2(0) = b,
is
xl(f) = ae',
x2(t) = e'(b + at).
(Hint: If ( yl ( t ), y2( 1 ) ) is another solution, consider the functions e-'yl ( t ) ,
e-'yz(t).)
4. Let an operator A have real, distinct eigenvalues. What condition on the eigen-
values is equivalent to 1imt+- I z ( t ) I = 03 for every solution x ( t ) to z' = Ax?
5. Suppose the n X n matrix A has real, distinct eigenvalues. Let 1 -+ +( 1, 50)
be the solution to x' = Ax with initial value d(0, zo) = zo.
(a) Show that for each fixed t,
lim + ( t , YO) = d(t, ZO).
Yo-20

This means solutions are continuous in initial conditions. (Hint: Suppose


A is diagonal.)
(b) Improve (a) by finding constants A 2 0, k 2 0 such that
I + ( t , YO) - 4(t, ZO) I IAekt I YO - 50 I.
(Hint: Theorem 2.)
$4. COMPLEX EIGENVALUES 55

6. Consider a second order differential equation


(*I 2’’ + bx‘ + cz = 0; b and c constant.
(a) By examining the equivalent first order system
5’ = y,
y‘ = -a - by,
show that if b2 - 4c > 0, then (4) has a unique solution z ( t ) for every
initial condition of the form
z(0) = u, x’(0) = 2).

(b) If b2 - 4c > 0, what assumption about b and c ensures that


limz(t) = 0
f-a,

for every solution z ( t ) ?


(c) Sketch the graphs of the three solutions of
- 3s’ 4- 22 = 0
for the initial conditions
x(0) = 1, s’(0) = -1, 0, 1.
7. Let a 2 X 2 matrix A have real, distinct eigenvalues X, LC.Suppose an eigen-
vector of X is ( 1 , O ) and an eigenvector of p is (1, 1). Sketch the phase portraits
of z’ = Ax for the following cases:
(a) O < A < p ; (b) O < p < X ; (c) X < p < 0 ;
(d) X < 0 < p ; (e) A = 0 ; p > 0.

$4. Complex Eigenvalues

A class of operators that have no real eigenvalues are the planar operators T.,b:
R2 --f R* represented by matrices of the form Aa,6 = [f:3, b # 0. The charac-
teristic polynomial is
X2 - 2uX + +
(a2 b2) ,
where roots are
a ib, + a - ib; i = 47.
\Ve interpret Ta,bgeometrically as follows. Introduce the numbers r , 0 by
r = (a2 + b2)
e = arc cos (:), cos = -.UT
56 3. LINEAR SYSTEMS: CONSTANT COEFFICIENTS, REAL EIQENVALUES

Then: Providing b > 0 , T a , b i s a counterclockwise Tolation through 0 radians followed


by a stretching (or shrinking) of the length of each vector by a factor of r.
That is, if R, denotes rotation through 0 radians, then
To,b(X) =; ~ R , ( x =) R ~ ( T x ) .
To see this first observe that
a = r cos 8, b = r sin 8.
In the standard basis, the matrix of Re is

the matrix of scalar multiplication by


[
cos e -sine
sine
T
cose I;
is T I = [:23. The equality
cose
sine
-sine
cos8 1
yields our assertion.
There is another algebraic interpretation of Ta.b. Identify the plane RZwith the
field of complex numbers under the identification
( x , Y > *z + iY.
Then with this identification, the operator To.b corresponds to multiplication by
+
a ib:
+
I I
(x,Y ) -2 iY

operate by Ta,b multiply by a + ib


(az - by, bx + ay) (aa:
c--) - by) + i ( b z + a y )
+
Notice also that r is the norm (absolute value) of a bi and 8 is its argument.
Readers familiar with complex functions will recall the formula a ib = rei@(see +
Appendix I).
The geometric interpretation of Ta.bmakes it easy to compute with. For example,
to compute the pth power of T,,b:
(To.blP = (rZ)P(Re)P = (rpI) ( R # )
rp coa pB -rp sin pe
r p sin pe r p cos pB

Next, we consider the operator T o n RS where the matrix is [--;I The char- .
acteristic polynomial is A2 - 2 A + 2, where roots are
l+i, 1-i.
44. COMPLEX EIGENVALUES 57

T does not correspond to multiplication by a complex number since its matrix


is not of the form A,,&.But it is possible to introduce new coordinates in R2-that
is, to find a new basis-giving T a matrix A a s b .
Let (xl,a ) be the standard coordinates in R2. Make the substitution
21 = y1 + yz,
so that the new coordinates are given by

- yz += XI 22.

The matrix of T in the y-coordinates is [: = Al,1. For this matrix T = el


8 = u/4. Therefore in the (yl, yZ)-plane T is rotation through r / 4 followed
with stretching by a.In the original coordinates (xl,x2) , T is a kind of “elliptical
rotation” followed by the *-stretch. If vectors in R2 are identified with complex
numbers via the y-coordinates-the vector whose y-coordinates are (yl, yz) becomes
+
y1 iyz-then T corresponds to multiplication by 1 i. +
This shows that although T is not diagonalizable, coordinates can be introduced
in which T has a simple geometrical interpretation: a rotation followed by a uniform

the roots of the characteristic polynomial, since u/4 = arg( 1 i), + *


stretch. Moreover, the amount of the rotation and stretch can be deduced from
= I 1
We shall explain in Chapter 4, Section 3 how the new coordinates were found.
+
i I.
We show now how the complex structure on R2 (that is, the identification of
R2with C) may be used to solve a corresponding class of differential equations.
Consider the system
dx
- = ux - by,
dt

dt
= bx + ay.
We use complex variables to formally find a solution, check that what we have
found solves ( 1), and postpone the uniqueness proof (but see Problem 5 ) .
+ +
Thus replace (5,y) by z i y = z, and -.] by a bi = p. Then (1) becomes
(2) z‘ = pz.

Following the lead from the beginning of Chapter 1, we write a solution for (2),
z ( t ) = K e * p . Let us interpret this in terms of complex and real numbers. Write
+ +
the complex number K as u i v and set z ( t ) = x ( t ) i y ( t ) , el@ = etaeitb.A stand-
ard formula from complex numbers (see Appendix I ) says that eifb= cos tb +
i sin tb. Putting this information together and taking real and imaginary parts we
58 3. LINEAR SYSTEMS: CONSTANT COEFFICIENTS, REAL EIGENVALUES

obtain
(3) x(1) = uetacos tb - veto sin tb,
y ( t ) = uetasin Ib + vela cos tb.
The reader who is uneasy about the derivation of (3) can regard the preceding
paragraph simply as motivation for the formulas (3) ; it is easy to verify directly
by differentiation that (3) indeed provides a solution to ( 1 ) . On the other hand,
all the steps in the derivation of (3) are justifiable.
We have just seen how introduction of complex variables can be an aid in solving
differential equations. Admittedly, this use was in a very special case. However,
many systems not in the form (1) can be brought to that form through a change
of coordinates (see Problem 5 ) . In Chapter 4 we shall pursue this idea systemati-
cally. At present we merely give an example which was treated before in the Kepler
problem of Chapter 2.
Consider the system
(4) 5' = y,

y' = -b'Jx; b > 0.

'1,
The corresponding matrix is

A = [-b2 0
whose eigenvalues are h b i . It is natural to ask whether A can be put in the form

through a coordinate change. The answer is yes; without explaining how we dis-
covered them (this will be done in Chapter 4), we introduce new coordinates (u,v)
by setting x = v, y' = bu. Then
1
u' = - y' = -bv,
b
V' = X' = bu.
We have already solved the system
U' = -bv,
V' = bu;
the solution with ( u ( O ) ,v ( 0 ) ) = (uo, VO) is
u ( t ) = uocos tb - vo sin Ib,
v ( t ) = uosin Ib + vo cos tb.
84. COMPLEX EIGENVALUES 59

Therefore the solution to (4) with initial condition

(S(O),Y(0)) = (20, Yo)


is

x(t) Yo
= - sin
b
tb + xo cos tb,
g ( t ) = yo cos tb - bzosin tb,
as can be verified by differentiation.
We can put this solution in a more perspicuous form as follows. Let C =
+ ) ~ and write, assuming C # 0,
[ ( ~ ~ / b x02]1/2

Then u2 + v2 = 1, and
x(t) = C[v cos tb - u sin tb].
Let to = b-l arc cos v, so that
cos bto = v , sin bto = u.
Then x ( t ) = C(cos bt cos bto - sin bt sin bto), or

(5) ) CCOS
~ ( t= b(t - to);
and
(6) y(t) = bC sin b ( t - to)
as the reader can verify; C and to are arbitrary constants.
From ( 5 ) and (6) we see that
x2 y2
-+-,=1.
c2 (bc)

Thus the solution curve (z (2) , g ( t ) ) goes round and round an ellipse.
Returning to the system (4) , the reader has probably recognized that it is equiva-
lent to the second order equation on R

(7) x” + b2x = 0,
obtained by differentiating the first equation of (4) and then substituting the
second. This is the famous equation of “simple harmonic motion,” whose general
solution is ( 5 ) .
60 3. LINEAR SYSTEMS: CONSTANT COEFFICIENTS, REAL EIGENVALUES

PROBLEMS

1. Solve the following initial value problems.


(a) x' = -9, (b) X: = -252,
yl = x ; X I = 2x1;
x ( 0 ) = 1, y(0) = 1. xt(c) = 0, s ( 0 ) = 2.
('1 "= 21, (d) X' = A X ,

-:I.
y' = -x; x ( 0 ) = ( 3 , -9);
x(0) = 1, y(0) = 1.
A = [i
2. Sketch the phase portraits of each of the differential equations in Problem 1.
3. Let A = 0 -11 and let x(t) be a solution to x' = Ax, not identically 0. The
curve x ( t ) is of the following form :
(a) a circle if a = 0 ;
(b) a spiral inward toward (0, 0 ) if a < 0, b # 0;
(c) a spiral outward away from (0, 0 ) if a > 0,b # 0.
What effect has the sign of b on the spirals in (b) and (c)? What is the phase
portrait if b = O?

4. Sketch the phase portraits of:


(a) x' = -22; (b) x' = - x + z ;
y' = 22; y' = 3y;
'2 = -2y. 2' = - x - 2.
Which solutions tend to 0 rn t -+ ao?

5. Let A be a 2 X 2 matrix whose eigenvalues are the complex numbers a fpi,


p # 0. Let B = -9.Show there exists an invertible matrix Q with QAQ-l =
B , as follows:
(a) Show that the determinant of the following 4 X 4 matrix is 0 :

[" ;Ia1 A-?:I]J

where I = [i 3.
(b) Show that there exists a 2 X 2 matrix Q such that A& = QB.
(Hint:Write out the above equation in the four entries of Q = [ q i j ]
Show that the resulting system of four-linear homogeneous equations in
the four unknowns q i j has the coefficient matrix of part (a).)
(c) Show that Q can be chosen invertible.
Therefore the system x' = Ax has unique solutions for given initial conditions.
@, COMPLEX EIGENVALUES 61

6. Let A = -3. The solutions of x' = Ax depend continuously on initial values.


(See Problem 5, Section 3.)
7. Solve the initial value problem
2' = -4y,

y' = x ;
x(0) = 0, y(0) = -7.
Chapter 4
Linear Systems with Constant
Coeficients and Complex Eigenvalues

As we saw in the last section of the preceding chapter, complex numbers enter
naturally in the study and solution of real ordinary differential equations. In gen-
eral the study of operators of complex vector spaces facilitates the solving of linear
differential equations. The first part of this chapter is devoted to the linear algebra
of complex vector spaces. Subsequently, methods are developed to study almost all
first order linear ordinary differential equations with constant coefficients, including
those whose associated operator has distinct, though perhaps nonreal, eigenvalues.
The meaning of “almost all” will be made precise in Chapter 7.

$1. Complex Vector Spaces

In order to gain a deeper understanding of linear operators (and hence of linear


differential equations) we have to find the geometric significance of complex eigen-
values. This is done by extending an operator T on a (real) vector space E to an
operator TCon a complex vector space Ec. Complex eigenvalues of T are associated
with complex eigenvectors of Tc. We first develop complex vector spaces.
The definitions and elementary properties of Rnand (real) vector spaces go over
directly to Cnand complex vector spaces by systematically replacing the real num-
bers R with complex numbers C. We make this more precise now.
Complex Cartesian space Cn is the set all n-tuples z = (zl, . . . , z,) of complex
numbers (see Appendix I for the definition of complex numbers). We call z in Cn
a complex vector or sometimes a point in Cn.Complex vectors are added exactly
like vectors in Rn (see Chapter 1, Section 2 ) . Also, if X is a complex number and
81. COMPLEX VECTOR SPACES 63

z = (21, . . . , z n ) is in C", then Xz is the vector ( h a , . . . , Xzn) ; this is scalar multi-


plication. Note that Rn is contained naturally in Cn at? the set of all (z,, . . . , zn),
where each z i is real.
The axioms VS1, VS2 of Section 1A of Chapter 3 are valid for the operations
we have just defined for Cn.They define the complex vector space structure on Cn.
As in Section l B , Chapter 3, a nonempty subset F of Cnis called a subspace or a
(complex) linear subspace if it is closed under the operations of addition and scalar
multiplication in Cn.The notions of trivial subspace, proper subspace, subspace
of a (complex) subspace are defined as in the real case; the same is true for the
concept of linear map T: F1--+ F2 between subspaces F1, F2 of Cn.One replaces
real scalars by complex scalars (that is, complex numbers) everywhere. A complex
vector space will mean a subspace of Cn.
The material on kernels and images of linear maps of complex vector spaces
goes over directly from the real case as well as the facts about bases, dimension,
coordinates. Propositions 1, 2, and 3 of Section l B , Chapter 3, are all valid for the
complex case. In fact, all the algebraic properties of real vector spaces and their
linear maps carry over to complex vector spaces and their linear maps. In par-
ticular, the determinant of a complex operator T, or a complex n X n matrix, is
defined (in C) . It is zero if and only if T has a nontrivial kernel.
Consider now an operator on Cn,or more generally, an operator T on a complex
vector space F C Cn.Thus T: F --f F is a linear map and we may proceed to study
its eigenvalues and eigenvectors as in Section 2 of Chapter 3. An eigenvalue X of
T is a complex number such that Tv = Xv has a nonzero solution v E F. The vector
u of F is called an eigenvector belonging to X. This is exactly analogous to the real
case. The methods for finding real eigenvalues and eigenvectors apply to this com-
plex case.
Given a complex operator T as above, one associates to it a polynomial
p(X) = Det(T - X I )
(now with complex coefficients) such that the degree of p ( X ) is the dimension of
F and the roots of p are exactly the eigenvalues of T.
The proof of Theorem 1 of Section 2 in the previous chapter applies to yield:

Theorem Let T: F --f F be an operator on an n-dimensional complex vector space

F. Zf the characteristic polynomial has distinct roots, then T can be diagonalized.


This implies that when these roots are distinct, then one may find a basis ( e l , . . . , en1
of eigenvectors for T so that if z = CF1 zjej is in F , then Tz = Z;-1 Xjzjej; ej is the
eigenvector belonging to the (complex) eigenvalues 1,.

Observe that the above theorem is stronger than the corresponding theorem
in the real caae. The latter demanded the turther substantial condition that the
roots of the characteristic polynomial be real.
Say that an operator T on a complex vector space is semisimple if it is diagonal-
64 4. LINEAR SYSTEMS: CONSTANT COEFFICIENTS, COMPLEX EIGENVALUES

ieable. Thus by the theorem above T is semisimple if its characteristic polynomial


has distinct roots (but not conversely as we shall see in Chapter 6 ).
As we have noted, Rn C Cn.We consider now more generally the relations be-
tween vector spaces in R n and complex vector spaces in Cn.Let F be a complex
subspace of Cn.Then FR = F n Rn is the set of all n-tuples (zl, . . . , z,) that are
in F and are real. Clearly, FR is closed under the operations of addition as well
as scalar multiplication by real numbers. Thus FR is a real vector space (subspace
of Rn).
Consider now the converse process. Let E C R nbe a subspace and let Ec be the
subset of Cn obtained by taking all linear combinations of vectors in E , with complex
coefficients. Thus

and Ec is a complex subspace of Cn.Note that ( E c ) R = E . We call EC the corn-


plexification of E and FR the spa.ce of real vectors in F.
In defining Ec, FR we used the fact that all the spaces considered were subsets
of Cn. The essential element of structure here, besides the algebraic structure, is
the operation of complex conjugation.
Recall if z = x + iy is a complex number, then Z = x - iy. We often write
2 = U ( Z ) so that U : C -+ C as a map with the property u2 = u u = identity. The
0

set of fixed points of u, that is, the set of z such that a(%) = z, is precisely the set
of real numbers in C.
This operation U , or conjugation, can be extended immediately to C" by defining
u: C n --f Cn by conjugating each coordinate. That is,

u(z~, . , zn) = (El, .. 1 2,).

For this extension, the set offixed points is Rn.


Note also that if F is a complex subspace of Cn, such that u F = F , then the set
of fixed points of u on F is precisely FR. This map u plays a crucial role in the rela-
tion between real and complex vector spaces.
Let F C Cnbe a u-invariant linear subspace of Cn.Then it follows that for v E F
X E C, u ( h v ) = u ( X ) u ( v ) or if we write u(w) = tij for w E F , XV = XO. Thus u is
+
not complex linear. However, u ( v w) = u ( u ) +
u (w) .
It follows that for any subspace F C Cn,
FR = (Z E F I U(Z) = z).
In terms of u it is easy to see when a subspace F C Cn can be decomplexified,
that is, expressed in the form F = Ec for some subspace E C Rn:F can be de-
complexified if and only if u ( F ) C F. For if u ( F ) C F , then x - i y E F whenever
x + ig E F with x , y E Rn;so x E F because

x = +C(z + i y ) + ( x - iy)].
$1. COMPLEX VECTOR SPACES 65

Similarly, y E F. It follows easily that F = FRC,that is, F is the complexification


of the space of real vectors in F. The converse is trivial.
Just as every subspace E C Rn has a complexification Ec C Cn, every operator
T : E --f E has an extension to a complex linear operator
T c : Ec + Ec,
called the compZexi,ficationof T . To define Tc, z E Ec, let
(1) z = CX j ~ j ; X j E C , ~j E E .
Then
TCZ = C XjTxj.
It is easy to see that this definition does not depend on the choice of the representa-
tion (1).
If { e l , . . . , e k J = (B is a basis for E l it is also a basis for the complex vector space
Ec; and the @-matrix for T c is the same as the &matrix for T .
In particular, if T E L ( R n ) is represented by an n X n matrix A (in the usual
way), then Tc E L(Cn) is also represented by A .
The question arises as to when an operator Q : Ec --f Ec is the complexification
of an operator T : E -, E .
Proposition Let E C Rn be a real vector space and Ec C Cn its complexification.
Z f Q E L ( E c ) then Q = Tc for some T E L ( E ) if and only if
Qu = uQ,
where u : Ec --f Ec is conjugation.
Proof. If Q = Tc, we leave it to the reader to prove that Qu = uQ. Conversely,
assume Q commutes with u. Then Q ( E ) C E ; for if x E El then UJ: = x , hence
UQX= QUX= QZ
so
Q x E ( ~ € E c ( u y = y= ]E c R = E .
Let Q I E = T € L ( E ); it is clear from the definition of Tc that T c = Q.
We close this section with a property that will be very important in later chapters.
An operator T on a real vector space E is semisimple if its complexification Tc is a
diagonalisable operator on Ec. Then the theorem proved earlier implies that a
sufficient (but not necessary) condition for semisimplicity is that the characteristic
polynomial should have distinct roots.

PROBLEMS

1. Let F C C* be the subspace spanned by the vector ( 1 , i ) .


(a) Prove that F is not invariant under conjugation and hence is not the
complexification of any subspace of R2.
(b) Find FR and ( F R ) c .
66 4. L I N E A R SYSTEMS: CONSTANT COEFFICIENTS, COMPLEX EIGENVALUES

2. Let h' C R" and F C C n be subspaces. What relations, if any, exist between
dim li and dim Ec? Brtween dim F and dim FR?
3. If F C CrBis any subspace, what relation is there between F and FRC?
4. Let L' br a real vector space and T E L(L').Show that (Iier T ) c = I<er(T c ) ,
(Im T ) c = I m( 5 ° C ) , and (T-l)c = (Tc)-' if T is invertible.

$2. Real Operators with Complex Eigenvalues

We move toward understanding the linear differential equation with condant


coefficients
dx
-= Tx,
dl

where T is an operator on Rn.For this purpose, we study further the eigenvalues


and rigenvectors of T . This was done thoroughly in Chapter 3 assuming that all
the eigenvalues were distinct and real. Now we drop the hypothesis that the eigen-
values must be real.

Proposition. If T is an operator on a real vector space E , then the set of its eigen-
values is preserved under complex conjugation. Thus if >.is an eigenvalue so is X. Con-
sequently, we may write the eigenvalues of T as

X1, . . . , A,, all real;

PI, PI, . . . , p s , iie, all nonreal.


Proof. IGrst, observe that the eigenvalues of T coincide with the eigenvalues
of its complexification Tc because both T and Tc have the same characteristic
polynomial. Let X be an eigcnvalue of Tc and Q a corresponding rigenvector in
Ec, so TCQ= Xv. Applying the conjugation operation u to both sides, we find

But, by the proposition of Section 1,

Hence

I n other words, X is an eigenvalue of Tc with corresponding eigenvector 9. This


proves the proposition. (Another proof is based on the fact that the characteristic
polynomial of T has real coefficients, so the roots occur in conjugate pairs.)
The basic properties of real operators are contained in the following three
theorems.
$3. HEAL OPERATORS WITH COMPLEX EIGENVALUES 67

Theorem 1 Let T : E -+ E be a real operator with distinct eigenvalues listed as i n


the previous proposition. T h e n E and T have a direct sum deconiposition (see Section
1F of Chapter 3) ,
E = E , @ E b’, T = T, @ Tb, T a :Ea+ I?,, Tb: Eh +Eb,
where T , has real eigenvalues and Th nonreal eigenvalues.

For the proof we pass to the complexification T c and apply the theorem of the
preceding section together with the above proposition. This yields a basis for Ec
(el, . . . , e,, f l J f l J . . . , fa, Is)
of eigcnvectore of T c corresponding to the eigenvalues
(AIJ * . . 7 XVJ M1J p1J . .. 9 pS, as).
Now let F , bc the complex subspace of EC spanned by (el, , . . , er) and F b be
the subspace spanned by (fl,f2, . . . , f s , fs I. Thus Fa and Fb are invariant subspaces
for T c on EC and form a direct sum decomposition for Ec,
Ec = Fa @ Fb.

RIorcovcr F, and F b are invariant undrr complex conjugation. Set E, = E n Fa and


Eh = E n F b ; thrn F , , Fb are the compkxifications of E,, &, and E = EJ,@ Eb. It
is ( m y to s w that E , and Eb have the rcquircd properties.
Theorem 1 reduces the study of such T to T , and Tb. The previous chapter ana-
lyzed T,.
We remark that Theorem 1 provides an “uncoupling” of the differential equation
dx
- = TX
dt
mentioned a t the beginning of the section. We may rewrite this equation as a pair
of equations
dX,
- = Taxa,
dt

where T a ,Tb are as above and X a E E n , X b E Eb.


We proceed to the study of the operator Tb.

Theorem 2 Let T : I3 -+ E be an operator on a real vector space with distinct non-


real eigenvalues ( p l Jp l J. . . , pa, p s ) . Then there i s an. invariant direct s i m decomposition
for h’ and a corresponding direct sujn deconiposition for T ,
E = El a3 . . . @ E,,
T = Ti Q Q T,,
such that each LC, is two ditiiemional and T , E L ( E , ) has eigenvalues p , , p,.
68 4. LINEAR SYSTEM8 : CONSTANT COEFFICIENTB, COMPLEX EIOENVALUES

For the proof of Theorem 2, simply let Fi be the complex subapace of EC spanned
by the eigenvectors, fi, ji correaponding to the eigenvalues pi, pi. Then let Ei be
Fc n E. The rest follows.
Theorems 1 and 2 reduce in principle the study of an operator with distinct eigen-
values to the case of an operator on a real two-dimensional vector space with nonreal
eigenvalues.

Theorem 3 Let T be, an operaior on a twodimaaional vector apace E C Rn with


+
m r e a l eigenvalues p , p, p = a ib. Then there is a matrix representation A for T

A = [; -3.
The study of such a matrix A and the corresponding differential equation on
R*,&/dt = Ax, waa the content of Chapter 3, Section 4.
We now give the proof of Theorem 3.
Let T c :EC +EC be the complexification of T . Since TChas the same eigenvaluea
+
aa T I there are eigenvectors cp, in EC belonging to p, p, reapectively.
+ +
Let (p = u iu with upu E Rn.Then = u - iu. Note that u and u are in Ec, for
u=3(cp+q), u = ! 2( -cp - (PI.
Hence u and u are in EC n Rn = E. Moreover, it is easy to see that u and v are
independent (use the independence of (p, q ) . Therefore ( u , u ) is a basis for E.
To compute the matrix of T in this basis we start from
+
T C ( U iu) = ( a + bi)(u+ iu)
= ( - b u + au) + i(au + bu).

T c ( u + i u ) = Tu + iTu.
Therefore
Tu = au + bu,
Tu = -bu + au.
This means that the matrix of T in the baais ( u , u ) is F 31, completing the p m f .
In the c o m e of the proof we have found the following interpretation of a complex
eigenvalue of a real operator T E L ( E ) , E C Rn:

+
Corollary Let cp E Ede an eigenveclor of T belonging to a ib, b # 0. If (p = u +
w E Cn,then ( u , u ) is a baais for E girring cp the matriz [i3.
Note that u and v can be obtained directly from cp and u (without reference to
0 )by the formulas in the proof of Theorem 3.
$3. APPLICATION OF COMPLEX LINEAR ALGEBRA TO DIFFERENTIAL EQUATIONS 69

PROBLEM

For each of the following operators T on R3find an invariant two-dimensional


E C R3 and a basis for E giving T I E a matrix of the form [: $1:

$3. Application of Complex Linear Algebra to Differential Equations

Consider the linear differential equation on R"

dx
- = Tx,
dt

where T is an operator on Rn (or equivalently, an n X n matrix). Suppose that


T has n distinct eigenvalues. Then Theorems 1, 2, and 3 of the previous section
apply to uncouple the equation and, after finding the new basis, one can obtain
the solution. Letting E = R",we first apply Theorem 1 to obtain the following sys-
tem, equivalent to ( 1 ) :

Ta has real eigenvalues, and Tb nonreal eigenvalues.


Note that (2a) and (2b) are equations defined not on R",but on subspaces E,
and Eb. But our definitions and discussion of differential equations apply just &B
well to subspaces of Rn.To find explicit solutions to the original equations, bases
for these subspaces must be found. This is done by finding eigenvectors of the
complexification of T,&s will be explained below.
If we obtain solutions and properties of (2a) and (2b) separately, corresponding
information is gained for (2) and ( 1 ) . Furthermore, (2a) received a complete
discussion in Chapter 3, Section 3. Thus in principle it is sufficient to give an analysis
of (2b). To this end, Theorem 2 of Section 2 applies to give the following system,
70 4. LINEAR SYSTEMS: CONSTANT COEFFICIENTS, COMPLEX E I G E N V A L U E S

equivalent t,o (2b) :

(3)

where T = TI e * . * CII T,, y = (y1, . . . , y8) E Eh = El e . . . CII E8 and each Ei


has two dimensions.
Thus (2b) and hence ( 2 ) , ( 1) are reduced to the study of the equation

(4) dy,-
dt
- T,y, on two-dimensional E,,

whew each T,has nonrral Pigenvalucs. Finally, Theorem 3 of Section 2 applies


to put (4) in the form of the equation analyzed in Section 4 of Chapter 3.
Example 1 Consider the equation
2; = -222,

2; = 21 + 2x2,
or

This is the matrix considered in Chapter 3, Section 4. The eigenvalues of A are


X=l+i,X=1-ii.
A complex cigenvector belonging to 1 +
i is found by solving the equation
( A - ( 1 + i ) )= ~
0
for w € C2,
-1 - i
= 0;
[ I I -2][;3
-i
or
(-1 - i)W1- 2w2 = 0,
w1+ (1 - i)WZ = 0.
The first equation is equivalent t o the second, as is seen by multiplying the second
by ( - 1 - i) . From the second equation we see that the solutions are all (complex)
multiples of any nonzero complex vector w such that WI = ( - 1 i) w2;for exam- +
+
ple, wz = -i, w1 = 1 i. Thus
+ +
w = (1 2, -i) = (1, 0) i(1, -1) = u iv +
is a complex eigenvector belonging to 1 i. +
We choose the new basis { v , u ) for R2C C2, with v = (1, - l ) , u = (1, 0).
To find new coordinates yl, y2 corresponding to this new basis, note that any x
can be written
x = Sl(1, 0) f 22(0, 1) = ylv + yzu = Yl(l1 -1) + yz(1, 0).
$3. APPLICATION O F COMPLEX LINEAN ALGEBRA TO DIFFERENTIAL EQUATIONS 71

Thus
21 =

22
y1
= -y1;
+ y2,
The new coordinates are given by
i or x = ~ y ,
P = [
1
-1

- 'I.
1

The matrix of A in the y-coordinates is

'1
1
= B,

or B = A I , in
~ the notation of Section 4,Chapter 3.
Thus, as we saw in that section, our differential equation

dx
- = AX
dt
on R2,having the form
dY
- = By
dt
in the y-coordinates, can be solved as
yl(t) = ue' cos t - vet sin t,
yt(1) = ue' sin t + ve' cos 1.
The original equation has as its general solution
xl(t) = + v ) e *cost + (u - v)e' sin t,
(u
xz(1) = -uef cos t + vet sin 1.
Example 2 Consider on R3 the differential equation

The characteristic equation Det(A - 21) = 0 is (1 - t ) ( ( 2 - Q 2 9) = 0. Its +


+
solutions, the eigenvalues for A , are X = 1, p = 2 3i, p = 2 - 32. Eigenvectors
in C3for the complexified operator are found by solving the homogeneous systems
1pI!:[]-
$2 4. LI N EA R SYSTEMS: CONSTANT COEFFICIENTS, COMPLEX EIGENVALUES

of three linear equations,

( A - x)e = 0, = 0;

this yields e = (- 10, 3, 1). Likewise

yields w = (0, i, 1 ) . A third eigenvector is 6 = (0, -i, 1).


We now wish to find the matrix P that gives a change of coordinates x = P y ,
y = P-'x where x is in the original coordinate system on R3and y corresponds to the
basis of eigenvectors. Proposition 4 of Section lC, Chapter 3, applies.
Thus

P=[
-10
; ; ;].
0 0

Here the columns of P are ( e , v , u) where w = (0,0, 1) + i(0, 1, 0) = u + iv.


Then

and

Now we have transformed our original equation x' = Ax following the outline

:I
given in the beginning of this section to obtain

y' = By, B =

[: I
0 2 -3 , IJ = P-'x.

This can be solved explicitly for y as in the previous example and from this solution
one obtains solutions in terms of the original x-coordinates by x = Py.
$3. APPLICATION OF COMPLEX LINEAR ALGEBRA TO DIFFERENTIAL EQUATIONS 73

A related approach to the equation (1) is obtained by directly complexifying it,


extending ( 1 ) to a differential equation on Cn,

dz
- = TCZ, z E C".
dt

One can make sense of ( 1 ~ as) a differential equation either by making definitions
directly for derivatives of curves R + Cn or by considering Cn as R2n,that is,
R2n ---f Cn,

(21, . . , Zn, ~ 1 , . . yn) = ( 2 , v) + z iy = Z.


3 +
Application of the theorem of Section 1 diagonalizes Tc, and one may correspond-
ingly rewrite ( l c ) as the set of differential equations,

dwi
_ - - piwi, i = 1, . . . , s.
dt

(Sometimes I,+, is written in place of wi.) Here zi, z ~ + wi


~ , are all in one-dimensional
complex vector spaces or can be regarded as complex numbers, and n = r 2s. +
These complex ordinary differential equations may be solved using properties of
complex exponentials, as in Section 4 of the previous chapter, obtaining as the
general solution :

z(t) = (zl(t)J ..., Zl+8(t)J wl(t), f * * 7 wII(t))

= ( ~ ~ e x p ( X i.t ). ,. , crexp(Xrt),
cr+1 . . . , Cr+r+i exp (pit) , . .
exp (14, ., ~r+2r -
exp ( p i t ) 1
Now it can be checked that if z ( 0 ) E R", then z ( t ) E R" for all t, using formal
properties of complex exponentials. This can be a useful approach to the study of
(1).

PROBLEM

Solve 2' = T x where T is the operator in (a) and (b) of Problem 1, Section 2.
Chapter 5
Linear Systems and Exponentials
of Operators

The object of this chapter is to solve the linear homogeneous system with con-
stant coefficients
(1) z’ = As ,
where A is an operator on R n (or an n X n matrix). This is accomplished with
exponentials of operators.
This method of solution is of great importance, although in this chapter we can
compute solutions only for special cases. When combined with the operator theory
of Chapter 6 , the exponential method yields explicit solutions for every system (1).
For every operator A , another operator eA, called the exponential of A , is defined

-
in Section 4. The function A -+ eA has formal properties similar to those of ordinary
exponentials of real numbrrs; indced, the latter is a special case of the former.
Likewise the function t e l A ( t E R) resembles the familiar eta, where a E R. In

particular, it is shown that the solutions of (1) are exactly the maps z: R + R n
given by
z ( t ) = efAK ( K E R”).
Thus we establish existence and uniqueness of solution of (1) ; “uniqueness” means
that there is only one snlution x ( t ) satisfying a given initial condition of the form
z(tn) = KO.
Exponcntials of operators are defined in Section 3 by means of an infinite series
in the opcrator space I,( Rn); t h r series is formally the same as the usual series for
e.Convergence is established by means of a special norm on L(Rn),the uniform
norm. Norms in general arc discussed in Section 2, while Section 1 briefly reviews
some basic topology in Rti.
Sections 5 and G arc devoted to two less-central types of differential equations.
One is a simple inhomogeneous system and the other a higher order equation of one
variable. We do not, however, follow the heavy emphasis on higher order equations
$1. REVIEN’ O F TOPOLOGY I N R“ 75

of some texts. In geometry, physics, and other kinds of applied mathematics, one
seldom encounters naturally any differential equation of order higher than two.
Often even the second order equations are studied with more insight after reducing
t o a first order system (for example, in Hamilton’s approach to mechanics).

$1. Review of Topology in R“

The inner product (“dot product”) of vectors x and y in R” is


(2,Y) = zlyl + * .. + xnyn.
The Euclidean norm of x is I x I = (2, x)”Z = (xI2 + . . . + x.2)1/2.Basic prop-
erties of the inner product are
Symmetry: (5, y ) = (y, x);
Bilinearity: (x + y, z ) = (5, z ) + (y, z ) ,
(ax,Y > = 4 2 , Y), a E R;
Positive dejiniteness: (x,x) 2 0 and
(2, 2 ) = 0 if and only if x = 0.
An important inequality is
Cauchy’s inequality: (2, y) 5 I x I I y 1.
To see this, first suppose x = 0 or y = 0 ; the inequality is obvious. Next, observe
that for any X
+ Xy, x + Xy) 2 0
.>
(x
or
(51 + X2(Y, Y) + 2x(x, Y) 2 0.
Writing - (2, y)/(y, y ) for X yields the inequality.
The basic properties of the norm are:
(1) I x I 2 Oand 1 x I
0 if and onlyif x
= = 0;
(2) lx+YI<I~I+lYl;
(3) lax1 = I a l l z l ;
where I a I is the ordinary absolute value of the scalar a. To prove the triangle
inequality ( 2 ), it suffices to prove
I x + Y l2 II2 l2 + I Y I* + 2 I x I I Y I.
Since
I x + y I 2 = (Z+?/,Z+Y)
= I x l2 + I Y lZ + Y),
this follows from Cauchy’s inequality.
76 5. LINEAR SYSTEMS AND EXPONENTIALS O F OPERATORS

Geometrically, I x I is the length of the vector x and

(2, Y >= Ix I I Y I case,


where 0 is the angle between x and y .
The distance between two points x , y E Rn is defined to be I x - y I = d(x, y ) .
It is easy to prove:
(4) I x - y I 2 0 and 1 x - y I = 0 if and only if x = y;
(5) I x - z I 5 I x - Y I + I Y - z I.
The last inequality follows from the triangle inequality applied to
x -2 = (5 - y) + (y- 2).

If e > 0 the e-neighborhood of x E Rn is


Be(x) = { y E Rn I I y - x I <0.
A neighborhood of x is any subset of Rncontaining an e-neighborhood of x .
A set X C Rn is open if it is a neighborhood of every x E X . Explicitly, X is
open if and only if for every x E X there exists 6 > 0, depending on x , such that

Be(z) c x.
A sequence (56) = .
x l , x2, . . in Rn converges to the limit y E Rnif
lim
k-oo
I Xk - y I = 0.
Equivalently, every neighborhood of y contains all but a finite number of the points
of the sequence. We denote this by y = limk,, x k or zk 4 y . If Z k = ( x k l , . . . , z k n )
and y = ( y l , . . . , y n ), then { z k ) converges to y if and only if limk,, x k j = yj, j =
1 , . . . , n. A sequence that has a limit is called convergent.
A sequence ( x k } in Rn is a Cauchy sequence if for every e > 0 there exists an
integer ko such that
I xj - x k I < e if k 2 k~ and j 2 ko.
The following basic property of Rn is called metric completeness:
A sequence converges to a limit if and only if it is a Cauchy sequence.
A subset Y C R n is closed if every sequence of points in Y that is convergent
has its limit in Y . It is easy to see that this is equivalent to: Y is closed if the com-
plement Rn - Y is open.
Let X C Rnbe any subset. A map f : X + Rm is continuous if it takes convergent
sequences to convergent sequences. This means: for every sequence ( x k } in with x
limxa = y E X,
k-oo
$2. NEW NORMS FOR OLD 77

it is true that
lim f(a)= f ( Y ) .
k-m

A subset X C R n is bounded if there exists a > 0 such that X C B,(O).


A subset X is compact if every sequence in X has a subsequence converging to a
point in X . The basic theorem of Bolzano-Weierstrass says :
A subset of Rn i s compact i f and only if it i s both closed and bounded.
Let K C Rn be compact and f : K + Rm be a continuous map. Then f ( K ) is
compact.
A nonempty compact subset of R has a maximal element and a minimal element.
Combining this with the preceding statement proves the familiar result :
Every continuous m a p f : K -+ R, defined o n a compact set K , takes on a m a x i m u m
value and a m i n i m u m value.
One may extend the notions of distance, open set, convergent sequence, and other
topological ideas to vector subspaces of R". For example, if E is a subspace of Rn,the
distance function d : Rn X Rn -+ R restricts t o a function d ~E: X E + R that also
satisfies (4) and ( 5 ) . Then eneighborhoods in E may be defined via dg and thus
open sets of E become defined.

$2. New Norms for Old

It is often convenient to use functions on Rn that are similar t o the Euclidean


norm, but not identical to it. We define a norm on Rn t o be any function N : Rn+ R
that satisfies the analogues of (1) (2) , and (3) of Section 1:
(1) N ( z ) 2 0 and N ( z ) = 0 if and only if z = 0;
+
(2) N ( z Y ) I +
N ( z ) N(y);
(3) N ( ~ z =) I a 1 N(x).
Here are some other norms on R":

lmax = max(I 2 1 I, . . . , I zn I I J
I z [sum = 1x1 I+ + 1 ~ 1. n
Let @ = {jl, . . . , fn) be a basis for Rn and define the Euclidean @-norm:

+ .. . +
n

I 2 la = (t12 tn2)1/2 if z = C tifi.


j-1

In other words, 1 z is the Euclidean norm of 2 in @-coordinates (tl, . . . , tn) .


78 5. LINEAR SYSTEMS AND EXPONENTIALS O F OPEHATOHS

The a max-norm of x is

I x Ia,max = maxi1 ti I, . . . , I tn I).


The basic fact about norms is the equivalence of norms:

Proposition 1 Let N: Rn -+ R be any norm. There exist constants A > 0, B > 0


such that

(4) A 1x1 I N ( x ) < B I Z 1

for all x, where I x I is the Euclidean norm.

Proof. First, consider the max norm. Clearly,


(max I xj I)* I C zjz 5 n(max I xj I)*;
i
taking square roots we have
I z Imax I I x I I vii I z Imax.
Thus for the max norm we can take A = l/h,B = 1, or, equivalently,

Now let N : Rn-+ R be any norm. We show that N is continuous. We have


N(x) = N ( C xjej) I C I xi I N(ej),
where el, . . . , en is the standard basis. If
max(N(el), . . . , N ( e n ) ) = M,
then

By the triangle inequality,

so limN(zk) = N(y) in R.
Since N is continuous, it attains a. maximum value B and a minimum value A
on the closed bounded set
$2. NEW NORMS FOR OLD 79

Now let x Rn. If x = 0, (4) is obvious. If I x I = a # 0, then


N(x) = aN(a-b).
Since I a-lx I= 1 we have
A _< N(a-'z) 5 B.
Hence
A 5 a-'N(z) 5 B,
which yields (4), since a! = I x I.
Let E C Rnbe a subspace. We define a norm on E to be any function
N:E-+R
that satisfies (1), (2), and (3). In particular, every norm on R n restricts to a norm
on E. In fact, every norm on E is obtained from a norm on R n by restriction. To
see this, decompose Rninto a direct sum
Rn = E 8 F.
(For example, let ( e l , . . . , e n ) be a basis for Rn such that (el, . , , , em) is a basis
for E ; then F is the subspace whose basis is . . . , e n } . ) Given a norm N on
E , define a norm N' on Rn by
"(3) = N(Y) + I z I,
where
x = y + z , y E E , z E F,
and I z I is the Euclidean norm of z. It is easy to verify that N' is a norm on Rn and
N' I E = N .
From this the equivalence of norms on E follows. For let N be a norm on E . Then
we may assume N is restriction to E of a norm on Rn, also denoted by N . There
exist A , B E R such that (4) holds for all x in Rn, so it holds a fortiori for all z
in E.
We now define a normed vector space (El N ) to be a vector space E (that is, a
subspace of some Rn) together with a particular norm N on E .
We shall frequently use the following corollary of the equivalence of norms:

Proposition 2 Let ( E , N ) be any normed vector space. A sequence (Zk] in E con-


verges lo y if and only if
(5)

Proof. Let A > 0, B > 0 be as in (4). Suppose ( 5 ) holds. Then the inequality
0-< [ ~h - y I 5 A - ' N ( x ~- y)
shows that limk,, I X k - y I = 0, hence X k y. The converse is proved SiIndady.
--$

Another useful application of the equivalence of norms is:


so 5. L INE AH SYSTEMS AND EX PO N EN TI A LS O F O PER A TO R S

Proposition 3 Let ( E , N ) be a nonned vector space. T h e n the unit ball


D = ( x E E I N ( x ) 51)
i s compact.
Proof. Let B be as in ( 4 ) . Then D is a bounded subset of Rn,for it is contained
in
I
( x E Rn I x I 5 E l ) .

I t follows from Proposition 2 that D is closed. Thus D is compact.

The Cauchy convergence criterion (of Section 1) can be rephrased in terms of


arbitrary norms :

Proposition 4 Let (h',N ) be a normed vector space. T h e n a sequence ( x k ) in E


converges to a n element in E i f and only i f :

( 6 ) for every e > 0, there exists a n integer no > 0 such that if p > n 2. no, then
N ( x-
~ 2,) < e.
Proof. Suppose E C Rn,and consider ( x k ) as a sequence in Rn.The condition
(6) is equivalent to the Cauchy condition by the equivalence of norms. Therefore
(6) is equivalent to convergence of the sequence to some y E R".But y E E because
subspaces are closed sets.

A sequence in Rn (or in a subspace of Rn)is often denoted by an injinite series


Cr-oX k . This is merely a suggestive notation for the sequence of partial sums ( s k ),
where
sk = 21 * * ' +
xk. +
If limk,, S k = y, we write
00

c x k = y
k-1

and say the series C xk converges to y. If all the X k are in a subspace E C Rn,then
also y E E because E is a closed set.
A series X k in a normed vector space ( E , N ) is absolutely convergent if the series
of real numbers Zr-0 N ( z ~ is) convergent. This condition implies that C xk is
convergent in E. Moreover, it is independent of the norm on E , as follows easily
from equivalence of norms. Therefore it is meaningful to speak of absolute con-
vergence of a series in a vector space E , without reference to a norm.
A useful criterion for absolute convergence is the comparison test: a series X k
in a normed vector space (23,N ) converges absolutely provided there is a conver-
gent series C a k of nonnegative real numbers ah such that
52. NEW NORMS FOR OLD 81

For

c c
P P
0 I N(Xk) 5 ak;
k-n+l k-n+l

hence c?--oN(zk) converges by applying the Cauchy criterion to the partial sum
sequences ofc N (zk)and ak. c
PROBLEMS

1. Prove that the norms described in the beginning of Section 2 actually are
norms.
2. I z Ip is a norm on Rn, where
n

12 Ip = (C 1zj 1 P ) I ’ P ; 1Ip < w.


i-1

Sketch the unit balls in R2and R3 under the norm I z Ip for p = 1, 2, 3.


3. Find the largest A > 0 and smallest B > 0 such that
A 1x1 I I z l e u r n I B l z l
for all z E Rn.
4. Compute the norm of the vector (1, 1) E R2 under each of the following
norms :
( a ) the Euclidean norm;
(b) the Euclidean @-norm, where @ is the basis ( (1, 2 ) , (2, 2) j ;
(c) the max norm;
(d) the 6%-maxnorm;
(e) the norm I z I p of Problem 2, for all p .
5 . An inner product on a vector space E is any map Rn X Rn + R, denoted
by (z, y) + (2, y), that is symmetric, bilinear, and positive definite (see
Section 1).
(a) Given any inner product show that the function (2, x ) ” ~is a norm.
(b) Prove that a norm N on E comes from an inner product as in (a) if and
only if it satisfies the “parallelogram law”:
N(x + y)* + N ( z - y ) 2 = 2 ( N ( z ) ’ + N ( g ) * ) .
(c) Let al, . . . , a, be positive numbers. Find an inner product on Rn whose
corresponding norm is
N ( z ) = ( C akzkz) ”2.
82 5. LINEAR SYSTEMS AND EXPONENTIALS O F OPERATORS

(d) Let (el, . . , , em) be a basis for E. Show that there is a unique inner
product on E such that
(ei, e,) = 6ij for all i, j .
6. Which of the following formulas define norms on R2? (Let ( x , y) be the co-
ordinates in R2.)
+ +
(a) ( 2 2 x y y2)*12; (b) ( x 2 - 3xy y2)II2; +
+
(c) (I %I I Y I)? ( 4 +(I 5 I + +
I ?/I) acx2 Y2)1'2. +
7. Let U C Rn be a bounded open set containing 0. Suppose U is convex: if 2 E U
and y E U , then the line segment ( tx +
(1 - 1 ) y I 0 I t 5 1 ) is in U. For
each x E Rn define
~ ( x =) least upper bound of ( A 2 0 I Xz E U ) .
Then the function

is a norm on Rn.
8. Let M , he the vector space of n X n matrices. Denote the transpose of A E M ,
by At. Show that an inner product (see Problem 5 ) on Mn is defined by the
formula
( A , B ) = Tr(AtB).
Express this inner product in term of the entries in the matrices A and B.
9. Find the orthogonal complement in M n (see Problem 8) of the subspace of
diagonal matrices.
10. Find a basis for the subspace of Mn of matrices of trace 0. What is the ortho-
gonal complement of this subspace?

r
$3. Exponentials of Operators

The set L(Rn) of operators on Rn is identified with the set Mn of 'n X n matrices.
This in turn is the same as Rn2since a matrix is nothing but a list of n2 numbers.
(One chooses an ordering for these numbers.) Therefore L(Rn) is a vector space
under the usual addition and scalar multiplication of operators (or matrices). We
may thus speak of norms on L(Rn), convergence of series of operators, and so on.
A frequently used norm on L(R") is the unijorm norm. This norm is defined in
terms of a given norm on Rn = E, which we shall write as I x I. If T : E -iE is an
operator, the uniform norm of T is defined to be
IITII = m a x ( l T x I I l z l 5 1 ) .
$3. EXPONENTIALS OF OPERATORS 83

In other words, 11 T 11 is the maximum value of I Tx I on the unit bail


D = (xEElIxI51).
The existence of this maximum value follows from the compactness of D (Section
1, Proposition 3) and the continuity of T : Rn + Rn.(This continuity follows im-
mediately from a matrix representation of T.)
The uniform norm on L(Rn)depends on the norm chosen for Rn. If no norm on
Rn is specified, the standard Euclidean norm is intended.

L e m m a 1 Let R" be given a norm I x I. The corresponding uniform norm on L(Rn)


has the fOllOtUing QTOQeTtieS:
(a) If 11 T 11 = k, then I T x I 5 k I x I for all x in R".
(b) I1 ST I1 I II s ll.ll T II.
(c) 11 T m11 5 11 T [Im for all m = 0, 1, 2, . . . .
Proof. (a) If 2 = 0, then I T x 1 = 0 = k 1 x I. If x # 0, then I x I # 0. Let
y = I x I-lx, then

Hence
1
k = II T II 2 I Ty I = I Tx I
1x1
~

from which (a) follows.


<
(b) Let I x I 1. Then from (a) we have
I S (T X ) I III SII.1 T x l
I II II*II T 11.1 x I
5 I1 s I I * I I T II.
Since 11 ST 11 is the maximum value of I STx I, (b) follows.
Finally, (c) is an immediate consequence of (b) .

We now define an important series generalizing the usual exponential series. For
any operator T:Rn -+ Rn define

C -.
1 .-
exp(T) = er =
k! t-O

(Here k! is k factorial, the product of the first k positive integers if k > 0, and
O! = 1 b y definition.) This is a series in the vector space L(Rn).

Theorem The exponential series C?dTk/k!is absolutely convergent for every


operator T.
84 5. LINEAR SYSTEMS AND EXPONENTIAL6 OF OPERATORB

Proof. Let 11 T 11 = a 1 0 be the uniform norm (for some norm on R n ) .


Then 11 T k / k !11 5 cr*/k!, by Lemma 1, proved earlier. Now the real series
d / k ! converges to eo (where e is the base of natural logarithms). Therefore
the exponential series for T converges absolutely by the comparison teat (Section
2).
We have also proved that
11 e A 11 5 e11411.
We shall need the following result.

The next result is useful in computing with exponentials.

Propaaition Lei P , 8,T denole operdota on Rn. Then:


(a) if Q = P T P ' , then eQ = PeTP-l;
83. EXPONENTIAL8 OF OPERATORS

(b) if ST = T S , then e5+T = @eT;


(c) e-s = (#)-I;
(d) if n = 2 and T = 3,
then
cos b -sin b
er = G
sin b cosb

The proof of (a) follows from the identities P ( A + B) P-l = PAP-' + PBP-I
and (PTP-1)k = PTkP-l. Therefore

and (a) follows by taking limits. To prove ( b ) , observe that because ST = TS we


have by the binomial theorem

Therefore

by L e m m a 2, which proves (b) . Putting T = - S in (b) gives (c) .


The proof of (d) follows from the correspondence

of Chapter 3, which preservea sums, products, and real multiples. It is eaay to see
that it also preserves limits. Therefore
eT w @ear
where ed is the complex number CLO( i b ) k / k ! .Using t2 = -1, we find the real
part of es to be the sum of the Taylor series (at 0 ) for cos b; similarly, the imaginary
part is sin b. This proves (d).
Observe that (c) implies that es is invertible for every operator S. This iS 4-
ogous to the fact that d # 0 for every real number s.
As an example we compute the exponential of T = [t :I. We write

T = aI + B, B = [: :].
S6 5. LINEAR SYSTEMS AND EXPONENTIALS OF OPERATORS

Note that a l commutes with B. Hence


eT = eOxeB = "8.
Now B2 = 0 ; hence Bk = 0 for all k > 1, and,

Thus

We can now compute eA for any 2 X 2 matrix A . We will see in Chapter 6 that
can find an invertible matrix P such that the matrix
B = PAP-'
has one of the following forms:

We then compute eB. For (1) ,

For (2)
cos b
sin b
-sin b
cos b 1
as was shown in the proposition above. For (3)
1 0
= e"l 11
as we have just seen. Therefore eA can be computed from the formula
eA = F I B P = P-leBP.
There is a very simple relationship between the eigenvectors of T and those of
eT:

If x E Rnis an eigenvector of T belonging to the real eigenvalue a of T , then x is also


an eigenvector of e T belonging to ea.
$3. EXPONENTIALS OF OPERATORS 87

For, from Tx = ax, we obtain

= e'x.
We conclude this section with the observation that all that has been said for
exponentials of operators on Rn also holds for operators on the complex vector space
C". This is because Cn can be considered aa the real vector space Rln by simply
ignoring nonreal scalars; every complex operator is a fortiori a real operator. In
addition, the preceding statement about eigenvectors is equally valid when complex
eigenvalues of an operator on C n are considered; the proof is the same.

PROBLEMS

1. Let N be any norm on L(Rn). Prove that there is a constant K such that
N(ST) I KN(S)N(T)
for all operators S, T.Why must K 1 l?
2. Let T:Rn + Rm be a linear transformation. Show that T is uniformly con-
-
tinuow: for all c > 0 there exists 6 > 0 such that if \ x y I < 6 then
I T X - T y I < C.
3. Let T:Rn + Rn be an operator. Show that
11 T 11 = least upper bound ~-
l~l lzl 01.
z
4. Find the uniform norm of each of the following operators on R2:

5. Let
ss 5. LINEAR SYSTEMS AND EXPONENTIALS OF OPERATORS

(a) Show that


lim ) I Tn = 3.
n-w

(b) Show that for every c > 0 there is a basis of R2for which
II T lla < 3 + €1

where 1 I T 1 la is the uniform norm of T corresponding to the Euclidean


@-norm on R2.
(c) For any basis of R2,
II T IIa > 3.
6. (a) Show that
I I T II-II T-' II 2 1
for every invertible operator T .
(b) If T has two distinct real eigenvalues, then
II T II-II T-' II > 1.
(Hint: First consider operators on R2.)
7. Prove that if T is an operator on Rn such that 11 T - Z < 1, then T isI(
invertible and the series (Z - T ) k converges absolutely to T-I. Find
an upper bound for 1 I T-l I I.
)I
8. Let A E L(Rn)be invertible. Find e > 0 such that if B - A < c, then I\
B is invertible. (Hint: First show A-'B is invertible by applying Problem 7
to T = A-lB.)
9. Compute the exponentials of the following matrices ( i = ):

(e) 1A :I
0 0 0
(f) [:: :] [: : "1
0 1 3
(g)
0 1 x

(h) [i
0 -i
'1 (i) [' 2 l+i
1 0 0 0

10. For each matrix T in Problem 9 find the eigenvalues of eT.

11. Find an example of two operators A , B on R2such that


eA+B # eA@.
84. HOMOGENEOUS LINEAR SYSTEMS 89

12. If A B = BA, then eAeB = e8eA and eAB = BeA.


13. Let an operator A : Rn + Rn leave invariant a subspace E C Rn (that is,
A s E E for all r E E ) . Show that eA also leaves E invariant.
14. Show that if I I T - I 11 is sufficiently small, then there is an operator 5 such
+
that es = T. (Hint: Expand log( 1 x ) in a Taylor series.) To what extent
is S unique?
15. Show that there is no real 2 X 2 matrix 5 such that es = [-:-3.

84- Homogeneous Linear Systems

Let A be an operator on Rn. In this section we shall express solutions to the


equation:
(1) X' = AX
in terms of exponential5 of operators.
Consider the map R L(Rn) which to t E R assigns the operator elA. Since
---f

L(Rn) is identified with Rn2,it makes sense to speak of the derivative of this map.

Proposition

In other words, the derivative of the operator-valued function etA is another


operator-valued function AetA.This means the composition of elAwith A ; the brder
of composition does not matter. One can think of A and e l A as matrices, in which
case AelA is their product.
Proof of the proposition.
d e(t+h)A - eL4
- = lim
dt h 4 h

= &AA;
that the last limit equals A follows from the series definition of ehA. Note that A
commutes with each term of the series for e t A , hence with elA. This proves
the proposition.
90 5. LINEAR SYSTEMS AND EXPONENTIALS OF OPERATORS

We can now solve equation (1). We recall from Chapter 1 that the general solu-
tion of the scalar equation
x' = a2 ( a E R)
is
Z(t) = ke*"; k = z(0).
The same is true where Z,a, and k are allowed to be complex numbers (Chapter 3).
These results are special cases of the following, which can be considered as the
fundamental theorem of linear differential equations with constant coefficients.

Theorem Let A be an operator on Rn.Then the solution of the initial value problem
(1') Z' = Az, ) K E
~(0= Rn,
is
(2) etAK,
and there are ?LO other solutions.
Proof, The preceding lemma shows that

= AeLAK;
since eOAK = K , it follows that (2) is a solution of (1'). To see that there are no
other solutions, let 5 ( 1 ) be any solution of (1') and put
y(1) = e-%(t).
Then

= -Ae-'Ax(t) + e-LAAz(t)
= e-'A(-A + A)z(t)
= 0.
Therefore y ( t ) is a constant. Setting t = 0 shows y ( t ) = K . This completes the
proof of the theorem.

.is an example we compute the general solution of the two-dimensional system


(3) x; = aX1,

Z; = bxi + UXZ,
$4. HOMOGESEOUS LINEAR SYSTEMS 91

where a, b are constants. In matrix notation this is

The solution with initial value K = (K1, Kz)E Rzis


etAK.

1.
In Section 3 we saw that

elA = eta [tb1 0


1
Thus
e'AK = (etaK1,eta(tbK1 + Kz)).
Thus the solution to (3) satisfying
z ~ ( O ) = K1, 22(0) = Kz
is
s ( t ) = etaK1,
z2(t) = eta(tbK1 + Kz).

FIG. A. Saddle: B = [: p], < 0 < P.


92 5. LINEAR SYSTEMS AND EXPONENTIALS OF OPERATORS

Since we know how to compute the exponential of any 2 X 2 matrix (Section 3) ,


we can explicitly solve any two-dimensional system of the form 2' = Ax,A E L(R2).
Without finding explicit solutions, we can also obtain important qualitative in-
formation about the solutions from the eigenvalues of A . We consider the most
important special cases.
Case I. A ltas real eigenvalues of opposite signs. In this case the origin (or some-
times the differential equation) is called a saddle. As we saw in Chapter 3, after a
suitable change of coordinates x = Py, the equation becomes

B = PAP-' = [o A 0
A, < 0 < p.
In the ( y l , y2) plane the phase portrait looks like Fig. A on p. 91.
Case IZ. -411 eigenvalues have negative real parts. This important case is called
a sink. It has the characteristic property that
limz(t) = 0
t-m

for every solution x ( t ) . If A is diagonal, this is obvious, for the solutions are
y(t) = (clext,cze") ; X < 0, ~r< 0.

FIG. B. Focus: B = [; 3 K 0.
44. HOMOGENEOUS LINEAR SYSTEMS 93

FIG. C. Node: B = [; 9 < P 0.

If A is diagonalizable, the solutions

are of the form with y(t) as above and P C L(R2); clearly, z ( t ) --+ 0 as t --f a.
The phase portrait for these subcases looks like Fig. B if the eigenvalues are
equal (a focus) and like Fig. C if they are unequal (a node).
If the eigenvalues are negative but A is not diagonalizable, there is a change
of coordinates z = P y (see Chapter 6 ) giving the equivalent equation

Y' = BY,
where

B = P-lAP = [,A 0 x < 0.


We have already solved such an equation; the solutions are
yl(t) = &elx,

yz(t) = K2etx + Kltetx,


which tend to 0 as t tends to 00. The phase portrait looks like Fig. D (an improper
node).
If the eigenvalues are a f ib, a < 0 we can change coordinates as in Chapter 4
94 5. LINEAR SYSTEMS AND EXPONENTIAL6 OF OPERATORS

FIG.D. Improper node: H = [; 3 < 0.

to obtain the equivalent system

From Section 3 we find


y' = By,
B = [; -3.
,tB =
cos tb
sin tb
-sin tb
cos tb I.

FIG. E. Spiral sink: B = [; -3, >b 0 > a.


$4, HOMOGENEOUS LINEAR SYSTEMS 95

Therefore the general solution is expressed in y-coordinates as


y ( t ) = eta(& cos tb - KZsin tb, Kz cos tb + Kl sin t b ) .
Since I cos tb I I1 and I sin tb I 5 1, and a < 0, it follows that
lim y ( t ) = 0.
t-00

If b > 0, the phase portrait consists of counterclockwise spirals tending to 0 (Fig.


E), and clockwise spirals tending to 0 if b < 0.

Case I I I . All eigenvalues have positive real part. In this case, called a source, we
have
lim 1 z ( t ) I = 00 and lim J z ( t ) 1 = 0.
;-OD I-r-W

A proof similar to that of Case I1 can be given; the details are left to the reader.
The phase portraits are like Figs. B-E with the arrows reversed.

Case IV. The eigenvalues are pure imaginary. This is called a center. It is charac-
terized by the property that all solutions are periodic with the same period. To see
this, change coordinates to obtain the equivalent equation

We know that
Y' = BY,
B = [- -3.
cos tb -sin tb
etB =
[sin tb cos tb
Therefore if y ( t ) is any solution,

FIG. F. Center: B = [i -3 > b 0.


96 5. LINEAR SYSTEMS AND EXPONENTIAL5 OF OPERATORS

Tr <

Nodes /V' Nodes


AcO , Tr c0 \o
?

Saddles
~ Det < O

FIG. G

The phase portrait in the y-coordinates consists of concentric circles. In the original
x-coordinates the orbits may be ellipses as in Fig. F. (If b < 0, the arrows point
clockwise.)
Figure G summarizea the geometric information about the phase portrait of
x' = Ax that can be deduced from the characteristic polynomial of A . We write
this polynomial as
A* - (TrA)A Det A. +
The discrimina.nt A is defined to be
A = (Tr A ) 2 - 4 Det A .
The eigenvaluea are
4 (TrA f <A).
Thus real eigenvalues correspond to the case A 2 0 ; the eigenvalues have negative
real part when Tr A < 0 ; and so on.
The geometric interpretation of 5' = Ax is as follows (compare Chapter 1). The
map Rn--$ R" which sends x into Ax is a vector field on R". Given a point K of
Rn,there is a unique curve t 4 e t A K which starts at K at time zero, and is a solution
of (1). (We interpret t as time.) The tangent vector to this curve at a time lo is the
vector Az(to) of the vector field at the point of the curve x(l0).
We may think of points of Rnflowing simultaneously along these solution curves.
The position of a point x E Rnat time t is denoted by
&(x) = e%.

Thus for each t E R we have a map


44: R" +Rn (t E R)
$4. HOMOGENEOUS LINEAR SYSTEMS 97

given by
+f(x) = etAx.
The collection of maps (4,)trR is called the flow corresponding to the differential
equation (1). This flow has the basic property
48+t = 4I 4f,
which is just another way of writing

-
e(~+t)A = edetA.
1

this is proved in the proposition in Section 2. The flow is called linear because each
map :Rn R n is a linear map. In Chapter 8 we shall define more general nonlinear
+f

flows.
The phase portraits discussed above give a good visualization of the correspond-
ing flows. Imagine points of the plane all moving a t once along the curves in the
direction of the arrows. (The origin stays put.)

PROBLEMS

Find the general solution to each of the following systems:

-y
(a)
x'
{y'
=
=
22
2y (b) {2' =

y' = x
22 - y
+ 2y
(c)
y'
=
= x (d)
! 2' = -22
y' = x - 2y

I
2' = y - 22

xI=y+z
(e) y' = z
2) = 0
In ( a ) , (b), and (c) of Problem 1, find the solutions satisfying each of the
following initial conditions:
(a) x( 0) = 1, y(0) = -2; (b) ~ ( 0 =) 0, y(0) = -2;
( c ) x( 0) = 0, y(0) = 0.
Let A : Rn -+ Rn be an operator that leaves a subspace E C R" invariant.
Let x: R + Rn be a solution of x' = A r . If x(to) E E for some to E R, show
that x ( t ) E E for all t E R.
Suppose A E L(Rn) has a real eigenvalue X < 0. Then the equation x' = Ax
98 5. LINEAR SYSTEMS AND EXPONENTIALS OF OPERATORS

has at least one nontrivial solution x ( t ) such that


lim x ( 1 ) = 0.
I--

5. Let A € L(R2) and suppose x' = Ax has a nontrivial periodic solution, u ( t ) :


+
this means u ( t p ) = u ( t ) for some p > 0. Prove that every solution is
periodic, with the same period p .

6. If u : R + Rn is a nontrivial solution of x' = Ax, then


d 1
- (I u I) = - (u,Au).
dt IuI
7. Supply the details of Case I1 in the text.

8. Classify and sketch the phase portraits of planar differential equations x' =
Ax, A E L(R2), where A has zero as an eigenvalue.
9. For each of the following matrices A consider the corresponding differential
equation x' = Ax. Decide whether the origin is a sink, source, saddle, or none
of these. Identify in each case those vectors u such that 1imf+-x(tj = 0, where
x ( t ) is the solution with x(0) = u :

10. Which values (if any) of the parameter k in the following matrices makes the
origin a sink for the corresponding differential equation x' = Ax?

11. Let + t : R2 R2 be the flow corresponding to the equation x' = Ax. (That
is, t + h ( x ) is the solution passing through x at t = 0.) Fix 7 > 0, and show
+,
that is a linear map of R2 R2. Then show that & preserves area if and only
---f
$5. A NONHOMOGENEOUS EQUATION 99

if Tr A = 0, and that in this case the origin is not a sink or a hource. (fZint:
An operator is area-preserving if and only if the determinant is & l . )
12. Describe in words the phase portraits of x’ = Ax for

13. Suppose A is an n X n matrix with n distinct eigenvalues and the real part of
every eigenvalue is less than some negative number a. Show that for every
solution to x’ = Ax, there exists to > 0 such that
] x(t) I < eta if t 2 to.
14. Let T be an invertible operator on R”, n odd. Then x‘ = Tx has a nonperiodic
solution.
15. Let A = [: j] have nonreal eigenvalues. Then b # 0. The nontrivial solutions
curves to 2’ = Ax are spirals or ellipses that are oriented clockwise if b > 0
and counterclockwise if b < 0. (Hint: Consider the sign of
d
- arc tan(x2(t)/xl(t)).)
dt

$5. A NonhomogeneouH Equation

We consider a nonhomogeneous nonautoncmous linear differential equation


(1) x’ = Ax + B(t).
Here A is an operator on Rn and B : R + Rnis a continuous map. This equation is
called nonhomogeneous because of the term B ( t ) which prevents (1) from being
strictly linear; the fact that the right side of (1) depends explicitly on t makes it
nonautonomous. I t is difficult to interpret solutions geometrically.
We look for a solution having the form
(2) z(t) = efAf(t),

where f : R -+ Rn is some differentiable curve. (This method of solution is called


“variation of constants,” perhaps because if B(1) = 0, f ( t ) is a constant.) Every
solution can in fact be written in this form since efAis invertible.
Differentiation of (2) using the Leibniz rule yields
x’(t) = -4efAf(t) + efAf’(t).
100 5. LINEAR BYBTEYB AND EXPONENTIAL8 OF OPERATORS

Since z ie eeeumed to be a solution of (2),


Az(t) + B ( t ) = A z ( t ) + etAf'(t)
or
f'(t) = e-'AB(t).
By integration

f ( t ) = /'C"'B(S) ds
0
+ K,
so as a candidate for a solution of (1) we have

Let us examine (3) to we that it indeed makes sense. The integrand in (3) and
the previous equation ie the vector-valued function s -ie A ' B ( s )mapping R into
Rn. In fact, for any continuousmap g of the reds into a vector space Rn,the integral
can be defined aa an element of Rn. Given a basis of Rn, this integral is a vector
whose c o o r h t e a are the integrals of the coordinate functions of g.
The integral as a function of ita upper limit t is a map from R into Rn.For each
t the operator acts on the integral to give an element of Rn. So t -iz(1) is a well-
defined map from R into E.
To check that (3) is a solution of ( l ) , we differentiate z ( t ) in (3) :

+ AeA1[
z'(t) = B ( t ) l e-"B(s) ds + K]
= B(t) + Az(t).

Thus (3) is indeed a solution of (1).


"hat every mlution of (1) must be of the form (3) can be Been aa follows. Let
y:Rn+ E be a second solution of (1). Then
x' - y' = A ( z - y)
80 that from Section 1
z-y = etAKo for some KO in Rn.
This implies that y is of the form (3) (with perhaps a different constant K E Rn).
We remark that if B in (1) is only defined on some interval, instead of on all of
R,then by the above methods, we obtain a solution z ( t ) defined for t in that same
interval.
We obtain further insight into (1) by rewriting the general solution (3) in the
form
z(t) = u(t) + &'K,
u(t) = e-1 i ' e - A * B ( s ) d ~ .
@. A NONHOMOGENEOUS EQUATION 101

Note that u ( t ) is also a solution to (1),while eALKis a solution to the homogeneous


equation
(4) y' = Ay
obtained from (1) by replacing B ( I ) with 0. In fact, if u ( t ) is any solution to (1)
+
and y ( t ) any solution to (4), then clearly x = u 1~ is mother solution to (1).
Hence the general solution to (1) is obtained from a particular solution by adding
to it the general solution of the corresponding homogeneous equation. In summary

Theorem Let u (1) be a particular solution of the nonhomogeneous linear differential


equation
(1) x' + B(t).
= Az
Then every solution of (1) has the form u ( t ) + u ( t ) where u ( t ) is a; solution of the
homogeneous equation
(4') x' = AX.
Conversely, the sum of a solution of (1) and a solution of (4') is a solution of (1).

If the function B ( t ) is at all complicated it will probably be impossible to replace


the integral in (3) by a simple formula; sometimes, however, this can be done.
Example. Find the general solution to
(5) X ;= -Za,

z;= 21 + t.
Here

A = [y -3 B(t) = [3].
Hence

and the integral in (3) is


cos s sin s] I:[ ds = 1'["
8
sin s] ds
cos 8

sin t - t coa t
cost+tsint- 1
To compute (3) we set

8'= [sint
cos t -sin t
cost
102 5. LINEAR SYSTEMS AND EXPONENTIALS OF OPERATORS

t::;]
hence the general solution is
= [cos t
sint
-sin t ] [sin t - t cos t K1 +
cost c o s t + t s i n t - l + K 2
Performing the matrix multiplication and simplifying yields
xl(t) = - t+ K1 cost + (1 - Kz)sin t ,
x2(t) = 1 - (1 - K z ) cost + K1sint.
This is the solution whose value at t = 0 is
x~(O)= K1, a ( 0 ) = Kz.

PROBLEMS

1. Find all solutions to the following equations or systems:


(a) x' - 42 - cost = 0; (b) x' - 4s - t = 0; (c) x' = y ,
y' = 2 - x;
(dl = YJ (e) X' = x + +
y 2,
y' = -4x + sin 2t; y' = -2y +
t,
2' = 22 +sin t.
2. Suppose T: Rn+ Rn is an invertible linear operator and c E E is a nonzero
constant vector. Show there is a change of coordinates of the form
x = Py + b, b E Rn,
transforming the nonhomogeneous equation x = Tx c into homogeneous +
form y' = S y . Find P , b, and S. (Hint:
Where is x' = O?)
3. Solve Problem 1(c) using the change of coordinates of Problem 2.

56. Higher Order Systems

Consider a linear differential equation with constant coefficients which involves


a derivative higher than the first; for example,
(1) 8'' + ad + bs = 0.
By introducing new variables we are able to reduce (1) to a first order system
of two equations. Let x1 = s and x2 = 2: = s'. Then (1) becomes equivalent to the
96. HIGHER ORDER SYSTEMS 103

system :
(2) x: = a,
x; = -bx1 - ux¶.
Thus if x ( t ) = (xl( t ) , q ( t ) ) is a solution of (2), then s ( t ) = x1( t ) is a solution
of (1); if s ( t ) is a solution of (1), then x(t) = (s(t) , s ’ ( t ) ) is a solution of (2).
This procedure of introducing new variables works very generally to reduce
higher order equations to first order ones. Thus consider
(3) s(*) + + +
- . . a,-ls’ ans = 0. +
Here s is a real function of t and s(“) is the nth derivative of s, while al, . . . , a n are
constants.
In this case the new variables are x1 = s, x2 = x:, . . . ,x n = xn-l’ and the equation
(3) is equivalent to the system
(4) x: = 22,

d = x8,

X; = -an21 - an-1- - * * * - a1xn.

I:
In vector notation (4) has the form x’ = Ax,where A is the matrix

(4’) 0 1 0
0 0 1

0 0
--an -‘an4
Proposition The characteristic polynomial of (4’) i s
p ( X ) = X” + alXn-1 + - + a,.*

Proof. One uses induction on n. For n = 2, this is easily checked. Assume the
truth of the proposition for n - 1, and let be the ( n - 1) X ( n - 1) sub-
matrix of A consisting of the last ( n - 1) rows and last ( n - 1) columns. Then
Det ( X I - A ) is easily computed to be X Det ( X I - An-1) +
a,by expanding along
the first column. The induction hypothesis yields the desired characteristic
polynomial.

The point of the proposition is that it gives the characteristic polynomial directly
from the equation for the higher order differential equation (3).
104 5. LINEAR SYSTEMS AND EXPONENTIALS OF OPERATORS

Let us now return to our first equation


(1) + as’ + bs = 0.
s”
Denote the roots of the polynomial equation Xz + aX + b = 0 by XI, X p . Suppose

at first that these roots are real and distinct. Then (1) reduces to the equation of
first order (2); one can find a diagonaliaing system of coordinates (yl,yz). Every
solution of (2) for these coordinates is then v1( 1 ) = Kl exp ( A d ) , yz ( t ) = K2 exp ( A d ) ,
with arbitrary constants K1, KZ. Thus zl(t) or s ( t ) is a certain linear combination
a ( t ) = pllK1 exp(X1t) +p12K2 exp(Xzt). We conclude that if All Xz are real and
distinct then every solution of ( 1 ) is of the form
s ( t ) = CI exp ( A d ) + CZexp ( A d )
for some (real) constants C1, C2. These constants can be found if initial values
s (t o ) , s’( t o ) are given.
Next, suppose that X1 = Xz = X and that these eigenvalues are real. In this case
the 2 X 2 matrix in (2) is similar to a matrix of the form

as will be shown in Chapter 6. In the new coordinates the equivalent first-order


system is
y: = XYl,
y; = By1 + Xyz.
By the methods of Section 4 we find that the general solution to such a system is
yl(t) = Klext,
y z ( t ) = K@eX1 + K2eAt,
K l and Kz being arbitrary constants. In the original coordinates the solutions to
the equivalent first order system are linear combinations of these. Thus we con-
clude that if the characteristic polynomial of (1) has only one root X € R, the
solutions have the form
s ( t ) = CleXt + CzteXt.
The values of C1 and Cz can be determined from initial conditions.
Example. Solve the initial-value problem
(5) 5” + 2s’ + s = 0 ,
s(0) = 1, s’(0) = 2.
The characteristic polynomial is Xz + 2X + 1 ; the only root is h = - 1. Therefore
the general solution is
s ( t ) = Cle+ + C2te-l.
96. HIGHER ORDER SYSTEMS 105

We find that
s ' ( t ) = ( -C1 + Cz)e-l - Czte-c.
From the initial conditions in (5) we get, setting t = 0 in the last two formulas
c, = 1,
-c1 + cz = 2.
Hence Cz = 3 and the solution to ( 5 ) is
s ( t ) = e-l + 3te-l.
The reader may verify that this actually is a solution to (5) !
The final case to consider is that when X1, Xz are nonreal complex conjugate num-
+
bers. Suppose XI = u iv, XZ = u - i v . Then we get a solution (as in Chapter 3 ) :
cos vt - KZsin vt),
y l ( t ) = eUL(K1
yZ(t) = e U 1 ( Ksin
I vt + K2 cos v t ) .
Thus we obtain s ( t ) as a linear combination of y l ( t ) and y z ( t ) ,so that finally,
s ( t ) = eu'(C1cos vt + Czsin v t )
for some constants C1, Cz.
A special case of the last equation is t.he "harmonic oscillator" :
S" + bzs = 0 ;
the eigenvalues are &ib, and the general solution is
Cl cos bt + Czsin bt.
We summarize what we have found.

Theorem Let XI, Xz be the roots of the polynomial Xz + aX + b. Then every solution
of the diflerentiul equation
(1) s" + as' + bs = 0
is of the following type:
Case ( a ) . XI, Xz are real distinct: s ( t ) = C1 exp(Xlt) CZexp(X2t) ; +
+
Case ( b ) . X1 = Xz = X is real: s ( t ) = CleA' C d e A f ;
Case ( c ) . XI = Xz = u + iv, v # 0 : s ( t ) = ewt(Cl cos vt C2 sin ut). +
In each case C1, Czare (real) constants determined by initial conditions of the
form
s(t0) = a, S'(t0) = 8.
The nth order linear equation (3) can also be solved by changing it to an equiva-
lent first order system. First order systems that come from nth order equations
106 5. LINEAR SYSTEMS AND EXPONENTIALS OF OPERATORS

have special properties which enable them to be solved quite easily. To understand
the method of solution requires more linear algebra, however. We shall return to
higher order equations in the next chapter.
We make a simple but important observation about the linear homogeneous
equation (3) :
If s( t ) and q ( 1 ) are solutions to ( 3 ), so is the function s ( t ) + q ( t ) ; if k is any real
number, then ks(t) is a so2ution.
In other words, the set of all solutions is a vector space. And since n initial conditions
determines a solution uniquely (consider the corresponding first order system), the
dimension of the vector space of solutions equals the order of the differential
equation.
A higher order inhomogeneous linear equation
(6) S(") + als'"-" + * * + an5 = b(t)
can be solved (in principle) by reducing it to a first order inhomogeneous linear
system

[ "1
i = Ax + B (t)
and applying variation of constants (Section 5 ) . Note that

B(t) =

b(t)
As in the case of first order systems, the general solution to (6) can be expressed
aa the general solution to the corresponding homogeneous equation
s(") + als("-') + . . . + a,s =0

plus a particular solution of (6). Consider, for example,


(7) 8" + 8 = t - 1.
The general solution of
8" + 8 =0
is
A cost + Bsint; A , B E R.
A particular solution to (7) is
s(t) = t - 1.
Hence the general solution to (7) is
A cosf + Bsint + t - 1.
Finally, we point out that higher order syslems can be reduced to first order
$6. HIGHER ORDER SYSTEMS 107

systems. For example, consider the system


x"+ + 2' 2y' - 32 = 0,
+ y" 52' - 4y = 0.
Here x ( t ) and y ( t ) are unknown real-valued functions of a real variable. Introduce
new functions u = x', v = y'. The system is equivalent to the four-dimensional
first order system
5' = u,

u' = 32 - u - 2v,
y' = v ,
8' = -5u + 4y.

PROBLEMS

1. Which of the following functions satisfy an equation of the form s" + +


as'
bs = O?
(a) te' (b) t2 - !. +
(c) cos 2t 3 sin 2t
+
(d) cos 2t 2 sin 3t (e) e-: cos 2t +
(f) el 4
(g> 31 - 9
2. Find solutions to the following equations having the specified initial values.
(a) 8'' +4s = 0;s(0) = 1, s'(0) = 0.
+
(b) S" - 3s' 29 = 0;~ ( 1 )= 0,~ ' ( 1 )= -1.
3. For each of the following equations find a basis for the solutions; that is, find
+
two solutions a ( t ) , % ( t ) such that every solution has the form crsl(t) & ( t )
for suitable constants a,8:
(a) s" +3s = 0 (b) S" - 3s = 0
(c) S" - S' - 6s = 0 (d) s" + + s' s = 0
4. + +
Suppose the roots of the quadratic equation X2 aX b = 0 have negative
real parts. Prove every solution of the differential equation
s" + +
as' bs = 0
satisfies
lims(t) = 0.
t-oa

5. State and prove a generalization of Problem 4 for for nth order differen-
tial equations
+
s(") als(n-l) * +- +
a,s = 0,
where the polynomial
X" + alXn-' + . + a,,
has n distinct roots with negative real parts.
108 5. LINEAR SYSTEMS AND EXPONENTIALS OF OPERATORS

6. Under what conditions on the constants a, b is there a nontrivial solution


to s” + +
as b = 0 such that the equation
s(t) = 0
has
(a) no solution;
(b) a positive finite number of solutions;
(c) infinitely many solutions?
7. For each of the following equations sketch the phase portrait of the correspond-
ing first order system. Then sketch the graphs of several solutions s ( t ) for
different initial conditions:
(a) s” + s=0 (b) s” - s = 0 (c) s” s’ s = 0 + +
+
(d) s” 2s’ = 0 (e) s” - s’ s. +
8. Which equations s” + as’ + bs = 0 have a nontrivial periodic solution? What
is the period?
9. Find all solutions to
st” - 8” + 4s’ - 4s = 0.
10. Find a real-valued function s ( t ) such that
8” + 4s = cos 2t,
s(0) = 0, s’(0) = 1.
11. Find all pairs of functions z ( t ) , y ( t ) that satisfy the system of differential
equations
2’ = -y,
y“ = -x - y + y’.
12. Let q ( t ) be a polynomial of degree m. Show that any equation
a(*) + a#--l) + . . . + ans = q ( t )
has a solution which is a polynomial of degree 5 m.

Votes

A reference to some of the topological background in Section 1 is Bartle’s The


Slements of Real Analysis [23. Another is Lang’s Analysis I [ll].
Chapter 6
Linear Systems and Canonical
Forms of Operators

The aim of this chapter is to achieve deeper insight into the solutions of the
differential equation
(1) X' = AX, A E L(E), E = R",
by decomposing the operator A into operators of particularly simple kinds. In
Sections 1 and 2 we decompose the vector space E into a direct sum
E = E l @ . * e. E r
and A into a direct sum
A = A1 @ * * * @A,, Ak E L(Ek).
Each Ak can be expressed aa a sum
Ak = s k -k Nk; Sk, N k E L(Ek),
with 8 k semisimple (that is, its complexification is diagonalizable), and N k nil-
potent (that is, (Nk)"'= 0 for some m ) ; moreover, b!%and N k commute. This
reduces the series for erAto a finite sum which is easily computed. Thus solutions
to (1) can be found for any A .
Section 3 is devoted to nilpotent operators. The goal is a special, essentially
unique matrix representation of a nilpotent operator. This special matrix is applied
in Section 4 to the nilpotent part of any operator T to produce special matrices
for T called the Jordan form; and for operators on real vector spaces, the real canon-
ical form. These forms make the structure of the operator quite clear.
In Section 5 solutions of the differential equation x' = Ax are studied by means
of the real canonical form of A . It is found that all solutions are linear combinations
of certain simple functions. Important information about the nature of the solu-
tions can be obtained without explicitly solving the equation.
110 6. LINEAR SYSTEMS AND CANONICAL FORMS OF OPERATORS

Section 6 applies the results of Section 5 to the higher order onedimensional


linear homogeneous equation with constant coefficients
(2) +
s(n) als(n-l) + .- . +
ans = 0.
Solutions are easily found if the roots of the characteristic polynomial
A" + a1Xn-l + * * * + a,
are known. A different approach to (2) , via operators on function spaces, is very
briefly discussed in the last section.
The first four sections deal not with differential equations, only linear algebra.
This linear algebra, the eigenvector theory of a real operator, is, on one hand,
rarely treated in texts, and, on the other hand, important for the study of linear
differential equations.

61. The Primary Decomposition

In this section we state a basic decomposition theorem for operators; the proof
is given in Appendix 111. It is not necessary to know the proof in order to use the
theorem, however.
In the rest of this section T denotes an operator on a vector space E, which may
be real or complex; but if E is real it is assumed that all eigenvalues of T are real.
Let the characteristic polynomial of T be given as the product
I

p(t)= fl ( t - Ak),'.
k-1

Here All . . , , Ak are the distinct roots of p ( t ), and the integer nk 2 1 is the multi-
plicity of h k ; note that nl + +
- nk = dim E.
+

We recall that the eigenspace of T belonging to Ak is the subspace


Ker(T - Ak) C E
(we write Ak for the operator Note that T is diagonalisable if and only if E
is the direct sum of the eigenspaces (for this means E has a basis of eigenvectors) .
We define the generalized eigenspace of T belonging to A k to be the subspace
E( T, X k ) = Ker(T - Ak)"k c E.
Note that this subspace is invariant under T .
The following primury decomposition theorem is proved in Appendix. 111.

Theorem 1 Let T be an operator on El where E is a complex vector space, or else E


is real and T has real eigenvalues. Then E is the direct sum of the generalized eigen-
spaces of T . The dimension of each generalized eigenspace equals the multiplicity of the
corresponding eigenvalue.
11. THE PRIMARY DECOMPOSITION 111

Let us see what this decomposition means. Suppose first that there is only one
eigenvalue X, of multiplicity n = dim E. The theorem implies E = E ( T,A). Put
N = T - XI, S XI.
+
Then, clearly, T = N S and SN = NS. Moreover, S is diagonal (in every basis)
and N is nilpotent, for E = E ( T , X) = Ker N". We can therefore immediately
compute
NL
"-1
eT E eS# = eh C -*
k-O k!'
there is no difficulty in finding it.
Example 1 Let T = [i 3. The characteristic polynomial is
p ( t ) = t2 - 4t + 4 = (t - 2)Z.
There is only one eigenvalue, 2, of multiplicity 2. Hence

s = [20 O]
2 ,
N = T - S = [ -1 -1

We know without further computation that N commutes with S and is nilpotent


of order 2: Nz = 0. (The reader can verify these statements.) Therefore
eT = eSeN = @(1 N ) +

More generally,
elT = elsetN = 8I(I + tN)
= 8
"
1 - 2t
t l+t
-7.
Thus the method applies directly to solving the differential equation x' = T x
(see the previous chapter).
For comparison, try to compute directly the limit of

In the general caae put


Tk = T I E(Xkl T ) .
Then T = TI . . e T,. Since each Tk has only the one eigenvalue At, we can
112 6. LINEAR SYSTEMS AND CANONICAL FORMS OF OPERATORS

apply the previous result. Thus


Tk = fh -k Nk; 81, Nk E L ( E ( h , T I ) ,

where sk = on E ( i k , T ), and Nk = Tk - sk is nilpotent of order nk. Then


T=S+N,
where
s = s1 @ ' * ' @ s,,
N = N1 * * . CZI N,.
Clearly, SN = N S . Moreover, N is nilpotent and S is diagonalizable. For if m =
max(nl, . . . , n,), then
N m = (Nl)m @ * . * @ (N,)m = 0;

and S is diagonalized by a basis for E which is made up of bases for the generalized
eigenspaces.
We have proved:

Theorem 2 Let T E L ( E ), where E is complex if T has a nonreal eigenvalue. Then


+
T = S N , where S N = N S and S is diagonalizable and N is nilpotent.

In Appendix I11 we shall prove that S and N are uniquely determined by T .


Using Theorem 2 one can compute the exponential of any operator T : E ---f E
for which the eigenvalues are known. (Recall we are making the general assumption
that if E is real, all the eigenvalues of T must be real.) The method is made clear
by the following example.
Example 2 Let T E L(R8) be the operator whose matrix in standard coordi-

[A -: :].
nates is
1 -2
To =

We analyze To with a view toward solving the differential equation


X' = T~x.
The characteristic polynomial of To can be read off from the diagonal because all
subdiagonal entries are 0 ; it is
p(t) = (t + 1 ) y t - 1).
The eigenvalues are - 1 with multiplicity 2, and 1 with multiplicity 1.
The two-dimensional generalized eigenspace of - 1 is spanned by the basis
a1 = (1, 0, O), % = (0,1,O);
this can be read off directly from the first two columns of To.
$1. THE PRIMARY DECOMPOSITION 113

The onedimensional generalized eigenspace of +1 is the solution space of the


system of equations
(To - 1)s = 0,
or

-2 1 -2
0 -2 4
0 0 0
one can verify that the vector
as = (0, 2, 1)
is a basis.
Let CB be the basis (ul, a,a*) of Ra. Let T = S + N be as in Theorem 2. In
(%-coordinates,S has the matrix

this follows from the eigenvalues of T being - 1, - 1, 1. Let Sobe the matrix of S
in standtwd coordinates. Then

where P is the inverse transpose of the matrix whose rows are ul, u2,u3. Hence

(PI)%=
0 1 0 ,
[: :]

Therefore
[: : I:
P= 0 1 -2.
114 6. LINEAR SYSTEMS AND CANONICAL FORMS OF OPERATORS

[A -A ;].
hfatrix niultiplication gives
0 0
so =

We can now find the matrix No of N in the standard basis for R',
No = To - So
-1 1 -2 -1 0 0

We have now computed the matrices of S and N. The reader might verify that
N* = 0 and SN = NS.
We compute the matrix in standard coordinates of es not by computing the matrix
eso directly from the definition) which involves an infinite series, but as follows:

=! E .I[.f 36 ; -!I1
exp(So) = exp(PISIP) = P'exp(S1)P
0 e-'0 0 1 0

1 0
which turns out to be

e-l
0
0
e-1 -2e-1+
O 1
0 0 e
It is easy to compute exp (N o ) :
$1. THE PRIMARY DECOMPOBITION 115

Finally, we obtain

[.
which gives
e-I e1 -2 e '
exp( To) = e1 -2e-;+ 2e].
0 0
It is no more difficult to compute ecTo,t E R. Replacing To by tT0 transforms So
to tSo,No to tNo,and so on; the point is that the same matrix Pis used for all values
to t. One obtains
exp (t To) = exp (tSo)exp (tNo)

=
1.
e-'
0
0
e
l -2e'+2et
1 -2t
t
0 1 0
: ] [ 0 0 1 ]

The solution of x' = Toz is given in terms of exp(tTo).


The following consequence of the primary decomposition is called the Cayley-
Hamilton theorem.

Theorem 3 Let A be any operator on a real or complex vector apace. Let ita charac-
teriatic polynomial be
n
p(t) = C adk.
L-0
Then p ( A ) = 0, that is,
I)

C akA*(x) = 0
L-0

for all x E E.
Proof. We may assume E = Rnor Cn;since an operator on Rnand its complexi-
fication have the same characteristic polynomial, there is no loss of generality in
assuming E is a complex vector space.
It suffices to show that P ( A )x = 0 for all x in an arbitrary generalized eigenspace
116 6. LINEAR aYamMs AND CANONICAL mms OF OPERATORS

E ( A , A), where p ( A ) = 0. Now


E(h, A) = Ker(A - A)-.,
where m is the multiplicity of A. Since ( t - A) divides p ( t ) we can write
P(t> = d O ( t - A)-.
Hence, for x E E(A, A ) :
= a(A)C(A - QnzJ
= q ( A ) ( O ) = 0.

$2. The S + N Decompoeition


Let T be an operator on Rnand TC:C +C its complexification.If TCis diagonal-
izable, we say T is semkimple.

Theorem 1 For any operatur T E L(Rn) there are unique operators S, N on Rn


+
uuch thut T = S N , SN = N S , S is semisimple, and N is nilpotent.
Proof. We have already Been a similar theorem for operators on complex vector
spaces; now we apply this result to prove the theorem for operatom on Rn.Let
Q: Cn -+ Cnbe the operator of conjugation (Chapter 4) ;if z = x iy E Cn,where+
2, y E Rn, then uz = x - iy. An operator Q on Cn ie the complexification of an
operator ou Rn if and only Qu = 4.
Given T E L (Rn) let TC E L ( 0 )be its complexification.By Theorem 2, Section
1, there are unique operators So,NOon Cnsuch that
TC = So + No,
SONo = NOSO,So diagonalisable, and NO nilpotent. We BBBeTt that So and No are
complexifications of operators on RR. This is proved by showing they commute
with Q, as follows. Put Sl= uSo0", Nl = uN0o-l. Then
Tc = UTCC' = S1+ N1.

It is easy to see that SIis diagonalisable, NI is nilpotent, and SlNl = NISI. There-
fore SO = SIand No = NI.This means that Soand No commute with u as asserted.
There are unique operators S, N in L(Rn)such that
So Sc, No = Nc.
Since the map A + dc is one-to-one, it follows that
SN = N S
62. THE s +N DECOMPOSITION 117

for
(SN - NS)c = SONO 0. - NOSO =

Similar reasoning shows that N is nilpotent, and also S N = T. The uniqueness+


of S and N follows from uniqueness of Soand No. This completes the proof.

Definition S is called the semisimple part of T and N the nilpotent part.

+
Let T = S N as in Theorem 1. Since Sc is diagonalizable, it follows from
Chapter 4 that in a suitable basis 63 of Rn,described below, S has a matrix of the
form

Here XI, . . . , A, are the real eigenvalues of T, with multiplicity; and the complex
numbers
ak + ibk; k = 1, . . . , s,
are the complex eigenvalues with positive imaginary part, with multiplicity. Note
that T, Tc,SC,and S have the same eigenvalues.
The exponential of the matrix tL, t E R is easy to calculate since

exp [f = eta [sincostbtb -sin&


cos tb
The basis 63 that gives S the matrix L is obtained as follows. The first r vectors
in 63 are from bases for the generalized eigenspaces of T that belong to real eigen-
values. The remaining 2s vectors are the imaginary and real parts of bases of the
generalized eigenspaces of Tc that belong to eigenvalues a ib, b > 0. +
In this way etT can be computed for any operator T,provided the eigenvalues
of T are known.
Example. Let T C L(R4) be the operator whose matrix in standard coordinates
118 6. LINEAR SYSTEMS AND CANONICAL FORMS OF OPERATORS

is

In C4the generalized i-eigenspace is the solution space of


( T o- i)*z = 0,
or of
-221 - 2izz = 0,
-2izl - 2z2 = 0,
-221 - 2za + Biz4 = 0,
-4izl - 2z2 - 2iza - 224 = 0.
These are equivalent to
21 = izz,
-23 + iz4 = izz.
As a basis for the solution space we pick the complex vectors
u = (i, 1, 0, l ) , v = (i, 1, -i, 0).
From these we take imaginary and real parts:
Iu = (1, 0, 0, 0) = el, I u = (1, 0, -1, 0 ) = es,
Ru = (0, 1, 0, 1) = ez, Rv = (0, 1, 0, 0 ) = e4.

1,
These four vectors, in order, form a basis (B of R'. This basis gives S the matrix

1 0
0 - .1
L 1 0
(We know this without further computation.)
The matrix of S in standard coordinates is

51 i3
so = P ' S I P ,
where P1is the transpose of the matrix of components of (B; thus

p' =
0 0 - 1 0
f$. THE s +N DECOMPOSITION 119

and one finds that

P=

I: : :I
0 0 - 1
0 1 0 - 1
0

I "1
Hence
0 -1 0

so = 0 1 0 - 1
1 O 01
0 0
The matrix of N in standard coordinates is then
No = To - So
0 -1 0 0'
0 0 0
1 0 -1
1 0 1 0.

=[; -1
0
which indeed is nilpotent of order 2 (where * denotes a zero).
The matrix of elT in standard coordinates is
exp(tTo) = exp(tN0 + tSo) = exp(tNo) exp(tSo)
= (I + tNo)Pexp(t&)P1.
From
cost -sin t
cos t
exp(tS1) =

the reader can complete the computation.


120 6. LINEAR SYSTEMS AND CANONICAL FORMS OF OPERATORS

PROBLEMS

1. For each of the following operators T find bases for the generalized eigenspaces;
give the matrices (for the standard basis) of the semisimple and nilpotent
parts of T.

-2 0 0 0 0 2 2 2 2
2 0 6 0 1 -2 3 3 3 3
4 4 4 4

2. A matrix [aij] such that aij = 0 for i 5 j is nilpotent.


3. What are the eigenvalues of a nilpotent matrix?
4. For each of the following matrices A , compute elA, t E R:
(4
1 0 0 1
1 0 0 1
0 -1 1 0,

5. Prove that an operator is nilpotent if all its eigenvaluea are zero.


6. The semisimple and nilpotent parts of T commute with A if T commutes
with A .
7. If A is nilpotent, what kind of functions are the coordinates of solutions to
X' = AX?

8. If N is a nilpotent operator on an n-dimensional vector space then Nn = 0.


9. What can be said about A B and A +
B if A B = BA and
(a) A and B are nilpotent?
(b) A and B are semisimple?
( c ) A is nilpotent and B is semisimple?
FJ. THE s +N DECOMPOSITION 121

10. If A and B are commuting operators, find a formula for the semisimple and
nilpotent parts of A B and A +
B in terms of the corresponding parts of A
and B. Show by example that the formula is not always valid if A and B do not
commute.
11. Identify Rn+' with the set P n of polynomials of degree 5 n, via the corre-
spondence
(any ... 1 G)* antn + * * * + + a.
alt

Let D : Pn+ Pnbe the differentiation operator. Prove D is nilpotent.


12. Find the matrix of D in the standard basis in Problem 11.
13. A rotation around a line in Ra and reflection in a plane in R* are semisimple
operators.
14. Let S be semisimpleand N nilpotent. If SN = N S = 0,then S = 0 or N = 0.
(Hint: Consider generalized eigenspaces of S N . ) +
15. If Tz = T, then T is diagonalizable. (Hint: Do not use any results in this
chapter !)

16. Find necessary and sufficient conditions on a, b, c, d in order that the operator
[ Z :] be
(a) diagonalizable; (b) semisimple; (c) nilpotent.
17. Let F C E be invariant under T E L ( E ) . If T is nilpotent, or semisimple, or
diagonalizable, so is T 1 F .

18. An operator T E L ( E ) is semisimple if and only if for every invariant subspace


F C El there is another invariant subspace F' C E such that E = F Q F'.

19. Suppose T is nilpotent and


k-1
Tk= C ajTj, aj E R.
i-0
Then Tb = 0.
20. What values of a,b, c, d make the following operators semisimple?

21. What values of a, b, c, d make the operators in Problem 20 nilpotent?


122 6. LINEAR SYSTEMS AND CANONICAL FORMS OF OPERATORS

63. Nilpotent Canonical Forms

In the previous section we saw that any operator T can be decomposed uniquely
as
T = S + N
with S semisimple, N nilpotent, and S N = N S . We also found a canonical form
for S, that is, a type of matrix representing S which is uniquely determined by
T , except for the ordering of diagonal blocks. In the complex case, for example,
S = diag(A1,. . . , A,,),
where All . . . , A,, are the roots of the characteristic polynomial of T, listed with
their proper multiplicities.
Although we showed how to find some matrix representation of N , we did not
give any special one. In this section we shall find for any nilpotent operator a matrix
that is uniquely determined by the operator (except for order of diagonal blocks).
From this we shall obtain a special matrix for any operator, called the Jordan
canonical form.
An elementary nilpotent block is a matrix of the form

with 1’s just below the diagonal and 0’s elsewhere. We include the one-by-one
matrix [ O ] .
If N : E E is an operator represented by such a matrix in a basis el, . . . , en,
then N behaves as follows on the basis elements:
N(ed = e2,
N(e2) = es,

N(e,-l) = en,
N(e,,) = 0.
It is obvious that N n (eb) = 0, k = 1, . . . , n; hence Nn = 0. Thus N is nilpotent
of order n. Moreover, N k # 0 if 0 5 k < n, since Nbel = ek+1 # 0.
In Appendix I11 we shall prove
$3. NILPOTENT CANONICAL FORMS 123

Theorem 1 Let N be a nilpotent operator on a real or complex vector space E . Then


E has a basis giving N a matrix of the form
A = diag(A1, . .. A,} ,
where A i s a n elemenlary nilpotent block, and the size of A k is a nonincreasing function
of k. The matrices All . . . A, are uniquely determined by the operator N .

We call the matrix in Theorem 1 the canonical form of N .


Let A be an elementary nilpotent matrix. It is evident that the rank of A is
n - 1; hence
dimKerA = 1.
This implies the following corollary of Theorem 1.

Theorem 2 I n Theorem 1 the number r of blocks is equal to dim Ker A .

We define the canonical form of a nilpotent matrix to be the canonical form


of the corresponding operator; this is a matrix similar to the original one. Since
similar matrices correspond to the same operator, it follows that they have the
same canonical form. From this we conclude:

Theorem 3 Two nilpotent n x n matrices, or two nilpotent operators on the same


vector space, are similar i f and only i f they have the same canonical form.

The question arises: given a nilpotent operator, how is its canonical form found?
To answer this let us examine a nilpotent matrix which is already in canonical
form, say, the 10 X 10 matrix

O1 0 I

G 01

K
N =
01
COl
COl
COl .
124 6. LINEAR SYSTEMS AND CANONICAL FORMS OF OPERATORS

We consider N as representing a nilpotent operator T on R'O. Consider the relations


between the following sets of numbers:
6k = dim Ker Tb, 1 6 k 6 10,
and
Vk = number of elementary nilpotent k X k blocks, 16 k 5 10.
Note that V k = 0 if k > 3. The numbers b k depend on the operator, and can be
computed from any matrix for T. On the other hand, if we know the v; we can
immediately write down the matrix N . The problem, then, is to compute the vk
in terms of the 6k.
Consider
61 = dim Ker T.

From Theorem 2 we find


61 = total number of blocks = V I + +
v2 VS.

Next, consider 6 2 = dim Ker P. Each 1 X 1 block (that is, the blocks [OJ)
contributes one dimension to Ker Tz. Each 2 X 2 block contributes 2, while the
3 X 3 block also contributes 2. Thus
62 = v1 + 2v2 + 2va.
For iSS = dim Ker Ts,we see that the 1 X 1 blocks each contribute 1; the 2 X 2
blocks each contribute 2; and the 3 X 3 block contributes 3. Hence
68 = v1 + 2v2 + 3vs.
In this example Ns= 0, hence 88 = 6k, k > 3.
For an arbitrary nilpotent operator T on a vector space of dimension n, let N
.
be the canonical form; define the numbers 6k and V k , k = 1, . . , n, as before. By
the same reasoning we obtain the equations

6, = VI + 2v2 + -.* + nv,.


We think of the 6k as known and solve for the V k . Subtracting each equation from
03. NILPOTENT CANONICAL WRlrlB 125

the one below it givea the equivalent system:

Subtrscting the second of these equations from the first givea

v1 = 26, - 82.
Subtracting the ( k + 1)th from the kth givea
Vk = -6k-1 + 26k - 1< k < n;
and the last equation givea vn. Thus we have proved the following theorem, in which
part (b) allowe us to compute the canonical form of any nilpotent operator:

Note that the equations in (b) can be aubaumed under the single equation

Vk = -6k-1 + 26k - bk+1,

valid for Cru integer8 k 2 1, if we note that ti0 = 0 and 6 k = a,, for k > n.
There is the more difficult problem of finding a basis that puts a given nilpotent
operator in canonical form. An algorithm is implicit in Appendix 111. Our point of
view, however, is to obtain theoretical information from canonical forms. For
example, the equations in the preceding theorem immediately prove that two nil-
potent qmator8 N,M on a veetor space E are similrrr if and only if dim Ker Nb =
dimKerMb for 1 5 k s d i m E .
+
For computational purposes, the S N decomposition ia usually adequate. On
the other hand, the exietence and uniqueneaa of the canonical forms ia important
for theory.
126 6. LINEAR SYSTEMS AND CANONICAL FORMS OF OPERATORS

PROBLEMS

1. Verify that each of the following operators is nilpotent and find its canonical
form :

0 0 2 3 0
2. Let N be a matrix in nilpotent canonical form. Prove N is similar to
(a) kN for all nonzero k E R,
(b) the transpose of N .
3. Let N be an n X n nilpotent matrix of rank r. If N R = 0, then k 2 n / ( n - r ) .
4. Classify the following operators on R4by similarity (missing entries are 0) :

$4. Jordan and Real Canonical Forms

In this section canonical forms are constructed for arbitrary operators.


We start with an operator T on E that has only one eigenvalue X; if X is nonreal,
we suppose E complex. In Section 1 we saw that
T=XI+N
with N nilpotent. We apply Theorem 1 of Section 3 and give E a basis (a =
{el, . . . , en) that gives N a matrix A in nilpotent canonical form. Hence T has the
+
&-matrix X I A . Since A is composed of diagonal blocks, each of which is an
84. JORDAE; -4ND REAL CANONICAL FORMS 127

elementary nilpotent matrix, X I + A has the form


(1)

!
A
1

* .

(Some of the diagonal blocks may be 1 X 1 matrices [A].) That is, X I A has +
A’s along the diagonal; below the diagonal are 1’s and 0’s; all other entries are 0.
+
The blocks making up X I A are called elementary Jordan matrices, or elementary
A-blocks. A matrix of the form (1) is called a Jordan matrix belonging to A, or briefly,
a Jordan A-block.
Consider next an operator T : E + E whose distinct eigenvalues are XI, . . . , A,;
as usual E is complex if some eigenvalue is nonreal. Then E = El e * * * e Em,
where Ek is the generalized Ak-eigenspace, k = 1, . . . , m. We know that T I Ek =
AkI + Nk with Nk nilpotent. We give Ek a basis adk, which gives T I Ek a Jordan
-
matrix belonging to Ak. The basis 63 = adl U . U a d k of E gives T a matrix of the
form
C = diag[CI,. . . , C , ) ,
where each Ck is a Jordan matrix belonging to Ak. Thus C is composed of diagonal
blocks, each of which is an elementary Jordan matrix C . The matrix C is called the
Jordan form (or Jordan matrix) of T.
We have constructed a particular Jordan matrix for T , by decomposing E as a
direct sum of the generalized eigenspaces of T . But it is easy to see that given any
Jordan matrix M representing T , each Jordan A-block of M represents the restric-
12s 6. LINEAR SYSTEMS AND CANONICAL FORMS OF OPERATORS

tion of T to the generalized A-eigenspace. Thus M must be the matrix we con-


structed, perhaps with the A-blocks rearranged.
It is easy to prove that similar operators have the same Jordan forms (perhaps
with rearranged A-blocks). For if P T P * = TI, then P maps each generalized
A-eigenspace of To isomorphically onto the generalized A-eigenspace of TI; hence
the Jordan A-blocks are the same for To and TI.
In summary:

Theorem 1 Let T E L(E) be a n operator; i f E is real, assume all eigenvalues of


T are real. Then E has a basis giving T a matrix in Jordan form, that is, a matrix
made up of diagonal blocks of the form ( 1 ) .

Except for the order of these blocks, the matrix is uniquely determined by T.
Any operator similar to T has the same Jordan form. The Jordan form can be
written A +B , where B is a diagonal matrix representing the semisimple part of
T while A is a canonical nilpotent matrix which represents the nilpotent part of
T; and A B = BA.
Note that each elementary A-block contributes 1 to the dimension of Ker ( T - A).
Therefore,

Proposition I n the Jordan form of an operator T, the number of elementary A-blocks


- A).
i s dim Ker( T

We turn now to an operator T on a real vector space E, allowing T to have non-


real eigenvalues. Let Tc:EC+ECbe the complexification of T. Then EC has a
basis @ putting TC into Jordan form. This basis @ is made up of bases for each
generalized cigenspace of Tc. We observed in Chapter 4, Section 2, that for a real
eigenvalue A, the generalized eigenspace Ec(Tc, A) is the complexification of a
subspace of E, and hence has a basis of vectors in E; the matrix of TC I E(Tc, A)
in this basis is thus a real matrix which represents T I E ( T, A ) . It is a Jordan A-block.
+
Let p = a ib, b > 0 be a nonreal eigenvalue of T. Let
I s + iu1, . . . , 2, + iy,)
be a basis for E ( p , Tc), giving TC I E ( p , Tc) a Jordan matrix belonging to p.
In Section 2 we saw that

E(P, Tc) eE(i4 Tc)


is the complexification of a subspace E , C E which is T-invariant; and the vectors

IYl, 51, * * 9 YP, XP)

are a basis for E . It is easy to see that in this basis, T 1 E, has a matrix composed
v. JORDAS AND REAL CANONICAL FORMS 129

of diagonal blocks of the form


(2)

or D,

where

Thus T I E, has a matrix of the form


(3) D
12 .
. .
. .

D
I2
. .
*

. .
12 L

Combining these bases, we obtain

Theorem 2 Let T : E -+ E be an operator on a real vector space. Then E has a basis


giving T a matrix composed of diagonal blocks of the forms (1) and (2). The diagonal
130 6. LINEAR SYSTEMS AND CANONICAL FORMS OF OPERATORS

elements are the real eigenvalues, with multiplicity. Each block G -3,b > 0, appears
+
as many times as the multiplicity of the eigenvalue a bi. Such a matrix is uniquely
determined by the similarity class of T , except for the order of the blocks.

Definition The matrix described in the theorem is called the real canonical form
of T . If T has only real eigenvalues, it is the same as the Jordan form. If T is nil-
potent, it is the same as the canonical form discussed earlier for nilpotent operators.

The previous theory applies to TCto show:

Proposition I n the real canonical form of a n operator T on a real vector space, the

I.
number of blocks of the form

X
1

. .
1 X

+
is dim Ker ( T - A). Th,enumber of blocks of the form ( 2 ) is dim Ker ( TC- ( a ib) ) .
The real canonical form of an operator T exhibits the eigenvalues as part of a
matrix for T . This ties them to T much more directly than their definition as roots
of the characteristic polynomial. For example, it is easy to prove:

Theorem 3 Let XI, . . . , X, be the eigenvalues (with multiplicities) of an operator T .


Then
(a) Tr(T) = XI + +A,,

(b) Det(T) = X I ... A,.


Proof. We may replace a real operator by its complexification, without changing
its trace, determinant, or eigenvalues. Hence we may assume T operates on a com-
plex vector space. The trace is the sum of the diagonal elements of any matrix for
T ; looking a t the Jordan form proves (a). Since the Jordan form is a triangular
matrix, the deterrfiinant of T is the product of its diagonal elements. This proves
(b) *

To compute the canonical form of an operator T we apply Theorem 4 of Section


3 to the nilpotent part of T - X for each real eigenvalue X, and to TC - ( a bi) +
+
for each complex eigenvalue a bi, b > 0. For each real eigenvalue X define vk ( A ) =
w. JORDAN AND REAL CANONICAL FORMS 131

!
number of k X k blocks of the form

X
1 .
. .
. .
1 h

in the real Jordan form of T ;and


&(A) = dim Ker(T - X)k.
For each complex eigenvalue X = a + bi, b > 0, define v k (1) = number of 2k X 2k

I*
blocks of the form

D
I

in the real Jordan form of T ; and


&(A) = dim Ker(Tc - A)
as a complex vector space. One obtains:

Theorem 4 Let T be an operator 072 a real n-dimensional vector space. T h e n the


real Jordan form of T i s determined by the following equations:

where X runs through all real eigenvalues and all complex eigenvalues with positive
imaginary part.

Example. Find the real canonical form of the operator

0 0 0

T=['0 1 0 -14

0 0 1
-8

'1 .
132 6. LINEAR SYSTEMS AND CANONICAL FORMS OF OPERATORS

The characteristic polynomial is


( t - (1 + ZJ(1 - (1 - i ) ) ( t- 2)2.
+
The eigenvalues are thus 1 i, 1 - i, 2, 2. Since 1 + i has multiplicity 1, there
can only be one block C: -;I. A computation shows
61(2) = 1.
This is proved most easily by showing that rank (T - 2) = 3. Hence there is only
one elementary 2-block. The real canonical form is thus:

There remains the problem of finding a basis that puts an operator in real canon-
ical form. An algorithm can be derived from the procedure in Appendix I11 for
putting nilpotent operators in canonical form. We shall have no need for it, however.

PROBLEMS

1. Find the Jordan forms of the following operators on C":

(a> [- 10 '1
0
(b) [: 4 1 (c)
l+i
[' + i
0 "3
2. Find the real canonical forms of the operators in Problem 1, Section 2.
3. Find the real canonical forms of operators in Problem 4, Section 2.
4. What are the possible real canonical forms of an operator on R" for n I 51
5. Let A be a 3 X 3 real matrix which is not diagonal. If (A + I)* = 0, find the
real canonical form of A.
6 . Let A be an operator. Suppose q ( X ) is a polynomial (not identically 0) such
that q ( A ) = 0. Then the eigenvalues of A are roots of q.
7. Let A , B be commuting operators on C" (respectively, Rn). There is a basis
putting both of them in Jordan (respectively, real) canonical form.
8. Every n X n matrix is similar to its transpose.
9. Let A be an operator on R".An operator B on R" is called a real logarithm
of A if eB = A. Show that A has a real logarithm if and only if A is an iso-
morphism and the number of Jordan X-blocks is even for each negative eigen-
value A.
85. CANONICAL FORMS AND DIFFERENTIAL EQUATIONS 133

10. Show that the number of real logarithms of an operator on Rnis either 0, 1,
or countably infinite.

05.

1.
Canonical Forms and DifTerential Equations

After a long algebraic digression we return to the differential equation


(1) Z' = A X , A E L(Rn).
Suppose A is Jordan A-block, A E R:

[.:I. 1 A

1
From the decomposition
A=X+N,

0
we fhd by the exponential method (Chapter 5 ) that the solution to (1) with initial
value ~ ( 0 =
) C E Rnis
z ( t ) = e"C = e t A e W

1 -c;
1 1
e-
21 .
. .

(n
P-'
- l)!
5
... -
21
1 CII
- *
134 6. LINEAR SYSTEMS AND CANONICAL FORMS OF OPERATORS

In coordinates,

Note that the factorials can be absorbed into the constants.


+
Suppose instead that X = a bi, b # 0, and that A is a real A-block:

D = [; -:I, I = [0
1 0
1.
1

Let m be the number of blocks D so that n = 2m. The solution to (1) can be com-
puted using exponentials. It is easiest to consider the equation
(3) Z' = Bz,
where z : R -+ Cmis an unknown map and B is the complex m X m matrix

, p=a+ib.

We identify Cm with RZmby the correspondence


(21 + @/I, * * J zm + iym) = (21, YI, - * 9 k , ym).
The solution to (3) is formally the same as ( 2 ) with a change of notation:
i-1 t k
(4) z,(t) = eF' C -C+k; j = 1, . . . , m.
k!
k-0

+
Put Ck = L k iMk,k = 1, . . . , m, and take real and imaginary parts of ( 4 ) ;
using the identity
= e'(cos bt + i sin b't)
one obtains
i-1 tk
(5) X j ( t > = B' C k! [ L j - k cos bt - Mi+ sin bt],
45. CANONICAL FORMS AND DIFFERENTIAL EQUATIONS 135

j = 1, . . . , m. This is the solution to (1) with initial conditions


xj(0) = Ljj y j ( 0 ) = Mj.
The reader may verify directly that (5) is a solution to (1).
At this point we are not so much intere$ted in the precise formulas (2) and (5)
as in the following observation:
(6) If A is real, each coordinate q ( t ) of any solution to (1) is a linear combination
(with constant coefficients) of the functions
eW, k = 0,. . , , n.
+
(7) If A = a bi, b # 0, and n = 2m,then each coordinate xi( t ) of any solution
to (1) is a linear combination of the functions
e.V cos bt, petksin bt; 05 k Im.

Consider now Eq. (1) where A is any real n X n matrix. By a suitable change
or coordinates x = P y we transform A into real canonical form B = P A P ’ . The
equation
(8) y‘ = By
is equivalent to (1) : every solution s(t)to (1) has the form
4 t ) = PY(t),
where y ( t ) solves ( 8 ) .
Equation (8) breaks up into a set of uncoupled equations, each of the form
U’ = B,u,
where B, is one of the blocks in the real canonical form B of A . Therefore the co-
ordinates of solutions to (8) are linear coordinates of the function described in (6)
+
and (7) , where A or a bi is an eigenvalue of B (hence of A ) . The same therefore
is true of the original equation (1).

Theorem 1 Let A E L(Rn)and let z ( t ) be a solution of x’ = A s . Then each co-


ordinate x, (t)is a linear combination of the findions
tbeb cos bt, Pela sin bt,
+
where a bi rum through all the eigenvalues of A with b 2 0 , and k and 1 run through
+
all the inbgers 0, . . . , n - 1. Moreover, for each A = a bi, k and 1 are less than
the size of the largest A-block in the real canonical fmm of A .

Notice that if A has real eigenvalues, then the functions displayed in Theorem 1
include these of the form Pe”.
This result does not tell what the solutions of (1) are, but it tells us what form
the solutions take. The following is a typical and very important application of
Theorem 1.
136 6. LINEAR SYSTEMS AND CANONICAL FORMS OF OPERATORS

Theorem 2 Suppose every eigenvalue of A E L(Rn)has negative red part. Then


lim x ( t ) = 0
I+w

for every solution to x' = Ax.


Proof. This is an immediate consequence of Theorem 1, the inequalities
IcosbtI 5 1, IsinbtI 5 1,
and the fact that
lim t W = 0 for all k if a < 0.
I+W

The converse to Theorem 2 is easy:

Theorem 3 If every solution of x' = Ax tends to 0 as t + 00 , then every eigenvalue


of A has negative real part.
+
Proof. Suppose p = a ib is an eigenvalue with a 2. 0. From ( 5 ) we obtain
a solution (in suitable coordinates)
z,(t) = d' cos bt,
yl(t) = eo' sin bt,
q ( t ) = Yj(t) = 0, j 2 1,
which does not tend to zero as t --f 0 0 .

An argument similar to the proof of Theorem 2 shows:

Theorem 4 If every eigenvalue of A E L(Rn)has positive real part, then


lim I z ( t ) I = 00
I+W

f o r every solution to x' = Ax.


The following corollary of Theorem 1 is useful:

Theorem 5 If A E L(Rn), then the coordinates of every solution to z' = A z are


in$nii?ely diferentiabk functions (that is, Cmfor all m).

PROBLEMS

1. (a) Suppose that every eigenvalue of A E L(Rn)has real part less than
-a < 0. Prow that there exists a constant k > 0 such that if z(l) is a
65. CANONICAL FORMS AND DIFFERENTIAL EQUATIONS 137

solution to x' = A x , then

I 4t) I I --01 I x(0) I


for all t 2 0. Find such a k and a for each of the following operators A :

2. Let A E L(Rn). Suppose all solutions of x' = A x are periodic with the same
period. Then A is semisimple and the characteristic polynomial is a power of
+
t2 a2, a E R.
3. Suppose at least one eigenvalue of A E L(R") has positive real part. Prove
that for any a E Rn, t > 0 there is a solution x(t) to x' = A x such that
Iz(0) - a 1 <t and lim I x(t)l = a.
t-00

4. Let A E L(Rn),and suppose all eigenvalues of A have nonpositive real parts.


(a) If A is semisimple, show that every solution of x' = A x is bounded (that
is, there is a constant M ,depending on x(O), such that 1 x(t)J 5 M for
all t E R).
(b) Show by example that if A is not semisimple, there may exist a solution
such that
'lim I x(t)I = co.
1-00

,5. For any solution to x' = A x , A E L(Rn), show that exactly one of the follow-
ing alternatives holds:
(a) liml+mx(t) = 0 and 1imt+-- I x(t)I = 0 0 ;
(b) liml+m1 x(t)I = w and lirnt--= x(t) = 0;
(c) there exist constants M , N > 0 such that
M < I x(t) I < N
for all t E R.
6. Let A E L(R4) be semisimple and suppose the eigenvalues of A are f a i , f b i ;
a > 0, b > 0.
(a) If a/b is a rational number, every solution to x' = A x is periodic.
(b) If a / b is irrational, there is a nonperiodic solution x(t) such that
M < 1 s(t)l < N
for suitable constant.s M, N > 0.
138 6. LINEAR SYSTEMS AND CANONICAL FORMS OF OPERATORS

$6. Higher Order Linear Equations

Consider the one-dimensional, nth order homogeneous linear differential equation


with constant coefficients
(1) g(n) + als(n-l) + . . . + a,,s = 0.

Here s: R + R is an unknown function, al, . . . ,a, are constants, and d k )means the
kth derivative of s.

Proposition 1 (a) A linear combination of solutions of (1) is again a solution.


(b) The derivative of a solution of (1) is again a solution.
Proof. By a linear combination of functions fl, . . . , f,,, having a common do-
main, and whose values are in a common vector space, we mean a function of the
form
j ( z ) = clfi(z) * * . + +
cnfrn(z),
where el, . . . ,c,,, are constants. Thus (a) means that if s1( t ) , . . . ,s,,,( t ) are solutions
of (1) and cl, . . . , c,,, are constants, then clsl(t) . - - + +
c,,,s,,,(t) is also a solution;
this follows from linearity of derivatives.
Part (b) is immediate by differentiating both sides of (1)-provided we know
+
that a solution is n 1 times differentiable! This is in fact true. To prove it, con-
sider the equivalent linear system
(2) z: = Q,

2,-*' = z,,
z,' = anzl - an-1z2 - - alzn.
If s is a solution to (1), then
z = (8, s', . . . , 8(n-1) )

1.
is a solution to (1). From Theorem 4, Section 1 we know that every solution to
(2) has derivatives of all orders.
The matrix of coefficients of the linear system (2) is the n X n matrix
(3) 0
f6. HIGHER ORDER LINEAR EQUATIONS 139

A matrix of this form is called the companion matrix of the polynomial


(4) p(X) = Xn + alln-* + - - + an-lX + an.
In Chapter 5 it was shown that this is the characteristic polynomial of A.
Companion matrices have special properties as operators. The key to solving (1)
is the following fact.

Proposition 2 Let X E C be a real or complex eigenvalue of a companion matrix


A. Then the real canonical form of A has only OTM M o c k .
Proof. We consider A as an operator on 0.
The number of X blocks is
dim Ker(A - A) ,
considering Ker(A - X) as a complex vector space.
The first n columns of A - X form the ( n - 1) X n matrix

which has rank n - 1. Hence A - X has rank n or n - 1, but rank n is ruled out
since X is an eigenvalue. Hence A - A has rank n - 1,so Ker (A - X) has dimension
1. This proves Proposition 2.

Definition A basis of solutions to (1) is a set of solutions sl, . . . , s, such that


every solution is expressible as a linear combination of sl, . . . , so in one and only
one way.

The following theorem is the basic result of this section.

Theorem The following n functions fwm a baPis for the solutions of (1) :
(a) the junction retA',where X runs through the dktinct red rook of the charm-
i!.eristic polynomial (4) , and k i s a nonnegative integer in the range 0 5 k <
multiplicity of A; together with
(b) the junctions
Pet cos bt and tLP1sin bt,
+
where a bi runs through the complex rook of (4) having b > 0 and k i s a
+
nonnegative i n h e r in the range 0 5 k < mzcltiplicity of a bi.
140 6. LINEAR SYSTEMS AND CANONICAL FORMS OF OPERATORS

Proof. We call the functions listed in the proposition basic functions. It follows
from Theorem 1 of the previous section that every solution is a linear combination
of basic functions.
The proof that each basic function is in fact a solution is given in the next section.
By Proposition 1 it follows that the solutions to (1) are exactly the linear combina-
tions of basic functions.
It remains to prove that each solution is a unique linear combination of basic
functions. For this we first note that there are precisely n functions listed in (a)
and (b) : the number of functions listed equals the sum of the multiplicities of the
real roots of p ( A ) , plus twice the sum of the multiplicities of the complex roots with
positive imaginary parts. Since nonreal roots come in conjugate pairs, this total
is the sum of the multiplicities of all the roots, which is n.
Define a map (a: Rn + Rn as follows. Let f1, . . . , fn be an ordering of the basic
functions. For each a = (a1,. . . , an) E Rnlet s.(t) be the solution
n
S. = C ajfjfi.
i-1

Define
(ab) = (scl(O), Sa'(O), - * * , .S.(~-~)(O)) E Rn.
It is easy to see that (a is a linear map. Moreover, (a is surjective since for each
(h, . . . , anJ E Rnthere is some Rolution s such that
(5) s(0) = ao, . . . , s(n-')(O) = an-1,

and s = 8. for some a. It follows that 8 is injective.


From this we see at once that every solution s is a unique linear combination
of the basic functions, for if .s = sg, then ~ ( a=) (a(@) and hence a = ,3.
This completes the proof of the theorem.

Theorem 1 reduces the solution of (1) to elementary linear algebra, provided


the roots and multiplicities of the characteristic polynomial are known. For example,
consider the equation
(6) s(') + 4s(') + 58'9' + 4s' + 4s = 0.
The roots of the characteristic polynomial
A' + 4x8 + 5 K + 4x + 4
are
-2, -2, i, -i.
Therefore the general solution is
(7) s ( t ) = Ae-2r + BLT*~+ C cos 1 + D sin t,
where A , B, C, D are arbitrary constants.
j6. HIGHER ORDER LINEAR EQUATIONS 141

To find a solution with given initial conditions, say,


(8) s(0) = 0,
s’(0) = -1,
s(2)(0) = -4,
s y o ) = 14,
we compute the left-hand side of (8) from (7) , to get:
(9) s(0) = A +C 0,
=

~’(0)= -2A + B + D = -1,


~(2)(0)= 4A - 4B - C = -4,
d8)(O) = -8A + 12B -D = 14.

The only solution to this system of equations is


A=C=O, B=l, D=-2.
Therefore the solution to ( 6 ) and (8) is
s(t) = te-2’ - 2 sin t.

PROBLEMS

1. Find a map s: R ---t R such that


s(*) - s(2) + 4s’ - 4s = 0,
s(0) = 1, s’(0) = -1, s”(0) = 1.
2. Consider equation (6) in the text. Find out for which initial conditions s(O),
s’(O), s”(0) there is a solution s ( t ) such that:
(a) s ( t ) is periodic; (b) limr,,s(t) = 0;
(c) lim;+, I s ( t ) I = CQ ; (d) I s ( t ) I is bounded for t 2 0;
(e) I s(2) I is bounded for all t E R.
3. Find all periodic solutions to
S(‘) + 2Sc2’+ 8 = 0.
4. What is the smallest integer n > 0 for which there is a differential equation
+
dn) als(n-l) + - - + a,,s
* = 0,
having among its solutions the functions
sin 2t, 4taeSL, -e-‘?
Find the constants all . . . , an E R.
142 6. LINEAR SYSTEMS AND CANONICAL FORMS OF OPERATORS

Q7. Operators on Function Spaces

We discuss briefly another quite different approach to the equation


(1) s(") + al~(n-1)+ * * * + a,,~ = 0; S: R +R.
Let 5 be the set of all infinitely differentiable functions R 3 R (one could also
use maps R + C) , Under multiplication by constants and addition of functions, 5
satisfies the axioms VS1, VS2 for a vector space (Chapter 3, Section 1, Part A) ;
but 5 is not finite dimensional.
Let D: 5 ---t 5 denote the diflerentiution operator; that is,
Df = f'.
Then D is a linear operator. Some other operators on 5 are:
multiplication off by a constant X, which we denote simply by X; note that If =f
and Of = 0;
multiplication off by the function i ( t ) = t, which we denote by t.
New operators can be built from these by composition, addition, and multiplica-
tion by constants. For example,
D2:5--15
assigns to f the function
D(Df) = B(f') = f";
similarly Dnf = f(n), the nth derivative. The operator D - X assigns to f the func-
tionf' - Xf.
More generally, let
p(t) = t" + apP-1 + - + a, ' *

be a polynomial. (There could also be a coefficient of t".) There is defined an


operator
p ( D ) = Dn alDn-' + +
.* * an-1D a n , + +
which assigns to f the function
f'n) + ad(,-') + * * * + an-d' + anf.
We may then rephrase the problem of solving (1) :find the kernel of the operator
P (a.
This way of looking at things suggests new ways of manipulating higher-order
equations. For example, if p (A) factors
PO) = d X ) r ( N ,
then clearly,
Ker r(D) C Ker p ( D ) .
87. OPERATORS ON FUNCTION BPACES 143

Moreover,
Ker q ( D ) C Ker p ( D ) ,
since qr = rq. In addition, if f E Ker q ( D ) and 9 E Ker r ( D ) , then f g E +
Ker p ( D ) .
We can now give a proof that if ( t - A)"' divides p ( t ) , then tke'h E Ker p ( D ),
0 I k 5 m - 1. It suffices to prove
(2) ( D - A)'+'~'S'~ =: 0, k = 0,1, . . . .
Note that D(elh) = Aea, or
( D - X)etX = 0.
Next, observe the following relation between operators :
Dt - tD = 1
(this means D ( t f ) - tDf = f, which follows from the Leibniz formula). Hence also
(D - X)t - t ( D - A) = 1.

It follows easily by induction that


(D - X)P - tk(D - A) = ktk'; k = 1, 2, .., .
Therefore
(D - X)k+l(t*e'h) = ( D - X)k(D - X)(tketA)
= ( D - X)'([P(D - X) + kt&')e'')
= k(D - X)k(t'-'e'h).
Hence ( 2 ) is proved by induction on k.
If X is interpreted as a complex number and p ( D ) has complex coefficients, the
proof goes through without change. If p ( D ) has real coefficients but X = a bi +
is a nonreal root, it follows that both the real and imaginary parts of Pelx are anni-
hilated by p (D) . This shows that
t'e' cos tb, Per sin tb
are in Ker p (D).
We have proved part of Theorem 1, Section 6 by easy "formal" (but rigorous)
methods. The main part-that every solution is a linear combination of basic
functions-can be proved by similar means. [SeeLinear Algebra by S. Lang, p. 213
(Reading, Massachusetts: Addison-Wesley, 1966).] Operators on function spaces
have many uses for both theoretical and practical work in differential equations.
Chapter 7
Contractions and Generic Properties
of Operators

In this chapter we study some important kinds of linear flows etA, particularly
contractions. A (linear) contraction is characterized by the property that every
trajectory tends to 0 as t 4 00. Equivalently, the eigenvalues of A have negative
real parts. Such flows form the basis for the study of asymptotic stability in Chapter
9. Contractions and their extreme oppositee, expansions, are studied in Section 1.
Section 2 is devoted to hyperbolic flows el-’, characterized by the condition that
the eigenvalues of A have nonzero real parts. Such a flow is the direct s u m of a
contraction and an expansion. Thus their qualitative behavior is very simple.
In Section 3 we introduce the notion of a generic property of operators on R”;
this means that the set of operators which have that property contains% dense
open subset of L(R”). It is shown that “semisimple” is a generic property, and
also, “generating hyperbolic flowsJJis a generic property for operators.
The concept of a generic property of operators is a mathematical way of making
precise the idea of “almost all” operators, or of a “typical” operator. This point is
discussed in Section 4.

$1. Sinka and Sources

Suppose that a state of some “physical” (or mechanical, biological, economic,


etc.) system is determined by the values of n parameters; the space of all states
is taken to be an open set U C Rn.We suppose that the dynamic behavior of the
system is modeled mathematically by the solution curves of a differential equation
(or dymmical eystem)
(1) X’ =f(x), f: U + R n .
We are interested in the long-run behavior of trajectories (that is, solution curves)
41. SINKS AND SOURCES 145

of (1). Of especial interest are equilibrium states. Such a state 5 E U is one that
does not change with time. Mathematically, this means that the constant map
t + f is a solution to (1) ; equivalently, f(*) = 0. Hence we define an equilib-
rium of (1) to be a point 5 E U such that f(5) = 0.
From a physical point of view only equilibria that are “stable” are of interest.
A pendulum balanced upright is in equilibrium, but this is very unlikely to occur;
moreover, the slightest disturbance will completely alter the pendulum’s behavior.
Such an equilibrium is unstable. On the other hand, the downward rest position is
stable; if slightly perturbed from it, the pendulum will swing around it and (because
of friction) gradually approach it again.
Stability is studied in detail in Chapter 9. Here we restrict attention to linear
systems and concentrate on the simplest and most important type of stable
equilibrium.
Consider a linear equation
(2) X’ = AX, A E L(R”).
The origin 0 E Rn is called a sink if all the eigenvalues of A have negative real
parts. We also say the linear flow e l A is a contraction.
In Chapter 6, Theorems 2 and 3, Section 5 , it was shown that 0 is a sink if and
only if every trajectory tends to 0 as t + w . (This is called asymptotic stability.)
From Problem 1, Section 5 of that chapter, it follows that trajectories approach
a sink exponentially. The following result makes this more precise.

Theorem 1 Let A be an operator on a vector space E . The following statements are


equivalent :
(a) The origin is a sink for the dynamical system x‘ = Ax.
(b) For any norm in E there are constants k > 0, b > 0 such that
I e f A x I 5 k e MI x I
for all t 2 0 , x E E.
(c) There exists b > 0 and a basis of E whose corresponding norm satisfis
I e% al3 5 e“I x al3
for all t 2 0 , x E E.
Proof. Clearly, (c) implies (b) by equivalence of norms; and (b) implies (a)
by Theorem 3 of Chapter 6, Section 5 . That (a) implies (b) follows easily from
Theorem 1 of that section; the details are left to the reader.
It remains to prove that (a) implies (c). For this we use the following purely
algebraic fact, whose proof is postponed.
Recall that R X is the real part of A.

Lemma Let A be an operator on a real vector space E and suppose


(3) ar<RX<B
146 7. CONTRACTIONS AND GENERIC PROPERTIES OF OPERATORS

for every eigenvalue X of A . Then E has a basis such that in the corresponding inner
product and norm,
(4) a I 5 I* I (Ax,x) I B I x I*
for all x E E .

Assuming the truth of the lemma, we derive an estimate for solutions of x’ = Ax.
Let ( q ,. . . , 2,) be coordinates on E corresponding to a basis &3 such that (4)
holds, and let
~ ( t=
) ( s ( t ) , - .. 3 zn(t))
be a solution to x’ = A z . Then for the norm and inner product defined by &3 we have

Hence

Therefore, from (4), we have

or
a
ar<-loglsl</3.
at
It follows by integration that
at 5 log I x ( t ) ) - log IX(0)l IBt;
hence

or
I 4 0 ) I II z(t>I 5 I .(O) I.
Theorem 1 is proved by letting B = - b < 0 where the eigenvalues of A have
real parts less than -b.
We now prove the lemma; for simplicity we prove only the second inequality
of (4).
01. 8INKS AND 80URCE8 147

Let c E R be such that


RX<c<B
for every eigenvalue X of A.
Suppose first that A is semisimple. Then Rn has a direct sum decomposition
. E , @ F ~ E$Fa,
R " = E i ~ l s -@ ~.*.
where each Ej is a ondimensional subspace spanned by an eigenvector e, of A
corresponding to a real eigenvalue Xj; and each F k is a two-dimensional subspace
invariant under A, having a basis (fj, gj) giving A I Fk the matrix

where a k + ibk
[; -3,
is an eigenvalue of A. By assumption
x j < c, ak < c.
Given Rn the inner product defined by
(ej, ei) = (.fk,6) = k k , gk) = 1,
and all other inner products among the ej, 6,and g k being 0. Then a computation
shows
(Aej, e j ) = X j < C, ( A f k , f k ) = a k < C;

it follows easily that


(Ax,z) 5 c 12 I*
for all x E Rn, as required.
Now let A be any operator. We finst give Rn a basis so that A has a matrix in real
canonical form
A = d i s g ( A i , . .. ,Ap),
where each Aj has the form
(5)

or

. .
, Dj=[z 1I:- I =[
1 0
0 1
1.
148 7. CONTRACTIONS AND GENERIC PROPERTIE6 OF OPERATORB

If we give a subspace Ej of E, corresponding to a block A , a bssis satisfying the


lemma for A,, then all these baaea together fulfill the lemma for A. Therefore we
m a y assume A is a single block.

1,
+
For the first kind of block ( 5 ) , we can write A = S N where S haa the matrix
u,I and N haa the matrix

1 0
Thus the basis vectors (el, , . . , en) are eigenvectora of S, while
Nel = 9,

Nen-l = en,

Nen = 0.

Let t > 0 be very small and consider a new basis


(B,
K 1
= e l , - e l . . . , s e n ] = (31,

63,is again composed of eigenvectore of S, while now


... Sn\.

Na1 = &a,

N+%= drl

Thus the a,matrix of A is

t .
. .
(7) . .
t aj

Let (5, v), denote the inner product corresponding to a,.It is clear by considering
the matrix (7) that
( A 4 +-( S X l 4 &9 c--ro*

(X? 4, 12 I*
81. SINKS AND SOURCES 149

Therefore if c is sufficiently small, the basis (B, satisfies the lemma for a block ( 5 ) .
The case of a block (6) is similar and is left to the reader. This completes the proof
of the lemma.
The qualitative behavior of a flow near a sink has a simple geometrical inter-
pretation. Suppose 0 € Rn is a sink for the linear differential equation x' = f (2).
Consider the spheres
Sa= (xER"IIxI = a ) , a>0,
where I z I is the norm derived from an inner product as in the theorem. Since I x ( t ) I
has negative derivatives, the trajectories all point inside these spheres as in Fig. A.

FIG. A

We emphasize that this is true for the spheres in a special norm; it may be false
for some other norm.
The linear flow ef-' that has the extreme opposite character to a contraction is an
expangion, for which the origin is called a source: every eigenvalue of A has positive
real part. The following result is the analogue of Theorem 1 for expansions.

Theorem 2 If A E L ( E ) , the following are equivalent:


(a) The origin is a source for the dynamical system x' = Ax;
(b) For any n m on E , there are constants L > 0,a > 0 such that
I e% I 2 Lela I x I
for all t 2 0, x E E .
(c) There exists a > 0 and a basis (B of E whose corresponding norm satisfis
I e% 1 era I x I
for all t 2 0,x E E.
150 7. CONTRACTIONS AND GENERIC PROPERTIES OF OPERATORS

The proof is like that of Theorem 1, using the lemma and the first inequality of
(4) *

PROBLEMS

1. (a) Show that the operator A = [-:


f] generates a contracting flow etA
(b) Sketch the phase portrait of x' = A x in standard coordinates.
( c ) Show that it is false that I erAzI 5 I x I for all t 2 0, x E R2,where I x 1
is the Euclidean norm.
2. Let e t A be a contraction in Rn.Show that for 7 > 0 sufficiently large, the norm
11 x 11 on R" defined by
11x11 = /'I@XIdS
0

satisfies, for some X > 0,


11 elAx 11 2 e-xt 11 x 11.
3. (a) If elB and e l A are both contractions on Rn,and B A = AB, then
is a contraction. Similarly for expansions.
(b) Show that (a) can be false if the assumption that A B = B A is dropped.
4. Consider a mass moving in a straight line under the influence of a spring. As-
sume there is a retarding frictional force proportional to the velocity but oppo-
site in sign.
(a) Using Newton's second law, verify that the equation of motion has the
form
mx" = ax' + bx; m > 0, a < 0, b < 0.
(b) Show that the corresponding first order system has a sink at (0, 0 ) .
(c) What do you conclude about the long-run behavior of this physioal
system?
5 . If etA is a contraction (expansion), show that is an expansion (respec-
tively, contraction). Therefore a contraction is characterized by every trajec-
tory going to m as t + - a0 ; and an expansion, by every trajectory going to
Oast+--.

$2. Hyperbolic Flows

A type of linear flow elA that is more general than contractions and expansions is
the hyperbolic flow: all eigenvalues of A have nonzero real part.
92. HYPERBOLIC FLOWS 151

After contractions and expansions, hyperbolic linear flows have the simplest
types of phase portraits. Their importance stems from the fact that almost every
linear flow is hyperbolic. This will be made precise, and proved, in the next section.
The following theorem says that a hyperbolic flow is the direct sum of a contrac-
tion and an expansion.

Theorem Let e'A be a hyperbolic linear Jlow, A E L ( E ). Then E has a direct sum
decomposition
E = En e EU
invariant under A , such that the induced Jlow on En is a contraction and the induced
flow on EUis an expansion, This decomposition i s unique.
Proof. We give E a basis putting A into real canonical form (Chapter 6 ) . We
order this basis so that the canonical form matrix first has blocks corresponding to
eigenvalues with negative real parts, followed by blocks corresponding to positive
eigenvalues. The first set of blocks represent the restriction of A to a subspace
En C E, while the remaining blocks represent the restriction of A to Eu C E.
Since En is invariant under A , it is invariant under elA. Put A I fl = A , and
A I Eu = A,. Then etA 1 En = elA.. By Theorem 1, Section 1, elA I Enis a contraction.
Similarly, Theorem 2, Section 1 implies that e l A I EU is an expansion.
Thus A = A , Q A , is the desired decomposition.
To check uniqueness of the decomposition, suppose Fn Q FUis another decom-
position of E invariant under the flow such that elA 1 FBis a contraction and elA I FU
is an expansion. Let x E FB.We can write
x = y + z, y E En, z E Eu.
Since e t A z+ 0 as t + Q) , we have etAy+ 0 and e% + 0. But
1 etAz1 2 eta I z 1, a > 0,
for all t 2 0. Hence 1 z 1 = 0. This shows that FB C Em.The same argument shows
that En C FB;hence En = FB. Similar reasoning about ectA shows that E' = FU.
This completes the proof.

A hyperbolic flow may be a contraction (EU= 0) or an expansion ( E l = 0).


If neither EUnor En is 0, the phase portrait may look like Fig. A in the two-dimen-
sional case or like Fig. B in a three-dimensional case.
If, in addition, the eigenvalues of A I En have nonzero imaginary part, all tra-
jectories will spiral toward E' (Fig. C).
Other three-dimensional phase portraits are obtained by reversing the arrows in
Figs. B and C.
The letters s and u stand for stable and unstable. Enand E' are sometimes called
the stable and unstable subspaces of the hyperbolic flow.
152 7. CONTRACTIONS AND GENERIC PROPERTIES OF OPERATORS

9
EU

FIG. B

IE U
FIG. C
93. GENERIC PROPERTIEB OF OPERATOW! 153

PROBLEMS

1. Let the eigenvalues ofA E,L(R33be A, p , v. Notice that erAis a hyperbolic flow
and sketch its phase portrait if:
(a) X<p<v<O;
+
(b) X < 0, p = u bi, u < 0, b > 0;
(c) X < O , p = a + b i , a > 0 , b > 0;
(d) X < 0 < p = Y and A is semisimple;
(e) X < p < O < v .

2. elA is hyperbolic if and only if for each x # 0 either

or

3. Show that a hyperbolic flow has no nontrivial periodic solutions.

63. Generic Pmpertiea of Operators

Let F be a normed vector space (Chapter 5 ) . Recall that a set X C F is open


if whenever x E X there is an open ball about x contained in X ;that is, for some
a > 0 (depending on x) the open ball about x of radius a,

is contained in X. From the equivalence of norms it follows that this definition is


independent of the norm; any other norm would have the same property (for
perhapa a different radius a).
Using geometrical language we say that if x belongs to an open set X,any point
sufficiently near to 2 also belongs to X.
Another kind of subset of F is a dense set: X C F is dense if every point of F
is arbitrarily close to points of X. More precisely, if x E F, then for every c > 0
there exists some y E X with I x - y I < c. Equivalently, U n X is nonempty for
every nonempty open set U C F.
154 7. CONTRACTIONS AND GENERIC PROPERTIES OF OPERATORS

An interesting kind of subset ofX is a set X C F which is both open and dense. It
is characterized by the following properties : every point in the complement of F
can be approximated arbitrarily closely by points ofX (because X is dense) ; but no
point in X can be approximated arbitrarily closely by points in the complement
(because X is open).
Here is a simple example of a dense open set in R2:

x= { ( Z J y) E Rz 1 Zy # 1).
This, of course, is the complement of the hyperbola defined by sy = 1. If zoyo# 1
and I s - 20 1, I y - yo I are small enough, then zy # 1; this proves X open. Given
any (so,yo) E R2, we can find ( 5 , y) as close as we like to (so,yo) with s y # 1; this
proves X dense.
More generally, one can show that the complement of any algebraic curve in
R2 is dense and open.
A dense open set is a very fat set, as the following proposition shows:

Proposition Let X I , . ..,X , be dense open sets in F . Then


x = xln ..-nx,
is also dense and open.

Proof. It can be easily shown generally that the intersection of a finite number
of open sets is open, so X is open. To prove X dense let U C F be a nonempty
open set. Then U n XIis nonempty since XIis dense. Because U and XIare open,
U n XIis open. Since U n X1is open and nonempty, ( U n Xl) n XIis nonempty
because Xzis dense. Since X1is open, U n XIn Xais open. Thus ( U n X1n X2)n XI
is nonempty, and so on. So U n X is nonempty, which proves that X is dense in F.
Now consider a subset X of the vector space L(Rn). It makes sense to call X
dense, or open. In trying to prove this for a given X we may use any convenient
norm on L(Rn). One such norm is the B-max norm, where is a basis Rn:
= max{I aij I I [aij]
11 T 1(arnu = (%-matrixof T ) .

A property 6 that refers to operators on Rn is a generic property if the set of opera-


tors having property 6 contains a dense open set. Thus a property is generic if it is
shared by some dense open set of operators (and perhaps other operators aa well).
Intuitively speaking, a generic property is one which "almost all" operators have.

Theorem 1 The set $1 of operalors on Rn that have n distinct eigmalzces is &me


and open in L(Rn).
63. GENERIC PROPERTIES OF OPERATORS 155

Proof. We first prove S1 dense. Let T be an operator on R".Fix a basis @ put-


ting T in real canonical form.
The real canonical form of T can be written as the sum of two matrices

T=S+N,
where c

XI

Xr
S=
DI

- D,

-ak bkl, bk>o;i= l,...ls;

and
c

0
1

. .
1 0
N =
I2 02
. .

-
0 0
O1I 1 02 = [o O].
The eigenvalues of T (with multiplicities) are XI, ... Xr, and a1 f ibl, . . . ,
a, f ib,.
Given t > 0, let
A:, . .. A:, a;, . . . ,a:
156 7. CONTRACTIONS AND GENERIC PROPERTIES OF OPERATORS

be distinct real numbers such that


I - xj I < e and 1 al - a k I < t.
Put
A:

x:
0:

D,

+
and T' = S' N . Then the a-max norm of T - T' is less. than t, and the eigen-
values of TI are the n distinct numbers
x:, . . , x:,
* a: f ibl, . . , a: f ib,.
*

This proves that S1 is dense.


To prove that Sl is open we argue by contradiction. If it is not open, then there
is a sequence A1, A2, . . . of operators on Rnthat are not in S1 but which converges
to an operator A in Sl. There is an upper bound for the norms of the A k and hence
for their eigenvalues. By assumption each A k has an eigenvalue As of multiplicity
at least two.
Suppose at first that all x k are real. Passing to a subsequence we may assume that
k k 4X E S1. For each k, there are two independent eigenvectors x k , Y k for A k be-
longing to the eigenvalue hk. We may clearly suppose I x k I = I Y k I = 1. Moreover
we may assume X k and y k orthogonal, otherwise replacing y k by
'!/k - (yk, xk)xk/l yk - ( y k , x k ) x k 1.
Passing again to subsequences we may assume x k + x and y k + y . Then x and y
are independent vectors. From the relations A k x k = h k x k and A k y k = A k y k we find
in the limit that Ax = Ax and A y = Ay. But this contradicts A E S1.
If some of the x k are nonreal, the same contradiction is reached by considering
the complexifications of the A k ; now x k and y k are vectors in Cn. In place of the
Euclidean inner product on Rn we use the Hemitian inner product on Cn defined
by (2, w ) = & z&, and the corresponding norm I z I = (2, z)l12. The rest of the
argument is formally the same aa before.

Note that the operators in S1 are all semisimple, by Chapter 4. Therefore an


immediate consequence of Theorem 1 is
$3. GENERIC PROPERTIES OF OPERATORS 157

Theorem 2 Semisimplicity is a generic property in L(Rn).

The set of semisimple operators is not open. For example, every neighborhood
of the semisimple operator I E L(R2) contains a nonsemisimple operator of the
form 3. [:
We also have

Theorem 3 The set


& = ( T E L(Rn) I elT is a hyperbolic flow)
is open and dense in L (R").
Proof. In the proof of density of S1 in Theorem 1, we can take the numbers
A:, . . . , A:,
a:, . . . , a: (the real parts of eigenvalues of T')to be nonzero; this proves
density. Openness is proved by a convergence argument similar to (and easier than)
the one given in the proof of Theorem 2.

PROBLEMS

1. Each of the following properties defines a set of real n X n matrices. Find out
which sets are dense, and which are open in the space L(Rn) of all linear opera-
tors on Rn:
(a) determinant # 0;
(b) trace is rational;
(c) entries are not integers;
(d) 3 _< determinant < 4;
(e) - 1 < I X I < 1 for every eigenvalue A;
(f) no real eigenvalues;
(g) each real eigenvalue has multiplicity one.
2. Which of the following properties of operators on Rn are generic?
(a) I X I # 1 for every eigenvalue A;
(b) n = 2 ; some eigenvalue is not real;
(c) n = 3 ; some eigenvalue is not real;
(d) no solution of z' = A z is periodic (except the zero solution) ;
(e) there are n distinct eigenvalues, with distinct imaginary parts;
(f) A z # z and A z # - z for all z # 0.
3. The set of operators on R n that generate contractions is open, but not dense, in
L(R") . Likewise for expansions.
158 7. CONTRACTIONS AND GENERIC PROPERTIES O F OPERATORS

4. A subset X of a vector space is residual if there are dense open sets Ak C E,


k = 1,2, . . . , such that (7 Ak C X .
(a) Prove the theorem of Baire: a residual set is dense.
n
(b) Show that if XIis residual, k = 1, 2, . . . , then Xkis residual.
(d) If the set Q C C is residual, show that the set of operators in Rn whose
eigenvalues are in Q, is residual in L(Rn).

@. The Significance of Genericity

If an operator A E L(Rn) is semisimple, the differential equation x’ = A z breaks


down into a number of simple uncoupled equations in one or two dimensions. If
the eigenvalues of A have nonzero real parts, the differential equation might be
complicated from the analytic point of view, but the geometric structure of the
hyperbolic flow elA is very simple: it is the direct sum of a contraction and
an expansion.
In Section 3 we showed that such nice operators A are in a sense typical. Pre-
cisely, operators that generate hyperbolic flows form a dense open set in L (R”) ;
while the set of semisimple operators contains a dense open set. Thus if A generates
a hyperbolic flow, so does any operator sufficiently near to A . If A does not, we can
approximate A arbitrarily closely by operators that do.
The significance of this for linear differential equations is the following. If there
is uncertainty as to the entries in a matrix A , and no reason to assume the contrary,
we might as well assume that the flow elA is hyperbolic. For example, A might be
obtained from physical observations; there is a limit to the accuracy of the measur-
ing instruments. Or the differential equation 2’ = Ax may be used as an abstract
model of some general physical (or biological, chemical, etc.) phenomenon; indeed,
differential equations are very popular as models. In this case it makes little sense
to insist on exact values for the entries in A .
It is often helpful in such situations to assume that A is as simple as possible-
until compelled by logic, theory or observation to change that assumption. It is
reasonable, then, to ascribe to A any convenient generic property. Thus it is com-
forting to w u m e that A is semisimple, for then the operator A (and the flow elA)
are direct sums of very simple, easily analyzed one- and two-dimensional types.
There may, of course, be good reasons for not assuming a particular generic
property. If it is suspected that the differential equation x’ = A s has natural
symmetry properties, for example, or that the flow must conserve some quantity
(for example, energy), then assumption of a generic property could be a mistake.
Chapter 8
Fundamental Theory

This chapter is more difficult than the preceding ones; it is also central to the
study of ordinary differential equations. We suggest that the reader browse through
the chapter, omitting the proofs until the purpose of the theorems begins to fit
into place.

$1. Dynamical Systems and Vector Fields

A dynamical system is a way of describing the passage in time of all points of a


given space 8. The space S could be thought of, for example, as the space of states
of some physical system. Mathematically, S might be a Euclidean space or an open
subset of Euclidean space. In the Kepler problem of Chapter 2, S was the set of
possible positions and velocities; for the planar gravitational central force problem,
S = ( R 2 - 0 ) X R z = { ( z , u )E R 2 X R * I z # 0 ) .
A dynamical system on S tells us, for z in S,where 2 is 1 unit of time later, 2 units
of time later, and so on. We denote these new positions of z by xl,zz,respectively. At
time zero, 2 is at z or zo.One unit before time zero, z was at xl. If one extrapolates
to fill up the real numbers, one obtains the trajectory zr for all time t . The map
R + 8, which sends t into z4,is a curve in S that represents the life history of z aa
time runs from - QO to 00.
It is assumed that the map from R X S + S defined by ( t , z) + z 4is continuously
differentiable or at least continuous and continuously differentiable in t. The map
&: 8 + 8 that takes z into zr is defined for each t and from our interpretation aa
states moving in time it is reasonable to expect q h to have as an inverse h. Also,
t#b should be the identity and 4, (4, (2) ) = I&+,( z) is a natural condition (remember
4dZ> = a ) .
160 8. FUNDAMENTAL THEORY

We formalize the above in the following definition:


A dynamical system is a C1map R X S 24. S where S is an open set of Euclidean
space and writing + ( t , x) = $((x), the map +(: 8 + 8 satisfies
(a) O0: S + 8 is the identity;
(b) The composition for each t, s in R.
Qg = 4t+g
I$( 0

Note that the definition implies that the map 4f: S + S is C1for each t and has a
C1inverse t&t (take s = - t in (b)).
An example of a dynamical system is implicitly and approximately defined by
the differential equations in the Newton-Kepler chapter. However, we give a pre-
cise example aa follows.
Let A be an operator on a vector space E ; let E = S and 4: R X S --+ S be de-
fined by + ( t , 2) = e%. Thus 41:S + S can be represented by +t = etA. Clearly,
40 = eO = the identity operator and since e(l+')" = etAegA,we have defined a dy-
namkal system on E (see Chapter 5 ) .
This example of a dynamical system is related to the differential equation dx/dt =
A x on E . A dynamical system on S in general gives rise to a differential equation
on S, that is, a vector field on 8,f: S -+ E. Here S is supposed to be an open set in
the vector space E . Given qh, define f by

thus for x in 8,f ( x ) is a vector in E which we think of as the tangent vector to the
curve t 4 t$l(x) at t = 0. Thus every dynamical system gives rise to a differential
equation.
We may rewrite this in more conventional terms. If &: S + S is a dynamical
system and x E S, let x ( t ) = qit(2) , and f : S --+ E be defined aa in (1). Then we
may rewrite (1) as
(1') 2' = f(x).

Thus z ( t ) or r#~((z)is the solution curve of (1') satisfying the initial condition
x ( 0 ) = x. There is a converse process to the above; given a differential equation
one has associated to it, an object that would be a dynamical system if it were
defined for all t. This process is the fundamental theory of differential equations
and the rest of this chapter is devoted to it.
The equation (1') we are talking about is called an autonomous equation. This
means that the function f does not depend on time. One can also consider a C1
map f: I X W -+ E where I is an interval and W is an open set in the vector space.
The equation in that case is
(2) 2' = f ( t , 5)

and is called nonautonomous. The existence and uniqueness theory for (1') will
02. THE FUNDAMENTAL THEOREM 161

be developed in this chapter; (2) will be treated in Chapter 15. Our emphasis in
this book is completely on the autonomous case.

02. The Fundamental Theorem

Throughout the rest of this chapter, E will denote a vector space with a norm;
W C E, an open set in E; and f: W + E a continuous map. By a 8 0 l ~ t i o nof the
differential equation
(1) z’ = f(z)
we mean a differentiable function
u :J - W
defined on some interval J C R such that for all I! E J
u’(t> = f ( 4 t ) ) .
Here J could be an interval of real numbere which iS open, closed, or half open, half
closed. That is,
(a, b ) = ( 1 E R I a < 1 < b ) ,
or
[a, b ] = ( 1 E R I a 5 I! 5 b ) ,
or
(a, b ] = ( t E R I a < 1 5 b ) ,
and so on. Also, a or b could be 00, but intervals like (a, ao] are not allowed.
Geometrically, u is a curve in E whose tangent vector u’(1) equals f ( u ( 1 ) ;) we
think of this vector as based at u(1). The map f : W + E is a vector field on W.A
solution u may be thought of aa the path of a particle that move8 in E 80 that at
time t, its tangent vector or velocity is given by the value of the vector field at the
position of the particle. Imagine a dust particle in a steady wind, for example, or
an electron moving through a constant magnetic field. See ah0 Fig. A, where u(t0) =
z,u’(t0) = f(4.

FIG. A
162 8. FUNDAMENTAL THEORY

FIG. B

An initial condition for a solution u:J --t W is a condition of the form u ( k ) = xo


where k E J, zo E W . For simplicity, we usually take to = 0.
A differential equation might have several solutions with a given initial condition.
For example, consider the equation in R,
5’ = 3x2/s.
Here W = R = E , f: R R is given by f(x) = 3x21’.
The identically zero function uo: R +R given by % ( t ) = 0 for all t is evidently
a solution with initial condition u ( 0 ) = 0. But so is the function defined by s ( t ) =
P. The graphs of some solution curves are shown in Fig. B.
Thus it is clear that to ensure unique solutions, extra conditions must be imposed
on the function f. That f be continuously differentiable, turns out to be sufficient,
as we shall see. Thus the phenomenon of nonuniqueness of solutions with given
initial conditions is quite exceptional and rarely arises in practice.
In addition to uniqueness of solutions there is the question of existence. Up to
this point, we have been able to compute solutions explicitly. Often, however, this
is not possible, and in fact it is not a priori obvious that an arbitrary differential
equation has any solutions at all.
We do not give an example of a differential equation without a solution because
in fact (1) haa a solution for all initial conditions provided f is continuous. We
shall not prove this; instead we give an easier proof under hypotheses that also
guarantee uniqueness.
The following is the fundamental local theorem of ordinary differential equations.
It is called a “local” theorem because it deals with the nature of the vector field
f: W 4E near some point q of W .

Theorem 1 Let W C E be an open subset of a normed vector space, f: W 3 E a C1


(continuwly diferentiable) map, and xo E W . Then there is some a > 0 and a unique
$3. EXISTENCE AND UNIQUENESS 163

solution
5: (-a, a) +W
of the diflerential equation
5' = f ( z )
satisfying the initial condition
z(0) = xo.

We will prove Theorem 1 in the next section.

$3. Existence and Uniqueness

A function f : W + E , W an open set of the normed vector space E , is said to be


Lipschitz on W if there exists a constant K such that
Jf(Y) -f(s)l _ < K l Y - - z J
for all 2, y in W . We call K a Lipschitz constant for f .
We have assumed a norm for E. In a different norm f will still be Lipschitz be-
cause of the equivalence of norms (Chapter 5) ;the constant K may change, however.
More generally, we call f locally Lipschitz if each point of W (the domain o f f )
has a neighborhood Wo in W such that the restriction f I W ois Lipschitz. The Lip-
schite constant off I Womay vary with Wo.

Lemma Let the function f : W + E be C1. Then j is locally Lipschitz.

Before giving the proof we recall the meaning of the derivative Of (z) for x E W .
This is a linear operator on E ; it assigns to a vector u E E, the vector

a 4
1
D f ( z ) u = lim- ( f ( z
s
+ su) - f ( z ) ) , s E R,
which will exist if Of ( 2 ) is defined.
. .
In coordinates ( X I , . . .,x,) on E, let f ( z ) = (fi(z1, . ., z,,), . .,&,(XI, . . ., z.));
then Df (5) is represented by the n X n matrix of partial derivatives
(a/azj) ( f i ( x 1 , * - t xn)).
Conversely, if all the partial derivatives exist and are continuous, then f is C1. For
each 2 E W , there is defined the operator norm 11 D f ( z ) I J of the linear operator
Of (z) E L ( E ) (see Chapter 5 ) . If u E E, then
(1) I D f ( z ) uI 5 I1 Df(x)II I u I.
That f : W + E is C1implies that the map W + L ( E ) which sends z + Df(x) is a
continuous map (see, for example, the notes at end of this chapter).
164 8. FUNDAMENTAL THEORY

Proof of the lemma. Suppose that f : W + E is C1 and xo E W. Let b > 0 be


so small that the ball &,(ah) is contained in W , where
&(xo) = (zE WI I z - zo I 5 b l .
Denote by Wo this ball Bb(z0). Let K be an upper bound for 11 Df(z) 11 on Wo;this
exists because Dfis continuous and W ois compact. The set W Ois convex; that is, if
y, z E Wo,then the line segment going from y to z is in Wo. (In fact, any compact
convex neighborhood of xo would work here.) Let y and z be in Wo and put u = z -
+
y. Then y su E Wofor 0 I s <1. Let 4(s) = f(t, y +
su);thus 4: [0, 13 -iE
I
is the composition [0, 11-i Wo * E where the first map sends s into y su. By +
the chain rule
(2) 4'(8) = W(!/su>u.+
Therefore
f(z> -f(Y> = 4(1) - 4(0)

Hence, by (1),

This proves the lemma.

The following remark is implicit in the proof of the lemma:


If W ois convex, and if 11 Df(z)11 5 K for all x E WO, then K is a Lipschitz con-
stant for f I Wo.
We proceed now to the proof of the existence part of Theorem 1 of Section 2.
Let xo E W and W obe aa in the proof of the previous lemma. Suppose J is an open
interval containing zero and 2:J --.) W satisfies
(3) x'(0 =f (m1
and x(0) = zo.Then by integration we have

(4) 40 = ah + j ' f ( z ( s ) ) a%.


0

Conversely, if x: J -+ W satisfies (4), then s(0) = zoand x satisfies (3) aa is seen


by differentiation.
Thus (4) is equivalent to (3) aa an equation for x: J + W.
By our choice of Wo,we have a Lipschitr constant K for f on Wo.Fur-
thermore, If(z)l is bounded on Wo, say, by the constant M. Let a > 0 satisfy
$3. EXISTENCE AND UNIQUENESS 165

a < min(b/M, l/K], and define J = [-a, a]. Recall that b is the radius of the
ball Wo.We shall define a sequence of functions uo, u1, . . . from J to Wo.We shall
prove they converge uniformly to a function satisfying (4), and later that there
are no other solutions of (4). The lemma that is used to obtain the convergence
of the uk: J + Wois the following:

Lemma from analysis Suppose uk: J + E, k = 0, 1, 2, . . . is a sequence of


continuous functionsfrom a closed interval J to a n m d vector space E which satisfy:
Given e > 0, there is some N > 0 such that f o r every p , q > N
max J u p ( t )- u p ( t ) J< t.
LEJ

Then there is a continuous function u: J 4 E such that


max I uk(t) - u(t)I - 0 as k + m .
1EJ

This is called uniform convergence of the functions Uk. This lemma is proved in
elementary analysis books and will not be proved here.
The sequence of functions U k is defined as follows:
Let
% ( t ) = ZO.
Let

Ul(t) = Zo + ~ f f ds.o )
Assuming that uk(t) has been defined and that
I uk(t) - z0 I 5 b for all t E J ,
let

uk+l(t) = ZO + l f ( u k ( s ) )ds.
This makes sense since u k ( s ) E WOso the integrand is defined. We show that
I uk+l(t) - Zo I 2 b Or uk+l(t) E WCI for t E J ;
this will imply that the sequence can be continued to uk+2,Uk+& and so on.
We have

I uk+l(t) - xo I 5 1'0
If(uk(s))lds
166 8. FUNDAMENTAL THEORY

Next, we prove that there is a constant L 1 c such that for all k 2 0:


I uk+l(t) - uk(t)I I (KalkL.
Put L = max{l a ( t ) - uo(t)l: I t I 5 a ) . We have

c ffhL
0

5
k-N

<_t

for any prescribed c > 0 provided N is large enough.


By the lemma from analysis, this shows that the sequence of functions ~ 0 U, I , . . .
converges uniformly to a continuous function x: J + E. From the identity

uk+l(t) = ZO

we find by taking limits of both sides that


+ c f(Uk(s))h,

Z(t) = + lim
k-m
l I f ( u k ( 8 ) ) d.8
a

-
-” + [[?imf ( u k(41ds
k-oo
63. EXISTENCE AND UNIQUENESS 167

(by uniform convergence)

(by continuity off) .


Therefore z:J 4 W Osatisfies (4) and hence is a solution of (3). In particular,
2: J + Wo is C '.
This takes care of the existence part of Theorem 1 (of Section 1) and we now
prove the uniqueness part.
Let z, y: J + W be two solutions of (1) satisfying z(0) = y(0) = zo, where
we may suppose that J is the closed interval [ - a , a ] . We will show that z ( t ) =
y ( t ) for all t E J. Let Q = maxr,J I z ( t ) - y ( t ) l . This maximum is attained at
some point tl E J. Then

Since aK < 1, this is impossible unless Q = 0. Thus

z(t) = y(t).

Another proof of uniqueness follows from the lemma of the next section.
We have proved Theorem 1 of Section 2. Note that in the course of the proof
the following waa shown: Given any ball WoC W of radius b about 5, with
max,,r., I f(z) I I M, where f on Wo has Lipschitz constant K and 0 < a <
min(b / M , l/K}, then there is a unique solution z: ( - a , a ) 4 W of (3) such that
z(0) = 20.
Some remarks are in order.
Consider the situation in Theorem 1 with a C*map f: W + E, W open in E.
Two solution curues of x' = f(z) cannot cross. This is an immediate consequence of
uniqueness but is worth emphasizing geometrically. Suppose cp: J + W , $: J1+ W
are two solutions of z' = f(z) such that cp(t1) = $(Q. Then cp(t1) is not a crossing
+
because if we let $tl(t) = $(t, - tl t ) , then #1 is also a solution. Since $ l ( t l ) =
$ ( 4 ) = cp(tl), it follows that rll and p agree near tl by the uniqueness statement of
168 8. FUNDAMENTAL THEORY

Theorem 1. Thus the situation of Fig. A is prevented. Similarly, a solution curve


cannot cross itself aa in Fig. B.

FIG. A FIG. B

If, in fact, a solution curve q : J + W of x’ = f(z) satisfiea q ( t l ) = q ( t l + w)


for some C and w > 0, then that solution curve must close up aa in Fig. C.

FIG. C

Let us see how 1 e “iterat in scheme” used in the proof in this section applies
to a very simple differential equation. Consider W = R and f(z) = x, and search
for a solution of z‘ = x in R (we know already that the solution s ( t ) satisfying
~(0= ) zois given by z ( t ) = zoe‘).
Set
uo(0 = zo,
#4. CONTINUITY OF SOLUTIONEI IN INITIAL CONDITIONS 169

and so

As k + m , Uk ( 1 ) converges to

which is, of course, the solution of our original equation.

#. Continuity of Solutions in Initial Conditions

For Theorem 1 of Section 2 to be at all interesting in any physical sense (or even
mathematically) it needs to be complemented by the property that the solution
z ( t ) depends continuously on the initial condition x ( 0 ) . The next theorem gives a
precise statement of this property.

Theorem Let W C E be open and suppose f: W + E has Lipschitz constant K .


Let y ( t ) , z ( t ) be solutions to

on the closed interval [to, t l ] . Then, for all t E [to, tl]:

The proof depends on a useful inequality (Gronwall's) which we prove first.

Lemma Let u : [0, a ] + R be con.tinuousand nonnegative. Suppose C 2 0, K 2 0


are such that

u(t) 5 c +/ k S ) ds
0

for all t E [0, a ] . Then

for all t E [O, a ] .


170 8. FUNDAMENTAL THEORY

Proof. First, suppose C > 0, let


u(t) = c + / L x ~ ( 8d8) > 0;
0

then
u(t) I V(0.
By differentiation of U we find
U'(t) = K u ( t ) ;
hence

Hence

so
log U ( t ) 5 log U ( 0 ) + Kt
by integration. Since cf(0) = C,we have by exponentiation

and so

If C = 0, then apply the above argument for a sequence of positive C i that tend
to 0 as i 0 0 . This proves the lemma.

We turn to the proof of the theorem.


Define
v(t> = I Y ( 0 - 4t)I.
Since

we have

v(t> I +
~ ( 4 ) /'Kv(8)
ro
d8.

Now apply the lemma to the function u ( t ) = u(t0 + t ) to get


~ ( tI
) to) exp(K(t - to)),
which is just the conclusion of the theorem.
$5. ON EXTENDING SOLUTIONS 171

95. On Extending Solutions

Lemma Let a C1 map f : W 3 E be given. Suppose two solutions u ( t ), v ( t ) of


x’ = f(x) are defined on the same open interval J containing t, and satisfy u ( & )=
v ( & ) . Then u ( t ) = v ( t ) for all t E J .

We know from Theorem 1 of Section 2 that u ( t ) = v ( t ) in some open interval


around 6. The union of all such open intervals is the largest open interval J* in
J around & on which u = v. But J* must equal J . For, if not, J* has an end point
tl E J ; we suppose tl is the right-hand end point, the other case being similar. By
continuity, u ( t l ) = v(t1). But, by Theorem 1 of Section 2, u = v in some J’, an
interval around tl. Then u = v in J* U J’ which is larger than J*. This contradiction
proves the lemma.

There is no guarantee that a solution x ( t ) to a differential equation can be de-


fined for all t. For example, the equation in R,
x’ = 1 + 22,
has as solutions the functions
x = tan(t - c), c = constant.
Such a function cannot be extended over an interval larger than
7r Tr
c-- <t <c+-
2 2
since x ( t ) + f00 as t + c f ~ / 2 .
Now consider a general equation ( 1 ) x’ = f ( x) where the C1function f is defined
on an open set W C E. For each xo E W there is a maximum open interval (a,8)
coniaining 0 on which there i s a solution x ( t ) with x(0) = xo. There is some such
a)
interval by Theorem 1 of Section 2; let (a, be the union of all open intervals
containing 0 on which there is a solution with x(0) = xo. (Possibly, a = - CQ or
+
8 = 00 , or both.) By the lemma the solutions on any two intervals in the union
agree on the intersections of the two intervals. Hence there is a solution on all of
a).
Next, we investigate what happens to a solution as the limits of its domain are
approached. We state the result only for the right-hand limit; the other case is
similar.

Theorem Let W C E be open, let f : W +E be a C1 m a p . Let y ( t ) be a solution


on a maximal open interval J = (a,8) C R with < 00. Then given any compact
set K C W , there i s s m t E (a,8) with y ( t ) 4 K.
172 8. FUNDAMENTAL THEORY

This theorem says that if a solution y ( t ) cannot be extended to a larger interval,


then it leavea any compact set. Thie implies that 08 t 8 either y ( t ) tends to the
boun&ary of W or I y ( t ) l tends to QO (or both).
Proof of the theorem. Suppose y ( t ) E K for all t E (a,8). Since f is continu-
ous, there exists M > 0 such that I f(z)l 5 M if z E K.
Let y E (a,6). Now we prove that y extends to a continuous map [y, 81 + E.
By a lemma from analysis it suffices to prove y: J -+ E uniformly continuous. For
b < tl in J we have

Now the extended curve y: [a,81 -+ E is differentiable a t 8. For

hence

for all t between y and 8. Hence a/ is diffemntiableat 8, and in fact @'(I!?) = f(y(8)).
Therefore y is a solution on [y, 81. Since there is a solution on an interval u, a),
8 > 8, we can extend y to the interval (a,a). Hence (a,8) could not be a maximal
domain of a solution. This completea the proof of the theorem.

The following important fact follows immediately from the theorem.


$6. GLOBAL SOLUTIONS 173

Proof. Let [O, /3) be the maximal half-open interval on which there is a solution
y as above. Then y([O, a)) C A , and so /3 cannot be finite by the theorem.

$6. Global Solutions

We give here a stronger theorem on the continuity of solutions in terms of initial


conditions.
In the theorem of Section 4 we assumed that both solutions were defined on the
same interval. In the next theorem it is not necessary t o assume this. The theorem
shows that solutions starting at nearby points will be defined on the same closed
interval and remain near to each other in this interval.

Theorem Let f(z) be C'. Let y ( t ) be a solution to x' = f(z) defined on the closed
interval [to, t J , with y(to> = YO. There is a neighborhood U C E of yo and a constant
K such that if zo E U , then there is a unique solution z ( t ) also deJined on [to, t l ) with
z ( 4 ) = zo; and z satisjies
I ~ ( t -) z ( t ) l 5 K I YO - zo I exp(K(t - 4 ) )
fm all t E [to, t13.

For the proof we will use the following lemma.

Lemma If f : W E is locally Lipschitz and A C W is a compact (closed and


bounded) set, then f I A is Lipschitz.
Proof. Suppose not. Then for every K > 0, no matter how large, we can find
xand y i n A with
If(d -f(Y)I >K Ix - Y I.
In particular, we can find z n , yn such that
(1) for n = 1, 2, . . .
I f(zn) - f ( y n ) l 2 I x n - yn I -
Since A is compact, we can choose convergent subsequences of the x n and yn.
Relabeling, we may assume xn 4x* and yn + y* with x* and y* in A . We observe
that z* = y*, since we have, for all n,
I Z* - y* I = lim I x n - Y n I I n-l I f(zn) - f(yn)l I +2M,
n-m'

where M is the maximum value of f on A . There is a neighborhood W oof z* for


which f I Wohas a Lipschitz constant K in x. There is an no such that x n E Wo if
n 1 no. Therefore, for n 1 no:
lf(xn> -f(Yn)l I K I zn - yn I,
which contradicts (1) for n > K. This proves the lemma.
174 8. FUNDAMENTAL THEORY

The proof of the theorem now goea 88 follows.


By compactnerur of [to, C], there h t a t > 0 such that x E W if I x - g (t ) I 5 c.
The set of all such pointa is a compact subset A of W. The 0 map f is
locally Lipachita (Section 3). By the lemma, it follows that f I A haa a Lipschitz
constant k.
Let 6 > 0 be so small that 8 5 t and 8 exp(k I t 1 - to I) 5 t. We asaert that if
1 % - go I < 8, then there is a unique solution through 20 defined on all of [h, t l ] .
First of all, 20 € W dnce I 20 - g(to) I < t, so there is a solution z ( t ) thrmgh zo on
a maximal interval [to, 8). We prove 0 > 4. For suppose B I 11. Then by the ex-
ponential estimate in Section 4, for all t € [to, B), we have

IzW - ar(t>l5 I 2O - Yo I expw I t - to I)


5 8 exp(k I t - to I)
-
< t.
Thus z ( t ) lies in the compact set A ; by the theorem of Section 5, [to, 0) could not
be a mcrzimal solution domain. Therefore z ( t ) is defined on [to, t13. The exponential
estimate follows from Section 4, and the uniqueneea from the lemma of Section 5.
We interpret the theorem in another way. Givenf(x) a~in the theorem and a
solution g ( t ) defined on [to, t l ] , we see that for all zo sufficiently close to yo = y(h) ,
there ia a unique solution on [to, t 1 ] starting at 20 a t time zero. Let us note this
solution by t + u(t, 20) ; thus u(0, 20) = 20, and u(t, go) = g ( t ) .
Then the theorem implies:
lim u(tJ&) = u ( t ,go),
.o%

uniformly on [to, 111.In other words, the 8 0 ~ t h mthrough 20 dependa continuozcely


on 2O.

47. The Flow of a Differential Equation

In this section we consider an equation


(1) xt = f(x)
defined by a 0 function f: W -B E , W C E open.
For each y €\Wthere is a unique solution +(t) with 4(0) = y defined on a maxi-
mal open interval J ( y ) C R. To indicate the dependence of + ( t ) on I/, we write

+(t) = N,Y).
+(O, v) = Y.
$7. THE FLOW OF A DIFFERENTIAL EQUATION 175

Let 62 CR X W be the following set:

hl = { ( t , y) E R X W I t E J(y) I.
The map ( t , y) + + ( t , y) is then a function
+: n-+ W .
+
We call the flow of equation (1).
We shall often write
+ ( t , z) = +t(z>.

Example. Let f(z) = Az, A E L ( E ) .Then h ( z ) = erAz.

Theorem 1 The map + has the following property:


(2) +,+t(z) = +r(+t(z))

in the sense that if one side of ( 2 ) is defined, so is the other, and they are equal.
Proof. First, suppose s and t are positive and +, (ot ) is defined. This means
(2)
t E J ( z ) and s E J ( & ( z ) ) . Suppose J ( z ) = (a,a). Then a < t < a; we shall
+
show /3 > s t. Define
y: (a,s t ] + + w

i
by
+(r, x) if a < r < t ;
y(r>
+
=
+ ( r - t, + t ( z ) ) if t Ir It s.
Then y is a solution and y(0) = z.Hence s + t E J ( z ) . Moreover,
+,+t(z) = Y(S + t ) = +,(+t(x)).
The rest of the proof of Theorem 1 uses the same ideas and is left to the reader.

Theorem 2 n is an open set in R X W and +: 62 + W is a continuous map.


Proof. To prove 62 open, let (to, a) E 0. We suppose & 2 0, the other case
being similar. Then the solution curve t + ( t , zo) is defined on [0, to], and hence
+
on an interval [-e, to e l , e > 0. By the theorem of Section 6, there is a neighbor-
hood U C W of zo such that the solution t + + ( t , z) ia defined m [-€, & e] +
+
for all z in U . Thus ( - 2 , & B ) X U C 0, which proves 0 open.
To prove +: + W continuous a t (to, zo), let U and e be as above. We may siip-
pose that U has compact closure d C W. Since f is locally Lipschitz and the set
A = +([-e, +
to a] X I?) is compact, there is a Lipschitz constant K for f A. I
Let M = max( I f(z) I : z E A } . Let 6 > 0 satisfy 6 < e, and if I z1 - zo[ < 6 , then
21 E U . Suppose

Itl--&I<8, Iz1--oI<6.
176 8. FUNDAMENTAL THEORY

Then
I 4 ( t l , 21) - 4(b, 20) I 5 I 4 ( t l , 21) - 4 ( t l , 20) I + I 4(tl, zo) - @(to, I.
5)

The second term on the right goes to 0 with 6 because the solution through zo is
continuous (even differentiable) in t. The first term on the right, by the estimate
in Section 6, is bounded by 6eRd which also goes to 0 with 6. This proves Theorem 2.

In Chapter 16 we shall show that in fact 4 is C1.


Now suppose ( t , zo) E 0;then 20 has a neighborhood U C W with t X U C 0,
since we know 0 is open in R X W. The function z -+ gl(s)defines a map

Theorem 3 The m a p sends U onto an open set V and i s defined on V and


sends V onto U.The composition is the identity m a p of U ;the composition
i s the identity map of V .
Proof. If y = g 1 ( z ) ,then t E J ( s ) . It is easy to see that then - t E J ( y ) ,
for the function
8 4dY)
---f

is a solution on [-t, 0) sending 0 to y . Thus is defined on ( U ) = V ;the state-


ment about compositions is obvious. It remains to prove V is open. Let V* 3 V
be the maximal subset of W on which is defined. V* is open because D is open,
and V* -+ W is continuous because 4 is continuous. Therefore the inverse
image of the open set U under is open. But this inverse image is exactly V .

We summarize the results of this section:


Corresponding to the autonomous equation s’ = f(z), with locally Lipschitz
f: W -+ E, there is a map 4: 0 -+ W where ( t , s) E 0 if and only if there is a solu-
tion on [0, t ] (or [ t , 01 if t < 0) sending 0 to 2. The set D is open. 4 is defined by
letting t -+ t#tf(z) = # ( t , z) be the maximal solution curve taking 0 to s. There
is an open set U l C W on which the map 4 f :U l + W is defined. The maps satisfy
M t ( z ) = 4,+,(z) as in Theorem 1. Each map Qt is a homeomorphism; that is,
is one-to-one and has a continuous inverse; the inverse is +-t.
If
f(z) = As, A E UE),
then
4((s)= e%.
In this case fi = R X E and each dl is defined in all of E.
67. THE FLOW OF A DIFFERENTIAL EQUATION 177

PROBLEMS

1. Write out the first few terms of the Picard iteration scheme (Section 3) for
each of the following initial value problems. Where possible, use any method
to find explicit solutions. Discuss the domain of the solution.
+
(a) 2’ = x 2; z(0) = 2.
(b) Z’ = *la; ~ ( 0 =) 0.
(c) 2’ = */a; x ( 0 ) = 1.
(d) z’ = sin z;z(0) = 0.
(e) x’ = 1/22;x(1) = 1.
2. Let A be an n X n matrix. Show that the Picard method for solving x‘ = Ax,
z(0) = u gives the solution e%
3. Derive the Taylor series for sin t by applying the Pinard method to the first
order system corresponding to the second order initial value problem
2’’ = -x; x ( 0 ) = 0, z’(0) = 1.

4. For each of the following functions, find a Lipschitz constant on the region
indicated, or prove there is none:
(a) f ( x ) = 1 ~ 1 , - - m < x < - m .
(b) f(z) = x“*, -1 5 x 5 1.
(c) f(z) = 1/z, 1 5 5 5 -m.
+
( 4 f(s, Y> = ( 5 2Y, -Y>, (5, Y) E R2.

5 . Consider the differential equation

(a) There are infinitely many solutions satisfying z(0) = 0 on every in-
terval [0, @].
(b) For what values of a are there infinitely many solutions on [0, a] satisfy-
ing x(0) = -l?
.
6. Let f : E + E be continuous; suppose f(z) 5 M . For each n = 1, 2, . . , let
x,,:[0, 11+ E be a solution to z’ = f(z). If z,,(O) converges, show that a
subsequence of { xn) converges uniformly to a solution. (Hint: Look up Ascoli’s
theorem in a book on analysis.)
7. Use Problem 6 to show that continuity of solutions in initial conditions follows
from uniqueness and existence of solutions.
178 8. FUNDAMENTAL THEORY

8. Prove the followinggeneral fact (see also Section 4) :if C 2 0 and u,u : [0,8] --$

R are continuous and nonnegative, and

u(t> I c + / '0u ( s ) v ( s ) d~ for all t E LO, a],


then

u(t> 5 CeV(t), ~ ( t =
) /'u(s) d ~ .
0

9. Define f: R +R by
f(x) = 1 if x 5 1; f(z) = 2 if x > 1.
There is no solution to z' = f(z) on any open interval around t = 1.
10. Let g: R +R be Lipschitr and f: R +R continuous. Show that the system
' = !7(3),
2

I/' = f(z)I/,

has at most one solution on any interval, for a given initial value. (Hint:
Use
Gronwall's inequdity.)

Notea

Our treatment of calculus tends to be from the modern point of view. The deriva-
tive is viewed aa a linear transformation.
Suppose that U is an open set of a vector spwe E and that g: U --+ F is some map,
F a second vector space. What is +hederivative of g at xo E Uo?We say that this
derivative exists and is denoted by Dg(xo) E L ( E , F) if

lim I dzo + 4 - g ( 5 ) - Dg(xoo)uI = 0.


lul-4 lul
UCE
UrO

Then, if, for each x E U , the derivative D g ( x ) exista, this derivative defines a
map

U + L ( E , F), z+Dg(z).
If this map is continuous, then g is said to be C1. If this map is C1itself, then g is
said to be 0.
Now suppose F, G,H are three vector spaces and u, u are open sets of F, G, re-
NOTES 179

spectively. Consider (7 maps f, g,


ULVLH.
The chain rule of calculus can be stated as: the derivative of the composition is the
composition of the derivatives. In other words, if x E U,then
W f ) (5) = W f ( ~ >.Df(X)
) E w,HI.
Consider the case where F = R and U is an interval; writing t E U,f’(t) = Df(t),
the chain rule reads
(sf>’(0 = Dg(f(t)1( f ( t >1*
In case H also equals R,the formula becomes
(sf>’(t)= (graddf(t)),f’(t)).
For more details on this and a further development of calculus along these lines,
see S. Lang’s Second Course in Calculus [12]. S. Lang’s Analysis I [ll] also covers
these questions as well as the lemma from analysis used in Section 3 and the uni-
form continuity statement used in the proof of the theorem of Section 5.
Chapter 9
Stability of Equilibria

In this chapter we introduce the important idea of stability of on equilibrium


point of a dynamical system. In later chapters other kinds of stability will be dis-
cussed, such as stability of periodic solutions and structural stability.
An equilibrium 2 is stable if all nearby solutions stay nearby. It is asymptotically
stable if all nearby solutions not only stay nearby, but also tend to 2. Of course,
precise definitions are required; these are given in Section 2. In Section 1 a special
kind of asymptotically stable equilibrium is studied first: the sink. This is charac-
terized by exponential approach to 2 of all nearby solutions. In Chapter 7 the
special case of linear sinks was considered. Sinks are useful because they can be
detected by the eigenvalues of the linear part of the system (that is, the derivative
of the vector field at 2 ) .
In Section 3 the famous stability theorems of Liapunov are proved. This section
also contains a rather refined theorem (Theorem 2) which is not mential for the
rest of the book, except in Chapter 10.
Sections 4 and 5 treat the important special case of gradient flows. These have
special properties that make their analysis fairly simple; moreover, they are of
frequent occurrence.

$1. Nonlinear Sinks

Consider a differential equation


(1) x’ = f(x) ; f: W + Rn; W C Rnopen.
We supposef is C1.A point 3 E W is called an equilibrium point of (1) if f(2) = 0.
Clearly, the constant function x ( t ) = f is a solution of (1). By uniqueness of
solutions, no other solution curve can pass through 2. If W is the state space of
$1. NONLINEAR SINKS 181

some physical (or biological, economic, or the like) system described by (1) then
f is an "equilibrium state": if the system is at f it always will be (and always
was) at 2.
Let 4: + W be the flow associated with (1) ; 61 C R X W is an open set, and
for each x E W the map t --+ 4 ( t l x) = &(x) is the solution passing through x when
t = 0; it is defined for t in some open interval. If 2 is an equilibrium, then & ( 2 ) = 2
for all t E R. For this reason, 3 is also called a stationary point, or fixed point, of
the flow. Another name for f is a zero or singular point of the vector field f.
Suppose f is linear: W = Rnand f (2) = Ax where A is a linear operator on R".
Then the origin 0 E Rn is an equilibrium of (1). In Chapter 7 we saw that when
A < 0 is greater than the real parts of the eigenvalues of A , then solutions &(x)
approach 0 exponentially :
I cpf(2)l I CeAt
for some C > 0.
Now suppose f is a C1 vector field (not necessarily linear) with equilibrium
point 0 E R".We think of the derivative Df(0) = A off at 0 as a linear vector
field which approximates f near 0. We call it the linear part off at 0. If all eigen-
values of Df(0) have negative real parts, we call 0 a sink. More generally, an
equilibrium f of (1) is a sink if all eigenvalues of Df(f) have negative real parts.
The following theorem says that a nonlinear sink f behaves locally like a linear
sink : nearby solutions approach 2 exponentially.

Theorem Let 5 E W be a sink of equation (1). Suppose every eigenvalw of Of (3)


has real part less than -c, c > 0. Then there is a neighborhood U C W of f such that
(a) 41(x) is dejEned and in U for all x E U,t > 0.
(b) There is a Euclidean norm on Rnsuch that
I 41(x)- z I 5 e-tc Ix - zI
for aU x E U , t 1 0.
( c ) For any norm on R",there is a constant B > 0 such that
I 4t(x)- Z I 5 Be-td I x - f I
for all x E U , t 2 0.
I n particular, &(x) .--) 2 aa t +m for all x E U.
Proof. For convenience we assume 2 = 0. (If not, give R" new coordinates
9 = x - 2; in y-coordinates f has an equilibrium at 0 ; etc.)
Put A = Df(0). Choose b > 0 so that the real parts of eigenvalues of A are
leas than - b < -c. The lemma in Chapter 7, Section 1 shows that Rnhas a basis
a whose corresponding norm and inner product satisfy
( A X ,X ) 5 - b I x I*
for all x E Rn.
182 9. STABILITY OF EQUILIBRIA

Since A = Df(0)andf(0) = 0, by the definition of derivative,

Therefore by Cauchy's inequality,

It follows that there exists 6 > 0 so small that if I 5 I 2 6, then z E W and


U ( Z > , 2) 2 --c I 12.

Put U = { z E R" I I 5 I5 6). Let z ( t ) , 0 It Ilo, be a solution curve in U ,


z ( t ) # 0. Then
d 1
- Iz I = - (d,2 ) .
dt 1x1
Hence, since 5' = f(z) :

This shows, first, that I z ( t ) 1 is decreasing; hence I z ( t ) I E U for all t E [O, to].
Since U is compact, it follows from Section 5, Chapter 8 that the trajectory s ( t )
is defined and in U for all t 2 0. Secondly, (2) implies that
I4t) I 5 fTrc1 4 0 ) I
for all t 2 0. Thus (a) and (b) are proved and ( c ) follows from equivalence of
norms.
The phase portrait, at a nonlinear sink Z looks like that of the linear part of the
vector field: in a suit'able norm the trajectories point inside all sufficiently small
spheres about 5 (Fig. A).

FIG. A. Nonlinear sink.


f
$1. NONLINEAR SINKS 183

Remember that the spheres are not necessarily “round” spheres; they are spheres
in a special norm. In standard coordinates they may be ellipsoids.
A simple physical example of a nonlinear sink is given by a pendulum moving in
a vertical plane (Fig. B) . We assume a constant downward gravitational force
equal to the mass m of the bob; we neglect the mass of the rod supporting the
bob. We assume there is a frictional (or viscous) force resisting the motion, pro-
portional to the speed of the bob.
Let 1 be the (constant) length of the rod. The bob of the pendulum moves along
a circle of radius 1. If 0 ( t ) is the counterclockwise angle from the vertical to the
rod at time t, then the angular velocity of the bob is &/dt and the velocity is
1 do/&. Therefore the frictional force is -kl d0/dt, k a nonnegative constant; this
force is tangent to the circle.
The downward gravitational force m has component -m sin e ( t ) tangent to the
circle; this is the force on the bob that produces motion. Therefore the total force
tangent to the circle at time t is

The acceleration of the bob tangent to the circle is


d20
a = 1- *
dt* ’
hence, from Newton’s law a = F/m, we have

or

Introducing a new variable


= e’

FIG. B. Pendulum.
184 9. STABILITY OF EQUILIBRIA

(interpreted as angular velocity), we obtain the equivalent first order system


(3) ef = w,
k
w’ = - -11 sin e - m
- W.

This nonlinear, aut,onomousequation in R*has equilibria at the points


(e, @) = (nT, 0); ?Z = 0,+I, +2,. ...
We concentrate on the equilibrium (0, 0).
The vector field defining (3) is

Its derivative at (e, W) is


0 1’
w e , w) = 1 k
- -1C O S B - m
-
Hence

The real part -k/2m is negative as long aa the coefficient of friction k is positive
and the mass is positive. Therefore the equilibrium 8 = w = 0 is a sink. We con-
clude: for all sufficiently small initial angles and velocities, the pendulum tends
toward the equilibrium position (0, 0).
This, of course, is not surprising. In fact, from experience it seems obvious that
from any initial position and velocity the pendulum will tend toward the down-
ward equilibrium state, except for a few starting states which tend toward the
vertically balanced position. To verify this physical conclusion mathematically
takes more work, however. We return to this question in Section 3.
Before leaving the pendulum we point out a paradox: the pendulum cannot come
lo rest. That is, once it is in motion-not in equilibrium-it cannot reach an equi-
librium state, but only approach one arbitrarily closely. This follows from unique-
ness of solutions of differential equations! Of course, one knows that pendulums
actually do come to rest. One can argue that the pendulum is not “really” at rest,
92. STABILITY 185

but its motion is too small to observe. A better explanation is that the mathematical
model (3) of its motion is only an approximation to reality.

PROBLEMS

1. (a) State and prove a converse to the theorem of Section 1.


(b) Define “sources” for nonlinear vector fields and prove an interesting
theorem about them.
2. Show by example that if f is a nonlinear C1vector field and f(0) = 0, it is
possible that lim+ x ( t ) = 0 for all solutions to x’ = f(z) , without the eigen-
values of Df(0)having negative real parts.
3. Assume f is a C1vector field on R” and f(0)= 0. Suppose some eigenvalue of
Df(0)has positive real part. Show that in every neighborhood of 0 there is a
solution x ( t ) for which I x ( t ) I is increasing on some interval [0, to], to > 0.
4. If 2 is a sink of a dynamical system, it has a neighborhood containing no other
equilibrium.

42. Stability

The study of equilibria plays a central role in ordinary differential equations


and their applications. An equilibrium point, however, must satisfy a certain
stability criterion in order to be very significant physically. (Here, as in several
other places in this book, we use the word physical in a broad sense; thus, in some
contexts, physical could be replaced by biological, chemical, or even ecological.)
The notion of stability most often considered is that usually attributed to
Liapunov. An equilibrium is stable if nearby solutions stay nearby for all future
time. Since in applications of dynamical systems one cannot pinpoint a state
exactly, but only approximately, an equilibrium must be stable to be physically
meaningful.
The mathematical definition is :

Definition 1 Suppose 3 C W is an equilibrium of the differential equation


(1) 2‘ = f(z),
where f: W + E is a C1map from an open set W of the vector space E into E .
Then 9 is a stable equilibrium if for every neighborhood U of 2 in W there is a
neighborhood U1of 2 in U such that every solution x ( t ) with x(0) in UI is defined
and in U for all t > 0. (See Fig. A.)
186 9. STABILITY OF EQUILIBRIA

FIG. A. Stability.

Definition 2 If U1can be chosen so that in addition to the properties described


in Definition 1, limt+- z ( t ) = 5 , then f ie asymptotically stable. (See Fig. B.)

I
U

u'
I FIG. B. Asymptotic stability.

Definition 3 An equilibrium 32 that is not stable is called urntable. This means


there is a neighborhood U of 13 such that for every neighborhood U , of f in U,there
is at leaat one solution z(1) starting a t z(0) E U1, which does not lie entirely in U .
(See Fig. C.)
f
U
I

FIG. C. Instability.

A sink is asymptotically stable and therefore stable. An example of an equi-


b r i m that is stable but not asymptotically stable ie the origin in Rafor a linear
#2. STABILITY 187

equation
(2) X‘ = AX,
where A has pure imaginary eigenvalues. The orbits are all ellipses (Fig. D) .

FIG. D. Stable, but not asymptotically stable.

The importance of this example in application is limited (despite the famed


harmonic oscillator) because the slightest nonlinear perturbation will destroy its
character. Even a small linear perturbation can make it into a sink or a source
since “hyperbolicity” is a generic property for linear flows (see Chapter 7 ) .
A source is an example of an unstable equilibrium.
To complement the main theorem of Section 2 we have the following instability
theorem. The proof is not essential to the rest of the book.

Theorem Let W C E be open and f : W + E continuously diflerentiable. Suppose


f (5)
= 0 and 2 i s a stable equilibrium point of the eqwltion

5’ = f(x).
Then no eigenvalue of Df( 2 ) has positive real part.

We say that an equilibrium 2 is hyperbolic if the derivative Df(2)has no eigen-


value with real part zero.

Corollary A hyperbolic equilibrium point is either unstable OT asymptotically stable.


Proof of the theorem. Suppose some eigenvalue has positive real part; we
shall prove 2 is not stable. We may assume 2 = 0, replacing f (2) by f (z - 3)
otherwise. By the canonical form theorem (Chapter 2), E has a splitting El t~ EZ
invariant under Df(0), such that eigenvalues of A = Of (0)1 El all have positive
real part, while those of B = Of (0) IE2all have negative or 0 real part.
Let a > 0 be such that every eigenvalue of A has real part >a. Then there is
a Euclidean norm on El such that
(3) (Az, z) 2 a I t 12, all z E El.
188 9. STABILITY OF EQUILIBRIA

Similarly, for any b > 0 there exists a Euclidean norm on EZsuch that
(4) (BY,Y > < b I Y Iz, all Y E Ez.
We choose b so that
O<b<a.
We take the inner product on E = El CB E2 to be the direct sum of these inner
products on El and E2; we also use the norms associated to these inner products
on El, Ez, E. If z = (2,y) E El CB Ez, then I z I = (I z I* I y Ip)l'*. +
We shall use the Taylor expansion off around 0:
f(z, Y) = (Az + R ( z , Y> BY + S(z, Y)
> = (fi(z, Y) ,fz(z,
with
('J = '; (R(zJ y)! '('9 y>> = &(')'
Thus, given any c > 0, there exists 6 > 0 such that if U = &(O) (the ball of
radius 6 about 0) ,
(5) 1 & ( z > 1 Ic I z I for z E U.
We define the cone C = ((5,y) E BI CB Ez I I z I 1 I Y I).

FIG. E. The cone C is shaded.

This lemma yields our instability theorem aa follows. We interpret first condi-
tion (a). Let 8 : El X EZ + R be defined by g(z, y) = +(I z I* - I y 11). Then
g is C1, r1[0m,) = C,and r1(0) is the boundary of C.
92. STABILITY 189

Furthermore, if (2,Y) = z E U,then Dg(z) (f(z)) = D d z , Y) ( f ~ ( zY) , ,f2(z, Y)) =


(z,fl(2,y) ) - (y~,f2 (5,y) ) which will be positive if z E p1(0) by (a). This implies
that on a solution z ( t ) in U passing through the boundary of C, g is increasing
since by the chain rule, (dldt) ( g ( z ( t ) ) = D g ( z ( t ) ) f ( z ( t ) ) Therefore
. no solution
which sturta i n C can leave C before it leaves U. Figure E gives the idea.
Geometrically (b) implies that each vector f(z) at z E C points outward from
the sphere about 0 passing through z. See Fig. F.

FIG. F

Condition (b) has the following quantitative implication. If z = z ( t ) is a solution


curve in C n U , then
Id
cf(z), 2) = (z', 2) = - - I z 12,
2 dt
80 (b) implies
Id
- - I z 12 1 a 12 1s
2 dt
or

thus
190 9. BTABILITY OF EQUILIBRIA

Thus each nontrivial solution z(t) starting in C n U moves away from 0 at an


exponential rate aa long as it is defined and in C n U.
If g ( t ) is not defined for all t 2 0, then, by Chapter 8, Section 5, it must leave
the compact set C n U ;aa we have seen above, it must therefore leave U . On the
other hand, if y ( t ) is defined for all 1, it must also leave U since U is the ball of
radius 6 and d' I z(0) 1 > IS for large t. Therefore there are solutions starting arbi-
trarily close to 0 and leaving U . Thus (assuming the truth of the lemma), the
vector field f does not have 0 aa a point of stable equilibrium.
We now give the proof of the lemma. First, part (b) : if (x,y) = z E C fi U ,
c f ( z ) , z > = (Ax, 2 ) + Y> + (&(z>, Z>J

by (3), (4)J (5):


2 a 1 2 I2 - b I Y I* - E I z I*.
cf(z), z )
In C,I I 2 I Y I and I x I' 2 $(Iz I'
b/2 - B ) 1 z .1'
+
I Y 12> 2 3 I z 1.' Thus OC(z),
We choose E > 0 and then IS > 0 so that a = a / 2
2) 2 (42
- b / 2 - s > 0.
-
This proves (b) .
To check (a), note that the left-hand side of (a) is
'>
- ?/>+ (xJ R(x, >,
) - (YJ s(z, ?/I
but
I (',
R ( z , ?/>)- (Y, ' ( ' 9 Y)) I 5 I (J' &('I)1'
We may proceed just as in the previous part; finally, 6 > 0 is chosen so that a / 2 -
b/2 - 2c > 0. This yields the proposition.

In Chapter 7 we introduced hyperbolic linear flows. The nonlinear analogue is


a hyperbolic equilibrium point f of a dynamical system x' = f(z) ; and to repeat,
this means that the eigenvalues of Df(5)have nonzero real parts. If these real parts
are all negative, f is, of course, a sink; if they are all positive, f is called a source. If
both signs occur, f is a saddle point. From the preceding theorem we see that a
saddle point is unstable.
If f is an ssyinptotic equilibrium of a dynamical system, by definition there is
a neighborhood N of 2 such that any solution curve starting in N tends toward Z.
The union of all solution curves that tend toward x (as t 4 a ) is called the basin
of 2, denoted by B ( f ) .
It is clear that any solution curve which meets N is in B ( f ); and, conversely,
any solution curve in B ( f ) must meet N . It follows that B ( 5 ) is an open set; for,
by continuity of the flow, if the trajectory of x meets N, the trajectory of any
nearby point t h o meets N.
Notice that B ( f ) and B(Q) are disjoint if 2 and g are different asymptotically
stable equilibria. For if a trajectory tends toward f , it cannot also tend toward 8.
If a dynamical system represents a physical system, one can practically identify
the states in B ( 2 ) with 2. For every state in B ( 2 ) will, after a period of transition,
stay so close to Z aa to be indistinguishable from it. For some frequently occurring
$2. STABILITY 191

types of dynamical systems (the gradient systems of Section 4), almost every
state is in the basin of some sink; other states are “improbable” (they constitute
a set of measure 0). For such a system, the sinks represent the different types of
long term behavior.
I t is often a matter of practical importance to determine the basin of a sink 3.
For example, suppose Z represents some desired equilibrium state of a physical
system. The extent of the basin tells us how large a perturbation from equilibrium
we can allow and still be sure that the system will return to equilibrium.
We conclude this section by remarking that James Clerk Maxwell applied
stability theory to the study of the rings of the planet Saturn. He decided that
they must be composed of many small separate bodies, rather than being solid or
fluid, for only in the former case are there stable solutions of the equations of mo-
tion. He discovered that while solid or fluid rings were mathematically possible,
the slightest perturbation would destroy their configuration.

PROBLEMS

1. (a) Let Z be a stable equilibrium of a dynamical system corresponding to a


C*vector field on an open set W C E. Show that for every neighborhood
U of Z in W , there is a neighborhood U‘ of 5 in U such that every solution
curve x ( t ) with x(0) E U’ is defined and in U’ for all t > 0.
(b) If 3 is asymptotically stable, the neighborhood U’ in (a) can be chosen
to have the additional property that 1imt+- x ( t ) = Z if x ( 0 ) E U’.
(Hint: Consider the set of all points of U whose trajectories for t 2 0 enter
the set Ul in Definition 1 or 2.)

2. For which of the following linear operators A on Rn is 0 E Rn a stable equi-


librium of x’ = Ax?

[; 8 8 :]
(a) A 0 (b) 1 (c) 0 -1 0 0

J
=

1 0
0 1 0 -1 0
-1 0

3. Let A be a linear operator on Rn all of whose eigenvalues have real part 0.


Then 0 E Rn is a stable equilibrium of x’ = A x if and only if A is semisimple;
and 0 is never asymptotically stable.
192 9. STABILITY OF EQUILIBRIA

1
4. Show that the dynamkal system in R*, where equations in polar coordinates
are
+sin (l/r), r > 0,
8' = 1, r' =
0, r = 0,
has a stable equilibrium at the origin. (Hint: Every neighborhood of the
origin contains a solution curve encircling the origin.)
5. Let f : Rn Rn be C1 and suppose f(0) = 0.. If some eigenvalue of Df(0)has
positive real part, there is a nonzero solution z(t), - 00 < t I 0, to x' = f(x),
such t,hat limk-, z(t) = 0. (Hint:Use the instability theorem of Section 3 to
find a sequence of solutions z.(t) , tn t 5 0, in &(O) with I zn(0) I = 8 and
limn-.c zn(tn) = 0.)
6. Let g: Rn --.) Rn be C1 and suppose f(0) = 0. If some eigenvalue of D g ( 0 ) has
negative real part, there is a solution g ( t ) , 0 I t < 00 , to z' = g(z) , such that
limt+mg ( t ) = 0. (Hint: Compare previous problem.)

93. Liapunov Functions

In Section 2 we defined stability and aaymptotic Stability of an equilibrium 2


of a dynamicd system
(1) ' = f(z>J
2

where f: W + Rn is a C1map on an open set W C R". If $? is a sink, stability can


be detected by examining the eigenvaluea of the linear part Of(3).Other than that,
however, as yet we have no way of determining stability except by actually finding
all solutions to (1) , which may be difficult if not impossible.
The Russian mathematician and engineer A. M. Liapunov, in his 1892 doctoral
thesis, found a very useful criterion for stability. It is a generalization of the idea
that for a sink there is a norm on R" such that, I z ( t ) - 3 I decreasea for solutions
z ( t ) near f. Liapunov showed that certain other functions could be used instead
of the norm to guarantee stability.
Let V: U --.) R be a differentiable function defined in a neighborhood U C W
of f. We denote by V : U + R the function defined by
V ( z > = DV(z)(f(z)).
Here the right-hand side is Simply the operator D V ( z ) applied to the vector
f(z).Then if +l ( 5 ) is the solution to (1) passing through z when t = 0,
$3. LIAPUNOV FUNCTION8 193

by the chain rule. Consequently, if V ( x ) is negative, then V decreaaes along the


mlution of (1) through x.
We can now state Liapunov’s stability theorem:

Theorem 1 Let f E W be an equilibrium f o r (1). Let V : U + R be a continuow


junction defined on a neighborhood U C W of x, diflerentiable on U - 3, such Ihd
(a)V ( i ) = Oand V ( z ) > O i f x # 2;
(b) V I Oin U - 5.
Then i is stable. Furthennore, i f also
(c) 3 < Oin U - 2,
then i is aeyrnplotiually stable.

A function V satisfying (a) and (b) is called a Liapunovfumtion for 2. If (c)


also holds, we call V a strict Liapunov function. The only equilibrium is the origin
z=y=o.
We emphasize that Liapunov’s theorem can be applied without solving the
differential equation. On the other hand, there is no cut-and-dried method of
finding Liapunov functions; it is a matter of ingenuity and trial and error in each
cwe. Sometimes there are natural functions to try. In the case of mechanical or
electrical systems, energy is often a Liapunov function.
Example 1 Consider the dynamical system on R*described by the system of
differential equations
2’ = 2y(z - l ) ,

y’ = -2(2 - l),
z’ = -51.
The z-axis ( = I (x,y, z) I x = y = 0 ) ) consists entirely of equilibrium points.
Let us investigate the origin for stability.
The linear part of the system at (0, 0, 0) is the matrix

There are two imaginary eigenvalues and one zero eigenvalue. All we can conclude
from this is thatdthe origin is not a sink.
+
Let us look for a Liapunov function for (0, 0,O) of the form V ( z ,y, z) = a 9
by‘ f cz‘,with a, b, c > 0. For such a V,
+
v = 2(azz’ buy’ czz’) ; +
80
*V = 2crzy(z - 1) - bzy(z - 1) - a‘.
194 9. STABILITY OF EQUILIBRIA

We want V s 0 ; this can be accomplished by setting c = 1and 2a = b .We conclude


that 2 + 2y2 + z2 is a Liapunov function; therefore the origin is a stable equi-
librium. Moreover, the origin is asymptotically stable, since our Liapunov function
V is clearly strict, that is, it satisfies (c) of p. 193.

Example2 Consider a (constant) mass m moving under the influence of a


conservative force field -grad @ ( x ) defined by a potential function 9: Wo 4 R
on an open set W o C R*.(See Chapter 2.) The corresponding dynamical system
on the state space W = W o x Ra C Rs X Rais, for (3, u ) E Wo X Rs:

du
- = -grad @(z).
dt
Let ( 2 , ij) E Wo X Rabe an equilibrium point. Then ij = 0 and grad @(Z) = 0.
To investigate stability a t ( 2 , 0 ) , we try to use the total energy
E(X, u ) = 3m I U p + rn@'(X)
to construct a Liapunov function. Since a Liapunov function must vanish a t
( 5 , 0 ) , we subtract from E ( s , u ) the energy of the state (3,0 ) , which is @(f), and
define V : Wo X Ra + R by
V ( r ,u ) = E ( s ,u) - E(2,O)
= hm lu12 + rn@(.r) - m @ ( i ) ,
By conservation of energy, P = 0. Since tmvl 2 0, we assume @(z) > at$) for
x near f, s # 2 , in order to make V a Liapunov function. Therefore we have proved
the well-known theorem of Lagrange: an equilibrium ( 2 , 0 ) of a conservative force
field is stable i f the potential energy has a local absolute minimum at 2.

Proof of Liupunov's theorem. Let 6 > 0 be so small that the closed ball
Ba(Z) around of radius b lies entirely in U . Let a be the minimum value of V
on the boundary of B a ( f ) ,that is, on the sphere Sa(f) of radius 6 and center f.
Then a > 0 by (a). Let U 1= ( z E Ba(2) I V ( z ) < a).Then no solution starting
in U 1can meet & ( f ) since V is nonincreasing on solution curves. Hence every
solution starting in U , never leaves Ba(f). This proves f is stable. Now assume
(c) holds as well, so that V is strictly decreasing on orbits in U - f. Let x ( t ) be a
solution starting in U 1 - 2 and suppose s(tn)+ zo C &(f) for some sequence
tn + ; such a sequence exists by compactness of Ba(2). We assert zo = 2. T o see
this, observe that V ( s ( t ) )> V ( z o )for all t 2 0 since V ( z ( t ) )decreases and
V ( z ( t , ) ) + V ( z o )by continuity of V . If zo # 2 , let z ( t ) be the solution starting
at 20. For any s > 0, we have V ( z ( s ) ) < V ( % ) .Hence for any solution y(s) starting
$3. LIAPUNOV FUNCTIONS 1%

sufficiently near zo we have


V(Y(S))< V ( 4 ;
putting y(0) = s(tn)for sufficiently large n yields the contradiction
V(z(tn + 8)) < V(Z0).
Therefore zo = 5. This proves that 5 is the only possible limit point of the set
( z ( t ) I t 2 0 ) .This completes the proof of Liapunov's theorem.

FIG.A. Level surfaces of a Liapuriov function.

Figure A makes the theorem intuitively obvious. The condition V 5 0 means


that when a trajectory crosses a "level surface" V - ' ( C ) ,it moves inside the set
where V 5 c and can never come out again. Unfortunately, it is difficult to justify
the diagram; why should the sets V-'(c) shrink down to Z? Of course, in many
cases, Fig. A is indeed correct; for example, if V is a positive definite quadratic
form, such as 22 +
2y2. But what if the level surfaces look like Fig. B? It is hard
to imagine such a V that fulfills all the requirements of a Liapunov function; but
rather than trying to rule out that possibility, it is simpler to give the analytic
proof as above.
Liapunov functions not only detect stable equilibria; they can be used to esti-
mate the extent of the basin of an asymptotically stable equilibrium, as the follow-
ing theorem shows. In order to state it, we make two definitions. A set P is positively
invariant for a dynamical system if for each s in P , + t ( z ) is defined and in P for all
t 2 0 (where + denotes the flow of the system). An entire orbit of the system is a
196 9. STABILITY O F EQUILIBRIA

set of the form


ldJt(Z) It E Rl,
where &(z) is defined for all t E R.

Theorem 2 Let 3 E W be an equilibrium of the dynumical system (1) and let V :


U + R be a Liapunov function for 2, U a neighborhood of 2. Let P C U be a neighbor-
hood of 2 which is closed in W . Suppose that P is positively invariant, and that there
is no entire orbit in P - 2 on which V is constant. T h n 2 is asymptotically stable,
and P C B ( 2 ) .

Before proving Theorem 2 we apply it to the equilibrium 2 = (0, 0) of the


pendulum discussed in Section 1. For a Liapunov function we try the total energy
E, which we expect to decrease along trajectories because of friction. Now
E = kinetic energy + potential energy;
kinetic energy = 3mtP
= +m(U)*
= 3mPd.
For potential energy we take maas times height above the lowest point of the circle:
potential energy = m ( 1 - 1 COB 0).
Thus
E = +mPd+ ml(1 - cost))
= ml(trw' + 1 - case).
Then
E = mZ(L' + Bin e) ; el

using (3) of Section 1 this simplifies to


A' = -kPd.
Thus E 5 0 and E ( 0 , O ) = 0, so that E is indeed a Liapunov function.
To estimate the basin of (0, 0), fix a number c, 0 < c < 2m1, and define
P, = ((e, 0) I E(0, O ) 5 c and 18 I < T).
Clearly, (0, 0) E P,.We shall prove P, C B(0, 0).
P, is positively invariant. For suppose
(e(t>,4 1 , 0 5t5 a, a >0
is a trajectory with (O(O), ~ ( 0 ) )E P,. To see that (O(a), w ( a ) ) E P,,observe
that E ( B ( a ) ,@(a)) Ic since & 5 0. If I 8(a)I 1 r , there must exist a smallest
$3. LIAPUNOV FUNCTIONS 197

fa E [0, a ] such that B(i!,,) = f a . Then

E(O(i!,,),U(t0)) = E ( f * , @(to))
= +
mlC+14to>2 21
12ml.
But
4to)) I c
E(@(i!,,)), < 2ml.
This contradiction shows that B(a) < a , and so P , is positively invariant.
We assert that P, fulfills the second condition of Theorem 2. For suppose E is
constant on a trajectory. Then, along that trajectory, E = 0 and so w = 0. Hence,
from (3) of Section 1, 8’ = 0 so B is constant on the orbit and also sin 0 = 0. Since
I B I < T , it follows that 0 = 0. Thus the only entire orbit in P, on which E is con-
stant is the equilibrium orbit (0, 0).
Finally, P , is a closed set. For if (Bo, coo) is a limit point of P,, then I 0, I 5 a ,
and E(&, LOO) I c by continuity of E. But I Bo I = a implies E(B0,w0) > c. Hence
I e0 I < T and so (Bo, w 0 ) E P,.
From Theorem 2 we conclude that each P , C B ( 0 , 0) ; hence the set
P = U ( P c10 < c < 2ml)
is contained in B ( 0 , O ) .Note that
P = {(B, U) 1 E(B, < 2ml
U) and IB 1 <T).
This result is quite natural on physical grounds. For 2ml is the total energy of
the state ( a , 0 ) where the bob of the pendulum is balanced above the pivot. Thus
if the pendulum is not pointing straight up, and the total energy is less than the
total energy of the balanced upward state, then the pendulum will gradually
approach the state (0,O).
There will be other states in the basin of (0, 0) that are not in the set P . Con-
sider a state ( a , u ) , where u is very small but not zero. Then ( a , u ) 4 P , but the
pendulum moves immediately into a state in P , and therefore approaches (0, 0).
Hence ( T , u) E B ( 0 , 0). See Exercises 5 and 6 for other examples.

Proof of Theorem 2. Imagine a trajectory s(t), 0 I t < 00, in the positively


invariant set P . Suppose z ( t ) does not tend to 5 as t + 00. Then there must be a
point a # 5 in P and a sequence 2, + co such that
lim s(tn)= a.
n-ow

If a = V ( a ) ,then a is the greatest lower bound of ( V ( s ( t ) )I t 1 0 ) ;this follows


from continuity of V and the fact that V decreases along trajectories.
Let L be the set of all such points a in W :
L = ( a E W I there exist t, + 00 with s(t,,) + a),
198 9. STABILITY OF EQUILIBRIA

where x ( t ) is the trajectory postulated above. Since every point of L is a limit


of points in P, and P is closed in W, it follows that L C P. Moreover, if a E L,
then the entire orbit of a is in L; that is, &( ( a) is defined and in L for all 1 E R.
For t$r(a)is defined for all t 1 0 since P is positively invariant. On the other hand,
each point + , ( x ( t n ) ) is defined for all t in the interval [ - t a p 03 since x(k) --* a
and we may assume t1 < t? < * - * , it follows from Chapter 8 that #Jr(a)is defined
for all t E [ - t n , 01,n = 1, 2, . . . . Since - t , + - 00, t$t(cr) is defined for all 1 5 0.
To see that g,(a) E L, for any particular s E R, note that if x(L) + u, then
z(tn + 8 ) --* t$,(a).
We reach a contradiction, for V(a) = a for all u E L; hence V is constant on
an entire orbit in P. This is impossible; hence 1imt+- z ( t ) = 3 for all trajectories
in P. This proves that Z is asymptotically stable, and also that P C B ( 5 ) . Thia
completes the proof of Theorem 2.

The set L defined above is called the set of vlimit points, or the vlimit set, of
the trajectory z ( t ) (or of any point on the trajectory). Similarly, we define the
set of a-limit points, or the a-limit set, of a trajectory y ( t ) to. be the set of all points
b such that limn+- y ( t n ) = b for some sequence tn - 00. (The reason, such aa
--.)

it is, for this terminology is that a is the first letter and o the last letter of the
Greek alphabet.) We will make extensive use of these concepts in Chapter 11.
A set A4in the domain W of a dynamical system is invariant if for every 2 E A,
t$((r)is defined and in A for all t E R. The following facts, essentially proved in
the proof of Theorem 2, will be used in Chapter 11.

Proposition The a-limit set and the d i m i t set of a trajectory whichis definedforall
t E R are closed invariant sets.

PROBLEMS

1. Find a strict Liapunov function for the equilibrium (0, 0) of


2' = -22 - y'.
y' = - y - 2'.
Find 6 > 0 aa large aa you can such that the open disk of radius 6 and center
(0, 0) is contained in the basin of (0,0).
2. Discuss the stability and basins of the equilibria of Example 1 in the text.

3. A particle moves on the atraight line R under the influence of a Newtonian


force depending only upon the position of the particle. If the force is always
directed toward 0 E R,and vanishes at 0, then 0 is a stable equilibrium. (Hint:
84. GRADIENT SYSTEMS 199

The total energy E is a Liapunov function for the corresponding first order
system
2' = y,

Y' = -g(x>;
E is kinetic energy plus potential energy, and the potential energy at x E R
is the work required to move the mass from 0 to 5.)
4. In Problem 3 suppose also that there is a frictional force opposing the motion,
of the form - f ( x ) v , f(s) 2 0, where v is the velocity, and x the position of the
particle. If f-'(O) = 0, then (0, 0) is asymptotically stable, and in fact every
trajectory tends toward (0,0).
5. Sketch the phase portraits of
(a) the pendulum with friction (see also Problem 6) ;
(b) the pendulum without friction.
6. (a) For the frictional pendulum, show that for every integer n and every
angle 0, there is an initial state (&, wo) whose trajectory tends toward
(0,0) , and which travels n times, but not n +
1 times, around the circle.
(b) Discuss the set of trajectories tending toward the equilibrium ( T , 0 ) .
7. Prove the following instability theorem: Let V be a C1 real-valued function

-
defined on a neighborhood U of an equilibrium f of a dynamical system.
Suppose V ( 5 ) = 0 and V > 0 in U - f. If V ( x n ) > 0 for some sequence
x,, f, then f is unstable.
8. Let V be a strict Liapunov function for an equilibrium f of a dynamical system.
Let c > 0 be such that V-l[O, c] is compact and contains no other equilibrium.
Then V-l[O, c ] C B ( 2 ) .

$4. Gradient Systems

A gradient system on an open set W C Rnis a dynamical system of the form


(1) x' = -grad V ( x ) ,
where
V :U + R
is a C* function, and

grad V = (E
axl
,. . . , 9
ax,
is the gradient vector field
grad 8 : U + Rn
200 9. STABILITY OF EQUILIBRIA

of V. (The negative sign in (1) is traditional. Note that -gradV(z) =


grad(-V(z)).)
Gradient systems have special properties that make their flows rather simple.
The following equality is fundamental:
(2) DV(Z)Y = (grad V(z), Y).
-
This EBYS that the derivative of V a t z (which is a linear map R* R) , evaluated
on y E Rn, gives the inner product of the vectors grad V (2) and y. To prove (2),
we observe that

which is exactly the inner product of grad V (z)and y = ( y ~., . . , yn).


Let V : U + Rn be the derivative of V along trajectories of (1) ; that is,
d
V ( 2 ) = - V(z(t))
dt

Theorem 1 V(z) 5 0 for all z E U ;and V ( z ) = 0 if and only if z is an equi-


Zibrium of (1).
Proqf. By the chain rule
V(2) = DV(z)z'
= (grad V(z), -grad V(z))
by (2) ; hence
V ( z ) = - I grad V(z) I*.
This proves the theorem.

Corollary Let 5 be an isolated minimum of V. Then 5 is an asymptotically stable


equilibrium of the gradient system 5' = -grad V (x).
Proof. I t is eaey to verify that the function x + V(z) - V(5) is a strict
Liapunov function for 5, in some neighborhood of 5.
To understand a gradient flow geometrically one looks at the level surfaces of
the function V: U + R. These are the subseta V-I(c), c E R. If u E V-l(c) is a
reguhr point, that is, grad V(z) # 0, then V-l(c) looks like a "surface" of dimen-
sion n - 1 near z. To see this, w u m e (by renumbering the coordinates) that
aV/ax,(u) # 0. Using the implicit function theorem, we find a function 8 :
Rn-l + R such that for z near u we have identically
V(z1, . . . , zn-1, g(z1,. .. , zn-1)) = c;
hence near u, V-l (c)' looks like the graph of the function g.
The tangent plane to this graph is exactly the kernel of DV(u). But, by (2),
w. GRADIENT SYSTEMS 201

this kernel is the ( n - 1)dimensional subspace of vectors perpendicular to grad


V ( u ) (translated parallelly to u).Therefore we have shown:

Theorem 2 At regular points, the vector Jield -grad V ( x ) is perpendicular to the


level surfaces of V .

Note by (2) that the nonregular or critical points of V are precisely the equi-
librium points of the system ( 1 ) .
Since the trajectories of the gradient system ( 1 ) are tangent to -grad V ( x ) ,
we have the following geometric description of the flow of a gradient system:

Theorem 3 Let
x' = -grad V ( z )
be a gradient system. At regular points the trajectories MOSS level surfaces orthogonally.
Nonregular points are equilibria of the system. Isolated minima are asymptotically
Stable.

Example. Let V: R*+R be the function V ( x ,y ) = $(x - 1)' + ys. Then we


have, putting z = (x,y ) :

f(z) = -grad V ( z ) = ( S V , *V) = ( - 2 4 s - 1)(2x - l ) , - 2 y )

or
dx
- -- - 2 4 2
dt
- 1)(2x - l ) , a

dY = - 2 y .
-
dt
The study of this differential equation starts with the equilibria. These are
found by setting the right-hand sides equal to 0, or -2x(x - 1)(2x - 1) = 0,
- 2 y = 0.

;1
We obtain precisely three equilibria: ZI = (0, 0),ZII = ($, 0),ZIII = (1,O). To
check their stability properties, we compute the derivative Df(z)which in co-
ordinates is
r 1

(-24% - 1 ) ( 2 x - 1))
d
-
dY
(-2Y)
9. STABILITY OF EQUILIBRIA

-
,---

Y/ FIG. A. Graph of V = Z~(Z- 1)' + yl.


or
+ 1)
-2(621 - 62

Evaluating this at the three equilibria gives:


-2"I *

We conclude from the main result on nonlinear sinks that 21, ZIII are sinks while ZII
is a saddle. By the theorem of Section 2, ZII is not a stable equilibrium.
The graph of V looks like that in Fig. A. The curve8 on the graph represent
intersections with horizontal planes. The level "surfaces" (curves, in this case)
+
look like those in Fig. B. Level curves of V(z, y) -- Z ( s - 1)* y* and the phase

FIG. €3. Level curves of V(z, y).


84. GRADIENT SYSTEMS 203

FIG. C. Level curves of V ( z , y) and gradient lines of (z',y') = --gad V(z, y).
portrait of (x', y') = -grad V ( x , y ) , superimposed on Fig. B, look like Fig. C.
The level curve shaped like a reclining figure eight is V-' (&) .
More information about a gradient flow is given by:

Theorem 4 Let z be an a-limit point or an o-limit point (Section 3 ) of a trajectory


of a gradient $ow. Then z is an equilibrium.
Proof. Suppose z is an w h i t point. As in the proof of Theorem 2, Section 3,
one shows that V is constant along the trajectory of z. Thus V ( z ) = 0 ; by Theorem
1, z is an equilibrium. The case of a-limit points is similar. In fact, an a-limit point
z of x' = -grad V ( x )is an w-limit point of x' = grad V ( z ) ,whence grad V ( z ) = 0.
In the caae of isolated equilibria this result implies that an orbit must either run
off to infinity or else tend to an equilibrium. In the example above we see that
the sets
V-'([-C, c ] ) , c E R,
are compact and positively invariant under the gradient flow. Therefore each
trajectory entering such a set is defined for all t 2 0, and tends to one of the three
equilibria (0, 0), (1, 0), or ( 3 , 0 ) . And the trajectory of every point does enter
such a set, since the trajectory through (x,y ) enters the set
V-l([-c, c ] ) , c = V ( x ,y ) .
The geometrical analysis of this flow is completed by observing that the line
x = 3 is made up of the equilibrium (3, 0 ) and two trajectoriea which approach
it, while no other trajectory tends to (+, 0). This is because the derivative with
+ + +
respect to t of I x - 1 is positive if 0 < x < or < x < 1, as a computation
shows.
We have shown: trajectories to the left of the line x = 3 tend toward (0, 0 )
(aa t -+ + 00 ) ; and trajectories to the right tend toward (1, 0 ) . Trajectories on
the line x = 4 tend toward (3, 0). This gives a description of the basins of the
equilibria (0, 0 ) and (1,O). They are the two half planea
N O ,0 ) = ((x, Y) E R2I < 31,
B(1,0> = ((x,Y) E R' I 2 > 31.
204 9. STABILITY OF EQUILIBRIA

PROBLEMS

1. For each of the following functions V ( u ) ,sketch the phase portrait of the gradi-
ent flow u' = -grad V ( u ) . Identify the equilibria and classify them as to
stability or instability. Sketch the level surfaces of V on the same diagram.
(a) z' + 2d + +
(b) i - 9' - 2% 49 5
(c) ysin z + + + +
(d) 22' - 2 ~ y 59' 4s 49 4
(el z f + d - z (f) i ( z - 1) + 9Yy - 2) + z'
2. Suppose a dynamical system is given. A trajectory z ( t ) , 0 5 t < a,is called
recurrent if z(L) --* z(0) for some sequence t, + a. Prove that a gradient
dynamical system has no nonconstant recurrent trajectories.
3. Let V: E + R be CZ and mppose V-l( - 00, c] is compact for every c E R .
. ..
Suppose also D V ( z ) # 0 except for a finite number of points P I , , p,. Prove:
(a) Every solution z ( t ) of z' = -grad V ( z ) is defined for all t 2 0;
(b) limt+., z ( t ) exists and equals one of the equilibrium points p l , ... , p,,
for every solution ~ ( 1 ) .

45. Gradients and Inner Products

Here we treat the gradient of a real-valued function V on a vector space E


equipped with an inner product ( , ). Even if E is Rn,the inner product might
not be the standard one. Even if it is, the new definition, while equivalent to the
old, has the advantage of being coordinate free. As an application we study further
the equilibria of a gradient flow.
We define the dwcl of a (real) vector space E to be the vector space
E* = L(E,R)
of all linear maps E + R .

Theorem 1 E* is isomorphic to E and thua has the same dimension.

Proof. Let (el, .. .


, en} be a basis for E and ( , ) the induced inner product.
Then define u : E + E* by 2 4u,where u&) = (2, 9 ) . Clearly, u is a linear map.
Also, u, # 0 if z # 0 since u,(z) = (5,z) # 0. It remains to show that u is sur-
jective. Let u E E* and u(ei) = 2s. Define z = C Ziei, so u,(et) = (9, C Ziei) = k
and u, = u. This proves the theorem.

Since E and E* have the same dimension, say n, E* has a basis of n elements.
If (el, . . . , en) = is a basis for E, they determine a basis {e:, . . . , e:) = a*
95. GRADIENTS AND INNER PRODUCTS 205

for E* by defining
e;: E + R J
e;(C lie<) = li;
i

for i, j = 1, . . . , n. Thus e; is characterized by


e?(ei) = 6ij.

(B* is called the basis dual to (B.


Now suppose E is given an arbitrary inner product ( , ). We define an associated
map CP: E + E* (as in Theorem 1) by CP(z)(y) = (5,y). Clearly, CP is an iso-
morphism by Theorem 1, since its kernel is 0.
Next, let V : W + R be a continuously differentiable map defined on an open
set W C E. The derivative of V is a continuous map
D V :W + L ( E , R) = E*.
A map W + E* is called a 1-form on W. An ordinary differential equation is the
same as a vector field on W, that is, a map W + E . We use @-I: E* + E to con-
vert the 1-form D V : W + E* into a vector field grad V : W --* E :

Definition grad V(s) = W ( D V ( z ) ) ,2 E W.


From the definition of CP we obtain the equivalent formulation
(1) DV(s)y = (grad V(z),y) for all 9 E E.
The reader can verify that if E = Rn with the usual inner product, then this defini-
tion of grad V(s) is the same as

We now prove some results of the preceding section concerning the differential
equation
(2) 2’ = -grad V(z),
using our new definition of grad V.

Theorem 2 Let V : W + R be a 0 function (that is, D V : W + E* is C1;w V has


continuous second partial derivatives) on an open set W in a vector space E with an
inner product.
(a) x is an equilibrium point of the diferential equation (2) if and only if
DV(x) = 0.
206 9. STABILITY OF EQUILIBRIA

(b) I f z ( t ) is a solution of (2), then


d
- V ( s ( t ) ) = - I grad V ( s ( t ) )1%.
dt
(c) If x ( t ) is not comtant, then V (x(1) ) is a decreasing function of t.
Proof. Since V is P,the right side of (2) is a C’ function of x ; therefore the
basic uniqueness and existence theory of Chapter 8 applies to (2).
By the definitions -grad V ( 5 ) = 0 if and only if D V ( 5 ) = 0, since CP: E -+ E*
is a linear isomorphism ; this proves (a).To prove (b) we use the chain rule :
d
- V ( x ( t ) )= DV(x(t))x’(t)
dt
= D V ( x ( t ) )(-grad V ( x ( t ) ;)
by (1) this equals
(grad V ( x ( t ) ) ,-grad V ( x ( t ) ) )= - I grad V ( x ( t ) )I*.
If x ( t ) is not constant, then by (a), grad V ( x ( t ) )# 0; so (b) implies
d
- V ( x ( t ) )< 0.
dt
This proves (c) .

The dual vector space is also used to study linear operators. We define the
udjoint of an operator
T:E+E
(where E has some fixed inner product) to be the operator
TI: E +E
defined by the equality
(Tx, Y ) = (x,T*Y)
for all x, y in E. To make sense of this, first keep y fixed and note that the map
z + (Tx,y ) is a linear map E 4 R; hence it defines an element X ( y ) E E*. We
define
T*y = W X ( y ) ,
where
@:E+E*
is the isomorphism defined earlier. It is easy to see that T* is linear.
If @ is an orthonormal basis for E, that is, @ = [el, . . . , en] and
(ei, el> = 6ij,
then the @-matrix of T* turns out to be the transpose of the @-matrix for T I &s is
easily verified.
$5. QRADIENTS AND INNER PRODUCTS 207

An operator T E L ( E ) is selfkdjoint if T* = T , that is,


(Ts, y ) = (2, Ty), for all z,y E E.
In an orthonormal basis this means the matrix [aij] of T is symmetric, that is,
aij = aji.

Theorem 3 Let E be a real vector space with an inner product and let T be a self-
adjoint operator on E. Then the eigenvalues of T are real.
Proof. Let EC be the complexification of E . We extend ( , ) to a function
+ +
EC X EC ---t C as follows. If x i y and u iv are in Ec, define
(x + iy, + iv) = (5, u> + i ( ( Y , u > - (s, v ) ) + (?I, v ) .
It is easy to verify the following for all a, b E Ec, X E C :
(3) (a, a ) >0 if a # 0,
(4) X(a, b ) = (Xa, b ) = (a, Xb),
where - denotes the complex conjugate.
Let Tc: EC + EC be the complexification of T ; thus T c ( s i y ) = T s + + i(Ty).
Let ( T * ) cbe the complexification of T*. It is easy to verify that
(5) (Tea, b ) = (a, (Web).
(Thisis true even if T is not self-adjoint.)
Suppose X E C is an ttigenvalue for T and a E EC an eigenvector for X; then
Tca = Xa.
BY ( 5 )
(%a, a > = (a, ( T * ) c a )
= (a, Tca).
since T* = T . Hence
(ka, a ) = (a, ha).
But, by (41,
k(a, a> = (ha, a>,
while
X(a, a ) = (a, Xa);
so, by (3), X = X and X is real.

Corollary A symmetric real n X n matrix has real eigenvalues.

Consider again a gradient vector field


F(s) = -grad V ( s ) .
208 9. STABILITY OF EQUILIBRIA

For simplicity we aasume the vector space is R",equipped with the usual inner
product. Let 2 be an equilibrium of the system
x' = -grad V ( 2 ) .
The operator
DF (5)
has the matrix

in the standard baais. Since this matrix is symmetric, we conclude:

Theorem 4 At an equilibrium of a gradient system, the eigenvdue8 are red.

This theorem is also true for gradients defined by arbitrary inner products.
For example, a gradient system in the plane cannot have spirals or centers at
equilibria. In fact, neither can it have improper nodes because of:

Theorem 5 Let E be a real vector space with an i n m product. Then any self-
adjoini operator on E can be diagonalized.
Proof. Let T : E + E be self-adjoint. Since the eigenvalues of T are real, there
is a nonzero vector el E E such that Tel = Alel, XI E R. Let
El = (xE E I (5, e l ) = 01,

the orthogonal complement of el. If z E El, then Tx E El, for


(Tz,el) = (2, Tel) = (x,Xel) = X(x, el) = 0.

Hence T leaves El invariant. Give El the same inner product aa E; then the operator
Ti = T I Ei E L(E1)
is self-adjoint. In the same way we find a nonzero vector Q E El such that
Te = X$s; Xs E R.
Note that el and et are independent, since (el, a) = 0. Continuing in this way, we
.
find a maximal independent set = (el, . . , en} of eigenvectors of T.These must
span E, otherwise we could enlarge the set by looking at the restriction of T to
the subspace orthogonal to el, . . . , en. In thia basis a,T is diagonal.

.
We have actually proved more. Note that er, . . , en are mutually orthogonal;
and we can take them to have norm 1. Therefore a 8eZj-adjoint operatm (or a 8ym-
metric matrix) can be diagonalized by an orthonormal barris.
For gradient system we have proved:
NOTES 209

Theorem 6 At an equilibrium of a gradient JEow the linear part of the vector &Zd
ie diagOna2izabl.eba, a n mthononnal ba&.

PROBLEMS

p : ;]
1. Find an orthonormal diagonelising basis for each of the following operators:
(4 2 1 0 -2
[l 11 (b) c - 2 01 (c)
1 2 -1

2. Let A be a self-adjoint operator. If x and y are eigenvectors belonging to


different eigenvdues then (2, y) = 0.
3. Show that for each operator A of Problem 1, the vector field x 3 Ax is the
gradient of some function.
4. If A is a symmetric operator, show that the vector field x + Ax is the gradient
of some function.

Notea

A statement and proof of the implicit function theorem used in Section 4, is


given in Appendix 4. See P. Halmos’ Finite Dimensional Vector Spaces [ S ] for a
more extended treatment of self-adjoint linear operators. One can find more on
Lispunov theory in LaSalle and Lefschets’s Stability b y Liapunou’s Direct Method
with Applications [141. Pontryagin’s text [ 101 on ordinary differential equations
ia recommended; in particular, he has an interesting application of Liapunov
theory to the study of the governor of a steam engine.
Chapter 10
Differential Equations
for Electrical Circuits

First a simple but very basic circuit example is described and the differential
equations governing the circuit are derived. Our derivation is done in such a way
that the ideas extend to general circuit equations. That is why we are so careful
to make the maps explicit and to describe precisely the sets of states obeying
physical laws. This is in contrast to the more typical ad hoc approach to nonlinear
circuit theory.
The equations for this example are analyzed from the purely mathematical
point of view in the next three sections; these are the classical equations of Lienard
and Van der Pol. In particular Van der Pol’s equation could perhaps be regarded
as the fundamental example of a nonlinear ordinary differential equation. It
possesses an oscillation or periodic solution that is a periodic attractor. Every
nontrivial solution tends to this periodic solution; no linear flow can have this
property. On the other hand, for a periodic solution to be viable in applied mathe-
matics, this or some related stability property must be satisfied.
The construction of the phase portrait of Van der Pol in Section 3 involves
some nontrivial mathematical arguments and many readers may wish to skip or
postpone this part of the book. On the other hand, the methods have some wider
use in studying phase portraits.
Asymptotically stable equilibria connote death in a system, while attract,ing
oscillators connote life. We give an example in Section 4 of a continuous transition
from one to the other.
In Section 5 we give an introduction to the mathematical foundations of elec-
trical circuit theory , especially oriented toward the analysis of nonlinear circuits.
211

61. An RLC Circuit

We give an example of an electrical circuit and derive from it a differential


equation that shows how the state of the circuit varies in time. The differential
equation is analyzed in the following section. Later we shall describe in greater
generality elements of the mathematical theory of electrical circuits.
Our discussion of the example here is done in a way that extends to the more
general case.
The circuit of our example is the simple but fundamental series RLC circuit in
Fig. A. We will try to communicate what this means, especially in mathematical
terms. The circuit has three branches, one resistor marked by R, one inductor
marked by L, and one capacitor marked by C . One can think of a branch as being
a certain electrical device with two terminals. In the circuit, branch R has terminals
a, p for example and these terminals are wired together to form the points or
nodes a,p, y.
The electrical devices we consider in this book are of the three types: resistors,
inductors, and capacitors, which we will characterize mathematically shortly.
In the circuit one has flowing through each branch a current which is measured
by a real number. More precisely the currents in the circuit are given by the three
, iR measures the current through the resistor, and so on. Current
numbers i ~ i, ~ ic;
in a branch is analogous to water flowing in a pipe; the corresponding measure for
water would be the amount flowing in unit time, or better, the rate at which water
passes by a fixed point in the pipe. The arrows in the diagram that orient the
branches tell us which way the current (read water!) is flowing; if for example iR
is positive, then according to the arrow current flows through the resistor from
p to a (the choice of the arrows is made once and for all at the start).
The state of the currents a t a given time in the circuit is thus represented by a
, ic) E Ra. But Kirchhofs current law (KCL) says that in reality
point i = ( i ~iL,
there is a strong restriction on what i can occur. KCL asserts that the total current

L3

FIG. A
212 10. DIFFERENTIAL EQUATIONS FOR ELECTRICAL CIRCUITS

flowing into a node k equal to the total current flowing out of that node. (Think of
the water analogy to make this plausible.) For our circuit this is equivalent to
KCL: i~ = {L = -ic.

Tbis defines a one-dimensional subspace K l of R8of physicd current states. Our


choice of orientation of the capacitor branch may seem unnatural. In fact the
orientations are arbitrary; in the example they were chosen so that the equations
eventually obtained relate most directly to the history of the subject.
The state of the circuit is characterized by the current i together with the voltage
(or better, voltage drop) across each branch. These voltages are denoted by UR, UL, uc
for the resistor branch, inductor branch, and capacitor branch, respectively. In the
water analogy one thinks of the voltage drop aa the difference in pressures a t the
two ends of a pipe. To measure voltage one placesa voltmeter (imaginea water pres-
sure meter) at each of the nodes a,8, 7 which reads V ( a ) at a,and so on. Then U R
is the difference in the reading at a and 8
V(B) - V ( a ) = UR.

The orientation or arrow tells us that UR = V(@) - V ( a )rather than V ( a ) - V ( 8 ) .


An unrestricted uollage state of the circuit is then a point u -- (UR, UL, UC) in R’.
Again a Kirchhoff law puts a physical restriction on u :
KVL: UR + UL - uc = 0.
This defines a two-dimensional linear subspace Knof R*.From our explanation of
the UR, UL,uc in terms of voltmeters, KVL is clear; that k,
VR + VL - UC = (v(8) - v(a)) + ( V ( a ) - v ( y ) ) - (v(8)- V(y)) = 0.
In a general circuit, one version of KVL aaserts that the voltages can be derived
from a “voltage potential” function V on the nodes at^ above.
We summarize that in the product space, Ra X R’ = 8, those states (i, u ) satis-

FIG. B
$1. AN RLC CIRCUIT 213

fying Kirchhoff’slaws form a three-dimensional subspace K of the form K = K1 X


K2 C Ra X R3.
Next, we give a mathematical definition of the three kinds of electrical devices
of the circuit.
First consider the resistor element. A resistor in the R branch imposes a “func-
tional relationship” on i R , V R . We take in our example this relationship to be de-
fined by a C1 real function f of a real variable, so that VR = f ( i R ) . If R denotes a
conventional linear resistor, then f is linear and V R = f ( i R ) is a statement of Ohm’s
law. The graph off in the ( i R , V R ) plane is called the characteristic of the resistor.
A couple of examples of characteristics are given in Figs. B and C. (A characteristic
like that in Fig. C occurs in the “tunnel diode.”)
A physical state (i, v ) E R3 X’Ra = S will be one which satisfies KCL and KVL
or (i, v ) E K and also f ( i , ) = v,. These conditions define a subset I: C K C S
Thus the set of physical stat4.s I: is that set of points ( i a , i L , i c , V R , VL, V C ) in R3 X R3
satisfying:
ZR = %L = - i C (KCL) 1
VR + VL - vc = 0 (KVL),
f(iR) = VR (generalized Ohm’s law).
Next we concern ourselves with the passage in time of a state; this defines a
curve in the state space S:
t -, ( i ( t ) , v ( t ) )
( i ~ ( t i)L, ( t > , i c ( t > , V R ( ~ )V, L ( ~ ) ,v c ( t ) ) .
=
The inductor (which one may think of as a coil; it is hard to find a water analogy)
specifies that

(Faraday’s law)

where L is a positive constant called the inductance.

FIG. C
214 10. DIFFERENTIAL EQUATIONS FOR ELECTRICAL CIRCUITS

On the other hand, the capacitor (which may be thought of aa two metal plates
separated by some insulator; in the water model it is a tank) imposes the condition

where C is a positive constant called the capacitance.


We summarize our development so far: a state of our circuit is given by the six
numbers ( i ~ i, ~ ic,
, UR, UL, UC), that is, an element of R* X R*.These numbers are
subject to three restrictions: Kirchhoff’s current law, Kirchhoff’s voltage law, and
the resistor characteristic or “generalized Ohm’s law.” Therefore the space of
physical states is a certain subset 2 C Rs X R*.The way a state changes in time
is determined by two differential equations.
Next, we simplify the state space 2 by observing that i L and uc determine the
other four coordinates, since i R = i~ and ic = - i ~by KCL, UR = f(i~)= by
the generalized Ohm’s law, and U L = uc - U R = uc - j ( i L )by KVL. Therefore
we can use R2as the state space, interpreting the coordinates aa ( i L , UC) . Formally,
we define a map r : Ra X RS R2, sending (i, u) E Ra X R8to ( i L , UC) . Then we
set r o = ‘TT 1 Z,the restriction of r to 2; this map r o :2 + R*is one-to-one and onto;
its inverse is given by the map Q: R2+ 2,
Q ( ~ L UC)
, = ( i ~ i, ~ -, i ~ ,f(id,uc - f(id,V C ) .
It is easy to check that Q ( ~ L ,UC) satisfies KCL, KVL, and the generalized Ohm’s
law, so cp does map RZinto 2; it is also easy to see that r 0and ~0 are inverse to each
other.
We +.hereforeadopt R2as our state space. The differential equations governing
the change of state must be rewritten in terms of our new coordinates ( i L , UC) :
diL
L- = UL = uc - f ( i L ) ,
dt

For simplicity, since this is only an example, we make L = 1, C = 1.


If we write x = i L , y = UC, we have as differential equations on the (x,y) Car-
tesian space:
dx
-
dt
= Y -f(s>,

dY -- -x.
-
dt
These equations are analyzed in the following section.
$2. ANALYSIS OF THE CIRCUIT EQUATIONS 215

PROBLEMS

1. Find the differential equations for the network in Fig. D, where the resistor k
voltage controlled, that is, the resistor characteristic is the graph of a C’ func-
tion g: R -+ R,g ( m ) = i ~ .

FIG. D

2. Show that the LC circuit consisting of one inductor and one capacitor wired
in a closed loop oscillates.

$2. Analysis of the Circuit Equations

Here we begin a study of the phase portrait of the planar differential equation
derived from the circuit of the previous section, namely:

dx
-
at = Y -f(z>,

- = -x,
dY
dt
This is one form of Lienard’s equation. If f(s) = x3 - x, then ( 1 ) is a form of
Van der Pol’s equation.
First consider the most simple case of linear f (or ordinary resistor of Section 1 ) .
Let f ( x ) = Kx, K > 0 . Then ( 1 ) takes the form

The eigenvalues of A are given by X = $ [ - K f (K2 - 4)1/*]. Since X always


has negative real part, the zero state (0’0) is an asymptotically stable equilibrium,
216 10. DIFFERENTIAL EQUATIONS FOR ELECTRICAL CIRCUITS

in fact a sink. Every state tends to zero; physically this is the dissipative effect of
the resistor. Furthermore, one can see that (0,0) will be a spiral sink precisely
when K < 2.
Next we consider the equilibria of (1) for a general C’function f.
There is in fact a unique equilibrium Z of (1) obtained by setting

Y - f(s> = 0,

or

The matrix of first partial derivatives of (1) at Z is

whose eigenvalues are given by


[To)I:
We conclude that this equilibrium satisfies:
Z is a sink if f’(W > 0,
and
Z is a source if f’(0) <0
(see Chapter 9).
In particular for Van der Pol’s equation (f(z) = 9 - 2) the unique equi-
librium is a source.
To analyze (1) further we define a function W: R2 + Rz by W ( s , y) = +(zz +
92) ; thus W is half of the norm squared. The following proposition is simple but
important in the study of (1).

Proposition Let z ( t ) = (s(t) , y ( t )) be a solution curve of Lienard’s equation (1).


Then
d
-WMt)) = -s(t>f(x(t>>.
dt
Proof. Apply the chain rule to the composition

J$R~ZR
to obtain
$3. VAX DEH POL’S EQUATION 217

suppressing t, this is equal to

X(Y - f(x>) - YX = -zf(x>


by (1). Here J could be any intewal of real numbers in the domain of z.
The statement of the proposition has an interpretation for the electric circuit
that gave rise to (1) and which we will pursue later: energy decreases along the
solution cuwes according to the power dissipated in the resistor.
In circuit theory, a resistor whose characteristic is the graph o f f : R + R , is
called passive if its characteristic is contained in the set consisting of (0, 0) and
the interior of the first and third quadrant (Fig. A for example). Thus in the case
of a passive resistor -xf(x) is negative except when x = 0.

FIG. A

From Theorem 2 of Chapter 9, Section 3, it follows that the origin is asymptoti-


cally stable and its basin of attraction is the whole plane. Thus the word passive
correctly describes the dynamics of such a circuit.

83. Van der Pol’s Equation

The goal here is to continue the study of Lienard’s equation for a certain func-
tion f.
dx
dt = y - f(x), f(x) = 9 - 2,
-

dY
-= -X.
dt
218 10. DIFFERENTIAL EQUATIONS FOR ELECTRICAL CIRCUITS

C
I y-

FIG. A

This is called V a n der Pol’s equation; equivalently


d- x= y - 2 + x s ,
dt

-
dY = -x.
dt
In this case we can give a fairly complete phase portrait analysis.

Theorem There i s one nontrivial periodic solution of (1) and every nonequilibrium
solution tends to this periodic solution. “The system oscillates.”

We know from the previous section that (2) has a unique equilibrium a t ( 0 , 0) ,
and it is a source. The next step is to show that every nonequilibrium solution
“rotates” in a certain sense around the equilibrium in a clockwise direction. To
this end we divide the ( 2 , y ) plane into four disjoint regions (open sets) A , B ,
C,D in Fig. A. These regions make up the complement of the curves
(3) Y -f(x> = 0,
- x = 0.
These curves (3) thus form the boundaries of the four regions. Let us make this
more precise. Define four curves

u+ = ((21 Y) I Y > 01 r = 011

9+ = I(X1 Y) I x > 0,Y = 2 - 211

u- = ( ( X l Y ) IY <o,s =O),
g-= ((x,y)Iz<O,y = 2 - x ) .
$3. VAN DER POL’S EQUATION 219

These curves are disjoint; together with the origin they form the boundaries of
the four regions.
Next we see how the vector field (x‘, y ‘ ) of (1) behaves on the boundary curves.
It is clear that y’ = 0 at (0,O) and on v+ u r,and nowhere clse; and 2’ = 0 exactly
on g+ u g- u (0, 0). Furthermore the vector (x’, y ’ ) is horizontal on v+ u v- and
points right on v+, and left on v- (Fig. B) . And (x’, y ’ ) is vertical on g+ U p,point-
ing downward on g+ and upward on r.In each region A , B, C,D the signs of
x’ and y‘ are constant. Thus in A , for example, we have x‘ > 0, y‘ < 0, and so the
vector field always points into the fourth quadrant.
The next part of our analysis concerns the nature of the flow in the interior of
the regions. Figure B suggests that trajectories spiral around the origin clockwise.
The next two propositions make this precise.

D
f

v-

FIG. B

Proposition 1 An3 trajectory starting on v+ enters A . A n y trajectory starting in A


meets g+; furthermore i t meets g+ before i t meets r,9- or v+.

Proof. See Fig. B. Let ( s ( t ) , y ( t ) ) be a solution curve to (1). If ( x ( O ) , y ( 0 ) ) E


u+, then s ( 0 ) = 0 and y(0) > 0. Since x ‘ ( 0 ) > 0, x ( t ) increases for small t and
so s ( t ) > 0 which implies that y ( l ) decreases for small t . Hence the curve enters A .
Before the curve leaves A (if it does) , x’ must become 0 again, so the curve must
cross g+ before it meets v-, g- or v+. Thus the first and last statements of the propo-
sition are proved.
It remains to show that if ( s ( O ) , y ( 0 ) ) E A then ( x ( t ) , y ( t ) ) E g+ for some
t > 0. Suppose not.
Let P C R2be the compact set bounded by (0,O)and v+, g+ and the line y = y ( 0 )
as in Fig. C. The solution curve ( s ( t ), y ( t ) ), 0 5 t < p is in P. From Chapter 8,
it follows since ( x ( t ) , y ( t ) ) does not meet g+, it is defined for all t > 0.
Since 2’ > 0 in A , s ( t ) 2 a for t > 0. Hence from ( l ) ,y ‘ ( t ) 5 - a for t > 0.
220 10. DIFFERENTIAL EQUATIONS FOR ELECTRICAL CIRCUITS

FIG. C

For these values of t, then

y ( t ) = / ' y ' ( s ) ds 5 y ( 0 ) - at.


0

This is impossible, unless our trajectory meets g+, proving Proposition 1.

Similar arguments prove (see Fig. D) :

I"+

FIG. D. Trajectories spiral clockwise.

Proposition 2 Every trajectory i s defined for (at least) all t 2 0. Except for (0, 0 ),
each trajectory repeatedly crosses the curves u+, g+, tr,g-, in clockwise order, passing
among the regions A , B, C,D in clockwise order.

To analyze further the flow of the Van der Pol oscillator we define a map
0: v+ + v+
$3. VAN DER POL'S EQUATION 221

as follows. Let p E u+; the solution curve t + + t ( p ) through p is defined for all
t 2 0. There will be a smallest t l ( p ) = tl > 0 such that + r , ( p ) E u+. We put u ( p ) =
+ f l ( p ) .Thus u ( p ) is the first point after p on the trajectory of p (for t > 0) which
is again on v+ (Fig. E) . The map p + t l ( p ) is continuous; while this should be
intuitively clear, it follows rigorously from Chapter 11. Hence u is also continuous.
Note that u is one to one by Uniqueness of solutions.
The importance of this section m p u: u+ + u+ comes from its intimate relation-
ship to the phase portrait of the flow. For example:

Proposition 3 Let p E u+. Then p i s a $xed point of u (that is, u ( p ) = p ) i f and


only i f p is on a periodic solution of (1) (that is, + r ( p ) = p for some t f 0 ) . Moreover
every periodic solution curue meets u+.

Proof. If u ( p ) = p , then & , ( p ) = p , where t1 = t l ( p ) is as in the definition


of U . Suppose on the other hand that u ( p ) # p . Let u* = u+ U (0,O).We observe
first that u extends to a map u* + u* which is again continuous and one to one,
sending (0, 0) to itself. Next we identify u* with ( y E R I y 2 0 ) by migning to
each point its y-coordinate. Hence there is a natural order on u*: (0,y ) < (0, z) if
y < z. It follows from the intermediate value theorem that u: u* + u* is order
preserving. If u ( p ) > p , then u2(p) > u ( p ) > p and by induction u"(p) > p ,
n = 1, 2, . . .. This means that the trajectory of p never crosses u+ again at p .
Hence +r(p) # p for all t # 0. A similar argument applies if u ( p ) < p . Therefore
if ~ ( p #) p , p is not on a periodic trajectory. The last statement of Proposition 3
follows from Proposition 2 which implies that every trajectory (except (0,0))
meets u+.
For every point p E vt let h ( p ) = tz be the smallest t > 0 such that + t ( p ) E tr.
Define a continuous map
a:u+ + r,

d P > = +r,(p).

I"'

FIG. E. The map U : u+ + u+.


'222 10. DIFFERENTIAL EQUATIONS FOR ELECTRICAL CIRCUITS

I "+

Iv -
FIG. F. The map a:u+ -+ u-

See Fig. F. The map Q is also one to one by uniqueness of solutions and thus mono-

tone.
Using the methods in the proof of Proposition 1 it can be shown that there is a
unique point po E v+ such that the solution curve
I$t(Po) I0 I t I hbo) 1
intersects the curve g+ at the point (1, 0) where g+ meets the x-axis. Let r = I po I.
Define a continuous map
6 : u+ + R,

WP)= y I4p) l2 - I p 1')


where I p I means the usual Euclidean norm of the vector p. Further analysis of
the flow of (1) is based on the following rather delicate result:

Proposition 4 (a) b ( p ) > 0 i f 0 < I p I < r ;


(b) 6 ( p ) decreascs monotonely to - 00 as I p I + OD, I p I 2 r.
Part of the graph of b ( p ) as a function of I p I is shown schematically in Fig. G.
The intermediate value theorem and Proposition 4 imply that there is a unique
qo E v+ with b(q0) = 0.
We will prove Proposition 4 shortly; first we use it to complete the proof of the
main theorem of this section. We exploit the skew symmetry of the vector field
g(z, y) = (y - 9 + 2, - 2)
given by the right-hand side of(2),namely,
d--z, -Y> Y).
= -g(x,
This means that if t + (x(t), y(t)) is a solution curve, so is t + (-x(t), -y(t)).
Consider the trajectory of the unique point qo E v+ such that a ( q 0 ) = 0. This
43. VAN DER POL'S EQUATION 223

point haa the property that I u(q0) I = I qo 1, hence that


9dqd = -qo.
From skew symmetry we have also
9d-qo) = - (-qo) = qo;

hence putting X = 2B > 0 we have


h(q0) = q0.

Thus qo lies on a nontrivial periodic trajectory 7 .


Since 6 is monotone, similar reasoning shows that the trajectory through q,, is
the unique nont.rivial periodic solution.
To investigate other trajectories we define a map 8: tr + u+, sending each point
of tr to the first intersection of its trajectory (for t > 0 ) with u+. By symmetry
@(PI = -4-PI.
Note that u = pa.
We identify the y-axis with t.he real numbers in the y-coordinate. Thus if p ,
q C v+ u tr we write p > q if p is above q. Note that a and 13 reverse this ordering
while u preserves it.
Now let p C u+, p > qo. Since a ( q 0 ) = -qo we have a(p) < -qo and o ( p ) > qo.
On the other hand, 6 ( p ) < 0 which means the same thing as a(p) > - p . There-
fore ~ ( p =) @a(p) < p. We have shown that p > qo implies p > u ( p ) > qo. Simi-
larly ~ ( p >) .'(p) > qo and by induction un(p) > un+l( p ) > qo, n = 1, 2, . . ..
The sequence u n ( p ) has a limit q1 2 qo in u+. Note that ql is a fixed point of u,
for by continuity of o we have
u(qd - q1 = lim 4 4 ~ )-)
n- w

= QI - q1 = 0.

FIG. G
224 10. DIFFERENTIAL EQUATIONS FOR ELECTRICAL CIRCUITS

Since u has only one fixed point q1 = qo. This shows that the trajectory of p spirals
toward y aa t + 00. The same thing is true if p < go; the details are left to the
reader. Since every trajectory except (0,O) meets u+, the proof of the main theorem
is complete.
It remains to prove Proposition 4.
We adopt the following notation. Let y: [a, b ] + R*be a C1curve in the plane,
written y ( t ) = ( z ( t ) , y ( t ) ) . If F: R*+ R is C1,define

It may happen that ~ ’ ( 1 ) # 0 for a 5 t 2 b, so that along y, y is a function of


x, y = y(z). I n this c a e we can change variables:
dt

hence

Similarly if y’(t) # 0.
Recall the function
W ( z ,!I)= +($ + v*>.
Let y ( t ) = ( z ( t ) , g ( t ) ) , 0 2 t 2 ti = h ( p ) be the solution curve joining p E u+
to a ( p ) E r.By definition 6 ( p ) = W ( z ( t 2 )y, ( t 2 ) ) - W(z(O),y(0)). Thus

By the proposition of Section 2 we have

6(p) = 10
11
-z(t)(z(t)* - z ( t ) ) dt;

b ( p ) = /flz(t)*(l - z ( t ) ’ ) dt.
0

This immediately proves (a) of Proposition 4 because the integrand is positive for
0< z(l) < 1.
We may rewrite the last equality as

6(P) = [*(I - 9).


We restrict attention to points p E u+ with I p I > t . We divide the corresponding
33. VAN DEH POL’S EQUATION 225

.
,I

1
FIG. H

solution curve 7 into three curves 71,7 2 , 78 as in Fig. H. Then

b(p) = &(PI + 62(p) + Sa(p>,


where

&(p) = / 22(1 -
’i
2‘), i = 1,2,3.

Notice that along yl, y(t) is a function of z ( t ) . Hence

where f(z) = z” - 2. As p moves up the y-axis, y - f(z) increases (for (z, y)


on n). Hence li1(p) decreases as I p I -i OQ . Similarly 68 (p) decreases as I p I -i co .
On y2, z is a function of y, and z 2 1. Therefore, since d y / & = -2,

= $’ z(y)(1 - z(y)Z) dy < 0.

As I p I increases, the domain [yl, y2] of integration becomes steadily larger.


The function y --* z(y) depends on p ; we write it zp(y). As I p I increases, the
curves 7 2 move to the right; hence zp(y) increases and so z,(p) (1 - zp(y)*)de-
creases. It follows that & ( p ) decreases as I p I increases; and evidentlv
litn,,+., & ( p ) = - 03. This completes the proof of Proposition 4
226 10. DIFFERENTIAL EQUATIONS FOR ELECTRICAL CIRCUITS

PROBLEMS

1. Find the phase portrait for the differential equation


x' = y -f(x), f(r) = 22,

2/' = -x.
2'. Give a proof of Proposition 2.
3. (Hartman [9, Chapter 7, Theorem 10.21) Find the phase portrait of the
following differential equation and in particular show there is a unique non-
trivial periodic solution:
y = y - f(x>,
Yl = -dx),
where all of the following are assumed:
(i) f, g are C1;
(ii) y ( - x ) = - g ( x ) and x g ( s ) > 0 for all z # 0;
(iii) f ( - r ) = -f(x) and f(x) < 0 for 0 < x < a;
(iv) for x > a, f(x) is positive and increasing;
(v) f(x) + co as x + a.
(Hint: Imitate the proof of the theorem in Section 3.)
4. (Hard !) Consider the equation
X' = y -f ( ~ ) , f: R 4R,C',
y' = -x.
Given f, how many periodic solutions does this system have? This would be
interesting to know for many broad classes of functions f. Good results on this
would probably make an interesting research article.

FIG. I
94. HOPF BIFURCATION 227

5. Consider the equation

It has a unique nontrivial periodic solution Y, by Problem 3. Show that as


p + 00, tends to the closed curve consisting of two horizontal line segments
Y,,
and two arcs on y = i - x as in Fig. I.

64. Hopf Bifurcation

Often one encounters a differential equation with parameter. Precisely, one is


given a C1 map g,: W + E where W is an open set of the vector space E and p is
allowed to vary over some parameter space, say p E J = [-1, 13. Furthermore
it is convenient to suppose that g,, is differentiable in p, or that the map
J x w + E, ( P , 2) + h ( Z >
is C1.
Then one considers the differential equation
(1) z' = q,(z) on W.
One is especially concerned how the phase portrait of (1) changes as p varies.
A value po where there is a basic structural change in this phase portrait is called
a bifurcation point. Rather than try to develop any sort of systematic bifurcation
theory here, we will give one fundamental example, or a realization of what is
called Hopf bifurcation.
Return to the circuit example of Section 1, where we now suppose that the
resistor characteristic depends on a parameter p and is denoted by f,:R --$ R,
- 1 5 p 5 1. (Maybe p is the temperature of the resistor.) The physical behavior
of the circuit is then described by the differential equation on R':

-
dv = -2.
dt
Consider as an example the special case where f,,is described by
(24 f,(z) = zd - pz.
Then we apply the results of Sections 2 and 3 to see what happens as p is varied
from - 1 to 1.
For each p , - 1 5 p 5 0, the resistor is passive and the proposition of Section 2
implies that all solutions tend asymptotically to zero as I + 00. Physically the
circuit is dead, in that after a period of transition all the currents and voltages
228 10. DIFFERENTIAL EQUATIONS FOR ELECTRICAL CIRCUITS

- I 6 pL\* 0 ocp6l
FIG. A. Bifurcation.

stay at 0 (or as close to 0 as we want). But note that as p crosses 0, the circuit
becomes alive. It will begin to oscillate. This follows from the fact that the analysis
of Section 3 applies to (2) when 0 < p I 1; in this case (2) will have a unique
periodic solution 'yr and the origin becomes a source. In fact every nontrivial
solution tends to 'y,, as t + 00. Further elaboration of the ideas in Section 3 can be
used to show that v,, +0 as p +0, p > 0.
For (2), p = 0 is the bifurcation value of the parameter. The basic structure of
the phase portrait changes as p passes through the value 0. See Fig. A.
The mathematician E. Hopf proved that for fairly general one-parameter families
of equations 2' = f,,(2), there must be a closed orbit for p > if the eigenvalue
character of an equilibrium changes suddenly at po from a sink to a source.

PROBLEMS

1. Find all values of p which are the bifurcation points for the linear differential
equation :
dx
- -- I.rz
dt + Yl
_
dY -- 2 - 2y.
dt
2. Prove the statement in the text that y' , + 0 as p 0, p > 0.

$5. More General Circuit Equations

We give here a way of finding the ordinary differential equations for a class of
electrical networks or circuits. We consider networks made up of resistors, capaci-
tors, and inductors. Later we discuss briefly the nature of these objects, called the
branches of the circuit; a t present it suffices to consider them as devices with two
85. MORE GENERAL CIRCUIT EQUATIONS 229

terminals. The circuit is formed by connecting together various terminals. The


connection points are called nodes.
Toward giving a mathematical description of the network, we define in R8 a
linear graph which corresponds to the network. This linear graph consists of the
following data :
(a) A finite set A of points (called nodes) in R8.The number of nodes is de-
noted by a, a typical node by a.
(b) A finite set B of line segments in R8 (called branches). The end points of a
branch must be nodes. Distinct branches can meet only at a node. The number of
branches is b ; a typical branch is denoted by j3.
We assume that each branch B is oriented in the sense that one is given a direction
from one terminal to the other, say from a ( - ) terminal r to a (+) terminal j3+.
The boundary of B E B is the set aj3 = 8+ u r
For the moment we ignore the exact nature of a branch, whether it is a resistor,
capacitor, or inductor.
We suppose also that the set of nodes and the set of branches are ordered, so
that it makes sense to speak of the lcth branch, and so on.
.
A current state of the network will be some point i = (ill . . , it,) E Rbwhere
ih represents the current flowing through the kth branch at a certain moment.
In this case we will often write $ for Rb.
The Kirchhof current law or KCL states that the amount of current flowing
into a node at a given moment is equal to the amount flowing out. The water
analogy of Section 1 makes this plausible. We want to express this condition in a
mathematical way which will be especially convenient for our development.
Toward this end we construct a linear map d : B + P where is the Cartesian
space Ra (recall a is the number of nodes).
If i E $ is a current state and a is a node we define the ath coordinate of di E P
to be
(di)a = C ea8i81
8EB
where
if p = a ,
Ca8 = ($1 if f = a ,
otherwise.
One may interpret (di), as the net current flow into node a when the circuit is in
the current state i.

Theorem 1 A current state i E $ satkfis KCL if and only if d i = 0.


Proof. It is sufficient to check the condition for each node a E A . Thus (di), =
0 if and only if
C Ca$6 = 01
8EB
230 10. DIFFERENTIAL EQUATIONS M)R ELECTRICAL CIRCUITS

or from t,he definition of c.5,

8c-a 8-0

This last is just the expression of KCL at the node a. This proves the theorem.

Next, a voltage stale of our network is defined to be a point v = ( V I , . . . , u b ) E Rb,


where in this context we denote Rb by 'U. The kth coordinate v k represents the
voltage drop across the kt,h branch. The Kirchhofl voltage law (KVL) may be
stated as asserting that there is a real function on the set of nodes, a voltage potential
(given, for example, by voltmeter readings), V : A + R, so that V g = V ( p + ) -
V ( P ) for each 8 E 8.
To relate KCL to KVL and to prove what is called Tellegen's theorem in net-
work theory, we make a short excursion into linear algebra. Let E , F be vector
spaces whose dual vector spaces (Chapter 9) are denoted by E*, F*, respectively.
If u : E -+ F is a linear transformation, then its adjoint or dual is a linear map
u*: F* -+ E* defined by u*(.r) (y) = x ( u ( y ) ) , where 2 E F*, y E E . (Here u * ( x )
is an element of E* and maps E + R.)
Now let 4 be the natural bilinear map defined on the Cartesian product vector
space E X E* with values in R : if ( e , e*) E E X E*, then 4 ( e , e*) = e*(e).

Proposition Let N.: E 4 F be a linear ,nap and let K = (Ker u ) x ?Im H * ) C


E x E * .Then 4is zero on K .
Proof. Let ( e , e*) E K so that u ( e ) = 0 and e* = u*y for some y E F*. Then
d ( e , e*) = 4(e, u*y) = (u*y)(e) = y ( u ( e ) ) = 0.
This proves the proposition.
R e m a r k . A further argument shows that dim K = dim E.

We return to the analysis of the voltage and current states of a network. It


turns out to be useful, as we shall see presently, to identify the space 'U with the
dual space g* of 9. 3Iathematically this is no problem since both 'u and g* are
naturally isomorphic to Rb. With this identification, the voltage which a voltage
state v E $* assigns to the kth branch p is just u ( i g ) , where i g E 9 is the vector
where the kth coordinate is 1 and where other coordinates are 0.
We can now express KVL more elegantly:

Theorem 2 A voltage state v E g* satisfies K V L i f and only i f i t i s i n the image


of the adjoint d*: D* g* of d : $ -+ D.
--f

Proof. Suppose v satisfies Kirchhoff's voltage law. Then there is a voltage


potential V mapping the set of nodes to the real numbers, with v(P) = V ( P + )-
V ( r ) for each branch 8. Recalling that = Ra,a = number of nodes, we define
65. MORE GENERAL CIRCUIT EQUATIONS 231

a linear map P: 9) + R by
c
a
P(Z1, . . . , 2.) = ZiV(0Ii).
i-1

Thus P E 9)*.
To see that d * P = v, consider first the current state i8 E 9 defined above just
before Theorem 2. Then
(d*P)ip = V(di8)
= V(B+) - V(lr)
= v(B).

Since the states i p , p E B form a basis for 4,this shows that v = d*P. Hence v is in
the image of g*.
Conversely, assume that v = d*W, W E a)*. For the kth node 01 define V ( a ) =
W ( ja),where fa E a, has kth coordinate 1 and all other coordinates 0. Then 1' is
a voltage potential for v since the voltage which v assigns to the branch B is
V ( i @ ) = d*W(ip)
= W(f8') - w(.f8->

= V(B+> - V(D-1.
This completes the proof of Theorem 2.

The space of urirestricled states of the circuit is the Cartesian space 9 x g*. Those
states which satisfy KCL and KVL constitute a linear subspace K C 9 X g*. By
Theorems 1 and 2,
K = Ker d X Im d* C g X g*.
An actual or physical state of the network must lie in K .
The power 4 in a network is a real function defined on the big state space 9 X g*
and in fact is just the natural pairing discussed earlier. Thus if (i, v ) E 9 X g*,
the power + ( z , v ) = v ( i ) or in terms of Cartesian coordinates
'#'(i~ v ) = c iBV8,
8

i = (il, . . . , it,), v = (u,, . . . , ub).


The previous proposition gives us

Theorem 3 (Tellegen's theorem) The power is zero on slates satisfying Kirchhoffs


laws.

Jlathematically this is the same thing as saying that 4: 9 X '


4 +R restricted
to K is zero.
232 10. DIFFERENTIAL EQUATIONS FOR ELECTRICAL CIRCUITS

Now we describe in mathematical terms the three different types of devices in


the network: the resistor, inductor, and capacitor. These devices impose conditions
on the state, or on how the state changes in time, in the corresponding branch.
Each resistor p imposes a relation on the current and voltage in its branch. This
relation might be an equation of the form F,(i,,up) = 0; but for simplicity we will
assume that (i,, up) satisfy fp(ip) = u, for some real-valued C1function f,of a real
variable. Thus f is a “generalized Ohm’s law.” The graph of f, in the (i,, up) plane
is called the characteristic of the pth resistor and is determined by the physical
properties of the resistor. (Compare Section 1.) For example, a battery is a re-
sistor in this context, and its characteristic is of the form { (i,, up) € R2 I v, =
constant 1.
An inductor or capacitor does not impose conditions directly on the state, but
only on how the state in that branch changes in time. In particular let X be an
inductor branch with current, voltage in that branch denoted by ix, UA. Then the
Xth inductor imposes the condition :

Here LAis determined by the inductor and is called the inductance. It is assumed
to be a C1positive function of i x .
Similarly a capacitor in the r t h branch defines a C’positive function u, + C,(u,)
called the capacitance; and the current, voltage in the 7th branch satisfy.

We now examine the resistor conditions more carefully. These are conditions on
the states themselves and have an effect similar to Kirchhoff’s laws in that they
place physical restrictions on the space of all states, d X d*. We define Z to be the
subset of d X d* consisting of states that satisfy the two Kirchhoff laws and the
resistor conditions. This space Z is called the space of physical states and is de-
scribed by
Z = ( ( i ,v ) E 0 X d* I (i, u ) E K,f,(i,) = u,, p = 1, . . . , r ) .
Here (iplv,) denotes the components of i, v in the pth branch and p varies over
the resistor branches, r in number.
Under rather generic conditions, Z will be a manifold, that is, the higher dimen-
sional analog of a surface. Differential equations can be defined on manifolds; the
capacitors and inductors in our circuit will determine differential equations on Z
whose corresponding flow at:Z Z describes how a state changes with time.
Because we do not have at our disposal the notions of differentiable manifolds,
we will make a simplifying assumption before proceeding to the differential equa-
tions of the circuit. This is the assumption that the space of currents in the in-
ductors and voltages in the capacitors may be used to give coordinates to Z. We
make this more precise.
$5. MORE GENERAL CIRCUIT EQUATIONS 233

Let d: be the space of all currents in the inductor branches, so that 6: is naturally
isomorphic to R’,where 1 is the number of inductors. A point i of 6: will be denoted
by i = (ill. . . , ir) where ix is the current in the Xth branch. There is a natural
map (a projection) i ~ g: + d: which just sends a current state into its components
in the inductors.
Similarly we let e* be the space of all voltages in the capacitor branches so
that e* is isomorphic to Re, where c is the number of capacitors. Also vc: 8* + e*
will denote the corresponding projection.
Consider the map i~ X V C : 9 X 8* + 6: X e* restricted to Z C 9 X g*. Call
this map 7r: Z + d: X e*. (It will help in following this rather abstract presentation
to follow it along with the example in Section 1.)

Hypothesis The map 7:I; + d: X e* has an inverse which is a C1 map


d: x e* + Z c g x 8*.

Under this hypothesis, we may identify the space of physical states of the net-
work with the space d: X e*. This is convenient because, as we shall see, the dif-
ferential equations of the circuit have a simple formulation on d: X e*. In words
the hypothesis may be stated: the current in the inductors and the voltages in
the capacitors, via Kirchhoff’s laws and the laws of the resistor characteristics,
determine the currents and voltages in all the branches.
Although this hypothesis is strong, it makes some sense when one realizes that
the “dimension” of Z should be expected to be the same as the dimension of
d: X e*.This follows from the remark after the proposition on dim K,and the fact
that Z is defined by r additional equations.
To state the equations in this case we define a function P: 9 X g* + R called
the mixed poienlial. We will follow the convention that indices p refer to resistor
branches and sums over such p means summation over the resistor branches.
Similarly X is used for inductor branches and y for capacitor branches. Then
P : 9 X g* + R is defined by

P(i, v ) = C i7v7 + C / f p ( i p 4.
)
7 P

Here the integral refers to the indefinite integral so that P is defined only up to an
arbitrary constant. Now P by restriction may be considered as a map P: Z + R
and finally by our hypothesis may even be considered as a map
P:d: X e * + R .
(By an “abuse of language” we use the same letter P for all three maps.)
Now assume we have a particular circuit of the type we have been considering.
At a given instant to the circuit is in a particular current-voltage state. The states
will change as time goes on. In this way a curve in 9 X g* is obtained, depending
on the initial state of the circuit.
234 10. DIFFERENTIAL EQUATIONS FOR ELECTRICAL CIRCUITS

The components i e ( t ) , v e ( t ) , p E B of this curve must satisfy the conditions


imposed by Kirchhoff’s laws and the resistor characteristics; that is, they must
be in 2. In addition at each instant of time the components diA/dt and dv,/dt of
the tangent vectors of the curve must satisfy the relations imposed by (la) and
(lb) . A curve satisfying these conditions we call a physical trajectory.
If the circuit satisfies our special hypothesis, each physical trajectory is identi-
fied with a curve in d: x C*.The following theorem says that the curves so obtained
are exactly the solution curves of a certain system of differential equations in
6: x e*:

Theorem 4 (Brayton-Moser) Each physical trajectory of a n electrical circuit


satisfying the special hypothesis i s a solution curve of the system
dix aP
L A ( & )-
dt
= - -
aiA
,

dv ap
C,(V,) -,
dt
= -,
av,
where X and y run through all iiuluctors and capacitors of the circuit respectively.
Conversely, every solution curve to these equations i s a physical trajectory.

Here P is the map 6: X C* 4 R defined above. The right-hand sides of the


differential equations are thus functions of all the i x , 21,.

Proof. Consider an arbitrary C’ curve in d: X C*. Because of our hypothesis


we identify d: X C* with Z C 9 X g*; hence we write the curve
t + ( i ( t ) ,v ( t ) ) E 9 X 9*.

By Kirchhoff’slaw (Theorem 1) i ( t ) E Ker d. Hence i’(t) E Ker d. By Theorem 2


v ( t ) E Im d*. By Tellegen’s theorem, for all t

C v p ( t ) i @ ( t ) = Os
8EB
We rewrite this as
C vpip + C V A i A + C v,i, = 0.
From Leibniz’ rule we get
C v,iT) = (C v,i,)’ - C i,vY’.
Substituting this into the preceding equation gives
$5. MORE GENERAL CIRCUIT EQUATIONS 235

from the definition of P and the generalized Ohm’s laws. By the chain rule
ap
ix’ + C - v,’.
dP ap
- = C-
dt aiA a%
From the last two equations we find

Since ix’ and VA‘ can take any values,

The theorem now follows from ( l a ) and (lb) ,

Some remarks on this theorem are in order. First, one can follow this develop-
ment for the example of Section 1 to bring the generality of the above down to
earth. Secondly, note that if there are either no inductors or no capacitors, the
Brayton-Moser equations have many features of gradient equations and much of
the material of Chapter 9 can be applied; see Problem 9. In the more general case
the equations have the character of a gradient with respect to an indefinite metric.
We add some final remarks on an energy theorem. Suppose for simplicity that
all the Lx and C, are constant and let
W:d: X e * + R
be the function W ( i , v ) = 4 CALAiA2 +
3 C, C,V,~.Thus W has the form of a
norm square and its level surfaces are generalized ellipsoids; W may be interpreted
aa the energy in the inductor and capacitor branches. Define P,: d: X e* + R
(power in the resistors) to be the composition

d:Xe*-,ZC4X4*4R,
where p,(i, v ) = C ipvp (summed over resistor branches). We state without proof:
Theorem 5 Let 4: Z + d: X e* be any solution of the equations of the previous
theorem. Then

Theorem 5 may be interpreted as asserting that in a circuit the energy in the


inductors and capacitors varies according to power dissipated in the resistors.
See the early sections where W appeared and was used in the analysis of Lienard’s
equation. Theorem 5 provides criteria for asymptotic stability in circuits.
236 10. DIFFERENTIAL EQUATION8 FOR ELECTRICAL CIRCUITS

PROBLEMS

1. Let N be a finite set and P C N X N a symmetric binary relation on N (that is,


(z, y) E P if (y, z) E P ) . Suppose z # y for all (z, y) E P . Show that there
is a linear graph in R*whose nodes &rein one-to-one correspondence with N,
such that the two nodes corresponding to z, y are joined by a branch if and
only if (z, y) E N .
2. Show that Kirchhoff’s voltage law as stated in the text is equivalent to the
following condition (“the voltage drop around a loop is Bero”) : Let ao,al,. . . ,
ak = a0 be nodes such that arm and am-l are end points of a branch Bm, m =
1, . . . , k. Then
1
C hu(Bm>
-1
= 0,

where c,,, = fl according aa (B,,,)+ = ar, or a - 1 .


3. Prove that dim K = dim E (see the proposition in the text and the remark
after it).

4. Prove Theorem 5.

5. Consider resistors whose characteristic is of the form F ( & up) = 0, where F is


a real-valued C1function. Show that an RLC circuit (Fig. A) with this kind of
resistor satisfies the special hypothesis if and only if the resistor is current
controlled, that is, F has the form

Y
FIG. A
65. MORE GENERAL CIRCUIT EQUATIONS 237

C
FIG. B

6. Show that the differential equations for the circuit in Fig. B are given by:
dix
L -dt = -(v,+tJ )
7‘ ?

c-
dvr
dt
= ix,

--
\/

FIG. C
238 10. DIFFERENTIAL EQUATIONS FOR ELECTRICAL CIRCUITS

linear resistor, and the box is a resistor with characteristic given by i = f(u) .
Find the mixed potential and the phase portrait for some choice of f. See
Problem 7.
9. We refer to the Brayton-Moser equations. Suppose there are no capacitors.
(a) Show that the function P: d: --* R decreases along nonequilibrium tra-
jectories of the Brayton-Moser equations.
(b) Let n be the number of inductors. If each function L, is a constant,
find an inner product on Rn = d: which makes the vector

the gradient of P in the sense of Chapter 9, Section 5.

Notes

This chapter follows to a large extent “Mathematical foundations of electrical


circuits” by Smale in the Journal ojDiflwential G e o w e t y 1221, Tliv i i i i ( l ~ ~ ~ ~ ~ i ( l ~ i ~ i t t
text on electrical circuit theory by Desoer and Kuh [ 5 ] is excellent for a treatment
of many related subjects. Hartman’s book [9], mentioned also in Chapter 11,
goes extensively into the material of our Sections 2 and 3 with many historical
references. Lefschetz’s book Digerential Equations, Geometrical Theory [141 also
discusses these nonlinear planar equations. Van cler Pol himself related his equation
to heartbeat and recently E. C. Zeeman has done very interrsting work on this
subject. For some physical background of circuit theory, one can see The Feynman
Lecture8 on Physics [S].
Chapter 11
The Poincar6- Bendixson Theorem

We have already seen how periodic solutions in planar dynamical systems play
an important role in electrical circuit theory. In fact the periodic solution in Van
der Pol’s equation, coming from the simple circuit equation in the previous chapter,
has features that go well beyond circuit theory. This periodic solution is a “limit
cycle,” a concept we make precise in this chapter.
The PoincarbBendixson theorem gives a criterion for the detection of limit
cycles in the plane ; this criterion could have been used to find the Van der Pol oscilla-
tor. On the other hand, this approach would have missed the uniqueness.
PoincartLBendixson is a basic tool for understanding planar dynamical systems
but for differential equations in higher dimensions it has no generalization or
counterpart. Thus after the first two rather basic sections, we restrict ourselves to
planar dynamical systems. The first section gives some properties of the limiting
behavior of orbits on the level of abstract topological dynamics while in the next
section we analyze the flow near nonequilibrium points of a dynamical system.
Throughout this chapter we consider a dynamical system on an open set W in a
vector space E, that is, the flow 4, defined by a C1vector field f: W + E.

61. Limit Sets

We recall from Chapter 9, Section 3 that y E W is an w-limit point of x E W


if there is a sequence t,+ m such that limn+m+te(z) = y. The set of all w-limit
points of y is the w-limit set L, ( y) . We define a-limit points and the a-limit set L. (y)
by replacing t, + 00 with t,, --+ - in the above definition. By a limit set we
Q)

mean a set of the form L,(y) or L.(y).


Here are some examples of limit sets. If z is an asymptotically stable equilib-
rium, it is the w-limit set of every point in its basin (see Chapter 9, Section 2). Any
5
11. THE WINCAR~~-BENDIXSONTHEOREM

FIG. A

FIG. B

equilibrium is its own u-limit set and d i m i t set. A closed orbit is the a-limit and
*limit set of every point on it. In the Van der Pol oscillator there is a unique closed
orbit y ; it is the w-limit of every point except the origin (Fig. A). The origin is the
a-limit set of every point inside y. If y is outside y , then L,(y) is empty.
There are examples of limit sets that are neither closed orbits nor equilibria, for
example the figure 8 in the flow suggested by Fig. B. There Are three equilibria, two
sources, and one saddle. The figure 8 is the w h i t set of all points outeide it. The
right half of the 8 is the wlimit set of all points inside it except the equilibrium, and
similarly for the left half.
In three dimensions there are extremely complicated examples of limit sets,
although they are not easy to describe. In the plane, however, limit sets are fairly
simple. In fact Fig. B is typical, in that one can show that a limit set other than a
closed orbit or equilibrium is made up of equilibria and trajectories joining them.
The PoincanSBendixson theorem says that if a compact limit set in the plane
contains no equilibria it is a closed orbit.
We recall from Chapter 9 that a limit set is closed in W ,and is invariant under
the flow. We shall also need the following result:

Proposition (a) If z and z are 012 the same trajectory, then Lu(s) = L,(z);simi-
hr2y for a-limits.
(b) If D is a closed positively invariant set and z E D, then L,(z) C D ; similarly
for negatively invariant seih and u-limits.
(c) A closed invariant set, in particular a limit set, contains the u-limit and w-limit
sets of every point in it.
01. LIMIT BETB 241

Proof. (a) Suppose II E Lm(s),md +,(z) = z. If +t.(z) + y, then +t.-.(z) + y.


Hence y E Lm(z).
(b) If t, + a0 and +#.(z) y E L,,,(z),then tn 2 0 for sufficiently large n
BO that qh.(z) E D. Hence a, E D = D.
(c) Follows from (b) .

PROBLEMS

1. Show that a compact limit set is connected (that is, not the union of two disjoint
nonempty closed sets.
2. Identify R4 with C* having two complex coordinates (w, z), and consider the
linear system
(*I w' = 2AW,

z' = 2+eiz,
where 0 is an irrdiond real number.
(a) Put a = 8." and show that the set (anI n = 1, 2, . . .) is dense in the
unitcircIeC= ( Z E c l l z l = 1).
(b) Let +# be the flow of (*) . Show that for n an integer,
+n (w, Z) = (w, 0"~).
(c) Let (WD, ZO)belong to the torus C X C C C*. Use (a), (b) to show that
LW(W0, ZO) = Ldwo, ZO) = c x c.
cd) Find L, and La of an arbitrary point of C*.
3. Find a linear system on R" = Ck such that if a belongs to the k-torus
CX ---
X C C Ck,then
L,(a) = L,(a) = C*.
4. In Problem 2, suppose instead that 0 is r a l i o d . Identify L, and La of every
point.
5. Let X be a nonempty compact invariant set for a C' dynamical system. Suppose
that X is minimd, that is, X contains no compact invariant nonempty proper
eubset. Prove the following:
(a) Every trajectory in X is dense in X ;
(b) La@) = L,(z) = X for each z E X ;
(c) For any (relatively) open set U C X,there is a number P > 0 such that
for any z E X,lo E R,there exists t such that +t(z) E U and I 1 - C I C
p;
342 11. THE POINCARI~-BENDIXSON THEOREM

(d) For any z,y in X there are sequences tn -+ , s,, + - 00


CQ such that
I - t n + l I < 2P,
tn I an - Sn+l I < 2P,
and
4f.b) y,
-+ 4l.b) y. +

6. Let X be a closed invariant set for a C’ dynamical system on R”,such that


4 f ( z ) is defined for all t E R, z E X. Suppose that L,(z) = L,(x) = X for
all z C X. Prove that X is compact.

$2. Local Sections and Flow Boxes

We consider again the flow 4, of the C1vector field j : W + E. Suppose the origin
0 E E belongs to W.
A local section at 0 off is an open set S containing 0 in a hyperplane H C E which
is transverse to j. By a hyperplane we mean a linear subspace whose dimension
is one less than dim E. To say that S C H is transverse to f means that f(x) f. H
for all z C S. In particular f(z) # 0 for x E S.
Our first use of a local section at 0 will be to construct a “flow box” in a neighbor-
hood of 0. A flow box gives a complete description of a flow in a neighborhood of
any nonequilibrium point of any flow, by means of special (nonlinear) coordinates.
The description is simple : points move in parallel straight lines at constant speed.
We make this precise as follows. A diflemorphism q : U ---t V is a differentiable
map from one open set of a vector space to another with a differentiable inverse.
A flow box is a diffeomorphism

R X H ~ N ~ W
of a neighborhood N of (0, 0) onto a neighborhood of 0 in W , which transforms
the vector field f : W -+ E into the constant vector field (1,O) on R X H.The flow
of f is thereby converted to a simple flow on R X H:
Mt, Y) = (1 + 8, !I)*
The map 9 is defined by
W t , 9) = 4&),
for (II y ) in a sufficiently small neighborhood of (0,0) in R X H. One appeals to
Chapter 15 to see that \k is a C1map. The derivative of \k at (0,O)is easily computed
to be the linear map which is the identity on 0 X H, and on R = R X 0 it sends
1 to f(0). Sincef(0) is transverse to H, it follows that D q ( 0 , 0) is an isomorphism.
Hence by the inverse function theorem \k maps an open neighborhood N of (0, 0)
diffeomorphically onto a neighborhood V of 0 in h’. Wc take N of the form
$2. LOCAL SECTIONS AND FLOW BOXES 243

R@

FIG. A. The flow box.

S X (-u, u), where S C H is a section at 0 and u > 0. In this case we sometimes


write V , = \ k ( N ) and call V , a flow box a t (or about) 0 in E. See Fig. A. An
important property of a flow box is that if z E V,, then &(z) E S for a unique
t E (-u, a).
From the definition of 9 it follows that if W 1 ( p ) = (s, y ) , then W l ( c # ~ ( p =
))
+
(s t , y) for sufficientlysmall I s I, I t I.
We remark that a flow box can be defined about any nonequilibrium point 50.
The assumption that xo = 0 is no real restriction since if 50 is any point, one can
replace f ( z ) by f ( x - so) to convert the point to 0.
If S is a local section, the trajectory through a point zo (perhaps far from S) may
reach 0 E S in a certain time t o ; see Fig. B. We show that in a certain local sense, t
is a continuous function of ZO. More precisely:

FIG: B

Proposition Let S be a local section at 0 as above, and suppose t$to(zo) = 0. There


is an open set U C W containing zo and a unique C1map r : U + R such tha.t ~ ( 2 ~= )
4, and
h t ) (z)E
for all x E U .
Proof. Let h : E --+ R be a linear map whose kernel H is the hyperplane con-
taining s.Then h ( f ( 0 ) ) # 0. The function
G(x,t ) = h4r(x)
244 11. TEE POINCAR~-BENDIXBON THEOREM

ie 0,and

By the implicit function theorem there is a unique 0 map z --* T ( Z ) E R defined


on a neighborhood U,of z,,in W such that ~ ( 2 0 = ) t, and G(x, T ( x ) ) = 0. Hence
+W ( 5 ) E H ;if U C UI is a sufficientlyamall neighborhood of zo then (2) E 5.
This provea the proposition.

For later reference note that

63. Monotone Sequences in Planar Dynamical Systems

We now restrict our discussion to planar dynamical system.


Let ZO, z1, . . . be a finite or infinite sequence of distinct points on the solution
curve C = (+,(zo)10I t I a). We say the sequence ie monotone dong the t m
j&ty if 4t.(zo) = Z, with 0 5 tl < - 5 a.
Let yo, y ~ ., . . be a h i t e or infinite sequence of points on a line segment I in R*.
We aay the sequence ie monotone &rag Z if the vector y, - yo ie a scalar multiple
- y,,) with 1 c AZ < A:3 < . . . n = 2,3, . . . . Another way to say this is that
y,l is between y l l - ,and u,~+, in the natural order along I , n = 1,2, . . . .

FIG. A
83. MONOTONE SEQUENCES IN PLANAR DYNAMICAL SYSTEM8 245

FIG. R

A sequence of points may be on the intersection of a solution curve and a segment


I; they may be monotone along the solution curve but not along the segment, or
vice versa; see Fig. A. However, this is impossible i f the segment is a local section.
Figure B shows an example; we suggest the reader experiment with paper and
pencil !

Proposition 1 Let S be a local section of a C1planar dynamical system and yo, y1,
y2, . . . a sequence of distinct points of S that are on the same solution curve C. If the
sequence i s monotone along C , it is also monotone dong S.
Proof. It suffices to consider three points yo, yl, y2. Let Z be the simple closed
curve made up of the part B of C between yo and y1 and the segment T C S between
yo and y l . Let D be the closed bounded region bounded by 2. We suppose that the
trajectory of yl leaves D at y1 (Fig. C) ; if it enters, the argument is similar.
We assert that at any point of T the trajectory leaves D. For it either leaves or
enters because, T being transverse to the flow, it crosses the boundary of D . The
set of points in T whose trajectory leaves D is a nonempty open subset T- C T, by

z, yI
4 ..

FIG. C
246 11. THE POINCAH&BENDIXSON THEOREM

continuity of the flow; the set T , C T where trajectories enter D is also open in
T . Since T- and T+ are disjoint and T = T- U T+, it.follows from connectedness
of the interval that T+ must be empty.
It follows that the complement of D is positively invariant. For no trajectory
can enter D at a point of T ; nor can it cross B, by uniqueness of solutions.
Therefore t#q(yl) E R2- D for all t > 0. In particular, yz E S - T.
The set S - T is the union of two half open intervals 1 0 and Il with yo an end-
point of loand y1 an endpoint of Z1. One can draw an arc from a point +@(yl) (with
t > 0 very small) to a point of 11, without crossing 2 . Therefore Zl is outside D.
Similarly lois inside D. It follows that y2 E Zlsince it must be outside D. This
shows that y1 is between yo and y2 in I, proving Proposition 1.

We come to an important property of limit points.

Proposition 2 Let y E L,(x) u L , ( x ) . Then the trajectory of y crosses any local


section at not inore than one point.
Prcwf. Suppose y1 and y2 are distinct points on the trajectory of y and S is a
local section containing yl and y2. Suppose y F L,(z) (the argument for L,(,z) is
similar). Then yk E L, (2) , k = 1, 2. Let V(k)be flow boxes a t yk defined by some
intervals J r C S; we assume J 1 and J2 disjoint (Fig. D) . The trajectory of z enters
V a , infinitely often; hence it crosses J k infinitely often. Hence there is a sequence
al) bl, @, bz) a31b81 * 9

FIG. I)
$3. MONOTONE SEQUENCES IN PLANAR DYNAMICAL SYSTEMS 247

which is monotone along the trajectory of x, with a, E J1, b, E J2, n = 1, 2, , . . .


But such a sequence cannot be monotone along S since J1 and J 2 are disjoint, con-
tradicting Proposition 1.

PROBLEMS

1. Let A C R2 be the annulus


A = (zER211<1z1_<2).
Let f be a C1 vector field on a neighborhood of A which points inward along
the two boundary circles of A . Suppose also that every radial segment of A
is local section (Fig. E) . Prove there is a periodic trajectory in A .

FIG. E

(Hint: Let S be a radial segment. Show that if z E S then & ( z ) E S for a


smallest t = t ( z ) > 0. Consider the map S --$ S given by z + L ( 2 ) ( z ) .)
2. Show that a closed orbit of a planar C1dynamical system meets a local section
in at most one point.
3. Let W C R2 be open and let f : W + R2 be a C1vector field with no equilibria.
Let J C W be an open line segment whose end points are in the boundary of
W . Suppose J is a global section in the sense that f is transverse to J , and for
any x E W there exists 5 < 0 and t > 0 such that I#J,(x)E J and h ( x ) E J .
Prove the following statements.
(a) For any x E J let ~ ( x E) R be the smallest positive number such that
F(x) = E J ; this map F : J -+ J is C1 and has a C1inverse.
(b) A point x E J lies on a closed orbit if and only if F ( z ) = x.
(c) Every limit set is a closed orbit.
248 11. THE POINCAR$-BENDIXSON THEOREM

4. Let z be a recurrent point of a C' planar dynamical system, that is, there is a
sequence 1, -,f 00 such that
t$k($1 + 2.

(a) Prove that either z is an equilibrium or z lies on a closed orbit.


(b) Show by example that there can be a recurrent point for higher dimen-
sional systems that is not an equilibrium and does not lie on a closed orbit.

64. The Poincad-Bendixson Theorem

By a closed orbit of a dynamical system we mean the image of a nontrivial periodic


solution. Thus a trajectory y is a closed orbit if y is not an equilibrium and
t$p (5) = z for some z C y, p # 0. It follows that t # ~ " ~ ( y=) y for all y C y, n = 0,
f l , f 2 , . .* .
In this section we complete thc proof of a celebrated result:

Theorem (PoincarbBendixson) A nonempty compact limit set of a C1 planar


dynamical system, which contains no equilibrium point, is a closed orbit.

P r w j . Assume L,(z) is compact and y E A,($). (The case of a-limit sets is


similar.) We show first that the trajectory of y is a closed orbit.
Since y belongs to the compact invariant set L,(z) we know that L,(y) is a
nonempty subset of L,(z). Let z E L,(y) ; let S be a local section at z, and N a
flow box neighborhood of z about some open interval J, z E J C 8.By Proposition
2 of the previous section, the trajectory of y meets S at exactly one point. On the
other hand, there is a sequence t, + 00 such that t$fm( y ) -,z ; hence infinitely many
qh.(y) belong to V . Therefore we can find r, s E R such that r > s and

It follows that #,(y) = t$,(y) ; hence t$,-,(y) = y, r - s > 0. Since L,(z) contains
no equilibrium, y belongs to closed orbit.
It remains to prove that if 7 is a closed orbit in L,(z) then 7 = L , ( z ) . It is
enough to show that
l hd( t $r ( z ) , r) = 0,
b- m

where d(t$f(z),7 ) is the distance from x to the compact set y (that is, the distance
from &(z) to the nearest point of y ) .
94. THE POINCAR~~-BENDIXBON THEOREM 249

Let S be a local section a t z E y, 80 small that S n y = z. By looking at a flow


box V , near z we see that there is a sequence to < tl < - such that +

h.(Z> E 8,
dt.(s) + 2,

4t(z) $ S for tn-1 < t < tn, n = 1, 2, . .. .


Put Zn = @t.(z).By Proposition 1, Section 3, Zn + z monotonically in S.
There exists an upper bound for the set of positive numbers tn+l - 1.. For sup-
pose h ( z ) = z, X > 0. Then for x. sufficientlynear z, +A(z.) € V . and hence
h+t(zn) E
for some t E [-E, E]. Thus
- tn I X + C.
tn+l

Let B > 0. From Chapter 8, there exists 6 > 0 such that if I 2. - u I < 6 and
I t 1 < X + e t h e n I h ( z . ) -$t(u)I < 8 .
Let no be 80 large that I zn - z I < 6 for all n 2 n,,.Then
I $t(zn) - +t(z)I <B
if I t I 5 X + t and n 2 720. NOWlet 1 2 t n v Let n 2 no be such that
tn It I &+I.
Then
d(+t(z), 7 ) I I h ( z >- +t-t.(z>I
= 1 $t-t.(zn> - #t-t.(z)I

<B
since I t - t. 1 5 X + t. The proof of the Poincad-Bendixaon theorem is complete.

PROBLEMS

1. Consider a 0 dynamical system in R2having only a finite number of equilibria.


(a) Show that every limit set is either a closed orbit or the union of equilibria
and trajectories $sI ( 2 ) such that liml+mt$i (2) and lirn+- (2) are
equilibria.
(b) Show by example (draw a picture) that the number of distinct trajectories
in L,(z) may be infinite.
2. Let y be a closed orbit of a C*dynamical system on an open set in R2.Let A
be the period of y. Let (m) be a sequence of closed orbits; suppose the period
250 11. THE POINCAROBENDIXSON
THEOREM

of yn is A,. If there are points xn E yn such that z n + z E y, prove that X, + A.


(This result can be false for higher dimensional systems. It is true, however, that
if A, + p , then p is an integer multiple of X.)

85. Applications of the PoincaA-Bendixson Theorem

We continue to suppose given a planar dynamical system.

Definition A limit cycle is a closed orbit y such that y C L,(z) or y C L , ( z )


for some z y. In the first cme y is called an o-limit cycle; in the second case, an
a-limit cycle.

In the proof of the Poincart5-Bendixson theorem it was shown that limit cycles
enjoy a certain property not shared by other closed orbits: if y is an w-limit cycle,
there exists z tf y such that
limd(&(z), y) = 0.
1- m

For an u-limit cycle replace 00 by - 0 0 . Geometrically this means that some tra-
jectory spirals toward y m t + 00 (for w-limit cycles) or as t + - m (for a-limit
cycles). See Fig. A.

FIG. A. y is an w-limit cycle.

Limit cycles possess a kind of one-sided stability. Suppose y is an w-limit cycle


and let &(z) spiral toward y as t + m . Let S be a local section at z E y. Then there
will be an interval I' C S disjoint from y bounded by 4 1 0 ( z )rhl(z),
, with to < t1
and not meeting the trajectory of z for to < t < tl (Fig. B). The region A bounded
by y, T and the curve
(4dz)l to 5 t I tll
$5. APPLICATIONS OF THE WINCAR&BENDIXSON THEOREM 251

IS

FIG. B

is positively invariant, as is the set B = A - y. It is eaay to see that & ( y ) spirals


toward y for all y E B. A useful consequence of this is

Proposition 1 Let y be a n w-limit cycle. If y = L,(x),x 6 7 then x hasaneighbor-


hood V such that y = L Y ( y )for dl Y E V . I n other words, th.e set

A = (Y 7 = L(Y)J - y
i s open.

Proof. l'or sufficiently large t > 0, .I,(x)


. , is in the interior of the set A described
above. Hence 4 r ( y ) E A for y sufficiently close to x. This implies the proposition.

A similar result holds for a-limit cycles.

Theorem 1 A nonempty compact set K that i s positively or negatively invariant


contains either a limit cycle or a n equilibrium.

Proof. Suppose for example that K is positively invariant. If x E K, then L,(x)


is a nonempty subset of K ;apply PoincarbBendixson.

The next result exploits the spiraling property of limit cycles.

Proposition 2 Let y be a closed orbit and suppose that the domain W of the dynamicd
system includes the whole open region U enclosed by 7 . Then U contains either a n
equilibrium or a limit cycle.
252 11. THE POINCAR&BENDIXSON THEOREM

Proof. Let D be the compact set U U y. Then D is invariant since no trajectory


from U can cross y. If U contains no limit cycle and no equilibrium, then, for any
X E u,
L,(z) = L,(x) = y

by PoincarbBendixson. If S is a local section at a point z E y, there are sequences


t n 4 co, s, --t - a0 such that

and

But this leads to a contradiction of the proposition in Section 3 on monotone


sequences.

Actually this last result can be considerably sharpened:

Theorem 2 Let y be a closed orbit enclosing an open set U contained in the domain
W of the dynamical system. Then U contains a n equilibrium.
Proof. Suppose IJ contains no equilibrium. If x , --t 2 in U and each xn lies
on a closed orbit, then x must lie on a closed orbit. For otherwise the trajectory o f
x would spiral toward a limit cycle, and by Proposition 1 so would the trajectory
of some x..
Let A 1 0 be the greatest lower bound of the areas of regions enclosed by closed
orbits in U. Let [y,) be a sequence of closed orbits enclosing regions of areas A ,
such that lim,,+- A , = A. Let x. E yn. Since y U U is compact we may assume
Z,---fx E U . Then if U contains no equilibrium, x lies on a closed orbit 8 of area
A (8). The usual section argument shows that as n --+ 00 , y, gets arbitrarily close
to 8 and hence the area A n - A (8) of the region between y, and 8, goes to 0. Thus
)

A(j3) = A .
We have shown that if U contains no equilibrium, it contains a closed orbit 8
enclosing a region of minimal area. Then the region enclosed by /3 contains neither
an equilibrium nor a closed orbit, contradicting Proposition 2.

The following result uses the spiraling properties of limit cycles in a subtle way.

Theorem 3 Let H be a $rst integral of a planar C' dynamical system (that is, H
is a real-valued junction that is conslant on trajectories). If H is not constant on any
open set, then there are no limit cycles.
Proof. Suppose there is a limit cycle y; let c E R be the constant value of H
on y. If x ( t ) is a trajectory that spirals toward 'y, then H ( x ( t ) ) = c by continuity
of H . In Proposition 1we found an open set whose trajectories spiral toward y; thus
H is constant on an open set.
$5. APPLICATIONS OF THE POINCAR$-BENDIXSON THEOREM 253

PROBLEMS

1. The celebrated Brouwer $xed point theorem states that any continuous map f
of the closed unit ball
D" = {Z E R"Ilzl = 1)

into itself has a fixed point (that is, f(z) = z for some 2).
(a) Prove this for n = 2, assuming that f is C1, by finding an equilibrium for
the vector field g(s) = f(z) - z.
(b) Prove Brouwer's theorem for n = 2 using the fact that any continuous
map is the uniform limit of C1maps.
2. Let f be a C1 vector field on a neighborhood of the annulus
A = (z I
E R2 I 5 I z I 5 2 ) .
Suppose that f has no zeros and that f is transverse to the boundary, pointing
inward.
(a) Prove there is a closed orbit, (Notice that the hypothesis is weaker than
in Problem 1, Section 3.)
(b) If there are exactly seven closed orbits, show that one of them has orbits
spiraling toward it from both sides.
3. Let f : R2+ R2 be a C1vector field With no zeros. Suppose the flow +t generated
by f preserves area (that.is, if S is any open set, the area of +t (8)is independent
of t ) . Show that every trajectory is a closed set.
4. Let f be a C1 vector field on a neighborhood of the annulus A of Problem 2.
Suppose that for every boundary point z, f(z) is a nonzero vector tangent to
the boundary.
(a) Sketch the possible phase portraits in A under the further assumption
that there are no equilibria and no closed orbits besides the boundary
circles. Include the case where the boundary trajectories have opposite
orientations.
(b) Suppose the boundary trajectories are oppositely oriented and that the
flow preserves area. Show that A contains an equilibrium.
5. Let f and g be C1 vector fields on R2such that cf(z), g(z) ) = 0 for all s. If f
has a closed orbit, prove that g has a zero.
6. Let j be a C1 vector field on an open set W C R2and H : W + R a C1function
such that
D H ( z ) f ( z )= 0
for all z. Prove that:
(a) H is constant on solution curves of z' = f(s);
254 11. THE POINCAR~~-BENDIXSON THEOREM

(b) D H ( z ) = 0 if z belongs to a limit cycle;


(c) If z belongs to a compact invariant set on which DH is never 0, then z
lies on a closed orbit.

Notes

P. Hartman’s Ordinary Diferential Equations [9], a good but advanced book,


covers extensively the material in this chapter.
It should be noted that our discussion implicitly used the fact that a closed curve
in R2which does not intersect itself must separate R2into two connected regions, a
bounded one and an unbounded one. This theorem, the Jordan curve theorem, while
naively obvious, needs mathematical proof, One can be found in Newman’s Topology
of Plane Sets [17].
Chapter 12
Ecology

In this chapter we examine some nonlinear two dimensional systems that have
been used as mathematical models of the growth of two species sharing a common
environment. In the first section, which treats only a single species, various mathe-
matical assumptions on the growth rate are discussed. These are intended to capture
mathematically, in the simplest way, the dependence of the growth rate on food
supply and the negative effects of overcrowding.
In Section 2, the simplest types of equations that model a predator-prey ecology
are investigated: the object is to find out the long-run qualitative behavior of tra-
jectories. A more sophisticated approach is used in Section 3 to study two competing
species. Instead of explicit formulas for the equations, certain qualitative assump-
tions are made about the form of the equations. (A similar approach to predator
and prey is outlined in one of the problems.) Such assumptions are more plausible
than any set of particular equations can be; one has correspondingly more confidence
in the conclusions reached.
An interesting phenomenon observed in Section 3 is bifurcation of behavior.
Mathematically this means that a slight quantitative change in initial conditions
leads to a large qualitative difference in long-term behavior (because of a change of
w-limit sets). Such bifurcations, also called “catastrophes,” occur in many applica-
tions of nonlinear systems; several recent theories in mathematical biology have
been based on bifurcation theory.

$1. One Species

The birth rate of a human population is usually given in terms of the number
of births per thousand in one year. The number one thousand is used merely to
avoid decimal places; instead of a birth rate of 17 per thousand one could just as
256 12. ECOLOGY

well speak of 0.017 per individual (although this is harder t o visualize). Similarly,
the period of one year is also only a convention; the birth rate could just as well
be given in terms of a week, a second, or any other unit of time. Similar remarks
apply to the death rate and to the growth Tab, or birth rate minus death rate. The
growth rate is thus the net change in population per unit of time divided by the
total population at the beginning of the time period.
Suppose the population y(1) a t time t changes to y +
Ay in the time interval
[t, t + At]. Then the (average) growth rate is

AY
Y ( 1 ) At *

In practice y ( 1 ) is found only at such times 4 , h , . . . when population is counted;


and its value is a nonnegative integer. We assume that y is extended (by interpola-
tion or some other method) to a nonnegative real-valued function of a real variable.
We assume that y has a continuous derivative.
Giving in to an irresistible mathematical urge, we form the limit

This function of t is the growth rate of the population at time 1.


The simplest assumption is that of a constant growth rate a. This is the case
if the number of births and deaths in a small time period At have a fixed ratio to
the total population. These ratios will be linear functions of At, but independent
of the size of the population. Thus the net change will be a y At where a is a constant;
hence

integrating we obtain the familiar formula for unlimited growth:


y(t) = e"'y(0).
The growth rate can depend on many things. Let us assume for the moment that
it depends only on the per capita food supply u, and that u 2 0 is constant. There
will be a minimum u0 necessary to sustain the population. For u > uo, the growth
rate is positive; for u < uo, it is negative; while for u = uol the growth rate is 0. The
simplest way to ensure this is to make the growth rate a linear function of u - uo:
a = a(u - uo), a > 0.
Then

3
dt
= a(u - uo)y(t).

Here a and uo are constants, dependent only on the species, and u is a parameter,
$1. ONE SPECIES 257

dependent on the particular environment but constant for a given ecology. (In
the next section u will be another species satisfying a second differential equation.)
The preceding equation is readily solved:

Thus the population must increase without limit, remain constant, or approach
0 as a limit, depending on whether u > uo, u A uo, or u < uo. If we recall that actu-
ally fractional values of y ( 1 ) are meaningless, we see that for all practical purposes
“y ( t ) + 0” really means that the population dies out in a finite time.
In reality, a population cannot increase without limit; a t least, this has never
been observed! It is more realistic to assume that when the population level exceeds
a certain value 9, the growth rate is negative. We call this value 9, the limiting
population. Note that 9 is not necessarily an upper bound for the population. Rea-
sons for the negative growth rate might be insanity, decreased food supply, over-
crowding, smog, and so on. We refer to these various unspecified causes as social
phenomena. (There may be positive social phenomena; for example, a medium size
population may be better organized to resist predators and obtain food than a
Rmall one. But we ignore this for the moment.)
Again making the simplest mathematical assumptions, we suppose the growth
rate is proportional to 9 - y :
a = c(9 - y ) , c > Oaconstant.
Thus we obtain the equation of limited growth:
dY
-=c(r]-y)y; c>o, 9>0.
dt
Note that this suggests

A&!
At
= c9y - cy2.

This means that during the period At the population change is cy2 At less than i t
would be without social phenomena. We can interpret cy2 as a number propor-
tional to the average number of encounters between y individuals. Hence cy2 is a
kind of social friction.
The equilibria of (2) occur a t y = 0 and y = 9. The equilibrium at 9 is asymptot-
ically stable (if c > 0) since the derivative of c(9 - y ) y a t 9 is -cq, which is
negative. The basin of 9 is { y 1 y > 0 ) since y (t) will increase to 9 as a limit if 0 <
y (0) < 9, and decrease to 9 as a limit if 9 < y (0). (This can be seen by considering
the sign of dyldt.)
A more realistic model of a single species is
Y’ = M ( Y ) Y *
Here the variable growth rate M is assumed to depend only on the total population
Y-
258 12. ECOLOGY

It is plausible to assume as before that there is a limiting population 7 such that


M ( q ) = 0 and M(y)< 0 for y > v. If very small populations behave like the
unlimited growth model, we assume M (0) > 0.

PROBLEMS

1. -4 population y ( 1 ) is governed by an equation


Y’ = M(Y)Y.
Prove that:
(a) equilibria occur a t y = 0 and whenever M ( y ) = 0;
(b) the equilibrium a t y = 0 is unstable;
(c) an equilibrium f > 0 is asymptotically stable if and only if there exists
e > 0 such that M > 0 on the interval [[ - e, [) and M < 0 on
(f, [ + €1.
2. Suppose the population of the United States obeys limited growth. Compute
the limiting population and the population in the year 2000, using the following
data :
Year Population
1950 150,697,361
1960 179,323,175
1970 203,184,772

$2. Predator and Prey

We consider a predator species y and its prey x. The prey population is the total
food supply for the predators a t any given moment. The total food consumed by
the predators (in a unit of time) is proportional to the number of predator-prey
encounters, which we assume proportional to xy. Hence the per capita food supply
for the predators a t time t is proportional to ~ ( t ) Ignoring
. social phenomena for
the moment, we obtain from equation (1) of the preceding section:
y’ = a(s- UO)Y,

where a > 0 and us > 0 are constants. We rewrite this as


y‘ = (CZ - D ) y ; C > 0, D > 0.
Consider next the growth rate of the prey. In each small time period At, a certain
number of prey are eaten. This number is assumed to depend only on the two popu-
62. PREDATOR AND PREY 259

lations, and is proportional to At; we write it asf(x, y) At. What should we postulate
about f(x, y ) ?
It is reasonable that f(z, y) be proportional to y: twice as many cats will eat
twice as many mice in a small time period. We also assume f(x, y) is proportional
to x: if the mouse population is doubled, a cat will come across a mouse twice as
often. Thus we put f(x, y) = P t y , @ a positive constant. (This assumption is less
plausible if the ratio of prey to predators is very large. If a cat is placed among a
sufficiently large mouse population, after a while it will ignore the mice.)
The prey species is assumed to have a constant per capita food supply available,
sufficient to increase its population in the absence of predators. Therefore the prey
is subject to a differential equation of the form
X' = A X - BXY.
In this way we arrive a t the predator-prey equations of Volterra and Lotka:
2' = (A - BY)x,
A , B, C, D > 0.
y' = (CX - D)y.
This system has equilibria a t (0,0) and z = (D/C, A/B). It is easy to see that
(0,0) is a saddle, hence unstable. The eigenvalues a t (D/C, A/B) are pure imagi-
nary, however, which gives no information about stability.
We investigate the phase portrait of (1) by drawing the two lines

A
x"0: y = j p

These divide the region x > 0, y > 0 into four quadrants (Fig. A). In each quadrant
the signs of x' and y' are constant as indicated.
The positive x-axis and the positive y-axis are each trajectories as indicated in
Fig. A. The reader can make the appropriate conclusion about the behavior of the
population.
Otherwise each solution curve (x(t), y ( t ) ) moves counterclockwise around z
from one quadrant to the next. Consider for example a trajectory ( x ( t ) , y(t))
starting a t a point
D
x(0) = u > - > 0,
C
A
y ( O ) = u > - >BO

in quadrant I. There is a maximal interval [O, 7 ) =J such that ( z ( t ) , ~ ( 1 ) )E


260 12. ECOLOGY

quadrant I for 0 5 1 < T (perhaps T = OD ). Put


A - Bv = -r < 0,
CU - D =8 > 0.
As long as t E J , x ( 1 ) is decreasing and y(1) is increasing. Hence
d XI
-logx(t) = - = A -By 5 - r ,
dt 5

Therefore

A
(3 1 - 2 y ( t ) 2 ve",
B
for 0 I1 < T . From the second inequality of (2) we see that T is finite. From (2)
and (3) we see that for 1 E J , ( ~ ( t ) ~, ( 1 ) is) confined to the compact region
- < x ( t ) 5 u,
D
C-
A
- < y ( t ) 5 ve+.
B -
02. PREDATOR AND PREY 26 1

Therefore (Chaptcr 8 ) ( x ( T ) , ~ ( 7 ) is) defined and in the boundary of that region;


since x ( t ) is decreasing, X ( T ) = D / C . Thus the trajectory enters quadrant 11.
Similarly for other quadrants.
We cannot yet tell whether trajectories spiral in toward z, spiral toward a limit
cycle, or spiral out toward “infinity” and the coordinate axes. Let us try to find a
Liapunov function H .
Borrowing the trick of separation of variables from partial differential equations,
we look for a function of the form
H b ,Y ) = F(x) +G(y).
We want H 5 0, where

= -dF
x ’ + - y dG
. I

dx dY
Hence

H(x, y ) =
dF
zz ( A - B y ) + y dG
- (CX- 0 ) .
dY
We obtaiii H = 0 provided
x dF/dx
-I:- - y dG/dy
CX- D By-A’
Since c and y are independent variables, this is possible if and only if
x dF/dx y dG/dy
-=-- - constant.
CX-D By-A
Putting the constant equal to 1 we get
_
dF - c - -D
dx 5)

integrating we find
F ( x ) = CX - Dlogz,
G(y) = By - A logy.
Thus the function
H(x,y ) = CX - D l o g x + B y - A logy,
defined for z > 0, y > 0, is constant on solution curves of (I).
262 12. ECOLOGY

By considering the signs of d H / d r and aH/dy it is easy to see that the equilibrium
z = ( D / C , A / B ) is an absolute minimum for H . It follows that H (more precisely,
H - H ( 2 ) ) is a Liapunov function (Chapter 9 ) . Therefore z is a stable equilibrium
We note next that there are no limit cycles; this follows from Chapter 11 because
H is not constant on any open set.
We now prove

Theorem 1 Every trajectory of the Volterra-Lotka equations ( 1 ) is a closed orbit


(ercept the equilibrium z and the coordinate axes).
Proof. Consider a point w = (u,v ) , u > 0, v > 0; w # z. Then there is a
doubly infinite sequence . . . < t - 1 < to < t l < . * such that +!,,(w)
is on the line
J = D / C , and
ln-+ rn as n + w ,
tn+-w as n + - w .
If w is not in a closed orbit, the points +fn(w) are monotone along the line x = D/C
(Chapter 1 1 ). Since there are no limit cycles, either
$Jt,,(w)+z as n + w,
or
+t,,(w)+z as n + - w ,
Since H is constant on the trajectory of w,this implies that H (w) = H (2). But this
contradicts minimality of H (2).

(0, O ) ( X

FIG. B. Phase portrait of (1).

We now have the following (schematic) phase portrait (Fig. B ) . Therefore, for
any given initial populations ( x ( O ) , y (0)) with s(0) # 0, and y(0) # 0, other
than i , the populations of predator and prey will oscillate cyclically.
52. PREDATOR AND PREY 263

No matter what the numbers of prey and predator are, neither species will die
out, nor will it grow indefinitely. On the other hand, except for the state z, which
is improbable, the populations will not remain constant.
Let us introduce social phenomena of Section 1 into the equations (1). We obtain
the following predator-rey equations of species with limited growth :
(-51 X' = (A - By - XS)Z,
y' = (CX- D - p y ) ~ .
The constants -4,B , C , D,X, p are all positive.
We divide the upper-right quadrant Q ( 5 > 0, y > 0 ) into sectors by the two
lines
L : A - B y - XX = 0;
M : CX - D - py = 0.
Along these lines x' = 0 and y' = 0, respectively. There are two possibilities, ac-
cording to whether these lines intersect in Q or not. If not (Fig. C), the predators
die out and the prey population approaches its limiting value A/X (where L meets
the x-axis).

r /

- - - - -
A/ x
FIG. C. Predators + 0; prey -B A/A.

This is because it is impossible for b0t.h prey and predators to increase at the
same time. If the prey is above its limiting population it must decrease and after
a while the predator population also starts to decrease (when the trajectory crosses
M).After thab point the prey can never increase past A/X, and so the predators
continue to decrease. If the trajectory crosses L, the prey increases again (but not
past A / X ) , while the predators continue to die off. In the limit the predators dis-
appear and the prey population stabilizes at A/X.
264 12. ECOLOGY

FIG. D

Suppose now that L and M cross at a point z = (2, d ) in the quadrant Q (Fig.
D ) ; of course z is an equilibrium. The linear part of the vector field (3) at z is
--Xi -BZ
[ cg -pg I.
The characteristic polynomial has positive coefficients. Both roots of such a poly-
nomial have negative real parts. Therefore z is asymptotically stable.
Note that in addition tc the equilibria at z and (0,0), there is also an equilibrium,
a saddle, at the intersection of the line L with the x-axis.
I t is not easy to determine the basin of z ; nor do we know whether there are any
limit cycles. Nevertheless we can obtain some information.
Let L meet the x-axis at ( p , 0) and the y-axis at (0, q ) . Let be a rectangle
whose corners are
(0, 01, ( P , O), (0, Q), ( P , Q)
with p > p , Q > q, and ( p , Q) E M (Fig. E). Every trajectory at a boundary point
of I' either enters r or is part of the boundary. Therefore r is positively invariant.
Every point in Q is contained in such a rectangle.
By the Poincard-Bendixson theorem the w-limit set of any point (5, y ) in r, with
x > 0, y > 0 , must be a limit cycle or one of the three equilibria (0,O) , z or ( p , 0 ) .
We rule out (0, 0) and ( p , 0) by noting that 2' is increasing near (0, 0); and y' is
increasing near ( p , 0). Therefore L y ( p ) is either z or a limit cycle in I". By a con-
sequence of the Poincard-Bendixson theorem any limit cycle must surround z .
We observe further that any such rectangle r contains all limit cycles. For a
limit cycle (like any trajectory) must enter r, and is positively invariant.
Fixing ( p , Q) as above, it follows that for any initial values (z(O), y ( O ) ) , there
ezists k > 0 a h that
x(t) < P, y(t) <Q if t 1 lo.
One can also find eventual lower bounds for z ( t ) and ~ ( 1 ) .
03. COMPETING SPECIES 265

We also see that in the long run, a trajectory either approaches z or else spirals
down to a limit cycle.
From a practical standpoint a trajectory that tends toward z is indistinguishable
from z after a certain time. Likewise a trajectory that approaches a limit cycle y
can be identified with y after it is sufficiently close.
The conclusion is that any ecology of predators and prey which obeys equations ( 2 )
eventually eettles down to either a constunt or periodic population. There are absolute
upper bounds that no population can exceed in the long run, m matter what the initial
populations are.

PROBLEM

Show by examples that the equilibrium in Fig. D can be either a spiral sink or a
node. Draw diagrams.

03. Competing Species

We consider now two species x, y which compete for a common food supply.
Instead of analyzing specific equations we follow a different procedure : we consider
a large class of equations about which we assume only a few qualitative features. In
this way considerable generality is gained, and little is lost because specific
equations can be very difficult to analyze.
266 12. ECOMQY

The equations of growth of the two species are written in the form
(1 1 ' =
2 M ( z , y)z,
Y' = N(Z1 Y>Y,
dhere the growth rates M and N are C1functions of nonnegative variables 2, y.
The following assumptions are made:
( a ) If either species increases, the growth rate of the other goes down. Hence
aM aN
-< 0 and - < 0.
aY az

(b) If either population is very large, neither species can multiply. Hence
there exists K > 0 such that
M ( z , y ) 5 0 and N ( z , y ) 50 if z > K or y 2 K .
(c) In the absence of either species, the other has a positive growth rate up to
a certain population and a negative growth rate beyond it. Therefore there are
constants a > 0, b > 0 such that
M ( x , 0) >0 for x <a and M ( z , 0) <0 for z > a,
N(0, y) >0 for y <b and N(0,y) <0 for y > b.
By (a) anti (c) each vertical line r x R meets the set p = M-'(0) exactly once
if 0 5 z 5 a and not a t all if z > a. By (a) and the implicit function theorem p
is the graph of a nonnegative CLmap f : [0, a] + R such that j-I(O) = a. Below
the curve p , M > 0 and above it M < 0 (Fig. A).

FIG. A

In the same way the set Y = iV-' (0) is a smooth curve of the form
( (2, Y)I z = f 7 ( Y ) l ,
83. COMPETING SPECIES 267

where g: LO, b ] + R is a nonnegative C1 map with g-* (0) = b. The function N is


positive to the left of v and negative to the right..
Suppose p and v do not intersect and that p is below v. Then a phase portrait can
be found in a straightforward way following methods of the previous section. The
equilibria are (0, 0 ) , (a, 0) and (0, b ) . All orbits tend to one of the three equilibria
but most to the asymptotically stable equilibrium (0, b ) . See Fig. B.

FIG. B

Suppose now that p and v intersect. We make the assumption that p n v is a


finite set, and a t each intersection point, p and v cross transversely, that is, they have
distinct tangent lines. This assumption could be dispensed with but it simplifies
the topology of the curves. Rloreover M and N can be approximated arbitrarily
closely by functions whose zero sets have this property. I n a sense which can be
made precise, this is a “generic” property.
The curves p and v and the coordinate axes bound a finite number of connected
open sets in the upper right quadrant: these are sets where x’ # 0 and y‘ # 0. We
call these open sets b a s k regions (Fig. C). They are of four types:
I: 2’> 0, y’ > 0;
11: 2’ < 0, y’.> 0;

111: 2’ < 0, y’ < 0;


IV: 2’ > 0, y’ < 0.
268 12. ECOLOGY

J
X ' < 0
y'c 0

The boundary aB of a basic region B is made up of points of the following types:


points of p tl V , called vertices; points on p or v but not on both nor on the coordinate
axes, called ordinary boundary points; and points on the axes.
A vertex is an equilibrium; the other equilibria are at (0, 0 ) , (a, 0 ) , and (0, b).
A t an ordinary boundary point w E aB, the vector (x', y') is either vertical (if
w E p ) or horizontal (if w E v ) . It points either into or out of B since p has no
vertical tangents and Y has no horizontal tangents. We call w an inward or outward
point of aB, accordingly.
The following technical result is the key to analyzing equation (1):
Lemma Let B be a basic region. Then the ordinary boundary points of B are either
a8 inward or all outward.
Proof. If the lemma holds for B, we call B good.
Let p be a vertex of B where p and Y cross. Then p is on the boundary of four
basic regions, one of each type. Types I1 and IV, and types I and 111,are diagonally
opposite pairs.
63. COMPETING SPECIES 269

Let w C p and vo C Y be the open arcs of ordinary boundary points having p


as a common end point. If PO U vo consists entirely of inward or entirely of outward
points of dB, we call p good for B ; otherwise p is bad for B. It is easy to see that if
p is good for B , it is good for the other three basic regions adjacent to p , and similarly
for bad (Fig. D). This is because (x’,y ’ ) reverses direction as one proceeds along
p or Y past a crossing point. Hence it makes sense to call a vertex simply good or bad.

Bad Good
FIG. D

Consider first of all the region Bo whose boundary contains ( 0 , O ) .This is of type
I (x’ > 0, y’ > 0). If q is an ordinary point of p n dBo, we can connect q to a point
inside Bo by a path which avoids Y. Along such a path y‘ > 0 . Hence (x’,y’) points
upward out of Bo a t q since p is the graph of a function. Similarly a t an ordinary
point r of Y n dBo, (x’,y ’ ) points to the right, out of Bo at r . Hence B, is good, and
so every vertex of Bo is good.
Next we show that if B is a basic region and dB contains one good vertex p of
p n Y, then B is good. We assume that near p , the vector field along aB points into
B ; we also assume that in B , z’< 0 and y’ > 0. (The other cases are similar.) Let
p, C p, YO C Y be arcs of ordinary boundary points of B adjacent to p (Fig. E). For
example let r be any ordinary point of dB n p and q any ordinary point of PO. Then
y‘ > 0 at q. As we move along p from q to r the sign of y‘ changes each time we cross
Y. The number of such crossings is even because r and q are on the same side of Y .
Hence y’ > 0 a t r . This means that (z’,y ’ ) points up a t r . Similarly, x’ < 0 at
every ordinary point of Y n dB. Therefore along p the vector (x’,y ’ ) points up;
along Y is points left. Then B lies above p and left of Y. Thus B is good.
This proves the lemma, for we can pass from any vertex to any other along p,
starting from a good vertex. Since successive vertices belong to the boundary of a
common basic region, each vertex in turn is proved good. Hence all are good.
As a consequence of the lemma, each basic region, and its closure, is either posi-
tively or negatively invariant.
270 12. ECOLOGY

FIG. E

What are the possible w-limit points of the flow (1 ) ? There are no closed orbits.
For a closed orbit must be contained in a basic region, but this is impossible since
z ( t ) and y ( t ) are monotone along any solution curve in a basic region. Therefore
all w-limit points are equilibria.
We note also that each trajectory is defined for all t 1 0 ,because any point
lies in a large rectangle I' spanned by (0, 0 ) , (a, 0 ) , (0, yo), (Q, y o ) with zo > a,
yo > b ; such a rectangle is compact and positively invariant (Fig. F). Thus we
have shown:

Theorem The jeow qh of (1) has the following property: for all p = (z,y ) , z 2 0 ,
y 2 0, the limit
lim 4c(P)
t- 00

exists and is o w of a finite number of equilibria.

We conclude that the populations of two competing species always tend to one of a
finite number of limiting populations.
83. COMPETING SPECIES 27 1

FIG. F

Examining the equilibria for stability, one finds the following results. A vertex
where p and Y each have negative slope, but p is steeper, is asymptotically stable
(Fig. G). One sees this by drawing a small rectangle with sides parallel to the axes
around the equilibrium, putting one corner in each of the four adjacent regions.
Such a rectangle is positively invariant; since it can be arbitrarily small, the equilib-
rium is asymptotically stable. Analytically this is expressed by

slope of p = -M
-, < slope of Y =
Ns
--
Mv Nu < OJ
where M . = a M / d z , M u = a M / d y , and so on, at the equilibrium, from which a
computation yields eigenvalues with negative real parts. Hence we have a sink.

FIG. G

A case by case study of the different ways I( and Y can cross shows that the only
other asymptotically stable equilibrium is (a, 0) when (b, 0) is above p, or (a,0)
when (a,0 ) is to the right of Y. All other equilibria are unstable. For example, q
in Fig. H is unstable because arbitrarily near it, to the left, is a trajectory with z
272 12. ECOLOGY

decreasing; such a trajectory tends toward (0,b ) . Thus in Fig. H, (0, b ) and p
are asymptotically stable, while q, r , s and (a, 0) are unstable. Note that r is a
Eource.
There must be at least one asymptotically stable equilibrium. If (0, b ) is not one,
then it lies under p ; and if (a, 0) is not one, i t lies over p. In that case p and Y cross,
and the first crossing to the left of (a,0 ) is asymptotically stable.
Every trajectory tends to an equilibrium; it is instructive to see how these
w-limits change as the initial state changes. Let us suppose that q is a saddle. Then
it can be shown that exactly two trajectories a, a' approach q, the so-called stable
manifolds of q, or sometimes separatrices of q. We concentrate on the one in the
unbounded basic region B,, labeled a in Fig. H.

FIG. H. Bifurcation of behavior.

All points of B , to the left of a end up at (0, b ) , while points to the right go to
p . As we move across a this limiting behavior changes radically. Let us consider
this bifurcation of behavior in biological terms.
Let vo, v1 be states in Bo, very near each other but separated by a ;suppose the
trajectory of vo goes to p while that of v1 goes to (0, b ) . The point vo = (a,yo)
represents an ecology of competing species which will eventually stabilize a t p .
$3. COMPETING SPECIES 273

Note that both populations are positive a t p . Suppose that some unusual event
occurs, not accounted for by our model, and the state of the ecology changes sud-
denly from vo to v1. Such an event might be introduction of a new pest>icide,importa-
tion of additional members of one of the species, a forest fire, or the like. Mathe-
matically the event is a jump from the basin of p to that of (0, b).
Such a change, even though quite small, is an ecological catastrophe. For the
trajectory of v1 has quite a different fate: it goes to (0,b) and the x species is wiped
out!
Of course in practical ecology one rarely has Fig. H to work with. Without it, the
change from vo t o v1 does not seem very different from the insignificant change from
vo to a near state u2, which also goes to p . The moral is clear: in the absence of com-
prehensive knowledge, a deliberate change in the ecology, even an apparently minor
one, is a very risky proposition.

PROBLEMS

1. The equations
2’ = x ( 2 - x - y),
y’ = y ( 3 - 22 - y)
satisfy conditions (a) through (d) for competing species. Explain why these
equations make it mathematically possible, but extremely unlikely, for both
species to survive.
2. Two species x, y are in symbiosis if an increase of either population leads to an
increase in the growth rate of the other. Thus we assume

with
aM aN
-
aY > 0 and - > 0.
ax
We also suppose that the total food supply is limited; hence for some A > 0,
R > 0 we have
M(x,y) < O if s> A,
W(x, y ) <0 if y > B.
If both populations are very small, they both increase; hence
M(0, 0) >0 and -V(O, 0) > 0.
274 12. ECOLOGY

Assuming that the intersections of the curves M-'(O), N-'(O) are finite, and
all are transverse, show that:
(a) every trajectory tends to an equilibrium in the region 0 < 5 < A , 0 <
Y < B;
(b) there are no sources;
(c) there is at least one sink;
(d) if aM/ax < 0 and aN/ay < 0, there is a unique sink z, and z = L,(x,y )
for all 5 > 0, y > 0.
3. Prove that under plausible hypotheses, two mutually destructive species can-
not coexist in the long run.
4. Let y and x denote predator and prey populations. Let
2' = M ( 2 , y)z,

Y' = N(2, YIY,


where M and N satisfy the following conditions.
( i ) If there are not enough prey, the predators decrease. Hence for some
b>O
N ( s , y) < 0 if 2 < b.
(ii) An increase in the prey improves the predator growth rate; hence
aN/ax > 0.
(iii) In the absence of predators a small prey population will increase; hence
M ( 0 , 0) > 0.
(iv) Beyond a certain size, the prey population must decrease; hence there
e xi st sA> O w i t hM( z , y) < O i f s > A.
(v) Any increase in predators decreases the rate of growth of prey; hence
anf/ay < 0.
(vi) The two curves M-'(O), N-'(O) intersect transversely, and at only a
finite number of points.
Show that if there is some (u, u ) with M ( u , u ) > 0 and N ( u , u ) > 0
then there is either an asymptotically stable equilibrium or an w-limit cycle.
Moreover, if the number of limit cycles is finite and positive, one of them must
have orbits spiraling toward it from both sides.
A. Show that the analysis of equation ( 1 ) is essentially the same if (c) is replaced
by the more natural assumptions: M (0,O) > 0, N (0,O) > 0, and M (2,0) < 0
forz>A,N(O,y) <Ofory>B.

Notes

There is a good deal of experimental and observational evidence in support of


the general conclusions of this chapter-that predator-prey ecologies oscillate
while competitor ecologies reach an equilibrium. In fact Volterra's original study
NOTES 275

was inspired by observation of fish populations in the Upper Adriatic. A discussion


of some of this material is found in a paper by E. W. Montroll et al., “On the Volterra
and other nonlinear models” [lS]. See also the book The Struggle for Existence by
U. D’Ancona [4].
A very readable summary of some recent work is in “The struggle for life, I”
by A. Rescigno and I. Richardson [21]. Much of the material of this chapter waa
adapted from their paper.
A recent book by Rent5 Thom [24] on morphogenesis uses very advanced theories
of stability and bifurcation in constructing mathematical models of biological
processes.
Chapter 13
Periodic Attractors

Here we define asymptotic stability for closed orbits of a dynamical system, and
an especially important kind called a periodic attractor. Just m sinks are of major
importance among equilibria in models of “physical” systems, so periodic attractors
are the most important kind of oscillations in such models. As we shall show in
Chapter 16, such oscillations persist even if the vector field is perturbed.
The main result is that a certain eigenvalue condition on the derivative of the
flow implies asymptotic stability. This is proved by the same method of local sec-
tions used earlier in the PoincarbBendixson theorem. This leads to the study of
“discrete dynamical systems’’ in Section 2, a topic which is interesting by itself.

81. Aeymptotic Stability of Closed Orbits

Let f: W + Rnbe a C1 vector field on an open set W C Rn;the flow of the dif-
ferential equation
(1) 2‘ = f(z)
is denoted by &.
Let y C W be a closed orbit of the flow, that is, a nontrivial periodic solution
curve. We call y asymptotically stable if for every open set U1 C W, with y C U1
there is an open set lJz, y C U ZC U1 such that $ I ( U , ) C U1 for all t > 0 and
limd(g,(z), y) = 0.
I- x

Here d ( z , y) means the minimum distance from z to a point of y.


The closed orbit in the Van der Pol oscillator was shown to be asymptotically
stable. On the other hand, the closed orbits of the harmonic oscillator are not since
an asymptotically stable closed orbit is evidently isolated from other closed orbits.
$1. ASYMPTOTIC BTABILIm OF CLOSED ORBITS 277

We say a point x E W has asynptotic period X E R if

Theorem 1 Let y be an asyn.ptotically stable closed orbit of period A. Then y has a


neighborhood U C W such that every point of U has asymptotic period A.
Proof. Let U be the open set UZin the definition of asymptotically stable with
Wo = UI. Let x E U and fix c > 0. There exists 6, 0 < 6 < e, such that if z E y
and I y - z I < 6, then [ h ( y ) - + A ( z ) ~< c (by continuity of the flow). Of course
+x(z) = z. Since d ( & ( z ) , y) 4 0 88 t -b a,there exists to 2 0 such that if t 2 to,
there is a point z1 E y such that I + l ( z )- zlI < 6. Keeping in mind + x ( z 1 ) = z l
we have for t 2 b:
I +A+i(z) - +t(z)l < I +A+t(z) - +A(z1)1 +I +X(zi> - +1(2)1
I c -k 6 I2c.
This proves the theorem.

The significance of Theorem 1 is that after a certain time, trajectories near an


asymptotically stable closed orbit behave as if they themselves had the same period
as the closed orbit.
The only example we have seen of an asymptotic closed orbit occurs in a two
dimensional system. This is no accident; planar systems are comparatively easy to
analyze, essentially because solution curves locally separate the plane.
The theorem below is analogous to the fact that an equilibrium 3 is asymptotically
stable if the eigenvalues of Df(5)have negative real part. It is not as convenient
to use since it requires information about the solutions of the equation, not merely
about the vector field. Nevertheless it is of great importance.

Theorem 2 Let y be a closed orbit of period X of the dynamical system ( 1). Let p E y.
Suppose that n - 1 of the eigenvalues of the linear m a p D+A( p ) : E E are less than
1 in absolute value. Then T is asymptotically stable.

Some remarks on this theorem are in order. First, it assumes that +A is differenti-
able. In fact, &(z) is a C1function of ( t , z); this is proved in Chapter 16. Second,
the condition on D ~ A ( Pis )independent of p E y. For if q E y is a different point,
let r E R be such that + , ( p ) = q. Then
D$A(p) = o($-h&)
(PI
= D&(p) -'&A (d&+(PI 1

which shows that D+r(p) ia similar t o D&,(q). Third, note that 1 is always an eigen-
value of D + A ( ~since
)
D+A(p)f(p) = f (PI -
278 13. PERIODIC ATTRACTORS

The eigenvalue condition in Theorem 2 is stronger than asymptotic stability.


If it holds, we call y a periodic attracior. Not only do trajectories near a periodic
attractor y have the same asymptotic period as y, but they are asymptotically
“in phase” with y. This is stated precisely in the following theorem.

Theorem 3 Let y be a periodic atlraclor. If 1imf-* d(&(z), y) = 0, then there


i s a unique point z E y m h that 1imt+- I dt(z) - + f ( z ) l = 0.

This means that any point sufficiently near to y has the same fate as a definite
point of y.
It can be proved (not easily) that the closed orbit in the Van der Pol oscillator
is a periodic attractor (see the Problems).
The proofs of Theorems 2 and 3 occupy the rest of this chapter. The proof of
Theorem 2 depends on a local section S to the flow at p, analogous to those in Chap-
ter 10 for planar flows: S is an open subset of an ( n - 1)-dimensional subspace
transverse to the vector field at p. Following trajectories from one point of S to
another, defines a C’map h: So+ S , where Sois open in S and contains p. We call
h the Poincard map. The following section studies the “discrete dynamical system”
h: So S. In particular p E Sois shown to be an asymptotically stable fixed point
of h, and this easily implies Theorem 2.

PROBLEM

Let y be a closed orbit of period X > 0 in a planar dynamical system z’ = f(z).


Letp E y.
If
I DetD+A(P)I < 1,
then y is a periodic attractor, and conversely.
Using the methods of Chapter 10, Section 3, and Liouville’s formula (a proof
of Liouville’s formula may be found in Hartman’s book [9])

DetD+x(p) = exp [[ ‘x
Tr Df(9fP)d l ) ,

show that the closed orbit in the Van der Pol oscillator is a periodic attractor.

$2. Discrete Dynamical Systems

An important example of a discrete dynamical system (precise definition later)


is a C1 map g : W + W on an open set W of vector space which has a 0 inverse
g-’: W + W.Such a map is c d e d a difleomorphism of W.If W represents a “state
82. DISCRETE DYNAMICAL SYSTEMS 279

space” of some sort, then g (x ) is the state of the system 1 unit of time after it is in
state x. After 2 units of time it will be in state g*(x) = g ( g ( x ) ); after n units, in
state q”(x). Thus instead of a continuous family of states (C$ ~(X I t )E R ) we have
the discrete family (gn(x) 1 n E Z ) , where 2 is the set of integers.
The diffeomorphism might be a linear operator T : E E. Such systems are
--f

studied in linear algebra. We get rather complete information about their structure
from the canonical form theorems of Chapter 6.
Suppose T = eA, A E L ( E ) . Then T is the “time one map” of the linear flow
e*A. If this continuous flow etA represents some natural dynamical process, the
discrete flow T n = enAis like a series of photographs of the process taken a t regular
time intervals. If these intervals are very small, the discrete flow is a good approxi-
mation to the continuous one. A motion picture, for example, is a discrete flow
that is hard to distinguish from a continuous one.
The analogue of an equilibrium for a discrete system g: E E is a fired point
--f

Z = g ( 2 ) . For a linear operator T , the origin is a fixed point. If there are other
fixed points, they are eigenvectors belonging to the eigenvalue 1.
We shall be interested in stability properties of fixed points. The key example is a
linear contraction: an operator T E L ( E ) such that
lim Tnx = 0
n--m

for all x t E. The time one map of a contracting flow is a linear contraction.

Proposition The jollouiing statements are equivalent:

(a) T is a linear contraction;


(b) the eigenvalues of T have absolute values less than 1;
(c) there is a norm on E , and p < 1, such that
I Txl I PIXI
for all x E E.
I
Proof. If some real eigenvalue X has absolute value 1 X 2 1, (1) is not true
if z is an eigenvector for A. If I X 2 1 and X is complex, a similar argument about
I
the complexification of T shows that T is not a contraction. Hence (a) implies
(b). That (c) implies (a) is obvious; it remains to prove (b) implies (c) .
We embed E in its complexification Ec, extending T to a complex linear operator
TCon Ec (Chapter 4 ) . It suffices to find a norm on Ec as in (c) (regarding h’c as a
real vector space), for then (c) follows by restricting this norm to E.
Recall that Ec is the direct sum of the generalized eigenspaces VX of Tc, which
are invariant under Tc. It suffices to norm each of these subspaces; if x = C XA,
zh VX, then we define I x 1 = maxi] X A I ) . Thus we may replace Ec by V A ,or
what is the same thing, assume that T has only one eigenvalue A.
A similar argument reduces us to the case where the Jordan form of TC has only
280 13. PERIODIC ATTRACTORS

one elementary Jordan block

I:or any c > 0 there is another basis { e l , . . . , em1 giving TC the "c-Jordan form"

This was proved in Chapter 7 . Give Ec the max norm for this basis:
I C a,ej 1 = max{I aj I I,
where al, . . . , a, are arbitrary complex numbers. Then if I I < 1 and c is suffi-
ciently small, (c) is satisfied. This completes the proof of Proposition 1.

We now define a discrete dynamical system to be a C" map g : W -, E where W


is an open set in a vector space E . If W # E , it is possible that g2 is not defined a t
all points of W , or even a t any points of W. (This last case is of course uninteresting
as a dynamical system.)
A fixed point i = g(Z) of such a system is asymptotically stable if every neighbor-
hood U C W of 2 contains a neighborhood (Il of f such that g " ( 1 1 d C I J for
n 2 0 and
lim gn ( x ) = z
n-ca

for all x E U,. It follows the Proposition that 0 IS ~asymptoticnllystahle for ;i


linear contraction.
In analogy with continuous flows we define a sink of a discrete dynamical system
g to mean an equilibrium (that is, fixed point) at which the eigenvalues of Dg have
absolute value less than 1.
The main result of this section is:

Theorem Let 2 be a fixed point of a discrete dynamical system g : W -+ E . If the


eigenvalues of D g ( f ) are less than 1 in absolute value, Z is asymptotically stable.
Proof. We may assume f = 0 E E. Give E a norm such that for some p < 1,
$3. STABILITY AND CLOSED ORBITS 28 1

for all x E E. Let 0 < c < 1 - p. By Taylor's theorem there is a neighborhood


V C W of 0 so small that if x E V , then

Hence

<Plzl+tlxl.
-
Putting v = p + I
c < 1 we have I g ( x ) 5 Y I x I for x E V . Given a neighborhood
U of 0, choose r > 0 so small that the ball U , of radius r about 0 lics in U . Then
I gnx I 5 v" J x I for x E U,; hence gnx E c', and g"x -+ 0 as x -+ a. This completes
the proof.

The preceding argument can be slightly modified to show that in the specified
norm,
18(2) -9(Y)I I P I Z - Y I , P<1,

for all x, y in some neighborhood of 0 in W .

$3. Stability and Closed Orbits

We consider again the flow 4, of a C' vector field f : W -+ E. Let y C W be a


closed orbit and suppose 0 E y.
Supposc S is a section at 0. If X > 0 is the period of y, then as t increases past
A, the solution curve C $ ~ ( O crosses
) S at 0. If x is sufficiently near 0, there will be a
time ~ ( 2 near
) X when & ( I ) ( x ) crosses S. In this way a map

9: U-+ S ,
dx) = 41(2)(2)
is obtained, U being a neighborhood of 0. In fact, by Section 2 of Chapter 11, there
is such a U and a unique C' map r : U + R such that &(=)(x) E S for all x in U
and ~ ( 0 =) X.
Now let U , T be as above and put So = S n U . Define a C1map
9: so --+ s,
o(x) = &(I) (2) *

Then g is a discrete dynamical system with a fixed point a t 0. See Fig. A. We call
g a Poincart? map. Note that the Poincar6 map may not be definable at all points
of S (Fig. B) .
There is an intimate connection between the dynamical properties of the flow
near y and those of the Poincar6 map near 0. For example:
282 13. PERIODIC ATTRACTORS

FIG. A. A Poincarb map g: SO + S .

Proposition 1 Let 9: SO+ S be a Poincard map for 7 a8 above. Let x E Sobe such
that g " ( s ) = 0. Then
l i m d ( & ( s ) , y) = 0.
t+m

Proof. Let g"(s) = Z n E S. Since g"+'(z) is defined, sn E So. Put r(sn) = A,.
Since Z n + 0, A, + X (the period of 7 ) . Thus there is an upper bound r for
{I An 1 1 n 2 0).By continuity of the flow, as n --f 00,

I4r(sn) - dJr(0)I -0
unifom2y in s E [O, r ] . For any t > 0, there exist s ( t ) E [0, r ] , and an integer
$3. STABILITY AND CLOSED ORBITS 283

Keeping the same notation, we also have:

Proposition 2 If 0 is a sink for g , then y is asymptotically s t d b .


Proof. Let U be any neighborhood of y in W; we must find U1, a neighborhood
of y in U , such that df( U , ) C U for all t 2 0 and

lim d(4t(z), 7 ) = 0
1-m

for all z E U1.


Let N C U be B neighborhood of y so small that if z E N and I t I < 2X, then
&(z) E U (where X is the period of y).
Let H C E be the hyperplane containing the local section S. Since 0 is a sink,
the main result of Section 2 says that H has a norm such that for some p < 1, and
some neighborhood V of 0 in So, it is true that
I 9 b ) l I: MI z I
for all z E V. Ifit p > 0 be so small that the ball B , in H around 0 of radius p is
contained in V n N; and such that T ( Z ) < 2X if z E B,.
Define
ui = (4i(z) I z E Bp, 12 0).
See Fig. C. Then UI is a neighborhood of y which is positively invariant. Moreover
U1 C U . For let y E U1. Then y = &(z) for some z E B,, 1 2 0. We assert that
(1, z) can be chosen so that 0 < t 5 ~ ( 5 ) For
. put g*(z) = 5.. Then Z. E V for
all n 2 0. There exists n such that y is between z,,and z.+1 on the trajectory of
z;sincez, E V, ~(2.) < 2X; and y = 41(z) = 4,(zn) for0 5 t < 2X. Then y E U
because z. E N.
Finally, d ( t $ t ( y ) y)
, + 0 as t + 00 for all y E U . For we can write, for given y,
y = Gb), zE v.
Since gn(z) + 0, the result follows from Proposition 1.

The following result links the derivative of the Poincarb map to that of the flow.
We keep the same notation.

Proposition 3 Let the hyperplane H C E be invariant under DI#Q(O).Then


&(o) %(o) I
284 13. PERIODIC ATTRACTORB

FIG. C. UI is positively invariant.

Proof. Let r: So + R be the C' map such that r ( 0 ) = k and g(z) = + ( r ( z ), z)


By the remark at the end of Section 2, Chapter 11, we have

Since D+r(O) ( H )= H = Ker h, D r ( 0 ) = 0. Hence by the chain rule

Dg(0) = @A(O)I H + (A, O)Dr(O)

= D h ( 0 )1 H.
It is easy to see that the derivatives of any two Poincard maps, for different
sections at 0, are similar.
We now have all the ingredients for the proof of Theorem 2 of the first section.
Suppose y is a closed orbit of period X as in that theorem. We may assume 0 E 7.
We choose an (n - 1)-dimensional subspace H of E as follows. H is like an
eigenspace corresponding to the eigenvalues of Dth(0) with absolute vaiue less
than 1. Precisely, let B C EC be the direct sum of the generalized eigenspaces
belonging to these eigenvalues for the complexification ( D + r ( O ) ) c :EC + Ec, and
let H = B n E. Then H is an (n - l)-dimensional subspace of E invariant under
D@A(O)and the restriction D+A(O)IH is a linear contraction.
Let S C H be a section at 0 and g : So+ S a Poincard map. The previous proposi-
tion implies that the fixed point 0 E Sois a sink for g. By Proposition 2, y is asymptob
ically stable.
To prove Theorem 3, it suffices to consider a point z E So where g: SO+ S is
the Poincard map of a local section at 0 E y (since every trajectory starting near
y intersects So).
i3. STABILITY AND CLOSED ORBITS 285

If +,h(z) is defined and sufficiently near 0 for n = 1, . . , , k, then

where

For some v < 1 and some norm on E we have


1 gnxI I y I gn--I5 I ;
and using D s ( 0 ) = 0, we know that for any c > 0,
I 1, - t,-1 I 5 e I gn-ls I I evn-1 I z 1
if 1 x I is sufficiently small. Thus
Q
e
ItnI I I ~ o I k++ ~ C 1$-=Y'-
Hence if e is sufficiently small, the sequence & X ( Z ) stays near 0 and can be con-
tinued for all positive integers n, and the above inequalities are valid for all n. It
follows that the sequence it,) is Cauchy and converges to some s E R.Thus +,,x(z)
converges to t$,(O) = z E y. This implies Theorem 3 of Section 1.

PROBLEMS

1. Show that the planar system


5' = (1 - 5 2 - y2)z - y,
y'=5+ (1-52- Y )Y
has a unique closed orbit 7 and compute its Poincar6 map. Show that y is a
periodic attractor. ( H i n t : Use polar coordinates.)
2. Let X denote either a closed orbit or an equilibrium. If X is asymptotically
stable, show that for every X > 0 there is a neighborhood U of X such that if
p E U - X,then & ( p ) # p for all t E [O, A].
3. Show that a linear flow cannot have an asymptotically stable closed orbit.
4. Define the concepts of stable closed orbit of a flow, and stable $xed point of a
discrete dynamical system. Prove the following :
(a) A closed orbit is stable if and only if its Poincar6 map has a stable fixed
point at 0.
(b) If a closed orbit y of period X is stable then no eigenvalue of D+x(p),
p E y , has absolute value more than one, but the converse can be false.
286 13. PERIODIC ATTRACTORB

5. (a)Let p be an asymptotically stable fixed point of a discrete dynamical


system g: W ----t E . Show that p has arbitrarily small compact neighbor-
hoods V C W such that g ( V ) C int V and nnrt,gn(V)= p .
(b) State and prove the analogue of (a) for closed orbits.
6. Let g: R + R be the map
g(z) = az + bz* + ma, a # 0.
Investigate the fixed point 0 for stability and asymptotic stability (see Problem
4). Consider separately the cases I a I < 1, I a [ = 1, I a I > 1.
7. (The Contracting Map Theorem) Let X C Rn be a nonempty closed set and
f : X + X a continuous map. Suppose f has a Lipschitz constant a < 1. Prove
that f has unique fixed point p , and lim,,,f"(z) = p for all z E X. ( H i n t :
Consider the sequence f" (z).)
Chapter 14
Chssical Mechanics

The goal of this very short chapter is to do two things: (1) to give a statement
of the famous n-body problem of celestial mechanics and (2) to give a brief intro-
duction to Hamiltonian mechanics. We give a more abstract treatment of Hamil-
tonian theory than is given in physics texts; but our method exhibits invariant
notions more clearly and has the virtue of passing easily to the case where the
configuration space is a manifold.

$1. The n-Body Problem

We give a description of the n-body “problem” of celestial mechanics; this


extends the Kepler problem of Chapter 2. The basic example of this mechanical
system is the solar system with the sun and planets representing the n bodies.
Another example is the system consisting of the earth, moon, and sun. We are
concerned here with Newtonian gravitational forces; on the other hand, the New-
tonian n-body problem is the prototype of other n-body problems, with forces
other than gravitational.
The data, or parameters of this system, are n positive numbers representing the
masses of the n bodies. We denote these numbers by ml, . . . , m..
The first goal in understanding a mechanical system is to define the configuration
space, or space of generalized positions. In this case a configuration will consist
precisely of the positions of each of the n bodies. We will write xi for the position
of the ith body so that z i is a point in Euclidean three space (the space in which we
live) denoted by E. Now E is isomorphic to Cartesian space R*but by no natural
isomorphism. However E does have a natural notion of inner product and associ-
ated norm; the notions of length and perpendicular make sense in the space in
which we live, while any system of coordinate axe8 is arbitrary.
288 14. CLASSICAL MECHANICS

Thus Euclidean three space, the configuration space of one body, is a three-
dimensional vector space together with an inner product.
The configuration space M for the n-body problem is the Cartesian product of
E with itself n times; thus M = ( E ) ”and x = ( 5 1 , . . . , xn), where xi E E is the
position of the ith body. Note that x, denotes a point in El not a number.
One may deduce the space of states from the configuration space as the space
TM of all tangent vectors to all possible curves in M . One may think of T M as the
product M X M and represent a state as ( x , v ) E M X M , where x is a configura-
.
tion as before and v = (vl, . . , vn), v i E E being the velocity of the ith body. A
state of the system gives complete information about the system at a given moment
and (at least in classical mechanics) determines the complete life history of the
state.
The determination of this life history goes via the ordinary differential equations
of motion, Newton’s equations in this instance. Good insights into these equations
can be obtained by introducing kinetic and potential energy.
The kinetic energy is a function K : M X M + R on the space of states which
is given by

~ ( 2 V ,) = ’c
2 i-1
mi I v i 12.
Here the norm of v ; is the Euclidean norm on E. One may also consider K to be
given directly by an inner product B on M by
1 ”
B (v , w) = - C mi(vii wi),
2 1

K(x,v) = B(v, v ) .
It is clear that B defines an inner product on M where (vi, w;)means the original
inner product on E.
The potential energy V is a function on M defined by

We suppose that the gravitational constant is 1 for simplicity. Note that this
function is not defined at any “collision” (where x; = zj) . Let Aij be the subspace of
collisions of the ith and j t h bodies so that
A;j = { x E M I xi = xj}, i <j.
Thus Aij is a linear subspace of the vector space M . Denote the space of all collisions
by A C M so that A = U A + Then properly speaking, the domain of the potential
energy is M - A:
V : M - A-R.
Q1. THE 11-BODY PROBLEM 289

We deal then with the space of noncollision states which is ( M - A) X M .


Newton’s equations are second order equations on M - A which may be written

mifi = -gradi V(Z) for i = 1,. . . , 72.


Here the partial derivative D;V of V with respect to z; is a map from M - A to
L ( E , R ) ; then the inner product on E convertsD;V(x) to a vector which we call grad;
V ( z ) .T he process is similar to the definition of gradient in Chapter 9. Thus the
equations make sense as written.
One may rewrite Newton’s equations in such a way that they become a first
order system on the space of states (A4 - A ) X M :
x; = vi,

mivj = -grad,V(z), for i = , . . . , n.


The flow obtained from this differential equation then determines how a state
moves in time, or the life history of the n bodies once their positions and velocities
are given. Although there is a vast literature of several centuries on these equa-
tions, no clear picture has emerged. In fact it is still not even clear what the basic
questions are for this “problem.”
Some of the questions that have been studied include: Is it true that almost all
states do not lead to collisions? To what extent are periodic solut,ions stable? How
to show the existence of periodic solutions? How to relate the theory of the n-body
problem to the orbits in the solar system?
Our present goal is simply to put Newton’s equations into the framework of
this book and to see how they fit into the more abstract framework of Hamiltonian
mechanics.
We put the n-body problem into a little more general setting. The key ingredients
are :

(1) Configuration space Q, an open set in a vector space E (in the above case
Q = M - AandE = M ) .
(2) A C2 function K : Q X E + R, kinetic energy, such that K ( s , v ) has the
form K ( z , v ) = K z ( v ,v ) , where K , is an inner product on E (in the above
case K , was independent of z, but in problems with constraints, K , de-
pends on 2).
(3) A C2 function V : Q + R,potential energy.
The triple,(&, K , V ) is called a simple mechanical system, and Q x I3 the slate
apace of the system. Given a simple mechanical system ( Q , K , V ) the energy or
total energy is the function e: Q X E + R defined by e(z, v ) = K ( z , v ) V(z). +
For a simple mechanical system, one can canonically define a vector field on
Q x E which gives the equations of motion for the states (points of Q X E). We
will see how this can be done in the next section.
290 14. CLASSICAL MECHANICS

Examples of simple mechanical systems beside the n-body problem include a


particle moving in a conservative central force field, a harmonic oscillator, and a
frictionless pendulum. If one extends the definition of simple mechanical systems
to permit Q to be a manifold, then a large part of classical mechanics may be
analyzed in this framework.

$2. Hamiltonian Mechanics

We shall introduce Hamiltonian mechanics from a rather abstract point of


view, and then relate it to the Newtonian point of view. This abstract development
proceeds quite analogously to the modern treatment of gradients using inner
products; now however the inner product is replaced by a “symplectic form.”
So we begin our discussion by defining this kind of form.
If F is a vector space, a symplectic form on F is a real-valued bilinear form
that is antisymmetric and nondegenerate. Thus
8:FXF+R
is a bilinear map that is antisymmetric: Q(u, v ) = -fl(u, u), and nondegenerate,
which means that the map
+Pa = CP: F + F*
is an isomorphism. Here is the linear map from F to F* defined by
+ ( u ) ( v )= n(u, o ) , U, u E F.
It turns out that the existence of a symplectic form on F implies that the di-
mension of F is even (see the Problems).
We give an example of such a form noon every even dimensional vector space.
If F is an even dimensional vector space, we may write F in the form F = E X E*,
the Cartesian product of a vector space E and its dual E*. Then an element f of
F is of the form (u, 20) where u, ware vectors of E , E*, respectively. Now iff = (u, w),
f” = (tP, Up) are two vectors of F, we define
flo(f,f”) = w“(v> - w(fl>.
Then it is easy to check that 0, is a symplectic form on F. The nondegeneracy is
obtained by showing that if a # 0, then one may find j3 such that %(a, 8) jc 0.
Note that (Ilo does not depend on a choice of coordinate structure on E , so that it
is natural on E X E l .
If one chooses a basis for El and uses the induced basis on E*, nois expressed
in coordinates by
Q,( (u, w ) , (rp, Up)) = c wioui - c wiuio.
92. HAMILTONIAN MECHANICS 291

It can be shown that every symplectic form is of this type for some representa-
tion of F as E X E*.
Now let U be an open subset of a vector space F provided with a symplectic
form n. There is a prescription for assigning to any c1 function H : U 4 R,a C1
vector field X H on U called the Hamiltonian vector field of H . In this context H ia
called a Humiltonian or a Hamiltonian function. To obtain X H let D H : U 4 F*
be the derivative of H and simply write
(1) X,(S) = W’DH(z), 2 E U,
where 0-’is the inverse of the isomorphism 0: F + F* defined by n above. (1) is
equivalent to saying ~ ( X(2) H , y ) = DH (2)( y ) ,all y E F. Thus XH:U + F is a
C’ vector field on U ; the differential equations defined by this vector field are
called Hamilton’s equations. By using coordinates we can compare these with what
are called Hamilton’s equations in physics books.
Let no be the symplectic form on F = E X E* defined above and let
2 = (xl, . . . , xn) represent points of E and y = ( y l , . . . , y n ) points of E* for the
dual coordinate structures on E and E*. Let 0 0 : F. + F* be the associated iso-
morphism.
For any c1 function H : U + R,

Frorn this, one has that @(;‘DH(x,y) is the vector with components

This is seen as follows. Observe that (suppressing (2,y ) )


@ o ( X H ) = DH
or
no(&, W) = DH(w) for all w E F.
By letting w range over the standard basis elements of R*”,one confirms the ex-
preasion for X H . The differential equation defined by the vector field XH is then:
aH
2: = - i = 1, . . . , 72,
ayi ’

These are the usual expressions for Hamilton’s equations.


Continuing on the abstract level we obtain the “conservation of energy” theorem.
The reason for calling it by this name is that in the mechanical models described
292 14. CLASSICAL MECHANICS

in this setting, H plays the role of energy, and the solution curves represent the
motions of states of the system.

Theorem (Conservation of Energy) Let U be an open set of a vector space F ,


H : U + R any c2 junction and 52 a symplectic form on F . Then H i s constant on the
solution curves defined by the vector $eld X H .
Proof. If + t ( z ) is a solution curve of the vector field X H , then it has to be shown
that
d
- H ( t $ , ( z ) ) = 0, all z, t .
dt
This expression by the chain rule is

DH(t$f(z)) (a +f(z,) = DH(XH).

But DH(XH) is simply, by the definition of XH, 52(XH, XH)which is 0 since 52 is


antisymmetric. This ends the proof.

It is instructive to compare this development with that of a gradient dynamical


system. These are the same except for the character of the basic bilinear form
involved; for one system it is an inner product and for the other it is a symplectic
form. The defining function is constant on solution curves for the Hamiltonian
case, but except a t equilibria, i t is increasing for the gradient case.
From the point of view of mechanics, the Hamiltonian formulation has the
advantage that the equations of motion are expressed simply and without need
of coordinates, starting just from the energy H . Furthermore, conservation laws
follow easily and naturally, the one we proved being the simplest example. Rather
than pursue this direction however, we turn to the question of relating abstract
Hamiltonian mechanics to the more classical approach to mechanics. We shall see
how the energy of a simple mechanical system can be viewed as a Hamiltonian H ;
the differential equations of motion of the system are then given by the vector
field XH.
Thus to a given simple mechanical system ( Q , K , V ), we will associate a Hamil-
tonian system H : U + R, U C F , Q a symplectic form on F in a natural way.
Recall that configuration space Q is an open set in a vector space E and that
the state space of the simple mechanical system is Q X E . The space of generalized
momenta or phase space of the system is Q X E*, where E* is the dual vector
space of Q .
The relation between the state space and the phase space of the system is given
by the Legendre transformution A : Q X E + Q X E*. To define A, first define a
linear isomorphism A,: E + E*, for each q E Q, by
A,(u)w = 2K,(v, w ) ; u E El 20 E E.
82. HAMILTONIAN MECHANICS 293

Then set
v) = (q, MV)).
Consider the example of a simple mechanical system of a particle with mass m
moving in Euclidean three space E under a conservative force field given by poten-
tial energy V . In this case state space is E X E and K : E X E + R is given by
K ( q , u ) = 3m I u 12. Then A : E X E + E X E* is given by h,(v) = p E E*, where
p ( w ) = 2K,(v, t o ) ; or
P ( W > = m(v, 2 0 )
and ( , ) is the inner product on E. I n a Cartesian coordinate system on E , p =
mu, so that the image p of v under h is indeed the classical momentum, “conjugate”
to 2).
Returning to our simple mechanical system in general, note that the Legendre
transformation has an inverse, so that h is a diffeomorphism from the state space
to the phase space. This permits one to transfer the energy function e on state
space to a function H on phase space called the Hamiltonian of a simple mechanical
system. Thus we have

R
H = eoX-1

The final step in converting a simple mechanical system to a Hamiltonian system


is to put a symplectic form on F = E X E* 3 Q X E* = U . But we have already
constructed such a form noin the early part of this section. Using ( q , p ) for variables
on Q X E*, then Hamilton’s equations take the form in coordinates

dH
p! = -- , i=l, . . . , n.
aqc
Since for a given mechanical system N (interpreted as total energy) is a known
function of p , , q,, these are ordinary diflerential equations. The basic assertion of
Hamiltonian mechanics is that they describe the motion of the system.
The justification for this assertion is twofold. On one hand, there are many
cases where Hamilton’s equations are equivalent to Newton’s; we discuss one
below. On the other hand, there are common physical systems to which Newton’s
laws do not directly apply (such as a spinning top), but which fit into the framework
of “simple mechanical systems,” especially if the configuration space is allowed
to be a surface or higher dimensional manifold. For many such systems, Hamilton’s
eauations have been verified experimentally.
294 14. CLABBICAL MECHANIC8

It is meaningless, however, to try to deduce Hamilton’s equations from Newton’s


on the abstract level of simple mechanical system (Q,K, V). For there is no
identification of the elements of the “configuration space” Q with any particular
physical or geometrical parameters.
Consider as an example the special case above where K(q, v) = 1 C m,vt in
Cartesian coordinates. Then wv,= pr and H(jo, q ) = C h 2 / 2 m r )+ V ( q ); Hamil-
ton’s equations become

Differentiating the first and combining these equations yield

These are the familii Newton’s equations, again. Conversely, Newton’s equations
imply Hamilton’s in this case.

PROBLEMS

1. Show that if the vector @ace F has a symplectic form il on it, then F has even
dimension. Hint:Give F an inner product ( , ) and let A : F +F be the operator
defined by ( A z , y ) = Q(z, y). Consider the eigenvectors of A.
2. (Lagrange) Let (Q,K, V) be a simple mechnnical system and XH the associ-
ated Hamiltonian vector field on phase space. Show that (qJ 0) is an equi-
librium for XH if and only if DV(q) = 0 ; and (qJ0 ) is a stable equilibrium if
q is an isolated minimum of V. (Hint:Use conservation of energy.)
3. Consider the second order differential equation in one variable
3 +f(z) = 0’
where f: R --* R is ?t and if f(z) = 0, then f’( z) # 0. Describe the orbit struc-
ture of the associated system in the plane
J=v
8 = -f(z)
when f(z) = z - 9. Discuss this phase-portrait in general. (Hid: Consider

H(zJv, = 3@ +f 0
f ( t ) dt
NOTES 295

and show that H is constant on orbits. The critical points of H are at v = 0,


f(z) = 0; use H,, = f’(z), H , , = 1.)
4. Consider the equation
I + g ( z ) k + f(z) = 0,
where g(z) > 0, and f is as in Problem 3. Describe the phase portrait (the
function y may be interpreted as coming from friction in a mechanical problem).

Notes

One modern approach to mechanics is Abraham’s book, Foundations of Mechanics


[l]. Wintner’s Analytical Foundations of Celestial Mechanics C25) has a very ex-
tensive treatment of the n-body problem.
Nonautonomous Equations
and Differentiability of Flows

This is a short technical chapter which takes care of some unfinished business
left over from Chapter 8 on fundamental theory. We develop existence, uniqueness,
and continuity of solutions of nonautonomous equations x’ = f ( t , 2). Even though
our main emphasis is an autonomous equations, the theory of nonautonomous
linear equations 2’ = A ( t )x is needed as a technical device in establishing differenti-
ability of flows. The variational equation along a solution of an autonomous equation
is an equation of this type.

$1. Existence, Uniqueness, and Continuity for Nonautonomous Differen-


tial Equations

Let E be a normed vector space, W C R X E an open set, and j : W + E a con-


tinuous map. Let (to, $) € W . A solution to the initial value problem

(1) 2’(t) = f( 4 5 ) ,
Z(h) = uo

is a differentiablecurve x ( t ) in E defined for t in some interval J having the following


properties :
to E J and ~ ( b =) uo,
(1, 4 1 ) ) E w, x ’ ( t ) = f(t, x ( t ) >
for all t € J.
$1. EXISTENCE, U N I Q U E N E S S , AND CONTINUITY 297

We call the function j ( t , 2) Lipschitz in x if there is a constant K 2 0 such that


I f ( t , Jl) - f ( t , 41IK I 21 - 2 2 I
for all ( t , sl)and ( t , x 2 ) in W .
The fundamental local theorem for nonautonomous equations is :

Theorem 1 Let W C R X E be open and f : W -+ E a continuous m a p that i s


Lipschitz i n x . If (to, U O ) E W , there i s a n open interval J containing t and a unique
solution to ( 1 ) de$ned o n J .

The proof is the same as that of the fundamental theorem for autonomous equa-
tions (Chapter 8) , the extra variable t being inserted where appropriate.
The theorem applies in particular to functions f ( t , 2) that are C’, or even con-
tinuously differentiable only in s; for such an f is locally Lipschitz in x (in the
obvious sense). In particular we can prove:

Theorem 2 Let A : J + L ( E ) be a continuous m a p from an open interval J to the


space of linear operators on h’.Let (to, uo) E J X E. Then the inital value problem
2’ = A(t)J, z(&) = uo
has a unique solution on all of J .
Proof. It suffices to find a solution on every compact interval; by uniqueness
such solutions can be continued over J. If Jo C J is compact, there is a n upper
bound K to the norms of the operators A ( t ) , t E JO. Such an upper bound is a
Lipschitz constant in 5 for f I Jo X E , and Theorem 1 can be used to prove Theorem
2.
As in the autonomous case, solutions of (1) are continuous with respect t o initial
conditions if f ( t , s) is locally Lipschitz in L . We leave the precise formulation and
proof of this fact to the reader.

A different kind of continuity is continuity of solutions as functions of the data


f ( t , 2). That is, i f f : W -+ E and 8 : W -+ E are bot,h Lipschitz in x , and I f - g I
is uniformly small, we expect solutions to x’ = f ( t , 2) and y’ = g ( t , y ) , having the
same initial values, to be close. This is true; in fact we have the following more
precise result.

Theorem 3 Let W C R X E be open and f, g : W --+ E continuous. Suppose that


for all ( t , 2) E W ,
I f ( t , 2) - B(t, 211 <
Let K be a Lipschitz constant in 5 for f ( t , x ) . If x ( t > , y ( t ) are solutions to
2’ = f(t, 2),
Y’ = S(t9 Y ) ,
298 15. NONAUTONOMOUS EQUATIONS AND DIFFERENTIABILITY OF FLOWS

It follows from Gronwall’s inequality (Chapter 8 ) that

which yields the theorem.

52. Differentiability of the Flow of Autonomous Equations

Consider an autonomous differential equation


(1) z’ = f(z), f: W -+ El W open in El
$2. DIFFERENTIABILITY OF THE FLOW OF AUTONOMOUS EQUATIONS 299

where f is assumed C'. Our aim is to show that the flow


( t , 1' 4('J = +t('>

defined by (1) is a C1function of two variables, and to identify 13+/az.


To this end let y ( t ) be a particular solution of (1) for t in some open interval
J . Fix to E J and put y(t) = yo. For each t E J put
A (0 = ( 01 ;
thus A : J + L ( E ) is continuous. We define a nonautonomous linear equation
(2) u' = A(t)u.
This is the variational equation of (1) along the solution y ( t ) .
From Section 1 we know that (2) has a solution on all of J for every initial condi-
tion u(t) = uo.
The significance of ( 2 ) is that if u, is small, then the m a p
t-+Y(t) +u(t)
is a good approrimtion to the solution s ( t ) of (1) with initial value '(6) = yo w,. +
To make this precise we introduce the following notation. If g E E , let the map
5)
t +d t ,
be the solution to (2) which sends to to 5. If 5: and go + 5: E W , let the map
t + Y ( t , €1
be the solution to (1) which sends to to yo + 5. (Thus y ( t , 5:) = +
t$c-ro(y~ 5:) .)

Proposition Let JO C J be a compact interval containing t.Then

lim 1 ?/(tJ5:) - y ( t ) - u ( t , €)I = o


€4 I51
uniformly in t E Jo.

This means that for every e > 0, there exists 6 > 0 such that if I 5: I 5 6, then

(3) I !At, 5 ) - (y(t> + 4 t , 5))l 5 e I 5: I


for all t E Jo. Thus as 5 + 0, the curve t + y ( t ) +
u ( t , 5 ) is a better and better
+
approximation to y(t, 5:). In many applications y ( t ) u ( t , 5:) is used in place of
y ( t , 5 ) ; this is convenient because u(t, 5:) is linear in 5:.
We will prove the proposition presently. First we use (3) to prove:

Theorem 1 The JEow + ( t , 2) of (1) is C1; that is, a+/at and a4/& erist and are
continuous in ( t , x).
300 1s. NONAUTONOMOUS EQUATIONS AND DIFFERENTIABILITY OF FLOWS

Proof. Of course a+(t, t ) / a t is just j ( t # J t ( t ) )which


, is continuous. To compute
&$/at we have, for small €,
4 ( t , Yo + €) - 4(t, Yo) = ?At, €1 -
The proposition now implies that a+(t, yo)/ a t E L ( E ) is the linear map ( + u ( t , t) .
The continuity of a4/ax is a consequence of the continuity in initial conditions and
data of solutions to the variational equation (2).
Denoting the flow again by & ( t )we, note that for each t the derivative D & ( t )
of the map 4f at x E W is the same as at$(t, t )/ a t . We call this the space derivative
of the flow, as opposed to the time derivative a+(t, x ) / a t .
The proof of the preceding theorem actually computes D & t ( x ) as the solution
to an initial value problem in the vector space L ( E ): for each xo E W the space
derivative of the flow satisjies

Df#Jo(to)= I .
Here we regard toas a parameter. An important special case is that of an equilibrium
2 so that = Z. Putting Df(2)= A E L ( E ) ,we get
d
- ( D 4 f(2)) = AD4,(2),
dt
Df#Jo(3)= I .
The solution to this is
Dt~%(2)= e l A .
This means that in a neighborhood of an equilibrium the Jlow i s approximately linear.
We now prove the proposition. For simplicity we take to = 0. The integral equa-
tions satisfied by y ( t , 0, y ( t ) , and u ( t , t ) are

From these we get


Pt
&?. DIFFERENTIABILITY OF THE FLOW OF AUTONOMOUS EQUATIONS 301

The Taylor estimate for f says

limR(z, y - z ) / l y - z I = 0
2-W

uniformly in y for y in a given compact set. We apply this to y = y(s, 0, z = y(s) ;


from linearity of Df(y(s)) and (4) we get

Denote the left side of (5) by g ( t ) and put

N = max{llDf(y,s>ll I s E JOI.

Then from (5) we get

Fiu t > 0 and pick b0 > 0 so small that

if I y(s, f ) - y(s) I I bo and s E Jo.


From Chapter 8, Section 4 there are constants K 2 0 and 61 > 0 such that
(8) I Y(S, 0 - Y(S)l I I E I eR. I 60
if I f I I 61 and s E JO.
Assume now that I I 5 61. From (6), ( 7 ) , and (8) we find, for t E Jo,

whence

for some constant C depending only on K and the length of Jo.Applying Gronwall’s
302 15. NONAU1Y)NOYOUB EQUATIONS AND DIFTERENTIABILITY OF FLOWS

inequality we obtain
5 CCG" I E l
if t E JOand I I 5 61. (Recall that 61 depends on t.) Since c is any positive number,
this shows that g ( t ) / l [ I +0 uniformly in 1 E Jo,which proves the proposition.
We ahow next that the flow enjoys the same degree of differentiability as does
the data.
A function f : W +E is called 0, 1 5 r < 00 if it hee r continuoua derivatives.
For r 2 2 this is equivalent to:f is (? snd Of: W + L ( E ) is CP1. Iff is C'for all
r 2 1, we say f is C". We let C" mean "continuous."

Theorem2 Let W C E be open und &t f: W + E be 0 , l I r 5 a. Then the


$ow 9:Sl + E of the di&rentW equath
5' = f(5)
is also C
'.
Proof. We induct on r, the case r = 1 having been proved in Theorem 1.
We may suppose r < 00 for the proof.
Suppose, as the inductive hypothesis, that r 2 2 and that the flow of every
differential equation
E' = F(E)l
With ctl data F , is CF1.
Consider the differential equation on E X E defined by the v&r field
F : W X E + E X E, F ( x , u) - (f(x), Df(x)w),

or equivalently,
(9 1 5' = f(x), u' = Df(5)U.

Since F is the flow 4 of (9) is Cwl. But thi,flow is just


Wl ( Z , U ) ) = (N,21, D#t(z)tO,
aince the second equation in (9) is the variational equation of the first equation.
Therefore a4//dx is a Cpl function of (t, z), aince a#/& D ~ , ( x )Moreover
. 5

t+/at is Cr-' (in fact, C' in t ) since

It follows that t$ is CI since ita first partid derivative8 are cP'.


92. DIFFERENTIABILITY OF THE FLOW OF AUTONOMOUS EQUATIONS 303

PROBLEMS

1. Let A : R + L ( E ) be continuous and let P : R 4 L ( E ) be the solution to the


initial value problem
P' = A(t)P, P ( 0 ) = Po E L ( E ) .
Show that

Det P ( t ) = (Det Po) exp [[Tr A ( s ) &I.


2. Show that iff is C,some r with 0 5 T 5 a,, and z ( t ) is a solution to z' = f(z) ,
then z is a C*' function.
Chapter 16
Perturbation Theory
and Structural Stability

This chapter is an introduction to the problem: What effect does changing the
differential equation itself have on the solution? In particular, we find general condi-
tions for equilibria to persist under small perturbations of the vector field. Similar
results are found for periodic orbits. Finally, we discuss briefly more global problems
of the same type. That is to say, we consider the question: When does the phase
portrait itself persist under perturbations of the vector field? This is the problem of
structural stability.

51. Persistence of Equilibria

Let W be an open set in a vector space E and f : W --$ E a C1vector field. By a


perturbation, off we simply mean another C1vector field on W which we think of as
being “Cl close to f,” that is,

If(1:) - d1:)I and II Df(z) - @7(z>II


are small for all 1: F W .
To make this more precise, let 2) (W) be the set of all C1 vector fields on W. If
E has a norm, we define the C1-norm 11 h 111 of a vector field h E U ( W ) to be the
least upper bound of all the numbers

I h(z)l, II W z ) I I ; E w.
We allow the possibility 11 h = 00 if these numbers are unbounded.
$1. PEHSXSTESCE OF EQUILIBRIA 305

A neighborhood off E 2) ( W ) is any subset 32. C 2, ( W ) that contains a set of the


form
( 9 E 'U(W)I II 9 - f III < €1
for some t > 0 and some norm on E .
The set 'U ( W ) has the formal properties of a vector space under the usual opera-
tions of addition and scalar multiplication of vector-valued functions. The C1norm
has many of the same formal properties as the norms for vector spaces defined
earlier, namely,
(I h III 2 0,
(1 h = 0 if and only if h = 0,
II h + 9 Ill I II h Ill + II 9
Ill1

where if I( h I l l or 11 g (1, is infinite, the obvious interpretation is made.


We can now state our first perturbation theorem.

Theorem 1 Let f: W + E be a C1 vector jield and i E W a n equilibrium of x' =


f ( x ) such that Df(i)E L ( E ) i s invertible. T h e n there exists a neighborhood U C W
of i and a neighborhood 32. C 'U ( W ) of f such that for a n y g E 32. there i s a unique
equilibrium ij E U of y' = g ( y ) . iMoreover, i f E i s normed, for a n y c > 0 we can
choose 32. so that I ij - 3 I < t.

Theorem 1 applies to the special case where i is a hyperbolic equilibrium, that


is, the eigenvalues of Df(i)have nonzero real parts. In this case, the index ind(f)
of 1 is the number of eigenvalues (counting multiplicities) of Df(i)having negative
real parts. If dim E = n, then ind(2) = n means f is a sink, while i n d ( i ) = 0
means it is a source. We can sharpen Theorem 1 as follows:

Theorem 2 Suppose that 5 i s a hyperbolic equilibrium. I n Theorem 1, then, 3t,


U can be chosen so that i f g E X,the unique equilibrium E U of y' = g ( y ) i s hyper-
bolic and has the same index as 3.
Proof. This follows from a theorem in Chapter 7 and Theorem 1.

The proof of Theorem 1 has nothing to do with differential equations; rather, it


depends on the following result about C1 maps:

Proposition Let f: W 4 E be C1and suppose xo E W i s such that the linear operator


Df(x0): E E i s invertible. T h e n there i s a neighborhood 32. C 'U ( W ) of f and an
---f

open set U C W containing xo such that i f 9 E X,then


I
(a) g U i s one-to-one, and
(b) f(xo) E g ( W .
,306 16.

-
PERTURBATION THEORY AND STRUCTURAL STABILITY

Theorem 1 follows by taking zo 3 andf(3) = 0, for then g(g) = 0 for a unique


9 E U.To make I - 3 I < c (assuming E is normed now) we simply replace W
I
by W,= {zE W I z - i I < t ) . The proposition guarantees that 3t can be
chosen so that U,and hence 9,k in Wofor any g E 3t.
It remains to prove the proposition. In the following lemmas we keep the same
notation.

Lemma 1 Aesume E is normd. Let

(3)

Now let y, z be dietinct points of V with z = y + u. Note that since V is a ball,


y + tu E V for all t E [0, 13. Define a C' map [0, 13 cp: + E by

cp(0 = f(Y + tu).


Then
cp(O) = j ( y ) J v(l) =f(z>.
By the chain rule,

Hence
cp'(0 = Of(!/ + tu)u.
f(4 fb) = 1M
0
1

Y + tu)udt
$1. PERSISTENCE OF EQUILIBRIA 307

Thus f ( y ) # f(z) . This proves Lemma 1.

Lemma 2 Suppose E i s a normed vectm space with n m defined by an inner product.


Let B C W be a closed ball aTOUnd xo with boundary aB, and f : W + E a C1 map.
Suppose Df(y) is inVeTtibk for dl y E B. Let
I Y E aBJ > 26
min{If(y) - f(s)I > 0.
Then w E f ( B ) if I w - f(z0)1 < 6.
Proqf. Since B is compact, there exists yo E B at which the function
H :B+R,
H(y) = 3 If(?/)- w IS
takes a minimal value. Note that yo cannot be in dB, for if y E aB, then
Ifb) - w l II f ( Y ) -f(xo>I - If(s)
-WI
> 26 - 6.
Hence
If(?J) -wI>6 -w
> If(s> I1

showing that I f ( y ) - w I is not minimal if y E aB.


Since the norm on E comes from an inner product, 3 I 5 )* is differentiable; its
derivative a t 5 is the linear map z + ( 5 , 2 ) . By the chain rule, H is differentiable
and its derivative a t yo is the linear map
zjDH(yo)z = Cf(y0) - wJ D f ( y o ) z ) *
Since yo is a critical point of H and yo in an interior point of B , D H ( y o ) = 0.
Since Df(yo) is invertible, there exists v E E with

Then

Therefore f(y0) = w, proving Lemma 2.


308 16. PERTURBATION THEORY AND STRUCTURAL STABILITY

Note that the proof actually shows that


w E f ( B - aB).
To prove the proposition we give E a norm coming from an inner product. The
subset of invertible operators in the vector space L ( E ) is open. Therefore there
exists a > 0 such that A E L ( E ) is invertible if

II A - ~ f ( X 0 ) I I < **
Since the map x + D f ( x ) is continuous, there is a neighborhood N1 C W of xo
such that if x E N1, then
II D f ( x ) - Df(Z0)Il < ta.
It follows that if g E 2)( W ) is such that
I1 &(t) - Df(x)II < 3.1
for all x E N1, then Dg(x)is invertible for all x E N1. The set of such g is a neighbor-
hood Xi off.
Let Y > )I Df(x0)-l I I. The map A + A-l from invertible operators to L ( E ), is
continuous (use the formula in Appendix I for the inverse of a matrix). It follows
that f has a neighborhood Enz C X l and xo has a neighborhood N z C N , such that
if g E % and y E N z , then
II DB(Y)--' II <.v .
We can find still smaller neighborhoods, Xa C X2 off and N S C NZof xo,such that
if g E X 8 and y, 2 E N I , then
1
I1 W Y ) - Ds(z>II < V -*

It now follows from Lemma 1 that for any ball V C N and any g E 318, g I V is one-to-
one.
Fix a ball V C N8 around XO. Let B C V be a closed ball around xo and choose
6 > 0 as in L e m m a 2. There is a neighborhood X C 378 off such that if g E En, then

min(l g(y3 -gbdl IY E aB) > 26 > 0.


It follows that if I w - g(xo) I < 6 and g E X , then w E g ( B ). The proposition is
now proved using this 31 and taking U = V.

We have not discussed the important topic of nonautonomous perturbations.


Problem 2 shows that in a certain sense the basin of attraction of a sink persists
under small nonautonomous perturbations.
$2. PERSISTENCE OF CLOSED ORBITS 309

PROBLEMS

1. Show that the stable and.unstable manifolds of a hyperbolic equilibrium of a


linear differential equation x' = Ax vary continuously with linear perturbations
of A E L ( E ) . That is, suppose E" e E* is the invariant splitting of E such
that el*: EU-+ E" is an expanding linear flow and e l A : Es -+ Em is contracting.
Given t > 0, there exists S > 0 such that if 11 B - A 11 < 6, then B leaves
invariant a splitting F U e F B of E such that ele 1 F u is expanding, etB 1 Fa is
contracting, and there is a linear isomorphism T : E --+ E such that T ( E U )=
F", T(E') = Fa, and 11 T - 111 < t.
2. Let W C Rn be an open set and 0 E W an asymptotically stable equilibrium
of a C1 vector field f : W + Rn.Assume that 0 has a strict Liapunov function.
Then 0 has a neighborhood W oC W with the following property. For any
t > 0 there exists 6 > 0 such that if g: R X W --+ R is C1 and 1 g ( t , z) -

f(z) I < 6 for all ( t , x), then every solution x ( t ) to x' = g ( t , x) with z(6) E W
satisfies z ( t ) E W for all t 2 6 and I x ( t ) l < E for all t greater than some tl.
(Hint: If V is a strict Liapunov function for 0, then ( d l d t ) ( V ( z ( t ) is close
to ( d / d t ) ( V ( y ( t ) )where
, y' = f ( y ) . Hence ( d / d t ) ( V ( x ( t )<
) 0 if I z ( t ) l is
not too small. Imitate the proof of Liapunov's theorem.)

$2. Persistence of Closed Orbits

In this section we consider a dynamical system 4l defined by a C' vector field


f : W -+ E where W C E is an open set. We suppose that there is a closed orbit
y C W of period X > 0. For convenience we assume the origin 0 E E is in y. The
main result is:

Theorem 1 Let u : So --+ S be a Poincart?map for a local section S at 0. Let U CW


be a neighborhood of y. Suppose that 1 is not an eigenvalue of D u ( 0 ) . Then there exists
a neighborhood 3t C 'U ( W ) o f f such that every vectm field g E 3t has a closed orbit
BC u.

The condition on the Poincarb map in Theorem 1 is equivalent to the condition


that the eigenvalue 1 of D+A (0) has multiplicity 1. Unfortunately, no equivalent
condition on the vector fieldf is known.
Proof of the theorem. Let T : So --f R be the C' map such that r ( 0 ) = A and
U(t) = 4dz).
3 10 16. PERTURBATION THEORY AND STRUCTURAL STABILITY

We may assume that the closure of Sois a compact subset of S. Let a > 0. There
exists 60 > 0 such that if g E %( W) and I g(z) - j ( z ) I < 6o for all z E So, then,
first, S will be a local section at 0 for g, and second, there is a C1 map o: So+ R
such that
I o(z) - 4 z ) l < a,

and

where $, is the flow of g.


Put
ccu(z)(z) = u(t).
Then
v: so+s
is a C1map which is a kind of Poincart! map for the flow
Given any to > 0 and any compact set K C W, and any v >0 we can ensure
that
II D+l(S> - W1(z>ll< v
for all t E [-lo to], z E K , provided we make I I g - I II small enough. This follows
from continuity of solutions of differential equations aa functions of the original
data and initial conditions and the expression of a$,(z)/ax aa solutions of the
nonautonomous equation in I, ( E ),

where 9' = g(y). (See Chapter 15.)


From this one can show that provided 11 g - j 111 is small enough, one can make
I u ( z ) - u(z) I and 11 D u ( z ) - Du(z)11 aa small as desired for all 5 E SO.
A fixed point z = v ( x ) of v lies on a closed orbit of the flow $l. We view such It
fixed point as a zero of the C1 map
q: So + H , V(S) = -
~ ( 5 ) 5,

where H is the hyperplane containing S.


Let E : So+ H be the C' map
t(5) = u(z) -z
so that [(O) = 0. Now
D[(O) = D u ( 0 ) - I ,
where I: H + H is the identity. Since 1 is not an eigenvalue of Du(0) we know
that 0 is not an eigenvalue of D [ ( O ) ,that is, D[(O) i s invertible. From the proposi-
tion in the preceding section we can find a neighborhood %! C 'U ( So)of [ such that
any map in nt has a unique zero y So.If I I g - f I I I is sufficiently small, v E m.
52. PERSISTENCE OF CLOSED ORBITS 311

Hence q has a unique zero y E So;and y lies on a closed orbit B of g. Moreover, we


can make y so close to 0 that p C U . This proves Theorem 1.

The question of the uniqucness of the closed orbit of the perturbation is interest-
ing. It is not necessarily unique; in fact, i t is possible that all points of U lie on
closed orbits of j ! But it is true that closed orbits other than y will have periods
much bigger than y. In fact, given e > 0, there exists 6 > 0 so small that if 0 <
d ( z , y ) < 6 and t+((z) = r , t > 0 , then t > 2X - e. The same will hold true for
sufficiently small perturbations of y: the fixed point y of v that we found above
lies on a closed orbit of g whose period is within c of 1;while any other closed orbit
of g that meets Sowill have to circle around p several times before i t closes up. This
follows from the relation of closed orbits to the sections; see Fig. A.

FIG. A. A closed orbit 8' near a hyperbolic closed orbit 8.

There is one special case where the uniqueness of the closed orbit of the perturba-
tion can be guaranteed: if y is a periodic attractor and g is sufficiently close to f,
then p will also be a periodic attractor; hence every trajectory that comes near p
winds closer and closer to p as t + 00 and so cannot be a closed orbit.
Similarly, if y is a periodic repeller, so is p, and again uniqueness holds.
Consider next the case where y is a hyperbolic closed orbit. This means that the
derivative a t 0 E y of the Poincark map has no eigenvalues of absolute value 1. In
this case a weaker kind of uniqueness obtains: there is a neighborhood V C U of
y such that if 32 is small enough, every g E 3t will have a unique closed orbit that
is entirely contained i n V . It is possible, however, for every neighborhood of a hyper-
bolic closed orbit to intersect other closed orbits, although this is hard to picture.
We now state without proof an important approximation result. Let B C Rn
be a closed ball and aB its boundary sphere.

Theorem 2 Let W C Rn be an open set containing B and f: W + Rn a C1 vector


field which is transverse to aB at every point of aB. Let 3t C '0 ( W ) be any neighborhood
312 16. PERTURBATION THEORY AND STRUCTURAL STABILITY

off. Then there exists g E 3Z such that:


(a) i f 2 E B is an equilibrium of g , then 3 i s hgperbolic;
(b) if y C B is a closed orbit of g , then y is hyperbolic.

The condition that f be transverse to aB is not actually necessary, and in fact, B


can be replaced by any compact subset of W .

PROBLEMS

1. Show that the eigenvalue condition in the main theorem of this section
is necessary.
2. Let y be a periodic attractor of x' = f(z). Show there is a C' real-valued
function V ( z ) on a neighborhood of y such that V 2 0, V-l(O) = y, and
(d/dt)(V(z(t)< ) 0 if x ( t ) is a solution curve not in y. ( H i n t : Let z ( t ) be
the solution curve in y such that x ( t ) - z ( t ) + 0 as t --+ a0 ; see Chapter 13,
Section 1, Theorem 3. Consider JOT I ~ ( t - ) z ( t ) [ * dt for some large constant T . )
3. Let W C Rnbe open and let y be a periodic attractor for a C1vector field f : W --+
Rn.Show that y has a neighborhood U with the following property. For any
t > 0 there exists 6 > 0 such that if g: R X W .--) Rn is C* and I g ( t , z) -
f ( z ) I < 6, then every solution z ( t ) to x' = g ( t , z) with z(to) E U satisfies
x ( t ) E U for all 1 2 to and d ( x ( t ) ,y) < c for all t greater than some tl. ( H i n t :
Problem 2, and Problem 2 of Section 1.)

93. Structural Stability

In the previous sections we saw that certain features of a flow may be preserved
under small perturbations. Thus if a flow has a sink or attractor, any nearby flow
will have a nearby sink; similarly, for periodic attractors.
It sometimes happens that any nearby flow is topologically the same as a given
flow, that is, for any sufficiently small perturbation of the flow, a homeomorphism
exists that carries each trajectory of the original flow onto a trajectory of the per-
turbation. ( A homeomorphism is simply a continuous map, having a continuous
inverse. ) Such a homeomorphism sets up a one-to-one correspondence between
equilibria of the two flows, closed orbits, and so on. I n this case the original flow
(or its vector field) is called structurally stable.
Here is the precise definition of structural stability, a t least in the restricted
setting of vector fields which point in on the unit disk (or ball) in Rn. Let
Dn = ( z E R"llz1 5 1)
$3. STRUCTURAL STABILITY 313

Consider C1 vector fields f : W + Rn defined on some open set W containing Dn


such that c f ( x ), x) < 0 for each x in don.Such an f is called structurally stable on
Dn if there exists a neighborhood 51 C ‘u( W ) such that if 9 : W -+ Rn is in 32, then
flows off and g are topologically equivalent on Dn.This means there exists a homeo-
morphism h: Dn + Dn such that for each x E Dn,
h((4t(z)lt 2 0 1 ) = { r L M s ) ) l t 2 01,
where + I is the flow of 9 ; and if x is not an equilibrium, h preserves the orientation
of the trajectory. (The orientation of the trajectory is simply the direction that
points move along the curve as t increases.)
This is a very strong condition on a vector field. It means that the flow + t can-
not have any “exceptional” dynamical features in Dn. For example, it can be shown
that if 5 E int Dn is an equilibrium, then it must be hyperbolic; the basic reason
is that linear flows with such equilibria are generic.
The harmonic oscillator illustrates the necessity of this condition as follows.
Suppose that f : W + R*, with W 3 D2, is a vector field which in some neighborhood
of 0 is given by

x’ = Ax, A = [ -1
‘1.
0
By arbitrary slight perturbation, the matrix A can be changed to make the origin
either a sink, saddle, or source. Since these have different dynamic behavior, the
flows are not topologically the same. Hencef is not structurally stable. I n contrast,
it is known that the Van der Pol oscillator is structurally stable.
The following is the main result of this section. It gives an example of a class of
structurally stable systems. (See Fig. A.)

FIG. A. A structurally stable vector field.


314 16. PERTURBATION THEORY AND STRUCTURAL SThBILITY

Theorem 1 Let f : W + Rn be a C' vector jield on a n open set W 3 D n with the


following properties:

(a) f has exactly one equilibrium 0 E D", and 0 i s a sink;


( b ) j points inward along the boundary aDn of Dn, that is,
U(S), 2 ) < 0 if x E aDn.
(c) lim,,, q h ( x ) = 0 for all x E Dn, where t#a i s Iheflozo off.
Then f i s structurally stable on P.

Before proving this we mention three other results on structural stability. These
concern a C1 vector field f : W + R2 where W C R2 is a neighborhood of D2. The
first is from the original paper on structural stability by Pontryagin and Andronov.

Theorem 2 Suppose f points inward on D2. Then the following conditions taken
together are equivalent to structural stability on D2:

(a) the equilibria in D2 are hyperbolic;


(b) each closed orbit in 0 2 i s either a periodic attractor OT a periodic repeller (that
is, a periodic attractor for the vector field -f (s ) ) ;
(c) no trajectory in 0 2 goes from saddle to saddle.

The necessity of the third condition is shown by breaking a saddle connection


as in Fig. B (a) by an approximation as in Fig. B (b) .
A good deal of force is given to Theorem 2 by the following result of Peixoto;
it implies that structural stability on D2 is a generic condition. Let Wo(W ) be the
set of C1 vector fields on W that point inward on aD2.

Theorem 3 The set


S = { j E Wo(W )1 f is structurally stable on D z )
is dense and open. That i s , every element of s has a neighborhood in '&(W)contained
in S, and every open set in 'Uo(W)contains a vector jield which is structurally stable
on D2.

Unfortunately, it has been shown that there can be no analogue of Theorem 3


for dimensions greater than 2. Nevertheless, there are many interesting vector
fields that are structurally stable, and the subject continues to inspire a lot of
research.
In the important case of g r d i e n t dynamical systems, there is an analogue of
Theorem 3 for higher dimensions as follows. Consider in W ( D n )the set grad(Dn)
of gradient vector fields that point inward on Dn.
$3. STRUCTURAL STABILITY 315

i
(b)
f
FIG. B. (a) Flow near a saddle connection; (b) breaking a saddle connection.

Theorem 4 The set of structurally stable systems contained in grad (Dn) is open
and dense in grad (D").

We turn to the proof of Theorem 1. In outline it proceeds as follows. A vector


field g sufficiently close to f is shown to have a unique equilibrium a E Dn near
0; moreover, all trajectories of g in D" tend tcjward a. Once this is known, the homeo-
morphism h : Dn + Dn is defined to be the identity on aDn; for each x E aDn it
maps the f-trajectory of x onto the g-trajectory of x preserving the parametrization;
and h ( 0 ) = a.
The proof is based on the following result which is interesting in itself. In Section
1 we showed the persistence of a hyperbolic equilibrium under small perturbations.
In the special case of a sink we have a sharper result showing that the basin of
attraction retains a certain size under perturbation.
316 16. PERTURBATION THEORY AND STRUCTURAL STABILITY

Proposition Let 0 € E be a sink for a C1vector field f : W + E where W is an open


set containing 0. There ezists an inner product on E , a number r > 0, and a neighbor-
hood 31. C 'U ( W ) o f f such that the following holds:for each g € 31. there i s a sink a =
a(g) for g such that the set
Br= I z E E I I z I I r I
contains a, is in the basin of a, and is positively invariant under the $ow of g,
Proof. From Chapter 9 we give E an inner product with the following property.
For some v < 0 and 2r > 0 it is true that
U(z), z > < " I z I2

if 0 < I z I 5 2r. It follows that Br is in the basin of 0, and that f ( z ) points inward
along aB,. It is clear that f has a neighborhood 31.0 C 'U ( W ) such that if q € a,
then also g ( t ) points inward along aB,.
Let 0 < E < r and put s = r + c . If I y I < E, then the closed ball B,(y)about y
with radius s satisfies :
Br C B , ( Y ) C
Let Y < p < 1. We assert that if 11 g - f Ill is sufficiently small, then the sink a
of g will be in B,, and moreover,
(1) (9(z),5 - a) 5 P I - a I2
if z E B. ( a ) . To see this, write
( g ( z ) ,2 - a ) = U(z - a ) , 2 - a ) + ( d z ) - f ( Z - a ) , 2 - a>
< I z - a l2 + (g(z) - f ( z - a ) , z - a).
- Y

The map a ( z ) = g ( z ) - f ( z - a ) vanishes at a. The norm of its derivative at z is


estimated thus :
I1 Da(z)lI 5 II D g ( z ) - Df(z)II + II D f ( z ) - D f ( z - a)II;
as 11 g - f I l l +0, 11 D g ( z ) - Df(z)11 + 0 uniformly for I z I 5 2r; and also z -
a -0, so 11 Df(s)- Df(z- a)ll - 0 uniformly for I z I 5. 2r. Thus if (1 g - f ]II
is small enough, I( Da(z)( 1 5 P - Y , and II - Y is a Lipschitz constant for a ; hence
I a ( z ) l = I a ( z ) - a(a)l I( P - .)I - a I.
Consequently, if 11 g - f Ill is sufficiently small, say, less than 6 > 0,
(g(z>, - a >S Y I z - a l1 + ( a ( z ) , z - a )
< Y I z - a I* + ( f i - v)I z - a12
-
=pIz-aI*
a required.
Put 31.1 = { g E 'U(W)I 11 g - f 111 < 61, and set 31. = a n % . Suppose g E 3t,
with sink a E B,. B y (1) the set B,(a) is in the basin of a. Since Br C B , ( a ) , and
g(z) points inward along aB,, the proof is complete.
$3. STRUCTURAL STABILITY 317

We now prove Theorem 1. Since Dn is compact and f (2) points inward along the
boundary, no solution c i n e can leave Dn. Hence Dn is positively invariant. Choose
r > 0 and 3t C U ( W ) aa in the proposition. Let 310 C X be a neighborhood o ff
80 small that if g f G, then g(s) points inward along aDn. Let $ t be the flow of
g f %. Note that Dn is also positively invariant for $L1.
For every s f Dn - int B,, t h e is a neighborhood U, C W of z and t, > 0
such that if y f U , and t 1 t,, then
I4t(Y)l < r -
By compactness of aDn a finite number U,,, . . . , U,, of the sets U , cover aDn.Put
to = max(t,,, . . . , &).
Then $,(On - int B,) C B,, if t 2 6. It follows from continuity of the flow in f
(Chapter 15) that f has a neighborhood 311 C 31 such that if g E 311, then
+f (Dn - int B,) C B, if t 1 to.
This implies that
limJ.*(z)= a for all z E Dn.
L-. 00

For let 5 D" ; then y = (5) B,. , and B,. C basin of a under +!.
It also implies that every y D" - a is of the form (x)for some x aD" and +f

t 3 0. For otherwise L,(y) is not empty ; but if z La@),then + ! ( x ) + a as t + =.


hence y = a.
Fix g f 3tl. We have proved so far that the map
P:[O, m) x aDn+Dn,
w4 2) = Ct(4
has Dn - a for its image. And the map
ip: [O, m) X aDn+ Dn,

@(s,0 = 4t(Z>

has Dn - 0 as its image. We define


h: Dn 3 Dn,

Another way of saying this is that h maps c#I((z)to $,(x) for z E aDn, t 2 0, and
h ( 0 ) = a; therefore h maps trajectories of 4 to trajectories of $, preserving orienta-
tion. Clearly, h(Dn) = D". The continuity of h is verified from continuity of the
flows,and by reversing the role of the flow and its perturbation one obtains a con-
318 16. PERTURBATION THEORY AND STRUCTURAL STABILITY

tinuous inverse to h. Thus h is a homeomorphism; the proof of Theorem 1


is complete.

PROBLEMS

1. Show that if f : R2 -+ R2 is structurally stable on D2 and f(0) = 0, then 0 is a


hyperbolic equilibrium.
2. Let y C R", n 2 2 be the circle
y = (zER"Ix;+x: =4,zk = O for k > 2 ) .
Let
N = RnI d(x, y) 5 1).
(ZE

Let W C Rnbe a neighborhood of N and f : W -+ Rn a C1 vector field. Suppose


= ( z E R" 1 d ( z , y) = 1). If y is a periodic
f ( x ) points into N for all x in aN
attractor and y = L,(x) for all z E N , prove that f is structurally stable on
N . (See Fig. C for n = 3.)

FIG. C

3. Iff E 'U ( W ) is structurally stable on DnC Rn,show that f has a neighborhood


3t such that every g E 3t is structurally stable.
4. Show that Theorem 1 can be sharpened as follows. For every t > 0 there is a
neighborhood 3t off such that if g E 3t the homeomorphism h (in the definition
of structural stability) can be chosen so that I h ( r ) - x I < E for all x E Dn.
5. Find necessary and sufficient conditions that a vector field f : R --j R be struc-
turally stable on a compact interval.
6. Let A be an operator on Rn such that the linear flow e l A is hyperbolic. Find
t > 0 such that if B is an operator on Rnsatisfying ( 1 B - A 11 < t, then there
is a homeomorphism of Rn onto itself that takes each trajectory of the dif-
ferential equation x' = Ax onto a trajectory of y' = By.
Afterword

This book is only an introduction to the subject of dynamical systems. To pro-


ceed further requires the treatment of differential equations on manifolds; and
the formidable complications arising from infinitely many closed orbits must be
faced.
This is not the place to develop the theory of manifolds, but we can try to indi-
cate their use in dynamical systems. The surface S of the unit ball in R3 is an exam-
ple of the two-dimensional manifold. A vector field on R3 might be tangent to S
a t all points of S; if it is, then S is invariant under the flou. I n this way we get a n
example of a dynamical system on the manifold S (see Fig. A).
The compactness of S implies that solution curves of such a system are defined
for all 1 E R. This is in fact true for all flows on compact manifolds, and is one
reason for the introduction of manifolds.
Manifolds arise quite naturally in mechanics. Consider for example a simple
mechanical system as in Chapter 14. There is the Hamiltonian function H : U --f R,
where U is an open subset of a vector space. The “conservation of energy” theorem
states that H is constant on trajectories. Another way of saying the same thing
is that if H ( z ) = c, then the whole trajectory of 2 lies in the subset H - ’ ( c ) . For
“most” values of c this subset is a submanifold of U , just as the sphere S in R3 can
+ +
be viewed as H-l( 1) where H ( z , y, z ) = 2 2 y2 z2. The dimension of H - l ( c )
is one less than that of U . Other first integrals cut down the dimension even further.
In the planar Kepler problem, for example, the state space is originally an open
subset U of R4. The flow conserves both total energy H and angular momentum
I
h. For all values of c, d the subset { z E U H (z) = c, h ( z ) = d } is a manifold
that is invariant under the flow.
Manifolds also arise in mechanical problems with constraints. A pendulum in
three dimensions has a configuration space consisting of the 2-sphere S, and its
state space is the manifold of tangent vectors to S. The configuration space of a
320 AFTERWORD

__----
FIG. A. A vector field tangent to S.

/I
rigid body with one point fixed is a compact three-dimensional manifold, the set
of rotations of Euclidean three space.
The topology (global structure) of a manifold plays an important role in the
analysis of dynamical systems on the manifold. For example, a dynamical system
on the two sphere S must have an equilibrium; this can be proved using the
PoincartS-Bendixson theorem.
The mathematical treatment of electrical circuit theory can be extended if mani-
folds are used. The very restrictive special hypothesis in Chapter 10 was made in
order to avoid manifolds. That hypothesis is that the physical states of a circuit
(obeying Kirchhoff’s and generalized Ohm’s laws) can be parametrized by the
inductor currents and capacitor voltages. This converts the flow on the space of
physical states into a flow on a vector space. Unfortunately this assumption ex-
cludes many circuits. The more general theory simply deals with the flow directly
on the space of physical states, which is a manifold under “generic” hypotheses
on the circuit.
Rlanifolds enter into differential equations in another way. The set of points
whose trajectories tend to a given hyperbolic equilibrium form a submanifold called
the stable manifold of the equilibrium. These submanifolds are a key to any deep
global understanding of dynamical systems.
Our analysis of the long-term behavior of trajectories has been limited to the
simplest kinds of limit sets, equilibria and closed orbits. For some types of systems
these are essentially all that can occur, for example gradient flows and planar sys-
tems. But to achieve any kind of general picture in dimensions higher than two, one
AFTERIVOIID 321

must confront limit sets which can be extremely complicated, even for structurally
stable systems. It can happen that a compact region contains infinitely many
periodic solutions with periods approaching infinity. PoincarC was dismayed by
his discovery that this could happen even in the Newtonian three-body problem,
and expressed despair of comprehending such a phenomenon.
In spite of the prevalence of such systems it is not easy to prove their existence,
and we cannot go into details here. But to give some idea of how they arise in ap-
parently simple situations, we indicate in Fig. B a discrete dynamical system in
the plane. Here the rectangle ABCD is sent to its image A’B‘C‘D‘ in the most
obvious way by a diffeomorphism f of R2; thus f ( A ) = A’, and so on. It can be
shown that f will have infinitely many periodic points, and that this property is
preserved by perturbations. (A point p is periodic if f ” ( p ) = p for some n > 0.)
Considering R2 as embedded in R3,one can construct a flow in R3 transverse to RZ
whose time one map leaves R2 invariant and is just the diffeomorphism f i n R2.Such
a flow has closed orbits through the periodic points off.

FIG. B

In spite of PoincarC’s discouragement there has been much progress in recent


years in understanding the global behavior of fairly general types of dynamical
systems, including those exhibiting arbitrarily long closed orbits. On the other
hand, we are far from a clear picture of the subject and many interesting problems
are unsolved.
The following books are recommended to the reader who wishes to see how the
subject of dynamical systems has developed in recent years. They represent a good
cross section of current research : Proceedings of Symposia in Pure Alatheinatics
Volume X I V , Global Analysis [3] and Dynamical Systems [19]. See also Nitecki’s
Differentiable Dynamics [lS].
Elementary Facts

This appendix collects various elementary facts that most readers will have
seen before.

1. Set Theoretic Conventions

We use extensively maps, or functions, from one set X to another Y , which we


write
f
f:X-+Y or X+Y.

Thus the map f assigns to each element x E X (that is, x belongs to X ) an element
f ( x ) = y of Y . In this case we often write x -+ y or x - + f ( x ) . The zdentity map
i: X ---f X is defined by i(x) = x and if Q is a subset of X , Q C X , the inclusion
map a: Q -+ X is defined by a ( q ) = q. Iff: X -+ Y , and g: Y + 2 are two maps,
the composition g f (or sometimes written gf) is defined by g f ( r ) = g (f(x)).
0 0

The map f: X -+ Y is said to be one-to-one if whenever E , x’ E X ,x # x’,then


j(x) # f(x’). The image off is the set described as

Imf = (y E Y I y = f ( x ) , some x E X ) .
Then f is onto if Im f = Y . An inverse g (or f-’) off is a map g: Y X such that-+

g j is the identity map on X and f g is the identity on Y . If the image off is


0 0

Y and f is one to one, then f has an inverse and conversely.


If f : X -+ Y is a map and Q C X , then f I Q : Q -+ Y denotes the restriction of
f t o Q sof I Q(q> = f k ) .
ELEMENTARY FACTS 323

We frequently use the summation sign :

c + + . . . + zn,
n
5; = z1 22
i-I

where the ei are elements of a vector space. If there is not much ambiguity, the
limits are omitted:
c s i = c1+ . . . +z..

2. Complex Numbers

We recall the elements of complex numbers C.We are not interested in complex
analysis in itself; but sometimes the use of complex numbers simplifies the study
of real differential equations.
The set of complex numbers C is the Cartesian plane RZconsidered as a vector
space, together with a product operation.
Let i be the complex number i = (0, 1) in coordinates on R2.Then every complex
number z can be written uniquely in the form z = 2 +
i y where z, y are real num-
bers. Complex numbers are added as elements of R2,so if z = x +
i y , 2’ = z’ +iy’,
then z +
2’ = (c +5’) +
i(y +
y’) : the rules of addition carry over from RZto C.
Multiplication of complex numbers is defined as follows: if z = x +
i y and
2’ = x’ + iy’, then zz’ = (zz’ - yy’) i(cy’ + +
z’y). Note that iz = -1 (or
“i = 6 1 ” ) with thisdefinition of product and this fact is ~ yaid l to remembering
the product definition. The reader may check the following properties of multi-
plication :
(a) zz’ = z’z.
(b) ( z z ’ ) ~ ’ ’ = ~(”2”).
(c) l z = z (here 1 = 1 + i.0).
(d) If z = z + iy is not 0, then

( e ) If z is real (that is, z = z +


i . 0 ) , then multiplication by z coincides with
scalar multiplication in R2.If z and z’ are both real, complex multiplication
specializes t o ordinary multiplication.
+
( f ) ( 2 z’)w = zw + z’w, 2 , z’, w E c.

The complex conjugate of a complex number z = c +


iy is the complex number
= c - iy. Thus conjugation is a map U : C + C, U ( Z ) = 2, which has as its set
of fixed points the real numbers; that is to say 2 = z if and only if z is real. Simple
324 APPENDIX I

properties of conjugation are:


z = 2,

(2 + 2') = z + z',
E' = 22'.

The absolute value of a complex number z = 2 + i y is


1zI = (zi)1/' = (2' + y')1/'.
Then
Iz1 = 0 if and only if z = 0,
/z+z'! IIZI + 12'1,

IZZ'I = IZIIZ'I,

and I z 1 is the ordinary absolute value if z is real.


Suppose a complex number z has absolute value 1. Then on R2it is on the unit
+
circle (described by zz y" = 1) and there is a 0 E R such that z = cos 0 i sin 0. +
We define the symbol eie by
eib = cos 0 + i sin 0,
ea+ib = e@eib,

This use of the exponential symbol can be justified by showing that it is con-
sistent with a convergent power series representation of e.. Here one takes the
power series of ea+ib as one does for ordinary real exponentials; thus

ea+ib
c +n!ib)"
= * (a
n-O

One can operate with complex exponentials by the same rules as for real ex-
ponentials.

3. Determinants

One may find a good account of determinants in Lang's Second Course in Calculus.
[l2]. Here we just write down a couple of facts that are useful.
First we give a general expression for a determinant. Let A = [aij] be the
(n X n) matrix whose entry in the ith row and j t h column is a+ Denote by Aij
the ( n - 1 ) X (n - 1) matrix obtained by deleting the ith row a n d j t h column.
Then if i is a fixed integer, 1 5 i 5 n, the determinant satisfies
Det A = ( - 1) '+'ail Det Ail + * * + (-1) Det A,..
ELEMENTARY FACTS 325

Thus the expression on the right does not depend on i and furthermore gives a
way of finding (or defining) Det A inductively. The determinant of a 2 X 2 matrix
[z is ad - bc. For a 3 X 3 matrix

one obtains

Recall that if Det A # 0, then A has an inverse. One way of finding this inverse
is to solve explicitly the system of equations Ax = y for x obtaining x = By ; then
B is an inverse A - ' for A .
If n e t A # 0, one has the formula

A-l = transpose of
[
(-l)i+jDet A i j
D etA 1
It follows easily from the recursive definition that the determinant of a tri-
angular matrix is the product of the diagonal entries.

4. Two Propositions on Linear Algebra

The purpose of this section is to prove Propositions 1 and 3 of Section lB, Chap-
ter 3.

Proposition 1 Every vector space F has a basis, and every basis of F has the same
number of elements. I f {el, . . . , ek) C F i s an independent subset that is not a basis,
by adjoining to it suitable vectors et+l, . . ., em, one can form a basis el, . . ., em.

The proof goes by some easy lemmas.

Lemma 1 A system of n linear homogeneous equations in n + 1 unknowns always


has a nontrivial solution.

The proof of Lemma 1 is done by the process of elimination of one unknown to


obtain a system of n - 1 equations in n unknowns. Then one is finished by induc-
tion (the first case, n = 2, being obvious). The elimination is done by using the
first equation to solve for one variable as a linear combination of the rest. The
326 APPENDIX I

expression obtained is substituted in the remaining equations to make the re-


duction.

Lemma 2 Let { e l , . . . , en} be a basis for a vector space F. If ul, , . . , urn are linearly
independent elements of F , then m 5 n.
Proof. It is sufficient to show that m # n + 1. Suppose otherwise. Then each
u, is a linear combination of the ei,

+ 1.
n
ui = C aikek, i = 1, . . . , n
k-1

By Lemma 1, the system of equations


n+l
C xiaik = 0, k = 1, . . . , n,
i-1

has a nontrivial solution x = (51, . . . , Xn+l). Then


C xiu1 = C xi C aikeh = C C xiairek = 0,
i k h i

so that the ui are linearly dependent. This contradiction proves Lemma 2.

From Lemma 2 we obtain the part of Proposition 1 which says that two bases
have the same number of elements. If (el, . . . , en} and (UI, . . . , urn) are the two
bases, then the lemma says m I n. An interchange yields n 5 m.
Say that a set S = (ut, . . . , Om) of linearly independent elements of F is maximal
if for every u in F, u 6 S, the set (u, UI, . . . , urn) is dependent.

Lemma 3 A maximal set of linearly independent elemenls B = ( UI, . . . , Urn} in a


vector space F is a basis.
Proof. We have to show that any u E F, u 6 B, is a linear combination of the
u,. But by hypothesis u, ul, ...
, u, are dependent so that one can find numbers x ,
+
x; not all zero such that C x;ui xu = 0. Then x # 0 since the ui are independent.
Thus u = C (-x,/x)u,. This proves Lemma 3.

Proposition 1 now goes easily. Recall F is a linear subspace of Rn (our definition


of vector space!). If F # 0, let u1 be any nonzero eIement. If (01) is not a basis,
one can find u2 E F, us not in the space spanned by ( ul] . Then ul, u2 are independent
and if ( q ,u t ) is maximal, we are finished by Lemma 3. Otherwise we continue
the process. The process must stop with a maximal set of linearly independent
elements (UI, . . . , u,}, m 5 n by Lemma 2. This gives us a basis for F. The rest of
the proof of the proposition proceeds in exactly the same manner.
ELEMENTARY FACTS 327

Proposition 3 Let T : E + F be a linear map. Then


dim(1m T ) + dim(Ker T ) = dim E.
I n particular, suppose dim E = dim F . Then the followirq are equivalent statements:
(a) Ker T = 0 ;
(b) Im T = F ;
(c) T is an isomorphism.
Proof. The second part follows from the first part (and things said in Section 1
of Chapter 3).
To prove the first part of the proposition let fl, . . . ,f k be a basis for Im T. Choose
el, . . . , ek such that Tei = f i . Let 91, . . . , gr be a basis for Ker T. It is sufficient
to show that
(el, . . . , 9 , 91, . . . , 911
is a basis for E since k = dim Im T and 1 = dim Ker T .
First, these elements are independent: for if C hie; +
C Migi = 0, application
of T yields C LTei = C X c f i = 0. Then the Xi = 0 since the f; are independent.
Thus C Mjgj = 0 and the M j = 0 since the g j are independent.
Second, E is spanned by the ei and g j , that is, every element of E can be written
as a linear combination of the ei and the gj. Let e be any element of E. Define
v = C Xiei, where Te = C Xifi defines the Xi. Then e = (e - u ) +
u. Now
T(e - v ) = 0 so e - u E Ker T and thus ( e - u ) can be written as a linear combin-
ation of the gj.
Appendix I1
Polynomials

1. The Fundamental Theorem of Algebra

Let
p ( z ) = anZn + a,-lzn-' + + a12 +
* * * an # 0,
.
be a polynomial of degree n 2 1 with complex coefficients ao, . . , a,. Then p ( z ) = 0
for a t least one z E C.
The proof is based on the following basic property of polynomials.

Proposition 1 l i m ~ + I ~P ( Z ) I = a.

Proof. For z # 0 we can write.

Hence

Therefore there exists L > 0 such that if I z I 2 L, then the right-hand side of
(1) is 2 3 I a, I > 0, and hence

from which the proposition follows.


POLYNOMIALS 329

Proposition 2 Ip ( z ) I attains a minimum value.


Proof. For each k > 0 define the compact set
Dt= (zECIlzI_<k).
The continuous function I p ( z ) I attains a minimum value
uk = I p ( z k ) 1, zk E Dk,
on Dk. (Zk may not be unique.) By Proposition 1 there exists k > 0 such that
(2 ) I p ( z ) 12 v1 if Iz 1 2 k.
We may take k 2 1. Then v k is the minimum value of I p ( z ) 1, for if z E Dk, then
I p ( Z ) I 5 v k , while if Z 4 Dk, I p ( z ) I 2 U 1 by (2); and V i 2 u k since D1 Dk. c
Proof of theorem. Let I p ( z 0 ) I be minimal. The function
q(z) = p(z + 20)

is a polynomial taking the same values as p , hence it suffices to prove that q has a
root. Clearly, I q ( 0 ) I is minimal. Hence we may assume that
(3) the minimum value of I p ( z ) I is I p ( 0 ) I = 1 Q 1.
We write
p(Z) = a0 + arZk + Zk+?(Z), Uk # 0, k 2 1,
where r is a polynomial of degree n - k - 1 if k < n and r = 0 otherwise.
We choose w so that
(4 1 a0 + akwk = 0.
In other words, w is a kth root of -ao/ak. Such a root exists, for if

-
-a0-
ak
+
- ~ ( C OO S i sin 0 ) ,

then we can take

w =p l / k (cos (1) + (i)) i sin

We now write, for 0 <t < 1,


+ + akwk) + (tw)k+lr(tw)
p ( t w ) = (1 - tk)a tk(ao
= (1 - tk)a+ (tw)k+lr(tw).
Hence
I p ( t w ) I 5 I a0 I - tk I a0 I + tk+' I wk+'r(tw)I
= I a0 I - t k ( I a0 I - t I wk+'r(tw) I ).
330 APPENDIX I1

But if I a, I > 0, for t sufficiently small, we have


1 a I - t 1 wk+lr(tw) I > 0,
such a value of t makes
I P ( t w ) I c I a0 I.
This contradicts minimality of I p ( 0 ) I = I a I. Hence I p ( 0 ) I = 0.

Corollary A polynomial p of degree n can be factored:


p(z) = (2 - A,) * * (2 - Am),
where p ( A r ) = 0, k = 1, . . , , n , and p ( z ) # 0 for z # hk.

Proof. For any A E C we have


P(Z) = P((Z - A) + A)
= ak((z - A) + A)k.
k-O

Expanding by the binomial theorem, we have

Every term on the right with j > 0 has a factor of z - A; hence

+ c akAb
n
p(z) = (2 - A)q(z)
k-0

or
P(4 = - A M Z ) + P(A>
(2

for some polynomial q ( z ) of degree n - 1 (which depends on A). In particular, if


p (A,) = 0, which must be true for some All we have
P(Z) = (2 - A,>q,(z>.
Since q1 has a root AS, we write
P(4 = (z - A d (2 - Az)q2(z)
and so on.
The complex numbers A,, . . . , A. are the roots of p . If they are distinct, p haa
simple roots. If A appears k times among (XI, . . . , An), X is a root of multiplicity k,
or a k-foZd root. This is equivalent to ( z -
A) being a factor of p (2).
Appendix ,11
On Canonical Forms

The goal of this appendix is to prove three results of Chapter 6: Theorem 1and
the uniqueness of the S + N decomposition, of Section 1; and Theorem 1 of Sec-
tion 3.

1. A Decomposition Theorem

Theorem 1 (Section 1, Chapter 6) Let T be an operator on V where V is a


complex vector space, or V is real and T has real eigenvalues. Then V is the direct
8um of the generalized eigenspaces of T . The dimension of each generalized eigenspace
equals the multiplicity of the corresponding eigenvalue.

For the proof we consider thus an operator T : V + V, where we suppose that


V is a complex vector space.
Define subspacea for each nonnegative integer j as follows:
Kj(T) = Kj = Ker T i ; N = UK , ;
a
L~(T>
= L j = Im ~ i ; M = nL ~ .
j

Then

Choose n and m so that


Kj=K, if j y n ,
Lj= L, if j l m ,
332 APPENDIX I11

which is possible since V is finite dimensional. Put

N(T) = N = K,, M(T) = M = L,.


Clearly, N and M are invariant.

Lemma V = NeM.
Proof. Since T M = L ,+1 = M , T I M is invertible; also, T n ( M ) = M and
Tnx # 0 for nonzero z in M . Since T n ( N ) = 0, we have N n M = 0. If x E V is
any vector, let Tmx = g E M . Since T m I M is invertible, Tmx = Tmz,z E M.
Put 2 = (z - z ) +
z. Since z - z E N , z E M, this proves the lemma.

Let al,. . . , a, be the distinct eigenvalues of T . For each eigenvalue LYI, define
subspaces
N k = N ( T - arZ) = u
Ker(T - a ~ , I ) j ,
J-20

MI, = M ( T - aJ) = Im(T - arZ)’.


J-S

Clearly, these subspaces are invariant under T .


By the lemma,
V = N1 @ M i .

Proposition V = N1 e * - a~N , .
Proof. We use induction on the dimension d of V,the caam d = 0 or 1 being
trivial. Suppose d > 1 and assume the theorem for any space of smaller dimen-
sion. In particular, the theorem 2s assumed to hold for T I M I :MI + M I .
It therefore suffices to prove that the eigenvalues of T I M1 are a2,. . . , a,, and
that

(1) N ( T - a1,1I M I ) = N(T - aJ), all k > 1.


We fist prove that
(2) Ker ( ( T - a J ) I NI,) = 0, all k > 1.
Suppose ( T - a l I ) x = 0 and z # 0. Then Tx = a l x ; hence
( T - acZ)z = ((11 - aL)z.
But then
( T - akI)’z = (a1 - ~I,)’X# O
for a l l j 2 0,80 x 4 Nr.
ON CANONICAL FORMS 333

Since N k is invariant under T - a l l we have


( T - alz)Nk = Nk,
by ( 2 ) . Therefore Nk C Im ( T - a l l )j, all j 2 0, k > 1. This shows that
Nk c Mi, all k > 1.
This implies that a2, . . . , a,,are eigenvalues of T I A l l . It is now clear that the
eigenvalues of T I A l l are precisely a2,. . . , aPsince a1is not, and any eigenvalue of
T I M 1is also an eigenvalue of T . The proposition is proved.

We can now prove Theorem 1. Let nk be the multiplicity of OLk as a root of the
characteristic polynomial of T.Then T I Nk: Nk + Nk has the unique eigenvalue
CYL (the proof is like that of (2) above) , and in fact the lemma implies that a k has
I
multiplicity nk as an eigenvalue of T Nk. Thus the degree of the characteristic
polynomial of T I Nk is nk = dim Nk.
The generalized eigenspace of T : V + V belonging to (Yk is defined by Ek =
E ( T , a k ) = Ker ( T - ak)nk. Then, clearly, Ek c
Nk.
In fact, it follows that Ek = Nk from the definition of Nk and Lemma 2 of the
next section (applied to T - a ) .This finishes the proof of the theorem if V is
complex. But everything said above is valid for an operator on a real vector space
provided its eigenvalues are real. The theorem is proved.

2. Uniqueness of S and N

Theorm Let T be a linear operator on a vector space E which i s complex if T has


any nonreal eigenvalues. Then there is only m e way of expressing T as S +
N , where
S is diagonalizable, N is nilpotent, and SN = N S .

Proof. Let Ek = E(Xk, T ) , k = 1, . . ., r, be the generalized eigenspaces of T .


Then E = El Q .. . Q E, and T = T , CB . Q T,, where Tk = T I Ek. Note
that Ek is invariant under every operator that commutes with T .
Since S and N both commute with S and N , they both commute with T . Hence
Ek is invariant under S and N .
Put s k = XkI E L(Ek), and Nk = Tk - s k . It Suffices to show that 8 I Ek = Sk,
for then N I Ek = Ek, proving the uniqueness of and N .
Since S is diagonalizable, so is S I Ek (Problem 17 of Chapter 6, Section 2 ) . There-
fore I Ek - X k I is diagonalizable; in other words ] Ek - 8 k is diagonalizable.
This operator is the same as Nk - N I Ek. Since N 1 Ek commutes with X k z and
with T k , it also commutes with Nk. It follows that Nk - N I Ek is nilpotent (use
the binomial theorem). Thus 8 [ Ek - s k is represented by a nilpotent diagonal
matrix. The only such matrix is 0; thus S I Ek = s k and the theorem is proved.
334 APPENDIX 111

3. Canonical Forms for Nilpotent Operators

The goal is to prove the following theorem.

Theorem 1 (Section 3, Chapter 6) Let N be a nilpotent operator on a red or


complez vector space V . Then V has a basis giving N a matrix of the form
A = diag(A1,. . . , A , ] ,
where A j i s a n elementary nilpotent block, and the Size of Ak i s a nonincreasing func-
tion of k. The matrices A1, . . . , A, are uniquely determined by the operator N .

In this section, V is a real or complex vector space.


A subspace W C V is a eyclic subspace of an operator T on V if T ( W) C W
and there is a vector x E W such that W is spanned by the vector TnxJn = 0, 1, . . . .
We call such an x a cyclic vector for W .
Any vector x generates a cyclic subspace, for the iterates of x under T , that is,
x , T x , T'x, . . . generate a subspace which is evidently cyclic. We denote this
subspace by Z ( x ) or Z ( x , 2').
Suppose N : V -+ V is a nilpotent operator. For each x E V there is a smallest
positive integer n, denoted by nil(x) or nil(x, N ) , such that Nnx = 0. If x # 0,
then N k x # 0 for 0 5 k < nil(x).

Lemma 1 Let nil(x, N ) = n. Then the vectors N k x , 0 5 k 5 n - 1, form a basis


for Z ( x , N ) .
Proof. They clearly span Z ( x ) . If they are dependent, there is a relation
2 - lakNk (x) = 0 with not all a k = 0. L e t j be the smallest index, 0 5 j 5 p - 1
such that a j # 0. Then

n-1
= C akNn+k--f-lx
k-j

n-1
= ajNn-lx + C akNn+k--flx
k-j+l

= ajNn-lx

since n +k - j - 1 1 n if k 2 j +
1. Thus ajNn-12 = 0, so Nn-Ix = 0 because
aj # 0. But this contradicts n = nil(x, N ) .
This result proves that in the basis (x,N x , . . , , Nn-lx], n = nil (x), the nilpotent
ON CANONICAL FORMS 335

operator N I Z ( x ) has the matrix

with ones below the diagonal, zeros elsewhere. This is where the ones below the
diagonal in the canonical form come from.
An argument similar to the proof of Lemma 1 shows:
if C,", akNkx = 0, then ah = 0 for k < nil(x, N ) .
It is convenient to introduce the notation p ( T ) to denote the operator xi, ahTk
if p is the polynomial
p ( t ) = antr +. . + ad + a,
+

where t is an indeterminate (that is, an "unknown"). Then the statement proved


above can be rephrased :

Lemma 2 Let n = nil (x,N). If p ( t ) is a polynomial such that p ( N ) x = 0 , then


t" divides p ( t ) , that is, there is a polynomial p l ( t ) such that p ( t ) = t n p l ( t ) .

We now prove the existence of a canonical form for a nilpotent operator N .


In view of the matrix discussed above for N I Z (2) , this amounts to proving:

Proposition Let N : V + V be a nilpotent operalor. Then V is a direct 8um of


cyclic subspaces.

The proof goes by induction on dim V , the case dim V = 0 being trivial. If
dim V > 0, then dim N ( V ) < dim V , since N has a nontrivial kernel. Therefore
there are nonzero vectors y l , . . . , yr in N ( V ) such that
N(V) = Z(yi) @ * * * @Z(y,).

Let xj E V be a nonzero vector with


N x j = yj, j = 1, . . . ,r .
We prove the subspaces Z(XI),. . . , Z(G) are i n d e p d e n t .
Observe that nil(xj) 2 2 since
N x=
~ yj # 0.
If the subspaces Z ( x i ) are not independent, there are vectors uj E Z (Zj), not
all zero, such that uj = 0. Therefore Cj N u j = 0. Since Nuj E N @ ( x i ) ) =
336 APPENDIX 111

Z ( y , ) and the Z ( y j ) are independent by assumption, it follows that u j E Ker N ,


j = 1, . . . , r. Now each u j has the form
nj-1
C ajkNkxj, nj = nil(zj).
k-O

Hence uj = p j ( N ) z j for the polynomial p j ( t ) = C:S1ajrlk. Therefore N u j =


p j ( N ) g j = 0. By Lemma 2, p j ( t ) is divisible by tm if nt 5 nil ( y j ) .Since 1 5 nil ( y j ) ,
we can write
pj(t) = sj(t)t
for some polynomial s,(l).
But now, substituting N for t, we have
~j = sj(N)N~j
= sj(N)yj E Z(yj).
Therefore uj = 0 since the Z (yj) are independent.
We now show that
(1 1 v = Z(z1) Q"'QZ(5,) 0 L
with L C Ker N . Let Ker N = K and let L be a subspace of K such that
K = ( K n N ( V ) ) Q L.
Then L is independent from the Z ( z j ) . To see this, let u E ( @ Z ( z j ) ) n L. Then
u E ( @ Z (zj)) fl K , and by an argument similar to the one above, this implies
u E N ( V ) . But N ( V ) n L = 0, henceu = 0.
It is clear that every cyclic subspace in K , and hence in L, is one dimensional.
--
Therefore I, = Z(w,)Q - Q Z (w,), where (wl,. . . , w.) is a basis for L. Finally,
v = Z ( a ) Q** ' 0 Z ( X r ) @ Z ( w 1 ) Q * * * QZ(W.).
This proposition implies the theorem, except for the question of uniqueness of the
matrices A 1 , . . . , A,. This uniqueness is equivalent to the assertion that the oper-
ator N determines t,he sizes of the blocks A i (or the dimensions of the cyclic sub-
spaces). This is done by induction on dim V .
Consider the restriction of N to its image N ( V ) = F :
N I F : F ---t F .
It is easy to see that if V is the direct sum of cyclic subspaces Z1 Q - * Q 2, Q W:,
where w1 c I<er iv, and zk is generated by X k , dim z k > 1, then N ( v ) is the
direct sum
N(Z1) $ * * ' c B N ( Z r ) ,
where N ( 2 , )is cyclic, generated by N ( z k ) , and dim N (Z,) = dim Z k - 1. Since
dim N ( F ) < dim V , the numbers {dim z k - 1) are determined by N I F , hence
by N . I t follows that (dim Zr)are also determined by N .
This finishes the proof of the theorem.
Appendix IV
The Inverse Function Theorem

In this appendix we prove the inverse function theorem and the implicit function
theorem.

Inverse function theorem Let W be an open set in a vector space E and let f:
W E be a C' map. Suppose xo E W is such that Df(xo) is an invertible linear
operator on E. Then zo has an open neighborhood V C W such that f I V is a diffeo-
morphism onto an open set.
Proof. By continuity of Of:W + L ( E ) there is an open ball V C W about s
and a number Y > 0 such that if y, z E V, then Df(y) is invertible,
11 Df(Y)-' 11 < "J
and
II Dfk) - Df(z)II < p-'.
It follows from L e m m a 1 of Chapter 16, Section 1, that f I V is one-to-one. More-
over, Lemma 2 of that section implies that f ( V ) is an open set.
The map f-I: f ( V ) + V is continuous. This follows from local compactness of
f( V). Alternatively, in the proof of L e m m a 1 it is shown that if y and z are in
V, then
I Y - I I v If(!/) - f(z> I;
hence, putting j(y) = a and f(z) = b, we have
If'(a) - f-'(b) I I v I a - b I,
which proves f-I continuous.
It remains to prove that f-1 is C'. The derivative of f-' at a = f(r) E f( V ) is
Df(s)-'.To see this, we write, for b = f ( y ) E f( V ) :
f-'(b) - j - ' ( a ) - Df(zj-'(b- a ) = y - 2 - Df(z)-'(f(g)
- f(z>).
338 APPENDIX IV

Hence

<
-
IY-zl
This clearly goes to 0 aa 1 f(y) - f (z)I goes to 0. Therefore D(f-l) ( a ) =
[Df(f-'a) rl.
Thus the map D (f-'> :f (V ) + L ( E ) is the composition:fl, followed
by Df, followed by the inversion of invertible operators. Since each of these maps
ie continuous, 80 is D(f-').

Remark. Induction on r = 1, 2, . . . shows also thal i f f is 0,then f-' is CI.

Implicit function theorem Let W C El X Er be an opeiL set in the Cartesian


product of two vector spaces. Let F : W + ES be a C' map. Suppose (20, yo) E W is
m h 'that the linear operator

ie invertible. Put F ( z o , yo) = c. Then there are open sets U C El, V C Et with
(ZO, Yo) E uxvcw
and a unique C1 map
g:u+v
such that
F ( z , g(z)) = c
for all x EIU,and m o v e r , F ( x , y) # c i f ( x ,y) E U x V and y # g(x).
THE INVERSE FUNCTION THEOREM 339

Before beginning the proof we remark that the conclusion can be rephrased thus:
the graph of g is the set
F - ~ ( c )n (u x v).
Thus F-l ( c ) is a “hypersurface” in a neighborhood of ( xo, yo).
To prove the implicit function theorem we apply the inverse function theorem
to the map
f: W - t E i X E2,

f(z, Y) = F(z, 9)).


(2,

The derivative off at (5,y) E W is the linear map


y):
Df(Z, El X E2 + EI X Es,

It is easy to find an inverse t o this if a F ( r , y)/dy is invertible. Thus Df(so, yo) is


invertible. Hence there is an open set Uo X V C W containing (xo,yo) such that
f restricts to a diffeomorphism of Uo X V onto an open set Z C El X E2.
Choose open sets U C Uo, Y C E2 such that xo E U , c E Y , and
UXYCZ.
The inverse off: UOX V + Z preserves the first coordinate because f preserves
it. The restriction of (f I Uo X V ) - l to U X Y is thus a C1map of the form
h : U X Y+UOX v,
h ( z , w) = (x,c p b w)),
where
(O:UXY+V
is C1.
Define a C1map
g:u-tv,
g(x) = cp(% c).
From the relation f h 0 = identity of U X Y we obtain, for x E U :
(z,c ) = f h ( G c)
= (z, Fh(z, c ) )
= (x,F ( z , d z , c ) ) 1
= (2,F ( z , g ( z ) )1.
Thus
340 APPENDIX IV

for all x E U.Since f is one-to-one on U X V , if y # g(z), then


(‘9 1/) # f(zJ g(’) ;
hence
( 2 ,F ( s t 21) 1 # (z,F ( z t g(z) 1) = (5,C)
so F ( z , y) # c. This completes the proof of the implicit function theorem.

We note that if F is CI, g is 0.


From the identity
F(sJ g(s>) c,
we find from the chain rule that for all x in U :

This yields the formula


References

1. R. Abraham, Foundations of Mechanics (New York: Benjamin, 1967).


2. R. Bartle, The Elements of Real Analysis (New York: Wiley, 1964).
3. S. S. Chern and S. Smale (eds.), Proceedings of the Symposium in Pure Mathemath XZV,
Global Analysis (Providence, Rhode Island: Amer. Math. SOC.,1970).
4. U. D’Ancona, The Struggle for Ezistence (Leiden, The Netherlands: Brill, 1954).
5. C. Desoer and E. Kuh, Basic Circuit Theory (New York: McGraw-Hill, 1969).
6. R. Feynman, R. Leighton, and M. Sands, The FeynnuLn Lectures on Physics, Vol. 1 (Reading,
Messachusetts: Addison-Wesley, 1963).
7. N. S. Goel, S. C. Maitra, and E. W. Montroll, Nonlinear Models of Interacting Populations
(New York: Academic Press, 1972).
8. P. Halmos, Finite Dimensional Vector Spaces (Princeton, New Jersey: Van Nostrand, 1958).
9. P. Hartman, Ordinary Di’erential Equations (New York: Wiley, 1964).
10. S. Lang, Calculus of Several Variables (Reading, Massachusetts: Addison-Wesley, 1973).
11. S. Lang, Analysis I (Reading, Massachusetts: Addison-Wesley, 1968).
12. S. Lang, Second Course in Calculus, 2nded. (Reading, Massachusetts: Addison-Wesley, 1964).
13. J. La Salle and S. Lefschetz, StaSility by Liapunou’s Direct Method with Applicatias (New
York: Academic Press, 1961).
14. S. Lefschetz, Diferential Equations, Geometric Theory (New York: Wiley (Interscience),
1957).
15. L. Loomis and S. Sternberg, Advanced Cdndus (Reading, Massachusetts : Addison-Wesley,
1968).
16. E. W.Montroll, On the Volterra and other nonlinear models, Rev. Mod. Phys. 4 .3 (1971).
17. M. H. A. Newman, Topology of Plane Sets (London and New York: Cambridge Univ. Press,
1964).
18. Z. Nitecki, Di’erentiable Dynamics (Cambridge, Massachusetts: M I T Press, 1971).
19. M. M. Peixoto (ed.), Dynamiml Systems (New York: Academic Press, 1973).
20. L. Pontryagin, Ordinary Differential Equations (Reading, Massachusetts: Addison-Wesley,
1962).
21. A. Rescigno and I. Richardson, The struggle for life; I, Two species, Bull. Math. Biophysics
29 (1967), 377-388.
B. S. Smale, On the mathematical foundations of electrical circuit theory, J . Digerential Geom.
7 (1972), 193-210.
342 REFERENCES

23. J. Synge and B. Griffiths, Principles of Mechanies (New York: McGraw-Hill, 1949).
24. R. Thom, Slabilild Slruclurelle el Morphoghnbse: Essai d'une thdorie ghnbrale des modbles (Read-
ing, Massachusetts: Addison-Wesley, 1973).
25. A. Wintner, The Analytical Foundations of Celeslial Mechanics (Princeton, New Jersey:
Princeton Univ. Press, 1941).
26. E. Zeeman, Differential equations for heartbeat and nerve impulses, in Dynamical Systems
(M. M. Peixoto, ed.), p. 683 (New York: Academic Press, 1973).
Answers to Selected Problems

Chapter 1

Section 2 , page 12

2. (a)( h e f ,k2et, k3ef)


(b)(kle‘, k2e-2t,k3)
(c)( k l e f ,k2e-2f, h e Z f )

6. A = diag {ai, ..., a,) and ai < 0, i = 1, . . ., n.


8. (b) Any solutions u, v such that u ( 0 ) and u(0) are independent vectors.

Chapter 2

Page 27

1. F ( z ) = - K x : V ( r ) = K l l x l l * , x ER2
2
~

dx2
m- = -grad V = -Kx
dP
“hlost” initial conditions means the set of v) E R2 X R2 such that u is not
(2,
collinear with x .
23 2y3 22
2. (a) with V ( x ,y) = - - - - and (c) with V ( x ,y ) = -
3 3 2
7. Hint: Use (4) Section 6.
344 ANSWERS TO SELECTED PROBLEMS

Chapter 3

Section 3, page 54

1. (a) c ( t ) = 0, y(t) = 3eZ1

4. All eigenvalues are positive.


6. (b) b >0

Section 4 , page 60

1. +
(d) c = 3et cos 2t 9eg sin 2t
y = 3ef sin 21 - 9e' cos 2t.

Chapter 4

Section 1, page 65

2. dim E = dim EC and dim F 2 dim FR


3. F > RCR

Section 2, page 69

1. (a) Basis for E is given by (0, -fl,fi)and (1, -2, -1).

Section 3, page 73

Introduce the new basis (1,0 , O ) , (0, -a,


a),(1, -2, - 1), and new coordinates
(yl, YZ, y3) related to the old by
XI = YI + Ys,
22 = -fi y2 - 2ys,
xa = fi y2 - y3.
ANSWERS TO SELECTED PROBLEMS 345

In the new coordinates the differential equation becomes


Y: = Y l
y; = -fi y31
y; = fi y2.
The general solution is
yl = Cet,
yz = A cos(fi 1) + B sin(fi t ) ,
+ A s i n ( a t).
y3 = -B c o s ( f i t )
Thcreforc
$1 = Ce' - B cos(fi t ) + A sin(V2 t),
.rz= (2B- A a ) cos(* t ) - ( B f i + 2 A ) s in (fi t ) ,
23 = ( B +
A f i ) c o s ( f i t) + ( B f i - A ) s i n ( f i t).
(The authors solved this problem in only two days.)

Chapter 5

Section 2 , page 81

3. A = l , B = &
4. (a) fi ( b) t (c) 1 (d) 4
6 . (a) and (d)

Section 3, page 87

5
3. Hint: Note -
ITXI - -
ITYI --I T y I i f y = - - .
1x1 IYI 1x1
4. (a) The norm is 1.
7. Hint: Use geometric series.
W
1
C x i --- for O < s < l , withx=(II-TI(.
1-0 1- 5

13. Hint: Show that all the terms in the power series for eA leave E invariant.
346 ANSWER8 TO BELECTED PROBLEMS

Section 4 , page 97

1. (a) z(t) = ( K 2 - tK1)e2',


y(t) = Kle2'.
(b) ~ ( t ) e2'(K1cos t - K 2 sin t ) ,
=
y(t) = e2'(K2 cos t +Kt sin t ) .
+
2. (a) z ( t ) = (2t l)e2',
y ( t ) = -2e21.
(b) z ( t ) = 2e2' sin t ,
y ( t ) = -2e2' cos 1.
4. Hint: Consider A restricted to eigenspaces of X and use result of Problem 3.
9. (a) sink (b) source (c) source
(d) none of these ( f ) none of these
10. (a) Only if a < - 2 are there any values of such k and in this case for
k>-.
(b) No values of k .
14. Hint: There is a real eigenvalue. Study T on its eigenspace.

Section 5 , page 102

1. (a) x ( t ) = A[-4 + sin t ] - + e4%.


cos t

z ( t ) = -f9[4t + 1) + - + e4Ik.
e4
(b)
16
(c) s ( 1 ) = A cos t + B sin t .
y ( t ) = -A sin.t + B cos t + 21.

Section 6 , page 107

2. (a) ( b ) s ( l ) = -ef+
s ( l ) = cos 21. +
e2r-2.
3. (a) c o s f i 2 , sin* 1 (b) e x p f i t , exp -fi t
4. Hint: Check cases ( a ) , ( b ) , (c) of the t,heorem.
8. a = 0, b > 0; period is 4 / 2 r .

Chapter 6

Section 2, page 120

'1
1. (a) Gcneralizcd l-eigenspace spanned by (1, 0 ) , (0, 1 ) ;
s = [ o1 J,
0 N=[O
0 0
INSWERS TO SELECTED PROBLEMS 347

(b) Generalized 1-eigenspace spanned by (1,O) ; generalized ( - 1)-cigcnspace


spanned by (1, 2) ;

s= c' 'I,
0 -1
1v = [; ;I.
2. If the rth power of the mat,rix is [b,,], then b , j = 0 for i <j + r ( r = 1, 2, . . . ) .
3. The only eigenvalue is 0.

5 . Consider the S +
iV decomposition.
6. A prcservrs each generalized eigenspace EX; hence it suffices to consider the
+
restrictions of A and T to Ex. If T = S N , then S Ex = X I which commutes
with A . Thus S and T both commute with A ; so therefore does N = T - S.
S. Use thr Cayley-Hamilton theorem.
1.5. Consider bases of the kernel and the image.

Section 3, page 126

1. Canonical forms:

3. Assumc that N is in nilpotent canonical form. Let b denote the number of blocks
and s the maximal number of rows in a block. Then bs 5 n ; also b = n - T and
s 5 k.
4. Similar pairs are ( a ) , (d) and (b) , (c)

Section 4 , page 132

1. (a) [Z 0 -i
O ]
(c) [ I0f i l + i 3
4. F o r n = 3:

o o c
348 ANSWERS TO SELECTED PROBLEMS

6. If Ax = p t , 5 # 0, then 0 = q ( A ) z = q(p)x.
8. Show that A and A t have the same Jordan form if A is a complex matrix, and
the same real canonical form if A is real.

Section 5, page 136

1. (a) Let every eigenvalue have real part < - b with b > a > 0. Let A = S N +
with S semisimple and N nilpotent. In suitable coordinates 1) ers )I 5 e-lb,
)I etx 11 5 C P . Then 11 elA 11 Ic e f b l n ,and so eta I( el* I( + O as t 4 00.
Lets>Obesolargethateto((etA (1 < 1 f o r t k s . P u t k = min(((etA1l-I)
for 0 5 t I s.
2. If z is an eigenvector belonging to an eigenvalue with nonzero real part, then
the solution etAxis not periodic. If ib, iC are pure imaginary eigenvalues, b # f c ,
and z , tu E Cn are corresponding eigenvectors, then the real part of etA( z w) +
is a nonperiodic solution.

Section 6, page 141

1. s ( l ) e-l.
=
2. (a) In ( 7 ) , A = B = 0. Hence s(0) = C, s'(0) = D, s(*)(O) = -c,
s y o ) = -D.

Chapter 7

Section I , page 150

3. (a) Use etBelA =

Section 2, page 153

2. Use the theorem of this section and Theorems 1 and 2 of Section 1.


3. Use Problem 2.

Section 3, page J57

1. (a) dense, open (b) dense (c) dense, open


( e ) open (f) open (9) dense, open
ANSWERS TO SELECTED PROBLEMS 349

Chapter 8

Page 177

1. (a) f(s) = 5 + 2.
uo(t) = 2,

ul(t) = 2 + / t f ( u ~ ( s ) )ds
0
= 2 + /'4ds0

= 2 + 4t,

~ ( t =) 2 + /' (4 + 4s) ds = 2 + 4t + 2t2,

l'
0

ua(t) = 2 + (4 + 4s + 2s2) ds = 2 + 4t + 2t2 + 2t3


-3.
By induction
.*. + - t") - 2 .
n!

Hence
z ( t ) = limu,(t) = 4et - 2.
n-m

u,(t) = 0
for all n: Hence z ( t ) = 0.
(c) z(t) = t 3 .

' '
4. (a) 1

f ( z ) - '(') + m as 2 + 0; no Lipschitz constant.


(b) (2-01
(c) 1
5. (a) For 0 < c < Blet
x(t) =
1"'
&(t - c)3,
OltSc

c 5 1 58.
350 ANSWERS TO SELECTED PROBLEMS

Chapter 9

Section 1, page 185

2. For example, f(r) = -x3 (z E R).


3. H i n t : Use a special inner product on Rn. Compute the rate of change of
I x ( t ) I* where x ( t ) is a solution such that s(0) is (the real part of) an eigen-
vector for Df(0)having positive real part; take z(0) very small.
4. Use ( b ) of the theorem of Section 1.

Section 2 , page 191

2. (a), (b), (e)


3. H i n t : Look at the Jordan form of A . It suffices to consider an elementary
.Jordan block.

Section 3, page 199

1. x2 + y2 is a strict Liapunov function.


S. V-l[O, c] is positively invariant. The w-limit set of any point of V-l[O, c] con-
sists entirely of equilibria in V-'[O, c]; hence it is just 2 .

Section 4, page 204

2. Let x' = -grad V ( z ) .Then V decreases along trajectories, so that V is con-


stant on a recurrent trajectory. Hence, a recurrent trajectory consists entirely
of equilibrium points, and so is a constant.
3. (a) Each set V-l( - m , c] is positively invariant.
(h) Use Theorem 3.

Section 5, page 209

1. H i n t : Find eigrnvcctors.
2. Let Ax = Xx, A y = py, X # p, p # 0. Then (2, y ) = p-l(x, A y ) = p - l ( A z , y) =
Xp-'(z, y), and Xp-' # 1.
5 . A x = grad $ ( x , Ax).
ANSWERS TO SELECTED PROBLEMS 351

Chapter 10

Section I , page 215

x = k, y = vc.

Section 3, page 226

1. Every solution is periodic! Hint : If ( x( t ), y ( t ) ) is a solution, so is ( - x ( - t ) ,


Y( - t ) 1.

Section 4 , page 228

1. p = - 2 , p=-lf21/7.

Section 5, page 237

x = iL, y = vc

Chapter 11

Section 1, page 241

1. H i n t : If the limit set L is not connected, find disjoint open sets Ul, lT2 con-
taining L. Then find a bounded sequence of points x , on the trajectory with
X, t ui, 2, [Jz.
4. H i n t : Every solution is periodic.
352 ANSWERS M SELECTED PROBLEMS

Section 3, page 247

2. Hint: Apply Proposition 2.


4. Hints: (a) If x is not an equilibrium, take a local section at 1. (b) See
Problem 2 of Section 1.

Section 4 , page 249

2. H i n t : Let y E y. Take a local section at y and apply Proposition 1 of the pre-


vious section.

Section 5, page 253

2. Hints: (a) Use Poincar6-Bendixson. (b) Do the problem for 2 n 1 +


closed orbits; use induction on n.
5. Hint: Let U be the region bounded by a closed orbit y o f f . Then g is trans-
verse to the boundary y of U . Apply Poincar6-Bendixson.

Chapter 13

Section 1 , page 278

(a) Hint: Show that the given condition is equivalent to the existence of an
eigenvalue a of & ( x ) with I a I < 1. Apply Theorem 2.

Section 3, page 285

3. Hint: If v is periodic of period A, then so is rv for all r > 0.


5. Hints: (a) Do the problem first in case p is zero and g is linear. Then use
Taylor’s formula for the general case. (b) Apply the result in (a) after taking
a local section.

Chapter 15

Section 2, page 303

2. This is pretty trivial. Since 2’ is the Ct function f, then x is Cr+l.


ANSWERS TO SELECTED PROBLEMS 353

Chapter 16

Section 1 , page 309

1. Hint: If B is close to A , each eigenvalue of B having negative real part will


be close t o a similar eigenvalue A of A . Arguing as in the proof that 81is open
in Theorem 1 of Chapter 7, Section 3, show that the sum of the multiplicities
of these eigenvalues p , of B near A equals the multiplicity of A. Then show that
bases for the generalized eigenspaces of the p , can be chosen near corresponding
bases for A.

Section 3, page 318

1. Suppose Df(0)has 0 as a n eigenvalue, let g.(z) = f(z) + cz, t # 0.For 1 t 1


sufficiently small, one of gWe,g. will be a saddle and the other a source or sink;
hence j cannot have the same phase portrait as both g-e and g.. If Df(0)has
+hi, A > 0, as a n eigenvalue, then g-. is a sink and g+. is a source.
A. Hint: First consider the case where e l A is a contraction or expansion. Then use
Problem 1 of Section 1.
Subject Index

A Change
Absolute convergence, 80 of bases, 36
Adjoint, 230 of coordinates, 6, 36
Adjoint of operator, 206 Characteristic, 213, 232
(YLimit point, 198, 239 Characteristic polynomial, 43, 103
Andronov, 314 Closed orbit, 248
Angular momentum, 21 Closed subset, 76
hnnulus, 247 Companion matrix, 139
.4ntisymmetric map, 290 Comparison test, 80
Areal velocity, 22 Competing species, 265
.4symptotic period, 277 Complex Cartesian space, 62
Asymptotic stability, 145, 180, 186 Complex eigenvalues, 43, 55
.4symptotically stable periodic solution, 276 Complex numbers, 323
iZsymptotically stable sink, 280 Complex vector space, 62, 63
.4utonomous equation, 160 Complexification of operator, 65
Complexification of vector spaces, 64
B Configuration space, 287
Conjugate of complex number, 323
Bad vertices, 269
Conjugate momentum, 293
Based vector, 10
Conjugation, 64
Basic functions, 140
Conservation
Basic regions, 267
Basin, 190 of angular momentum, 21
of energy, 18, 292
Basis, 34
Conservative force field, 17
of solutions, 130
continuous map, 76
Belongs to eigenvector, 42
Bifurcation, 227, 255 Continuously differentiable map, 16
of behavior, 272 Contracting map theorem, 286
Bifurcation point, 3 Contraction, 145
Convergence, 76
Bilinearity, 75
Convex set, 164
Boundary, 229
Coordiiiate system, 36
Branches, 229, 211
Brayton-Moser theorem, 234 Coordinates, 34
Brouwer fixed point theorem, 253 Cross product, 20
Current, 211
C Current states, 212, 229
Curve, 3, 10
C', C*, 178
Cyclic subspace, 334
Canonicnl forms, 122, 123, 331
Cyclic vector, 334
Capacitance, 232
Capacitors, 21 1, 232
D
Cartesian product of vector spaces, 42
Cartesian space, 10 Dense set, 154
Cauchy sequence, 76 Derivntive, 11, 178
Cauchy's inequality, 75 Ileterniinaiits, 3!), 3'24
Cayley-Hamilton theorem, 115 Diagonal form, 7
Center, 95 Diagonal matrix, 45
Central force fields, 19 U)iaRoiinlizabilit?-,45
Chaiu rule, 17, 178 I )itl'eonwrphism, 2-13
SUBJECT INDEX

Differentiation operator, 142 Gradient, 17


Direct sum, 41 Gradient system, 199
Discrete dynamical system, 278, 280 Graph of map, 339
Discrete flow, 279 Gronwall’s inequality, 169
Iliscriminant, 96 Growth rate, 256
Distance, 10, 76
Dual basis, 205 H
Dual space, 36 Hamiltonian, 291, 293
Dual vector space, 204 Hamiltonian vector field, 291
Dynamical system, 5, 6, 159, 160 Hamilton’s equations, 291
Harmonic motion, 59
E Harmonic oscillator, 15, 105
Eccentricity, 26 Higher order linear equations, 138
Eigenspace, 110 Higher order systems, 102
Eigenvalue, 63 Homeomorphism, 312
Eigenvector, 42, 63 Homogeneous linear systems, 89
Elementary A-block, 127 Hopf bifurcation, 227
Elementary Jordan matrix, 127 Hyperbolic closed orbit, 311
Elementary nilpotent block, 122 Hyperbolic equilibrium, 187
Energy, 18, 289 Hyperbolic flow, 150
Entire orbit, 195 Hyperplane, 242
Equation of limited growth, 257
Equilibrium, 145 1
Equilibrium point, 180 Identity map, 322
Equilibrium state, 145, 181 Image, 34, 322
Euclidean three space, 287 Implicit function theorem, 338
Expansion, 149 Improper node, 93
Exponent (exp), 83 Independent set (subset), 34
Exponential, 74 Inductance, 213, 232
of operator, 82 Inductors, 211, 213, 232
Exponential approach, 181 Infinite seriea, 86
Exponential series, 83 Initial condition, 2, 162
Initial value problem, 2
F Inner product, 16, 75
Factorial, 83 In phase trajectories, 278
Field of force, 15 Integral, 23
Fixed point, 181, 279 Invariance, 198
Flow, 6, 175 Inverse, 33
Flow box, 243 Inverse function theorem, 337
Focus, 93 Invertibility, 33
Force field, 16, 17, 23 Isomorphism, 35
Fundamental theorem, 162 Iteration scheme, 168
Fundamental theorem of algebra, 328
Fundamental theory, 160 J
Jordan A-block, 127
G Jordan curve theorem, 254
Generalized eigenspace, 110 Jordan form, 127
Generalized momenta, 292 Jordan matrix, 127
Generic property, 154
Genericity, 188 K
Global section, 247 KCL,211, 229
Good vertices, 269 Kepler problem, 58
SUBJECT INDEX 357

Kepler’s first law, 23 Nilpotent canonical form, 122


Kernel, 33 Nodes, 93, 211, 229
Kinetic energy, 18, 288 Nonautonomous differential equations, 99, 296
Kirchhoff’s current law, 211 Nonautonomous perturbation, 308
KVL, 212, 230 Nondegenerate bilinear form, 290
Nonhomogeneous, 99
L Nonlinear sink, 182
Lagrange’s theorem, 194 Norm, 77
Latus rectum, 26 0
Legendre transformation, 292 Ohm’s law, 213
Length, 10, 76 o Limit point, 198, 239
Level surface, 195, 200 One-form, 205
Liapunov, 192 Onto mapping, 322
Liapunov function, 193 Open set, 76, 153
Liapunov’s theorem, 180 Operator, 30, 33
Lienard’s equat,ion, 210, 215 Orbits, 5
Limit cycle, 250 Order of differential equation, 22
Limit set, 239 Ordinary boundary points, 268
Limiting population, 257 Oriented branch, 229
Linear contraction, 279 Origin, 10
Linear flow, 97 Orthonormal basis, 206
Linear graph, 229
Linear map, 30, 33 P
Linear part, 181 Parallelogram law, 81
Linear subspace, 33 Parameter, 2
Linear transformation, 5 Parametrized differential equation, 227
Linearity properties, 30 Partial sums, 80
Linearly independent elements, 326 Passive resistor, 217
Liouville’s formula, 278 Peixoto, 314
Lipschitz constant, 163 Pendulum, 183
Lipschitz function, 163 Periodic attractor, 278
Local section, 242, 278 Periodic solutions, 95
Locally Lipschitz, 163 Perturbation, 304
Phase portrait, 4
M Phase space, 292
Manifolds, 232, 319 Physical states, 213, 232
Matrices (matrix), 8, 11 Physical trajectory, 234
Picard iteration, 177
Maxwell, 191
Minimal set, 241 PoincarbBendixson theorem, 239, 248
Mixed potential, 233 Poincar6 map, 278, 281
.Monotone along trajectory, 244 Pontryagin, 314
Multiplicity, 110 Positive definiteness, 75
of a root, 330 Positive invariance, 195
Potential energy, 17, 288
N Power, 231
Predator-prey equation, 259
n-body problem, 287 Primary decomposition theorem, 110
Neighborhood, 76, 305 Product of matrices, 32
Newtonian gravitational field, 24 Proper subspace, 33
Newton’s equations, 289
Newton’s second law, 15 R
nil(z), 334 Rank, 41
Nilpotent, 112, 117 Real canonical form, 130
SUBJECT INDEX

Real distinct eigenvalues, 46 Symmetric matrix, 46, 207


Real eigenvalue, 42 Symmetry, 75
Real logarithm of operator, 132 Symplectic form, 290
Recurrent point, 248 System, 3
Regular point, 200 of differential equations, 9
Residual sets (subsets), 158
Resistors, 211, 213 T
Restriction, 322 Tangent vector, 3, 11
RLC circuit, 211 Tellegen's theorem, 231
Time one map, 279
S Total energy, 18, 289
Saddle, 92 Trace, 40
Saddle point, I90 Trajectories, 5
Scalars, 33 Translates of vectors, 10
Section map, 221 Transversal crossing, 267
Self-adjoint operator, 207 Transverse to vector field, 242
Semisimplicity, 63, 65, 116, 117 Trivial subspace, 33
Separation of variables, 261
Separatrices, 272 U
Sequence, 76 Uncoupling, 3, 67
Series, 80 Undetermined coefficients, 52
Singularity point, 181 Uniform continuity, 87
Sink, 145, 180, 181, 280 Uniform norm, 82
Similar matrices, 39 Unit ball, 83
Simple harmonic motion, 59 Unlimited growth, 256
Simple mechanical system, 289 Unstable equilibrium, 186
Social phenomena, 257 Unstable subspace, 151
Solution of differential equation, 161
Solution space, 35 V
Source, 95, 149, 190 Variation of constants, 99
Space derivative, 300 Variational equation, 299
of states, 22 Van der Pol's equation, 210, 215, 217
of unrestricted states, 231 Vector, 10, 33
Stability, 145 Vector field, 4, 11
of equilibria, 180 Vector space, 30,33
Stable closed orbit, 285 Vector structure on R",30
Stable equation, 3 Velocity vector, 18
Stable equilibrium, 185 Vertices, 268
Stable fixed point, 285 Voltage, 212
Stable manifolds, 272 Voltage drop, 212
Stable subspace, 151 Voltage potential, 212, 230
Standard basis, 34 Voltage state, 212, 230
State space, 23, 289 Volterra-Lotka equations, 259, 262
States, 22
Stationary point, 181 W
Structural stability, 304, 313 Work, 17
Subspace, 33
Summation sign, 323 Z
Symbiosis, 273 Zero, 181
Pure and Applied Mathematics
A Series of Monographs and Textbooks

Editors Samuel Ellenberg and Hymen Bass


Columbia University, New York

RECENT TITLES

ROBERTA. ADAMS.Sobolev Spaces


JOHNJ. BENEDETTO. Spectral Synthesis
D. V. WIDDER. T h e Heat Equation
IRVING EZRASECN.. Mathematical Cosmology and Extragalactic Astronomy
J. DIEUDONNE. Treatise on Analysis : Volume 11, enlarged and corrected printing ;
Volume I V ; Volume V ; Volume V I
WERNER GREUB,STEPHEN HALPWIN, AND RAYVANSTONE. Connections, Curvature, and
Cohomology : Volume 111, Cohomology of Principal Bundles and Homogeneous Spaces
I. MARTINISAACS. Character Theory of Finite Groups
JAMES R. BROWN. Ergodic Theory and Topological Dynamics
C. TRUESDELL. A First Course in Rational Continuum Mechanics: Volume 1, General
Concepts
GEORGE GRATZER. General Lattice Theory
K. D. STROYAN A N D W. A. J. LUXEMBURG. Introduction to the Theory of Infinitesimals
B. M. PUTTASWAMAIAH A N D JOHN D. DIXON. Modular Representations of Finite
Groups
MELVYNBERCER. Nonlinearity and Functional Analysis : Lectures on Nonlinear Problems
in Mathematical Analysis
CHARALAMBOS D. ALIPRANTIS A N D OWENBURKINSHAW. Locally Solid Riesz Spaces
J A N MIKUSINSKI. The Bochner Integral
THOMAS JECH. Set Theory
CARLL. DEVITO.Functional Analysis
MICRIELHAZEWINKEL. Formal Groups and Applications
SIGURDUR HELGASON. Differential Geometry, Lie Groups, and Symmetric Spaces
C. TRUESIIELLA N D R. G. MUNCASTEH. Fundamentals of Maxwell’s Kinetic Theory of a
Simple Monatomic Gas : Treated as a Branch of Rational Mechanics
ROBERTB. BURCKEI.. An Introduction T o Classical Complex Analysis : Volume 1

In preparation
Lours HALIEROWEN.Polynoininal Identities in Ring Theory
JOSEPH J. ROTMAN.
An Introduction to Homological Algebra
ROBERT B. BURCKEI..
An Introductioii To Classical Complex Analysis : Volume 2

H
1 3
1 4

You might also like