Harvard Math-23a (Notes, Problems, Syllabus)
Harvard Math-23a (Notes, Problems, Syllabus)
Strange as it may seem, Part I of the math placement test that freshmen have
taken is the most important. Students who do well in Math 23 have almost all
scored 26 or more out of 30 on this part.
Extension students who register for graduate credit are required to learn and
use the scripting language R. This option is also available to everyone else in the
course. You need to be only an experienced software user, not a programmer.
Council and the full faculty last April, the Ad Board will no longer approve such
petitions.
With regard to athletic practices that occur at the same time as classes, policy
is less well defined. Here is the view of the assistant director of athletics:
The basic answer is that our coaches should be accommodating to any academic conflict that comes up with class scheduling. Kids should be able to take
the classes they want and still be a part of the team. Especially for classes that
would only cause a student to miss a small part of a practice.
What complicates things are the classes that would cause a student to miss an
entire practice for 2-3 days a week. Those instances make it hard for a student to
engage fully in the sport and prepare adequately for competition.
Its hard for freshmen to ask a coach - the adult they have the closest relationship to in campus - for practice accommodations but in my experience many of
them will work with students on their total experience
The Math 23 policy, based on this opinion: It is OK to take Math 23a and
practice for your sport every Tuesday, but you must not miss Thursday lecture for
a practice.
Extension students may choose between attending lecture or watching videos.
However, students in Math E-23a who will not regularly attend lecture on Thursday
should sign up for a section that meets as late as possible. Then, with occasional
exceptions, they can watch the video of the Thursday lecture to prepare for section.
Sections will begin on September 10-11. Students should indicate their preferences for section time using the student information system. More details will be
revealed once the software is complete!
In order to include your name on a section list, we must obtain your permission
(on the sectioning form) to reveal on the Web site that you are a student taking
Math 23a or E-23a. If you wish to keep this information secret, we will include
your name in alphabetical order, but in the form Xxxx Xxxxxx.
Quizzes are held in the Yenching Auditorium, 2 Divinity Avenue. They run
from 6 to 9 PM, but you can arrive any time before 7 PM, since 120 minutes should
be enough time for the quiz.
Keep these time slots open. Do not, for example, schedule a physics lab
or an LS 1a section on Wednesday evenings. If you know that you tend to work
slowly, it would also be unwise to schedule another obligation that leaves only part
of that time available to you!
Students who have exam accommodations, properly documented by a letter
from the Accessible Education Office, may need to take their quizzes in a separate
location. Please provide the AEO letters as early in the term as you can, since we
may need to reserve one or more extra rooms.
The last day to drop and add courses (like Math 23a and Math 21a) is Monday,
October 5. This is before the first quiz. It is important that you be aware of how
you are managing the material and performing in the course. It is not a good
idea to leave switching out of any course (not just Math 23) until the fifth Monday. Decisions of this nature are best dealt with in as timely a manner as possible!!
Quizzes will include questions that resemble the ones done in the early sections, and each quiz will include two randomly-chosen proofs from among the
numbered proofs in the relevant module. There may be other short proofs similar to ones that were done in lecture and problems that are similar to homework
problems. However if you want quizzes on which you are asked to prove difficult
theorems that you have never seen before, you will need to take Math 25a or 55a,
not Math 23a.
If you have an unexpected time confilct for one of the quizzes, contact Kate
as soon as you know about it, and special arrangements can be made. Distance
students will take their quizzes near their home but on the same dates.
The final examination will focus on material from the last five weeks of the
course. Local Extension students will take it at the same time and place as undergraduates. The time (9AM or 2PM) will be revealed when the exam schedule is
posted late in September. If you have two or even three exams scheduled for that
day, dont worry: that is a problem for the Exams Office, not you, to solve.
Except for the final examination, local Extension students can meet all their
course obligations after 5:30pm.
Distance extension students who do not live near Cambridge and cannot
come to Harvard in the evening to hand in homework, attend section and office
hours, take quizzes, and present proofs can still participate online in all course
activities. Details will be available in a separate document. Since this fully-online
Textbooks:
Vector Calculus, Linear Algebra, and Differential Forms, Hubbard and Hubbard,
fourth edition, Matrix Editions, 2009. Try to get the second printing, which includes a few significant changes to chapters 4 and 6.
This book is in stock at the Coop, or you can order it for $84 plus $10 for
priority shipping from the publishers Web site at
http://matrixeditions.com/UnifiedApproach4th.html. The Student Solution
Manual for the fourth edition, not in stock at the Coop, is also available from that
Web site.
We will cover Chapters 1-3 this term, Chapters 4-6 in Math 23b; so this one
textbook will last for the entire year.
Ross, Elementary Analysis: The Theory of Calculus, 2nd Edition, 2013.
This will be the primary text for the module on single-variable real analysis.
It is available electronically through the Harvard library system (use HOLLIS and
search for the author and title). If you like to own bound volumes, used copies can
be found on amazon.com for as little as $25, but be sure to get the correct edition!
Lawvere, Conceptual mathematics: a first introduction to categories, 2nd Edition, 2009.
We will only be using the first chapter, and the book is available for free
download through the Harvard library system.
Proofs:
Learning proofs can be fun, and we have put a lot of work into designing an
enjoyable way to learn high level and challenging mathematics! Each weeks course
materials includes two proofs. Often these proofs appear in the textbook and will
also be covered in lecture. They also may appear as quiz questions.
You, as students, will earn points towards your grade by presenting these proofs
to teaching staff and to each other without the aid of your course notes. Here is
how the system works:
When we first learn a proof in class, only members of the teaching staff are qualified listeners. Anyone who presents a satisfactory proof to a qualified listener
also becomes qualified and may listen to proofs by other students. This process of
presenting proofs to qualified listeners occurs separately for every proof.
You are expected to present each proof before the date of the quiz on which it
might appear; so each proof has a deadline date. Distance students may reference
the additional document which details how to go about remotely presenting proofs
to classmates and teaching staff.
Each proof is worth 1 point. Here is the grading system:
Presenting a proof to Paul, Kate, one of the course assistants, or a fellow
student who has become a qualified listener: 0.95 points before the deadline,
0.8 points after the deadline. You may only present each proof once.
Listening to a fellow students proof: 0.1 point. Only one student can receive
credit for listening to a proof.
After points have been tallied at the end of the term, members of the course
staff may assign the points that they have earned by listening to proofs
outside of section to any students that they feel deserve a bit of extra credit.
Students who do the proofs early and listen to lots of other students proofs can
get more than 100%, but there is a cap of 30 points total.You can almost reach
this cap by doing each proof before the deadline and listening twice to each proof.
Either you do a proof right and get full credit, or you give up and try again
later. There is no partial credit. It is OK for the listener to give a couple of small
hints.
You may consult the official list of proofs that has the statement of each theorem
to be proved, but you may not use notes. That will also be the case when proofs
appear on quizzes and on the final exam.
It is your responsibility to use the proof logging software on the course
Web site to keep a record of proofs that you present or listen to. You can also
use the proof logging software to announce proof parties and to find listeners for
your proofs.
Each quiz will include two questions which are proofs chosen at random from
the four weeeks of relevant material. The final exam will have three proofs, all from
material after the second quiz. Students generally do well on the proof questions.
Useful software:
R and RStudio
This is required only for Extension students who register for graduate credit,
but it is an option for everyone. Consider learning R if you
are interested in computer science and want practice in using software
to do things that are more mathematical than can be dealt with in CS
50 or 51.
are thinking of taking a statistics course, which is likely to use R.
are hoping to get an interesting summer job or summer internship that
uses mathematics or deals with lots of data.
want to be able to work with large data files in research projects in any
field (life sciences, economics and finance, government, etc.)
R is free, open-source software. Instructions for download and installation
are on the Web site. You will have the chance to use R at the first section
on Thursday, September 10 or Friday, September 11; so install it right away,
preferably on a laptop computer that you can bring to section.
On the course Website are a set of R scripts, with accompanying YouTube
videos, that explain how to do almost every topic in the course by using
R. These scripts are optional for undergraduate, but they will enhance your
understanding both of mathematics and of R.
10
LaTeX
This is the technology that is used to create all the course handouts. Once
you learn how to use it, you can create professional-looking mathematics on
your own computer.
The editor that is built into the Canvas course Web site is based on LaTeX.
One of the course requirements is to upload four proofs to the course Web site
in a medium of your choice. One option is to use LaTeX. Alternatively, you
can use the Canvas file editor (LaTeX based), or you can make a YouTube
video.
I learned LaTeX without a book or manual by just taking someone elses files,
ripping out all the content, and inserting my own, and so can you. You will
need to download freeware MiKTeX version 2.9 (see http://www.miktex.org),
which includes an integrated editor named TeXworks.
From http://tug.org/mactex/ you can download a similar package for the
Mac OS X.
When in TeXworks, use the Typeset/pdfLaTeX menu item button to create
a .pdf file. To learn how to create fractions, sums, vectors, etc., just find an
example in the lecture outlines and copy what I did. All the LaTeX source
for lecture outlines, assignments, and practice quizzes is on the Web site, so
you can find working models for anything that you need to do.
If you create a .pdf file for your homework, please print out the files and
hand in the paper at class. An exception can be made if if you are a distance
Extension student or for some other good reason you are not in Cambridge
on the due date.
The course documents contain examples of diagrams created using TikZ,
the built-in graphics editor. It is also easy to include .jpg or .png files
in LaTeX. If you want to create diagrams, use Paint or try Inkscape at
http://www.inkscape.org, an excellent freeware graphics program. Students have found numerous other solutions to the problem of creating graphics, so just experiment.
If you create a .pdf file for your homework, please print out the files and hand
in the paper. By default, undergraduates and local Extension students may
submit the assignment electronically only if you are out of town on the due
date. Individual section instructors may adopt a more liberal poicy about
allowing electonic submission. Do not submit .tex files.
11
Use of R:
You can earn R bonus points in three ways:
By being a member of a group that uploads solutions to section problems
that require creation of R scripts. These will be available most, but not all,
weeks. (about 10 points)
By submitting R scripts that solve the optional R homework problems (again
available most, but not all, weeks). (about 20 points)
By doing a term project in R. (about 20 points)
To do the graduate credit grade calculation, we wiil add in your R bonus
points to the numerator of your score. To the denominator, we will add in 95%
of your bonus points or 50% of the possible bonus points, whichever is greater.
Earning a lot of R points is essential if you are registered for graduate credit. Otherwise,earning more than half the bonus points is certain to raise your percentage
score a bit, and it can make a big difference if you have a bad day on a quiz or on
the final exam.
12
13
14
Switching from Math 25a to Math 23b at midyear requires you to teach
yourself about multivariable differential calculus and manifolds, but a handful
of students do it every year, and it generally works out OK.
Special material for Physics 15b and Physics 153
Math 23b does an excellent treatment of vector calculus (div, grad, and curl)
and its relation to differential form fields and the exterior derivative. Alas, this
material is needed in Physics 15b and Physics 153 before we can reach it in Math
23.
Week 13 covers these topics in a manner that relies only on Math 23a, never
mentioning muliple integrals. This will be covered in a special lecture during
reading period, and there will be a optional ungraded problem set. If you choose
to do this topic, which physics students last year said was extremely useful, there
will be one question about it on the final exam, which you can use to replace your
lowest score on one of the other questions.
If you are not taking Physics 15b or Physics 153, just wait to see this material
in Math 23b.
YouTube videos
These were made as part of a rather unsuccessful pedagogical experiment last
year. They are quite good, but you will need some extra time to watch them.
The Lecture Preview Videos were made by Kate. They cover the so-called
Executive Summaries in the weekly course materials, which go over all the
course materials, but without proofs or detailed examples.
If you watch these videos (it takes about an hour per week) you will be very
well prepared for lecture, and even the most difficult material will make sense
on a first hearing.
Last years experiment was unsuccessful because we assumed in lecture that
everyone had watched these videos, when in fact only half the class did
so. Those who did not watch them complained, correctly, that the lectures
skipped over basic material in getting to proofs and examples. This years
lectures will be self-contained, so the preview videos are not required viewing.
The R script videos were made by Paul. They provide a line-by-line explanation of the R scripts that accompany each weeks materials.
Last years experiment was unsuccessful because going over these scripts in
class was not a good use of lecture time. If you are doing the graduate
option, these scripts are pretty much required viewing, although the scripts
are so thoroughly commented that just working through them on your own
is perhaps a viable alternative.
If you are doing just the undergraduate option, you can ignore the R scripts
completely.
15
16
Date
Topic
September 3-11
Fields, vectors and matrices
September15-18
Dot and cross products; Euclidean geometry of Rn
September 22-25 Row reduction, independence, basis
Sept. 29 - Oct. 2 Eigenvectors and eigenvalues
October 6-9
Number systems and sequences
October 7
QUIZ 1 on weeks 1-4
Week 6
October 13-16
Series, convergence tests, power series
Week 7
October 20-23
Limits and continuity of functions
Week 8
October 27-30
Derivatives, inverse functions, Taylor series
Week 9
November 3-6
Topology, sequences in Rn , linear differential equations
October 29
QUIZ 2 on weeks 5-8
Week 10
November 10-13 Limits and continuity in Rn ; partial and directional derivatives
Week 11
November 17-20 Differentiability, Newtons method, inverse functions
Fortnight 12 Nov. 24-Dec. 3
Manifolds, critical points, Lagrange multipliers
November 26
Thansksgiving
Half-week 13 December 8
Calculus on parametrized curves; div, grad, and curl
December ?
FINAL EXAM on weeks 9-12
This schedule covers all the math that is needed for Physics 15a, 16, and 15b
with the sole exception of surface integrals, which will be done in the spring.
The real analysis in Math 23a alone will be sufficient for most PhD programs in
economics, though the most prestigious programs will want to see Math 23b also.
All the mathematics that is used in Economics 1011a will be covered by the end
of the term. The coverage of proofs is complete enough to permit prospective
Computer Science concentrators to skip CS 20.
Abstract vector spaces and multiple integration, topics of great importance to
prospective math concentrators, have all been moved to Math 23b.
17
R Scripts
Script 1.1A-Finite Fields.R
Topic 1 - Why the real numbers form a field
Topic 2 - Making a finite field, with only five elements
Topic 3 - A useful rule for finding multiplicative inverses
Script 1.1B-PointsVectors.R
Topic 1 - Addition of vectors in R2
Topic 2 - A diagram to illustrate the point-vector relationship
Topic 3 - Subtraction and scalar multiplication
Script 1.1C-Matrices.R
Topic 1 - Matrices and Matrix Operations in R
Topic 2 - Solving equations using matrices
Topic 3 - Linear functions and matrices
Topic 4 - Matrices that are not square
Topic 5 - Properties of the determinant
Script 1.1D-MarkovMatrix
Topic 1 - A game of volleyball
Topic 2 - traveling around on ferryboats
Script 1.1L-LinearMystery
Topic 1 - Define a mystery linear function f M yst : R2 R2
Executive Summary
Quantifiers and Negation Rules
The universal quantifier is read for all.
The existential quantifier exists is read there exists. It is usually
followed by s.t, a standard abbreviation for such that.
The negation of x, P (x) is true is x, P (x) is not true.
The negation of x, P (x) is true is x, P (x) is not true.
The negation of P and Q are true is either P or Q is not true.
The negation of either P or Q is true is both P and Q are not true.
Functions
A function f needs two sets: its domain X and its codomain Y .
f is a rule that, to any element x X, assigns a specific element y Y .
We write y = f (x)
f must assign a value to every x X, but not every y Y must be of the
form f (x). The subset of the codomain consisting of elements that are of
the form y = f (x) is called the image of f . If the image of f is all of the
codomain Y , f is called surjective or onto
f need not assign different of elements of Y to different elements of X. If
x1 6= x2 = f (x1 ) 6= f (x2 ), f is called injective or one-to-one
If f is both surjective and injective, it is bijective and has an inverse f 1 .
Categories
A category C has objects (which might be sets) and arrows (which might
be functions)
An arrow f must have a specific domain objectX and a specific codomain
f
object Y ; we write f : X Y or X
Y.
If arrows f : X Y and g : Y Z are in the category, then the composition arrow f g : X Z is in the category.
For any object X there is an identity arrow IX : x X
Given f : X Y , f IX = f and IY f = f .
f
g
h
Associative law: given X
Y
Z
W , h (g f ) = (h g) f
Given an arrow f : X Y , an arrow g : Y X such that g f = IX is
called a retraction.
Given an arrow f : X Y , an arrow g : Y X such that f g = IY is
called a section.
If, for arrow f , arrow g is both a retraction and a section, then g is the
inverse of f , g = f 1 , and g must be unique.
Almost everything in mathematics is a special case of a category.
4
1.1
A field F is a set of elements for which the familiar operations of addition and
multiplication are defined and behave in the usual way. Here is a set of axioms
for a field. You can use them to prove theorems that are true for any field.
1. Addition is commutative: a + b = b + a.
2. Addition is associative: (a + b) + c = a + (b + c).
3. Additive identity: 0 such that a F, 0 + a = a + 0 = a.
4. Additive inverse: a F, a such that a + a = a + (a) = 0.
5. Multiplication is associative: (ab)c = a(bc).
6. Multiplication is commutative: ab = ba.
7. Multiplicative identity: 1 such that a F, 1a = a.
8. Multiplicative inverse: a F {0}, a1 such that a1 a = 1.
9. Distributive law: a(b + c) = ab + ac.
Examples of fields include:
The rational numbers Q.
The real numbers R.
The complex numbers C.
The finite field Zp , constructed for any prime number p as follows:
Break up the set of integers into p subsets. Each subset is named after the
remainder when any of its elements is divided by p.
[a]p = {m|m = np + a, n Z}
Notice that [a + kp]p = [a]p for any k. There are only p sets, but each has
many alternate names. These p infinite sets are the elements of the field
Zp .
Define addition by [a]p + [b]p = [a + b]p . Here a and b can be any names for
the subsets, because the answer is independent of the choice of name. The
rule is Add a and b, then divide by p and keep the remainder.
Define multiplication by [a]p [b]p = [ab]p . Again a and b can be any names
for the subsets, because the answer is independent of the choice of name.
The rule is Multiply a and b, then divide by p and keep the remainder.
1.2
F n denotes the set of ordered lists of n elements from a field F . Usually the field
is R, but it could be the field of complex numbers C or a finite field like Z5 .
A given element of F n can be regarded either as a point, which represents
position data, or as a vector, which represents incremental data.
If an element of F n is a point, we represent it by a bold letter like p and write
it as a column of elements enclosed in parentheses.
1.1
p = 3.8
2.3
If an element of F n is a vector, we represent it by a bold letter with an arrow
like ~v and write it as a column of elements enclosed in square brackets.
0.2
~v = 1.3
2.2
To add a vector to a point, we add the components in identical positions together.
The result is a point: q = p + ~v. Geometrically we represent this by anchoring
the vector at the initial point p. The location of the arrowhead of the vector is
the point q that represents our sum.
q
~v
2~v
1.3
The standard basis vector ~ek has a 1 as its kth component, and all its other
components are 0. Since the additive identity 0 and the multiplicative identity
1 must be present an any field, there will always be n standard basis vectors in
F n . Geometrically, the standard basis vectors in R2 are usually associated with
one unit east and one unit north respectively.
~e2
~e1
1.4
n
X
gi,j vj
j=1
1.5
Matrix multiplication
n
X
gi,k hk,j .
k=1
1.6
0
1
B 2 1
2 0
2 1
0
A
1 1 2
0
1
1 1 2
3 2 BA
B 2 1 3
2 0
4 2 0
The number of columns in the first factor must equal the number of rows in
the second factor.
2 1
0
2 1
A
AB
1 1 2 6 2
1.7
Function inverses
1.8
a b
For matrix A =
, det A = ad bc. If you fix one column, it is a linear
c d
function of the other column, and it changes sign if you swap the two columns.
1.9
Matrix inverses
1.10
Matrix transposes
The transpose of a given matrix A is written AT . The two are closely related.
The rows of A are the columns of AT and the columns of A are the rows of AT .
a b
a c
T
A=
,A =
c d
b d
The transpose of a matrix product is the product of the transposes, but in
the opposite order:
(AB)T = B T AT
A similar rule holds for matrix inverses:
(AB)1 = B 1 A1
1.11
In these examples, the sum of products rule for matrix multiilpication arises
naturally, and so it is efficient to use matrix techniques.
Counting paths: Suppose we have four islands connected by ferry routes:
1
0 0 1 1
1 0 0 0
Lecture Outline
1. Quantifiers and negation
Especially when you are explaining a proof to someone, it saves some writing
to use the symbols (there exists) and (for all).
Be careful when negating these.
The negation of x, P (x) is true is x, P (x) is not true.
The negation of x, P (x) is true is x, P (x) is not true.
When negating a statement, also bear in mind that
The negation of P and Q are true is either P or Q is not true.
The negation of either P or Q is true is both P and Q are not true.
For practice, lets negate the following statements (which may or may not
be true!)
There exists an even prime number.
Negation:
All 11-legged alligators are orange with blue spots. (Hubbard, page 5)
Negation:
10
2. Set notation
Here are the standard set-theoretic symbols:
(is an element of)
{a|p(a)} (set of a for which p(a) is true)
(is a subset of)
(intersection)
(union)
(Cartesian product)
- or \ (set difference)
Using the integers Z and the real numbers R, lets construct some sets. In
each case there is one way to describe the set using a restriction and another
more constructive way to describe the set.
The set of real numbers whose cube is greater than 8 in magnitude.
Restrictive:
Constructive:
The set of coordinate pairs for points on the circle of radius 2 centered
at the origin (an example of a smooth manifold).
Restrictive:
Constructive:
11
3. Function terminology:
Here are some terms that should be familiar from your study of precalculus
and calculus:
Example a
Example b
Example c
domain
codomain
image
one-to-one = injective
onto = surjective
invertible = bijective
Using the sets X = {1, 2} and Y = {A, B, C}, draw diagrams to illustrate
the following functions, and fill in the table to show how the terms apply
to them:
f : X Y, f (1) = A, f (2) = B.
12
Did your calculus course use range as a synonym for image or for
codomain?
13
4. Composition of functions
Sometimes people find that a statement is hard to prove because it is so
obvious. An example is the associativity of function composition, which
will turn out to be crucial for linear algebra.
Prove that (f g) h = f (g h). Hint: Two functions f1 and f2 are equal
if they have the same domain X and, x X, f1 (x) = f2 (x).
Consider the set of men who have exactly one brother and least one son.
h(x) = father of x, g(x) = brother of x, f (x) = oldest son of x
f g is called
(f g) h is
g h is called
f (g h) is
Simpler name for both (f g) h and f (g h)
14
Given X
Y
Z
W , h (g f ) = (h g) f
The objects do not have to be sets and the arrows do not have to be
functions. For example, the objects could be courses, and an arrow from
course X to course Y could mean if you have taken course X, you will
probably do better in course Y as a result. Check that the identity and
composition rules are satisfied.
15
However, it has a preinverse (my terminology the official word is section.) Starting at an element of Y , choose any element of X from which
there is an arrow to that element. Call that function g. Then f g = IY
but g f 6= IX . Furthermore, g is not unique.
Prove the cancellation law that if f has a section and h f = k h, then
h = k (another proof that is valid in any category!)
It has a postinverse (the official word is retraction). Just reverse all the
arrows to undo its effect, and define g however you like on the element of
Y that is not in the image of f . Then g f 6= IX g f = IX but f g 6= IY .
16
7. Fields
Loosely speaking, a field F is a set of elements for which the familiar operations of arithmetic are defined and behave in the usual way. Here is a set
of axioms for a field. You can use them to prove theorems that are true for
any field.
(a) Addition is commutative: a + b = b + a.
(b) Addition is associative: (a + b) + c = a + (b + c).
(c) Additive identity: 0 such that a F, 0 + a = a + 0 = a.
(d) Additive inverse: a F, a such that a + a = a + (a) = 0.
(e) Multiplication is associative: (ab)c = a(bc).
(f) Multiplication is commutative: ab = ba.
(g) Multiplicative identity: 1 such that a F, 1a = a.
(h) Multiplicative inverse: a F {0}, a1 such that a1 a = 1.
(i) Distributive law: a(b + c) = ab + ac.
This set of axioms for a field includes properties (such as the commutativity
of addition) that can be proved as theorems by using the other axioms. It
therefore does not qualify as an independent set, but there is no general
requirement that axioms be independent.
Some well-known laws of arithmetic are omitted from the list of axioms
because they are easily proved as theorems. The most obvious omission is
a F, 0a = 0.
Here is the proof. What axiom justifies each step?
0 + 0 = 0 so (0 + 0)a = 0a.
0a + 0a = 0a.
(0a + 0a) + (0a) = 0a + (0a).
0a + (0a + (0a)) = 0a + (0a).
0a + 0 = 0.
0a = 0.
17
8. Finite fields
Computing with real numbers by hand can be a pain, and most of linear
algebra works for an arbitrary field, not just for the real and complex numbers. Alas, the integers do not form a field because in general there is no
multiplicative inverse. Here is a simple way to make from the integers a
finite field in which messy fractions cannot arise.
Choose a prime number p.
Break up the set of integers into p subsets. Each subset is named after
the remainder when any of its elements is divided by p.
[0]p = {m|m = np, n Z}
[1]p = {m|m = np + 1, n Z}
[a]p = {m|m = np + a, n Z}
Notice that [a + kp]p = [a]p for any k. There are only p sets, but each
has many alternate names.
These p infinite sets are the elements of the field Zp .
Define addition by [a]p + [b]p = [a + b]p . Here a and b can be any
names for the subsets, because the answer is independent of the choice
of name. The rule is Add a and b, then divide by p and keep the
remainder.
What is the simplest name for [5]7 + [4]7 ?
18
9. Rational numbers
The rational numbers Q form a field. You learned how to add and multiply
them years ago! The multiplicative inverse of ab is ab as long as a 6= 0.
The rational numbers are not a big enough field for doing Euclidean
geometry or calculus. Here are some irrational quantities:
2
.
most values of trig functions, exponentials, or logarithms.
coordinates of most intersections of two circles.
10. Real numbers
The real numbers R constitute a field that is large enough so that any
characterization of a number in terms of an infinite sequence of real numbers
still leads to a real number.
A positive real number is an expression like 3.141592... where there is no
limit to the number of decimal places that can be provided if requested.
To get a negative number, put a minus sign in front. This is Hubbards
definition.
An equivalent viewpoint is that a positive real number is the sum of an
integer and an infinite series of the form
X
i=1
ai (
1 i
)
10
The rational numbers and the real numbers are both ordered fields. This
means that there is a subset of positive elements that is closed under both
addition and multiplication. No finite field is ordered.
In Z5 , you can name the elements [0], [1], [2], [2], [1], and try to call the
elements [1] and [2] positive. Why does this attempt to make an ordered
field fail?
19
11. Proof 1.1 - two theorems that are valid in any field
(a) Using nothing but the field axioms, prove that if ab = 0, then either a
or b must be 0.
(b) Using nothing but the field axioms, prove that the additive inverse
of an element a is unique. (Standard strategy for uniqueness proofs:
assume that there are two different inverses b and c, and prove that
b = c.
20
1.1
p = 3.8 ,
2.3
If an element of F n is a vector, we represent it by by a bold letter with
an arrow like ~v and write it as a column of elements enclosed in square
brackets.
0.2
~v = 1.3
2.2
13. Relation between points and vectors, inspired by geometry:
Add vector ~v component by component to point A to get point B.
Subtract point A component by component from point B to get vector
~v.
~
Vector addition: if adding ~v to point A gives point B and adding w
~ to point A gives point
to point B gives point C, then adding ~v + w
C.
A vector in F n can be multiplied by any element of F to get another
vector.
Draw a diagram to illustrate these operations without use of coordinates,
as is typically done in a physics course.
21
1.4
2.4
,q =
3.8
4.8
What is p + ~v?
What is ~v 1.5~
w?
What, if anything, is p + q?
22
15. Subsets of F n
A subset of F n can be finite, countably infinite, or uncountably infinite.
The concept is especially useful when the elements of F n are points, but it
is valid also for vectors.
Examples:
0
1
2
(a) In
consider the set {
,
,
}.
1
2
0
This will turn out (outline 7) to be a line in the small affine faculty
senate. Write it in the form {p + t~v|t Z3 }.
Z23 ,
(b) In R2 , consider the set of points whose coordinates are both positive
integers. Is it finite, countably infinite, or uncountably infinite?
x
(d) In R , draw a diagram that might represent the set of points
,
y
where x is family income and y is family net worth, for which a family
qualifies for free tuition.
2
23
16. Subspaces of F n
A subspace is defined only when the elements of F n are vectors. It must
be closed under vector addition and scalar multiplication. The second requirement means that the zero vector must be in the subspace. The empty
set is not a subspace!
Geometrically, a subspace corresponds to a flat subset (line, plane, etc.)
that includes the origin.
For R3 there are four types of subspace. What is the geometric interpretation of each?
0
1-dimensional if
24
xi e~i
i=1
This will turn out to be true also in an abstract n-dimensional vector space,
but in that case there will be no standard basis.
18. Another meaning for field
Physicists long ago started using the term field to mean a function
that assigns a vector to every point. Examples are the gravitational field,
electric field, and magnetic field.
Another example: in a smoothly flowing stream or in a blood vessel, there
is a function that assigns to each point the velocity vector of the fluid at
that point: a velocity field.
x1
If
is the point whose coordinates are the interest rate x1 and the
x2
unemployment rate x2 , then the Fed chairman probably has in mind the
function that assigns to this point a vector: the expected change in these
quantities over the next month.
x1
~
A function F
that assigns to this point a vector of rates of change:
x2
dx1
dt
~ x1
dx2 = F
x2
dt
specifies a linear differential equation involving two variables. In November
you will learn to solve such equations by matrix methods.
Here is a formula for a vector field from Hubbard, exercise 1.1.6 (b). Plot
it.
x
x
F~
=
.
y
0
25
Here are formulas for vector fields from Hubbard, exercise 1.1.6, (c) and
(e). Plot them. If you did Physics C Advanced Placement E&M, they may
look familiar.
F~
x
x
=
,
y
y
F~
x
y
=
y
x
19. Matrices
An m n matrix over a field F is a rectangular array of elements of F with
m rows and n columns. Watch the convention: the height is specified first!
As a mathematical object, any matrix can be multiplied by any element of
F . This could be meaningless in the context of an application. Suppose
you run a small hospital that has two rooms with three patients in each.
Then
98.6 102.4 99.7
103.2 98.3 99.6
26
0
1
2 1
0
A=
, B = 2 1
1 1 2
2 0
0
1
Compute AB. 2 1
2 0
2 1
0
1 1 2
2 1
0
Compute BA.
1 1 2
0
1
2 1
2 0
0 1
Find AB.
2 1
1 1
1 0
1 1
Find BA.
1 0
0 1
2 1
27
Since A(xi e~i + xj e~j ) is the sum of xi times column i and xj times column
j, we see that
f (xi e~i + xj e~j ) = xi f (~
ei ) + xj f (~
ej )
This is a requirement if f is to be a linear function.
2
Use matrix multiplication to calculate f (
).
1
The rule for forming the product AB can be stated in terms of the rule for
a matrix acting on a vector: to form AB, just let A act on each column of
B in turn, and put the results side by side to create the matrix AB.
What function does the matrix product AB represent? Consider (AB)~
ei .
This is the ith column of the matrix AB, and it is also the result of letting
B act on e~i , then letting A act on the result. So for any standard basis
vector, the matrix AB represents the composition A B of the functions
represented by B and by A.
What about the matrices (AB)C and A(BC)? These represent the composition of three functions: say (f g) h and f (g h). But we already know
that composition of functions is associative. So we have proved, without
any messy algebra, that multiplication of matrices is associative also.
28
If ai,j represents the entry in the ith row, jth column of A, then
(AB)i,k =
m
X
ai,j bj,k
j=1
((AB)C)i,q =
p
X
(AB)i,k ck,q =
p
m X
X
j=1 k=1
k=1
(BC)j,q =
(A(BC))i,q =
On what basis can you now conclude that matrix multiplication is associative for matrices over any field F ?
Group problem 1.1.1c offers a more elegant version of the same proof by
exploiting the fact that matrix multiplication represents composition of
linear functions.
29
1 0 0
I3 = 0 1 0
0 0 1
24. Matrices as the arrows for a category C
Choose a field F , perhaps the real numbers R.
An object of C is a vector space F n .
An arrow of C is an n m matrix A, with domain F m and codomain
F n.
B
Given F p
Fm
F n the composition of arrows A and B is the
matrix product AB. Show that the shape of the matrices is right
for multiplication.
30
.75 .25
4
A = .5 .5 , ~v = A
=
8
.25 .75
By elementary algebra you can reconstruct the price of silver and of gold
from the price of any two of the alloys, so it is no surprise to find two
different left inverses. Apply each of the following to ~v.
2 1 0
B1 =
, B1~v =
2 3 0
0 3 2
B2 =
, B2~v =
0 1 2
However, in this case there is no right inverse.
If m < n, then A takes a vector in Rn and produces a shorter vector in
Rm . In general, there will be no left inverse matrix B that can recover the
n
original
vector
in R , but there may be many different right inverses. Let
A = 1 1 and find two different right inverses. In the lingo of categories,
such a matrix A is a section.
31
a b
A=
c d
then
1
1
d b
=
ad bc c a
If ad bc = 0 then no inverse
3
Write down the inverse of
4
exists.
1
, where the elements are in R.
2
32
The matrix
inversion recipe works in any field: try inverting
3 1
A=
where the elements are in Z5 .
4 2
3 1 2
A = 1 2 3
2 3 4
3 0 0
B = 1 2 0
2 3 4
3 1 2
C = 0 2 3
0 0 4
3 0 0
D = 0 2 0
0 0 4
0 1 2
E = 1 0 3
2 3
0
33
Sugar costs $2 per pound, flour costs $1 per pound, chocolate costs $6
per pound.
Then a~v + b~
w is the vector of ingredients required to produce a batches of
brownies and b batches of fudge, while T (~v) is the cost of parts for a single
batch of brownies. The statement
T (a~v + b~
w) = aT (~v) + bT (~
w) is sound economics.
Two ways to find the cost of 3 batches of brownies plus 2 batches of fudge.
T (3~v + 2~
w) =
3T (~v) + 2T (~
w) =
Suppose that T produces a 2-component vector of costs from two competing
grocers. In that case [T ] is a 2 3 matrix.
34
0 2
[T ] =
2 0
a
By letting [T ] multiply an arbitrary vector
you can determine the effect
b
2
of T on any point in the plane. Do this for the vector
.
1
35
36
32. Inversion
A function f is invertible if it is 1-to-1 (injective) and onto (surjective). If
g is the inverse of f , then both g f and f g are the identity function.
How do we reconcile this observation with the existence of matrices that
have one-sided inverses?
Here are two simple examples that identify the problem.
(a) Define f by the formula f (x) = 2x. Then
f : R R is invertible.
f : Z3 Z3 is invertible.
f : Z Z is not invertible.
f : Z 2Z is invertible. (2Z is the set of even integers)
In the last case, we have made f invertible by redefining its codomain
to equal its image.
37
38
34. Invertibility of linear functions and of matrices (proof 1.3, Hubbard, proposition 1.3.14)
Since the key issue in this proof is the subtle distinction between a linear
function T and the matrix [T ] that represents it, it is a good idea to use
* to denote matrix multiplication and to denote composition of linear
transformations.
It is also a good idea to use ~x for a vector in the domain of T and ~y for a
vector in the codomain of T
Suppose that linear transformation T : F n F m is represented by the
m n matrix [T ].
(a) Suppose that the matrix [T ] is invertible. Prove that the linear transformation T is one-to-one and onto (injective and surjective), hence
invertible.
39
0
1
A=
1
0
0
0
0
1
1
0
0
1
1
0
0
0
40
Ai,j xj .
j=1
This is a linear function, and we see that the vector A~v represents the
number of distinct ways of reaching each island after extending the existing
list of walks by following one extra edge wherever possible.
If you start on island Vj and make a walk of n steps, then the number of
distinct walks leading to each island is specified by the components of the
vector An e~j .
Hubbard does the example of a cube, where all edges are two-way.
41
0
1
For the four-island graph, with A =
1
0
0
0
0
1
1
0
,
0
0
1
0
0
1
0
1
1
0
0
1
1
0
0
0
0
1
1
0
0
1
0
0
0
1
1
0
0
1
1
0
0
0
0
1
1
0
1
0
0
0
42
0
0
0
1
1
0
0
1
1
0
0
0
(b) Badminton: if player 1 serves, the probability of losing the point and
the serve is 0.2. If player 2 serves, the probability of losing the point
and the serve is 0.3.
(c) If John Hubbards reference books are on the shelf in the order (2,1,3),
the probability that he consults book 3 and places it at the left to make
the order (3,2,1) is P3 .
43
(d) Roulette: after starting with 2 chips and betting a chip on red, the
9
probability of having 3 chips is 19
and the probability of having 1 chip
10
is 19 . (in a fair casino, each probability would be 12 ).
0.8 0.3
0.2 0.7
0.8 0.3
0.2 0.7
What matrix represents the transition resulting from four successive points?
0.7 0.45
0.3 0.55
0.7 0.45
0.3 0.55
If you raise the transition matrix A to a high power, you might conjecture
that after a long time the probability that player 1 is serving is 0.6, no
matter who served first.
0.6 0.6
44
Group Problems
1. Some short proofs
Once your group has solved its problem, use a cell phone to take a picture
of your solution, and upload it to the topic box for your section on the
Week 1 page of the Web site.
(a) When we say that a matrix A is invertible, we mean that it has both
a right inverse and a left inverse. Prove that the right inverse and the
left inverse are equal, and that the inverse is unique.
If you need a hint, see page 48 of Hubbard.
Illustrate
your
answer by writing down the inverse B of the matrix
3 2
A=
, where all the entries are in the finite field Z5 , and showing
2 4
that both AB and BA are equal to the identity matrix.
Since you are working in a finite field, there are no fractions. In Z5 ,
dividing by 3 is the same as multiplying by 2.
(b) Here are two well-known laws of arithmetic that are not on the list of
field axioms. They do not need to be listed as axioms because they are
provable theorems! In each case, the trick is to start with an identity
that is valid in any field, then apply the distributive law. You should
be able to justify each step of your proof by reference to one or more
of the field axioms.
Starting with 0 + 0 = 0, prove that 0a = 0 for any a F .
Starting with 1 + 1 = 0, prove that (1)a = a for any a F .
(c) Prove that composition of functions, whether linear or not, is associative. Illustrate your proof by using the functions
f (x) = x2 , g(x) = ex , h(x) = 3 log x (natural logarithms)
and computing both f (g h) and (f g) h
Then use your result to give a one-line proof that matrix multiplication
must be associative. See Hubbard, page 63.
45
46
47
Homework
48
3. Hubbard, exercise 1.2.2, parts (a) and (e) only. Do part (a) in the field
R, and do part (e) in the field Z7 , where -1 is the same as 6. Check your
answer in (e) by doing the calculation in two different orders: according to
the associative law these should give the same answer. See Hubbard, figure
1.2.5, for a nice way to organize the calculation.
4. (a) Prove theorem 1.2.17 in Hubbard: that the transpose of a matrix
product is the product of the matrices in the opposite order: (AB)T =
B T AT .
1 2
2 1
(b) Let A =
,B =
. Calculate AB. Then, using the
2 3
1 3
theorem you just proved, write down the matrix BA without doing any
matrix multiplication. (Notice that A and B are symmetric matrices.)
(c) Prove that if A is any matrix, then AT A is symmetric.
5. (a) Here is a matrix whose entries are in the finite field Z5 .
[1]5 [2]5
A=
[3]5 [3]5
Write down the inverse of A, using the names [0]5 [4]5 for the entries
in the matrix. Check your answer by matrix multiplication.
(b) Count the number of different 2 2 matrices with entries in the finite
field Z5 . Of these, how many are invertible? Hint: for invertibility, the
left column cannot be zero, and the right column cannot be a multiple
of the left column.
6. (a) Hubbard, Exercise 1.3.19, which reads:
. Show
If A amd B are n n matrices, their Jordan product is AB+BA
2
that this product is commutative but not associative.
Since this problem has an odd number, it is solved in the solutions
manual for the textbook. If you want to consult this manual, OK, but
remember to cite your source!
(b) Denote the Jordan product of A and B by AB. Prove that it satisfies
the distributive law A (B + C) = A B + A C.
(c) Prove that the Jordan product satisfies the special associative law
A (B A2 ) = (A B) A2 .
49
3
6
2
5
7. (a) Suppose that T is linear and that T
=
,T
=
.
2
8
1
5
1
0
Use the linearity of T to determine T
and T
, and thereby de0
1
termine the matrix [T ] that represents T . (This brute-force approach
works fine in the 2 2 case but not in the n n case.)
(b) Express the given information about T from part (b) in the form
[T ][A] = [B], and determine the matrix [T ] that represents T by using
the matrix [A]1 . (This approach will work in the general case once
you know how to invert an n n matrix .)
The last two problems require R scripts. It is fine to copy and edit similar
scripts from the course Web site, but it is unacceptable to copy and edit
your classmates scripts!
8. (similar to script 1.1C, topic 5)
~ and v2
~ denote the columns of a 2 2 matrix M . Write an R script
Let v1
that draws a diagram to illustrate the rule for the sign of det M , namely
~ counterclockwise (through less than 180 ) to
If you have to rotate v1
~ then detM > 0.
make it line up with v2,
~ clockwise (through less than 180 ) to make it
If you have to rotate v1
~ then detM < 0.
line up with v2,
~ and v2
~ lie on the same line through the origin, then
If v1
det M = 0.
9. (similar to script 1.1D, topic 2)
Busch Gardens proposes to open a theme park in Beijing, with four regions
connected by monorail. From region 1 (the Middle Kingdom), a guest can
ride on a two-way monorail to region 2(Tibet), region 3(Shanghai) or region
4(Hunan) or back. Regions 2, 3, and 4 are connected by a one-way monorail
that goes from 2 to 3 to 4 and back to 2.
(a) Draw a diagram to show the four regions and their monorail connections.
(b) Construct the 4 4 transition matrix A for this graph of four vertices.
(c) Using matrix multiplication in R, determine how many different sequences of four monorail rides start in Tibet and end in the Middle
Kingdom.
50
51
R Scripts Scripts labeled A, B, ... are closely tied to the Executive Summary.
Scripts labeled X, Y, ... are interesting examples. There is a narrated version on
the Web site. Scripts labeled L are library scripts that you may wish to include
in your own scripts.
Script 1.2A-LengthDotAngle.R
Topic 1 - Length, Dot Product, Angles
Topic 2 - Components of a vector
Topic 3 - Angles in Pythagorean triangles
Topic 4 - Vector calculation using components
Script 1.2B-RotateReflect.R
Topic 1 - Rotation matrices
Topic 2 - Reflection matrices
Script 1.2C-ComplexConformal.R
Topic 1 - Complex numbers in R
Topic 2 - Representing complex numbers by 2x2 matrices
Script 1.2D-CrossProduct.R
Topic 1 - Algebraic properties of the cross product
Topic 2 - Geometric properties of the cross product
Topic 3 - Using cross products to invert a 3x3 matrix
Script 1.2E-DeterminantProduct.R
Topic 1 - Product of 2x2 matrices
Topic 2 - Product of 3x3 matrices
Script 1.2L-VectorLibrary.R
Topic 1 - Some useful angles and basis vectors
Topic 2 - Functions for working with angles in degrees
Script 1.2X-Triangle.R
Topic 1 - Generating and displaying a randomly generated triangle
Topic 2 - Checking some formulas of trigonometry
Script 1.2Y-Angles3D.R
Topic 1 - Angles between vectors in R3
Topic 2 - Angles and distances in a cube Topic 3 - Calculating the airline
mileage between cities
Executive Summary
1.1
Pn
i=1
xi yi
1.2
~y b
~x ~y
a
~x
Consider the triangle whose sides lie along the vectors ~x(length a), ~y (length b),
and ~x ~y (length c). Let denote the angle between the vectors ~x and ~y.
By the distributive law,
(~x ~y) (~x ~y) = ~x ~x + ~y ~y 2~x ~y = c2 = a2 + b2 2~x ~y
Comparing with the law of cosines, we find that angles and dot products are
related by:
~x ~y = ab cos = |~x||~y| cos
3
1.3
Cauchy-Schwarz inequality
The dot product provides a way to extend the definition of length and angle for
vectors to Rn , but now we can no longer invoke Euclidean plane geometry to
guarantee that | cos | 1.
~ in Rn
We need to show that for any vectors ~v and w
~ | |~v||~
|~v w
w|
This is generally known as the Cauchy-Schwarz inequality.
~ into unit
For a short proof of the Cauchy-Schwarz inequality, make ~v and w
vectors and form their sum and difference.
(
~
~v
~
~v
w
w
)(
)0
|~v| |~
w|
|~v| |~
w|
~v w
~
~v w
~
0, and by algebra |
|1
|~v||~
w|
|~v||~
w|
We now have a useful definition of angle for vectors in Rn in general:
1+12
= arccos
1.4
~v w
~
|~v||~
w|
If ~x and ~y, placed head-to-tail, determine two sides of a triangle, the third side
coincides with the vector ~x + ~y.
~x + ~y
~y
~x
We need to show that its length cannot exceed the sum of the lengths of the
other two sides:
|~x + ~y| |~x| + |~y|
The proof uses the distributive law for the dot product.
|~x + ~y|2 = (~x + ~y) (~x + ~y) = (~x + ~y) ~x + (~x + ~y) ~y
Applying Cauchy-Schwarz to each term on the right-hand side, we have:
|~x + ~y|2 |~x + ~y||~x| + |~x + ~y||~y|
In the special case where |~x + ~y| = 0 the inequality is clearly true. Otherwise
we can divide by the common factor of |~x + ~y| to complete the proof.
4
1.5
Isometries of R2
1.6
The same field axioms we reviewed on the first day apply here to the complex
numbers, notated C.
The real and imaginary parts of a complex number can be used as the two
components of a vector in R2 . The rule for addition of complex numbers is the
same as the rule for addition of vectors in R2 (in that they are to be kept separate
from each other), and the modulus of a complex number is the same as the length
of the vector that represents it. So the triangle inequality applies for complex
numbers: |z1 + z2 | |z1 | + |z2 |.
This property extends to vector spaces over complex numbers.
1.7
1.8
In general, matrices do not form a field because multiplication is not commutative. There are two notable exceptions: n n matrices that are multiples of the
identity matrix and 2 2 conformal matrices. Since multiples of the identity
ma
a b
trix and rotations all commute, the product of two conformal matrices
b a
c d
and
is the same in either order.
d c
1.9
a1
b1
a2 b 3 a3 b 2
~a ~b = a2 b2 = a3 b1 a1 b3
a3
b3
a1 b 2 a2 b 1
Properties
1. ~a ~b = ~b ~a.
2. ~a ~a = 0.
3. For fixed ~a, ~a ~b is a linear function of ~b, and vice versa.
4. For the standard basis vectors, e~i e~j = e~k if i, j and k are in cyclic
increasing order (123, 231, or 312). Otherwise e~i e~j = e~k .
5. ~a ~b ~c = ~a ~b ~c. This quantity is also the determinant of the matrix
whose columns are ~a, ~b, and ~c.
6. (~a ~b) ~c = (~a ~c)~b (~b ~c)~a
7. ~a ~b is orthogonal to the plane spanned by ~a and ~b.
8. |~a ~b|2 = |~a|2 |~b|2 (~a ~b)2
9. The length of ~a ~b is |~a||~b| sin .
10. The length of ~a ~b is equal to the area of the parallelogram spanned by ~a
and ~b.
1.10
If a 3 3 matrix A has columns ~a1 , ~a2 , and ~a3 , then its determinant det(A) =
~a1 ~a2 ~a3 .
1. det(A) changes sign if you interchange any two columns. (easiest to prove
for columns 1 and 2, but true for any pair)
2. det(A) is a linear function of each column (easiest to prove for column 3,
but true for any column)
3. For the identity matrix I, det(I) = 1.
The magnitude of ~a ~b ~c is equal to the volume of the parallelepiped spanned
by ~a, ~b and ~c.
If C = AB, then det(C) = det(A) det(B)
Lecture Outline
1. Introducing coordinates:
For three-dimensional geometry, we choose aspecific
point, the origin,
0
to correspond to the element of R3 , O = 0 . We also choose three
0
orthogonal, oriented, coordinate axes and a unit of length, which determine
the standard basis vectors. These are a right-handed basis: if you hold
your right hand so that the thumb points along ~e3 , then the fingers of your
right hand carry ~e1 into ~e2 the short way around, through 90 rather than
270 degrees. Now any point pa of Euclidean geometry can be represented
by a vector in R3 ,
a1
p
a21 + a22 + a23 . All the basis vectors have unit length.
10
4. Cauchy-Schwarz inequality
The dot product provides a way to extend the definition of length and
angle for vectors to Rn , but now we can no longer invoke Euclidean plane
geometry to guarantee that | cos | 1.
~ in Rn ,
We need to show that for any vectors ~v and w
~ | |~v||~
|~v w
w|
This is generally known as the Cauchy-Schwarz inequality. Hubbard
points out that it was first published by Bunyakovsky. This fact illustrates
Stiglers Law of Eponymy:
No law, theorem, or discovery is named after its originator.
The law applies to itself, since long before Stigler formulated it, A. N.
Whitehead noted that,
Everything of importance has been said before, by someone who did not
discover it.
The best-known proof of the Cauchy-Schwarz inequality incorporates two
useful strategies.
No vector has negative length.
Discriminant of quadratic equation.
Define a quadratic function of the real variable t by
~ |2 = (t~v w
~ ) (t~v w
~)
f (t) = |t~v w
Since f (t) is the square of a length of a vector, it cannot be negative, so
the quadratic equation f (t) = 0 does not have two real roots.
But by the quadratic formula, if the equation at2 + bt + c = 0 does not have
two real roots, its discriminant b2 4ac is not positive.
Complete the proof by writing out b2 4ac 0 for quadratic function f (t).
11
~v w
~
|~v||~
w|
2
1?
Example: In R4 , what is the angle between vectors
and
1
1
0
2
12
13
(b) Prove that the dot product of vectors ~x and ~y can be expressed solely
in terms of lengths of vectors. It follows that an isometry, which by
definition preserves lengths of all vectors, also preserves dot products
and angles.
(c) A parallelogram has sides with lengths a and b. Its diagonals have
lengths c and d, Prove the parallelogram law, which states that
c2 + d2 = 2(a2 + b2 ).
14
15
9. Isometries of R2 .
A linear transformation T : R2 R2 is completely specified by its effect
on the basis vectors ~e1 and ~e2 . These vectors are the two columns of the
matrix that represents T .
Of special interest are isometries: transformations that preserve the distance between any pair of points, and hence the length of any vector.
Since
4~a ~b = |~a + ~b|2 |~a ~b|2 ,
dot products can be expressed in terms of lengths, and any isometry also
preserves dot products.
Prove this useful identity.
cos 2 sin 2
F () =
,
sin 2 cos 2
which has det F = 1.
This represents reflection in a line through the origin that makes an
angle with the first basis vector.
Since the composition of isometries is an isometry, the product of any number of matrices of this type is another rotation or reflection.
16
17
18
19
a1
b1
a2 b 3 a3 b 2
a2 b2 = a3 b1 a1 b3
a3
b3
a1 b 2 a2 b 1
Since this is a computational definition, the way to prove the following
properties is by brute-force computation.
(a) ~a ~b = ~b ~a.
(b) ~a ~a = ~0.
(c) For fixed ~b, ~a ~b is a linear function of ~b, and vice versa.
(d) For the standard basis vectors, e~i e~j = e~k if i, j and k are in cyclic
increasing order (123, 231, or 312). Otherwise e~i e~j = e~k .
You may find it easiest to calculate cross products in general as
(a1 e~1 + a2 e~2 + a3 e~3 ) (b1 e~1 + b2 e~2 + b3 e~3 ),
using the formula for the cross products of basis vectors. Try this
approach for
2
0
~
~a = 1 , b = 1 .
0
3
20
21
0
b1
a1
a2 b2 =
0
a b
0
0
det 1 1
a2 b 2
You can think of the determinant as a function of the entire matrix A or
as a function of its two columns.
Matrix A maps the unit square, spanned by the two standard basis vectors,
into a parallelogram whose area is | det(A)|.
Lets prove this for the case where all the entries of A are positive and
det(A) > 0. The area of the parallelogram formed by the columns of A is
twice the area of the triangle that has these columns as two of its sides.
The area of this triangle can be calculated in terms of elementary formulas
for areas of rectangles and right triangles.
22
16. Determinants in R3
Here is our definition:
If a 3 3 matrix A has columns ~a1 , ~a2 , and ~a3 , then its determinant
det(A) = ~a1 ~a2 ~a3 .
Apply this
1 0
A= 2 1
0 1
1
2 .
0
(b) det(A) is a linear function of each column (easiest to prove for column
3, but true for any column)
23
Matrix A maps the unit cube, spanned by the three basis vectors, into a
parallelepiped whose volume is | det(A)|. You can think of | det(A)| as a
volume stretching factor. This interpretation will underly much of the
theory for change of variables in multiple integrals, a major topic in the
spring term.
If three vectors in R3 all lie in the same plane, the cross product of any
two of them, which is orthogonal to that plane, is orthogonal to the third
vector, so ~v1 ~v2 ~v3 = 0.
1
1
3
If four points in R3 all lie in the same plane, the vectors that join any one
of the points to
each
allliein that plane. Apply
of theother
three
points
1
2
2
4
this test to p = 1 , q = 1 , r = 3 , s = 3 .
1
2
1
3
24
i=1
i=1
j=1
k=1
Now use the distributive law for dot and cross products.
det C =
3
X
bi,1
i=1
3
X
bj,2
j=1
3
X
bk,3 (~
ai a~j a~k )
k=1
There are 27 terms in this sum, but all but six of them involve two subscripts
that are equal, and these are zero because a triple product with two equal
vectors is zero.
The six that are not zero all involve ~a1 ~a2 ~a3 , three with a plus sign and
three with a minus sign. So
det C = f (B)(~a1 ~a2 ~a3 ) = f (B) det(A), where f (B) is some messy
function of products of all the entries of B.
This formula is valid for any A. In particular, it is valid when A is the
identity matrix, C = B, and det(A) = 1.
So det B = f (B) det(I) = f (B)
and the messy function is the determinant!
25
26
20. Isometries of R2 .
A linear transformation T : R2 R2 is completely specified by its effect
on the basis vectors ~e1 and ~e2 . These vectors are the two columns of the
matrix that represents T .
Of special interest are isometries: transformations that preserve the distance between any pair of points, and hence the length of any vector.
Since
4~a ~b = |~a + ~b|2 |~a ~b|2 ,
dot products can be expressed in terms of lengths, and any isometry also
preserves dot products.
Prove this useful identity.
cos 2 sin 2
F () =
,
sin 2 cos 2
which has det F = 1.
This represents reflection in a line through the origin that makes an
angle with the first basis vector.
Since the composition of isometries is an isometry, the product of any number of matrices of this type is another rotation or reflection.
27
28
29
cos sin
R() =
.
sin cos
cos 2 sin 2
F () =
.
sin 2 cos 2
30
31
ST
.
det(A)
32
33
Group Problems
1. Dot products, angles, and isometries
(a) Making the reasonable assumption that a rotation though angle 2 can
be accomplished by making two successive rotations through angle ,
use matrix multiplication to derive the double-angle formulas for the
sine and cosine functions.
~ . Using the dot
(b) Consider a parallelogram spanned by vectors ~v and w
product, prove that it is a rhombus if and only if the diagonals are
perpendicular and that it is a rectangle if and only if the diagonals are
equal in length.
(c) A parallelogram is spanned by two vectors that meet at a 60 degree
angle, one of which is twice as long as the other. Find the ratio of the
lengths of the diagonals and the cosine of the acute angle between the
diagonals. Confirm that the parallelogram law holds in this case.
2. Proofs that involve cross products
(a) Consider a parallelepiped whose base is a parallelogram spanned by
two unit vectors, anchored at the origin, with a 60 degree angle between them. The third side leaving the origin, also a unit vector,
makes a 60 degree angle with each of the other two sides, so that each
face is made of of a pair of equilateral triangles. Using dot and cross
products, show that the angle between the third side and a line that
bisects the angle between the other two sides satisfies cos = 1/ 3
and that the volume of this parallelepiped is 12 .
(b) Using the definition det(A) = ~a1 ~a2 ~a3 and properties of the dot and
cross products, prove that the determinant of a 3 3 matrix changes
sign if you swap the first column with the third column.
(c) Prove that the cross product, although not associative, satisfies the
Jacobi identity
(~a ~b) ~c + (~b ~c) ~a + (~c ~a) ~b = 0.
34
35
Homework
In working on these problems, you may collaborate with classmates and consult
books and general online references. If, however, you encounter a posted solution
to one of the problems, do not look at it, and email Paul, who will try to get it
removed.
1. One way to construct a regular pentagon
O
36
37
38
The last two problems require R scripts. Feel free to copy and edit existing
scripts, including student solutions to group problem 3b, and to use the
library script 2l, which has functions for dealing with angles in degrees.
7. Vectors in two dimensions
(a) You are playing golf and have made a good tee shot. Now the hole is
located only 30 yards from your ball, in a direction 32 degrees north
of east. You hit a chip shot that travels 25 yards 22 degrees north of
east, followed by a putt that travels 8 yards 60 degrees north of east.
How far from the hole is your golf ball now located? For full credit,
include a diagram showing the relevant vectors.
(b) The three-reflections theorem, whose proof was problem 5b, states that
if you reflect successively in lines that make angle , , and with
the xaxis, the effect is simply to reflect in a line that makes angle
+ with the x-axis. Confirm this, using R, for the case where
= 40 , = 30 , and = 80 . Make a plot in R to show where the
point P = (1, 0) ends up after each of the three successive rotations.
8. Vectors in three dimensions (see script 2Y, topic 3)
The least expensive way to fly from Boston (latitude 42.36 N, longitude
71.06 W) to Naples (latitude 40.84 N, longitude 14.26 E) is to buy a ticket
on Aer Lingus and change planes in Dublin (latitude 53.35 S, longitude
6.26 W). Since Dublin is more than 10 degrees further north than either
Boston or Naples, it is possible that the stop in Dublin might lengthen the
journey substantially.
(a) Construct unit vectors in R3 that represent the positions of the three
cities.
(b) By computing angles between these vectors, compare the length in
kilometers of a nonstop flight with the length of a trip that stops
in Dublin. Remember that, by the original definition of the meter,
the distance from the North Pole to the Equator along the meridian
through Paris is 10,000 kilometers. (You may treat the Earth as a
sphere of unit radius.)
(c) Any city that is on the great-circle route from Boston to Naples has a
vector that lies in the same plane as the vectors for Boston and Naples.
Invent a test for such a vector (you may use either cross products or
determinants), and apply it to Dublin.
39
R Scripts
Script 1.3A-RowReduction.R
Topic 1 - Row reduction to solve two equations, two unknowns
Topic 2 - Row reduction to solve three equations, three unknowns
Topic 3 - Row reduction by elementary matrices
Topic 4 - Automating row reduction in R
Topic 5 - Row reduction to solve equations in a finite field
Script 1.3B-RowReductionApplications.R
Topic 1 - Testing for linear independence or dependence
Topic 2 - Inverting a matrix by row reduction
Topic 3 - Showing that a given set of vectors fails to span Rn
Topic 4 - Constructing a basis for the image and kernel
Script 1.3C-OrthonormalBasis.R
Topic 1 - Using Gram-Schmidt to construct an orthonormal basis
Topic 2 - Making a new orthonormal basis for R3
Topic 3 - Testing the cross-product rule for isometries
Script 1.3P-RowReductionProofs.R
Topic 1 - In Rn , n + 1 vectors cannot be independent
Topic 2 - In Rn , n 1 vectors cannot span
Topic 3 - An invertible matrix must be square
Executive Summary
1.1
When you solve the equation A~v = ~b you combine the matrix A and the vector
~b into a single matrix. Here is a simple example.
x + 2y = 7, 2x + 5y = 16
1 2
x ~
7
Then A =
, ~v =
,b=
, so that A~v = ~b exactly corresponds
2 5
y
16
1 2 7
to our system of equations. Our matrix of interest is therefore
2 5 16
First,
subtract
twice
row
1
from
row
2,
then
subtract
twice
row
2 from row 1
1 0 3
to get
0 1 2
Interpret the result as a pair of equations (remember what each column corresponded to when we first appended A and ~b together: x = 3, y = 2.
The final form we are striving for is row-reduced echelon form, in which
1.2
1 0 0
Example: E1 = 0 3 0 multiplies the second row of matrix A by 3.
0 0 1
Type 2: Adding b times the jth row to the kth row is accomplished by an
elementary matrix formed by starting with the identity matrix and changing
the jth element in the kth row for 0 to the scalar b.
1 3 0
Example: E2 = 0 1 0 adds three times the second row of matrix A to
0 0 1
the first row.
You want to multiply the second row of A by 3, so the 3 must be in the
second column of E2 . Since the 3 is in the first row of E2 , it will affect the
first row of E2 A.
Type 3: Swapping row j with row k is accomplished by an elementary
matrix formed by starting with the identity matrix, changing the jth and
kth elements on the diagonal to 0, and changing the entries in row j, column
k and in row k, column j from 0 to 1.
0 0 1
Example: E3 = 0 1 0 swaps the first and third rows of matrix A.
1 0 0
Suppose that A|I row-reduces to A0 |B. Then EA = A0 and EI = B, where
E = Ek E2 E1 is a product of elementary matrices. Since each elementary
matrix is invertible, so is E. Clearly E = B, which means that we can construct
E during the row-reduction process by appending the identity matrix I to the
matrix A that we are row reducing.
If matrix A is invertible, then A0 = I and E = A1 . However, the matrix
E is invertible even when the matrix A is not invertible. Remarkably, E is also
unique: it comes out the same even if you carry out the steps of the row-reduction
algorithm in a non-standard order.
1.3
1.4
To show that a set of vectors {~v1 , ~v2 , , ~vk } does not span F n , we must exhibit
~ that is not a linear combination of the vectors in the given set.
a vector w
Create an n k matrix A whose columns are the given vectors.
Row-reduce this matrix, forming the product E of the elementary matrices
that accomplish the row reduction.
If the original set of vectors spans F n , the row-reduced matrix EA will
have n pivotal columns. Otherwise it will have fewer than n pivotal 1s, and
there will be a row of zeroes at the bottom. If that is the case, construct
~ = E 1~en .
the vector w
Now consider what happens when you row reduce the matrix A|~
w. The
~ is independent
last column will contain a pivotal 1. Therefore the vector w
of the columns to its left: it is not in the span of the set {~v1 , ~v2 , , ~vk } .
If k < n, then matrix A has fewer than n columns, so the matrix EA has
fewer than n pivotal columns and must have a row of zeroes at the bottom. It
~ = E 1~en can be constructed and that a set of fewer
follows that the vector w
than n vectors cannot span F n .
1.5
1.6
Hubbard (page 200) gives two arguments that the number of linearly independent
rows of a matrix equals its rank. Here is yet another.
Swap rows to put a nonzero row as the top row. Then swap a row that is
linearly independent of the top row into the second position. Swap a row that is
linearly independent of the top two rows into the third position. Continue until
the top r rows are a linearly independent set, while each of the bottom m r
rows is a linear combination of the top r rows.
Continuing with elementary row operations, subtract appropriate multiples
of the top r rows from each of the bottom rows in succession, reducing it to
zero. (Easy in principle but hard in practice!). The top rows, still untouched,
are linearly independent, so there is no way for row reduction to convert any of
them to a zero row. In echelon form, the matrix will have r pivotal 1s: rank r.
It follows that r is both the number of linearly independent columns and the
number of linearly independent rows: the rank of A is equal to the rank of its
transpose AT .
1.7
Orthonormal basis
A basis is called orthogonal if any two distinct vectors in the basis have a dot
product of zero. If, in addition, each basis vector is a unit vector, then the basis
is called orthonormal.
Given any basis {~v1 , ~v2 , , ~vk } of a subspace W and any vector ~x W , we
can express ~x as a linear combination of the basis vectors:
~x = c1~v1 + c2~v2 + + ck ~vk ,
but determining the coefficients requires row reducing a matrix.
If the basis {~v1 , ~v2 , , ~vk } is orthonormal, just take the dot product with ~vi
to determine that ~x ~vi = ci .
We can convert any spanning set of vectors into a basis. Here is the algorithm,
sometimes called the Gram-Schmidt process.
~ 1 : divide it by its length to make the first basis vector ~v1 .
Choose any vector w
~ 2 that is linearly independent of ~v1 and subtract off a multiple
Choose any vector w
of ~v1 to make a vector ~x that is orthogonal to ~v1 .
~x = w
~ 2 (~
w2 ~v1 )~v1
Divide this vector by its length to make the second basis vector ~v2 .
~ 3 that is linearly independent of ~v1 and ~v2 , and subtract off
Choose any vector w
multiples of ~v1 and ~v2 to make a vector ~x that is orthogonal to both ~v1 and ~v2 .
~x = w
~ 3 (~
w3 ~v1 )~v1 (~
w3 ~v2 )~v2
Divide this vector by its length to make the third basis vector ~v3 .
Continue until you can no longer find any vector that is linearly independent of
your basis vectors.
Lecture Outline
1. Row reduction
This is just an organized version of the techniques for solving simultaneous
equations that you learned in high school.
When you solve the equation A~x = ~b you combine the matrix A and the
vector ~b into a single matrix. Here is a simple example.
The equations are
x + 2y = 7
2x + 5y = 16.
1 2 ~
7
Then A =
,b=
,
2 5
16
1 2 7
and we must row-reduce the 2 3 matrix
.
2 5 16
First, subtract twice row 1 from row 2 to get
You see the general strategy. First eliminate x from all but the first equation, then eliminate y from all but the second, and keep going until, with
luck, you have converted each row into an equation that involves only a
single variable with a coefficient of 1.
2. Echelon Form
The result of row reduction is a matrix in echelon form, whose properties
are carefully described on p. 165 of Hubbard (definition 2.1.5). Here is
Hubbards messiest example:
0 1 3 0 0 3 0 4
0 0 0 1 2 1 0 1 .
0 0 0 0 0 0 1 2
Key properties:
The leftmost nonzero entry in every row is a pivotal 1.
Pivotal 1s move to the right as you move down the matrix.
A column with a pivotal 1 has 0 for all its other entries.
Any rows with all 0s are at the bottom.
If a matrix is not is echelon form, you can convert it to echelon form by
applying one or more of the following row operations.
(a) Multiply a row by a nonzero number.
(b) Add (or subtract) a multiple of one row from another row.
(c) Swap two rows.
Here are the whats wrong? examples from Hubbard. Find row operations that fix them.
1 0 0 2
0 0 1 1 .
0 1 0 1
1 1 0 1
0 0 2 0 .
0 0 0 1
0 0 0
1 0 0 .
0 1 0
0 1 0 3 0 3
0 0 1 1 1 1 .
0 0 0 0 1 2
0 3 3 6
Carry out this procedure to row-reduce the matrix 2 4 2 4 .
3 8 4 7
10
4. Solving equations
Once you have row-reduced the matrix, you can interpret it as representing
x = ~b,
the equation A~
which has the same solutions as the equation with which you started, except
that now they can be solved by inspection.
A pivotal 1 in the last column ~b is the kiss of death, since it is an equation like 0x + 0y = 1. There is no solution. This happens in the second
Mathematica example,
1 0 1 0
where row reduction leads to 0 1 1 0 .
0 0 0 1
Otherwise, choose freely the values of the active unknowns in the nonpivotal columns(excluding the last one). Then each row gives the value of
the passive unknown in the column that has the pivotal 1 for that row.
This happens in the third Mathematica example,
1 0 1 23
2 1 3 1
where row reduction converts 1 1 0 1 to 0 1 1 31 .
0 0 0 0
1 1 2 31
The only nonpivotal column(except the last one) is the third. So we can
choose the value of the active unknown z freely.
Then the first row gives x in terms of z: x =
2
3
z.
11
12
Now we must show that if A~x = ~b has a unique solution, the number
of rows m must equal the number of columns n. Consider solving
A~x = ~b by row reduction, converting A to matrix A in echelon form.
To show that m = n, show that m n and n m.
If A has more rows than columns, there is no existence. Row reduction
must leave at least one row of zeroes at the bottom, and there exists
~b for which A~x = ~b has no solution.
If A has more columns than rows, there is no uniqueness. Row reduction must leave at least one nonpivotal column, and the solution to
A~x = ~b is not unique.
So if A is invertible, and A~x = ~b therefore has a unique solution, A
must be a square matrix.
13
The right two columns of the row-reduced matrix are the desired inverse:
check it!
For matrices larger than 2 2, row reduction is a more efficient way of constructing a matrix inverse than any techniques involving determinants that
you may have
learned! Hubbard,
Example 2.3.4, is done in Mathematica.
2 1 3 1 0 0
1 0 0 3 1 4
The matrix 1 1 1 0 1 0 row reduces to 0 1 0 1 1 1 .
1 1 2 0 0 1
0 0 1 2 1
3
1
What are A and A ?
14
8. Elementary matrices:
Each basic operation in the row-reduction algorithm can be achieved by
multiplication on the left by an appropriate invertible elementary matrix.
Here are examples of the three types of elementary matrix.
For each, figure
2 4
0 0
Type 1: E1 = 0 1 0
0 0 1
1 2 0
Type 2: E2 = 0 1 0
0 0 1
0 0 1
Type 3: E3 = 0 1 0
1 0 0
2
15
3 6 ~
21
Then A =
,b=
,
2 5
16
3 6 21
and we must row-reduce the 2 3 matrix
.
2 5 16
Use an elementary matrix to accomplish each of the three steps needed to
accomplish row reduction.
Matrix E1 divides the top row by 3.
Matrix E2 subtracts twice row 1 from row 2.
Matrix E3 subtracts twice row 2 from row 1.
Interpret the result as a pair of equations and solve them (by inspection)
for x and y.
16
The sum
Pk
i=1
i=1
The set of all the linear combinations of ~v1 , v~k is called the span of the
set ~v1 , ~vk .
Prove that it is a subspace of F n .
1
0
3
2
1
Invent an easy way to describe the span of ~v1 , ~v2 , and ~v3 . (Hint:
consider the sum of the components.)
1
0
3
2 1
1
17
0 3 2 0
1 5 2 0.
0 0 0 1
~ 1 or w
~2
not w
18
The vectors to test for independence are ~v1 =
2, ~v2 = 1, ~v3 = 3.
1
1
1
~ is irrelevant and might as well be zero, so we just make a
The vector w
matrix from the three given vectors:
1 2 0
1 0 2
1 0 2
0 1 1
2 1 3 reduces to 0 0 0
1 1 1
0 0 0
The third column is nonpivotal; so the given vectors are linearly dependent.
How can you write the third one as a linear combination of the first two?
0
2
Change ~v3 to
1 and test again.
1
1 2 0
1 0
1 0 2
0 1
Now
2 1 1 reduces to 0 0
1 1 1
0 0
0
0
1
0
19
~v1 = 2 , ~v2 = 1
3
2
4 2
1 0
A = 2 1 reduces to EA = 0 1, and the matrix that does the job is
3 2
0 0
1 0 1
E = 23 0 2 .
1
1 0
2
We want to append a third column ~b such that when we row reduce the
square matrix A|~b, the resulting matrix EA|E ~b will have a pivotal 1 in the
third column. In this case it will be in the bottom row. Since E, being a
product of elementary matrices, must be invertible, we compute
0
0
1
0 = 1
E
1
0
0
We have found a vector, 1, that is not in the span of ~v1 and ~v2 .
0
Key point: the proof relies on the fact that this procedure will always work,
because the matrix E that accomplishes row reduction is guaranteed to be
invertible!
20
21
22
To prove that b 6= 0, assume the contrary, and show that the vectors
~v1 , ~v2 , ~vk would be linearly dependent.
23
Now we combine this definition of basis with what we already know about
sets of vectors in Rn .
Our conclusions:
In Rn , a basis cannot have fewer than n elements, since they would not
span.
In Rn , a basis cannot have more than n elements, since they would not be
linearly independent.
So any basis must, like the standard basis, have exactly n elements.
24
25
Now we proceed to the proof. First we must prove the existence of a basis
by explaining how to construct one.
How to make a basis for a non-empty subspace E in general:
Choose any ~v1 to get started. Notice that we need not specify a method for
doing this! The justification for this step is the so-called axiom of choice.
If ~v1 does not span E, choose ~v2 that is not in the span of ~v1 (not a multiple
of it). Again, we do not say how to do this, but it must be possible since
~v1 does not span E.
If ~v1 and ~v2 do not span E, choose ~v3 that is not in the span of ~v1 and ~v2
(not a linear combination).
Keep going until you have spanned the space. By construction, the set is
linearly independent. So it is a basis.
Second, we must prove that every basis has the same number of vectors.
Imagine that two people have done this and come up with bases of possibly
different sizes.
One is ~v1 , ~v2 , ~vm .
~ 1, w
~ 2, w
~ p.
The other is w
~ j as a linear combination of
Since each basis spans E, we can write each w
the ~v. It takes m coefficients to do this for each of the p vectors, so we end
~ j.
up with an m p matrix A, each of whose columns is one of the w
~ j . It takes p
We can also write each ~vi as a linear combination of the w
coefficients to do this for each of the m vectors, so we end up with a p m
matrix B, each of whose columns is one of the ~vi .
Clearly AB = I and BA = I. So A is invertible, hence square, and m = p.
26
27
1 2 1 1
1 2 0 2
The matrix T = 0 0 1 1 row reduces to 0 0 1 1.
2 4 1 3
0 0 0 0
By inspecting these two matrices, find a basis for Img T . Notice that the
dimension of Img T is 2, which is less than the number of rows, and that
the two leftmost columns do not form a basis.
28
1 2 1 1
1 2 0 2
The matrix T = 0 0 1 1 row reduces to 0 0 1 1.
2 4 1 3
0 0 0 0
To find a basis for Ker T , look at the row-reduced matrix and identify
the nonpivotal columns. For each nonpivotal column i in turn, put a 1
in the position of that column, a 0 in the position of all other nonpivotal
columns, and leave blanks in the other positions. The resulting vectors must
be linearly independent, since for each of them, there is a position where
it has a 1 and where all the others have a zero. What are the resulting
(incomplete) basis vectors for Ker T ?
Now fill in the blanks: assign values in the positions of all the pivotal
columns so that T (v~i ) = 0. The vectors v~i span the kernel, since assigning a
value for each nonpivotal variable is precisely the technique for constructing
the general solution to T (~v) = 0.
29
30
31
~1 =
If w
1 , what is ~v1 ?
1
~ 2 that is linearly independent of ~v1 and subtract off
Choose any vector w
a multiple of ~v1 to make a vector ~x that is orthogonal to ~v1 . Divide this
vector by its length to make the second basis vector ~v2 .
2
1
~2 =
~ 2 (~
If w
w2 ~v1 )~v1
1, calculate ~x = w
0
25
2 5
2
1
3
1
2 5
2 5
2 , ~
~
Mathematica gives ~v1 =
v
=
,
v
=
.
1 2 3 3
21 5
2
2 5
1
3
12
2 5
2 5
32
Group Problems
1. Row reduction and elementary matrices
(a) By row reducing an appropriate matrix to echelon form, solve the
system of equations
2x + y + z = 2
x + y + 2z = 2
x + 2y + 2z = 1
where all the coefficients and constants are elements of the finite field
Z3 . If there is no solution, say so. If there is a unique solution, specify
the values of x, y, and z. If there is more than one solution, determine
all solutions by giving formulas for two of the variables, perhaps in
terms of the third one.
1
2
(b) Find the inverse of A =
by using row reduction by means of
3 7
elementary matrices, as was done in sample problem 2. Confirm that
the product of the three elementary matrices that you use is indeed
the inverse. Use the familiar rule method for finding a 2 2 inverse to
check your answer!
(c) The matrix
0 1 2
A = 1 2 3
2 3 4
is not invertible. Nonetheless, there is a product E of three elementary
matrices, applied as was done in sample problem 2, that will reduce it
to echelon form. Find these three matrices and their product E.
33
34
(a) The director of a budget office has to make changes to four line items
in the budget, but her boss insists that they must sum to zero. Three
of her subordinates make the following suggestions, all of which lie in
the subspace of acceptable changes:
1
3
3
2
2
1
~ 2 = ,w
~1 =
w
3 ,w
2 ~ 3 = 2.
6
3
2
1
1
3 1 1 0 4
1 0 1 1 2
A=
0 1 2 0 1,
2 0 0 1 3
Express the columns that are not in the basis for the image as linear
combinations of the ones that are in the basis.
(c) Find two different solutions to the following set of equations in Z5 :
2x + y + 3z + w = 3
3x + 4y + 3w = 1
x + 4y + 2z + 4w = 2
(d) The R function
sample(0:2, n, replace=TRUE)
generates n random numbers, each equally likely to be 0, 1, or 2. Use
it to generate three equations of the form ax + by + cz + dw = e with
coefficients in Z3 , and solve them by row reduction. If the solution is
not unique, find two different solutions.
35
Homework
In working on these problems, you may collaborate with classmates and consult
books and general online references. If, however, you encounter a posted solution
to one of the problems, do not look at it, and email Paul, who will try to get it
removed.
For the first three problems, do the row reduction by hand. That should give
you enough practice so that you can do row reduction by hand on exams. Then
you can use R to do subsequent row reduction.
1. By row reducing an appropriate matrix to echelon form, solve the system
of equations
2x + 4y + z = 2
3x + y = 1
3y + 2z = 3
over the finite field Z5 . If there is no solution, say so. If there is a unique
solution, specify the values of x, y, and z and check your answers. If there
is more than one solution, express two of the variables in terms of an arbitrarily chosen value of the third one. For full credit you must reduce the
matrix to echelon form, even if the answer becomes obvious!
2. (a) By using elementary matrices, find a vector that is not in the span of
1
0
2
36
3. This problem illustrates how you can use row reduction to express a specifed
vector as a linear combination of basis vectors.
Your bakery uses flour, sugar, and chocolate to make cookies, cakes, and
brownies. The ingredients for a batch of each product is described by a
vector, as follows:
1
4
7
Suppose ~v1 = 2, ~v2 = 2, ~v3 = 8 .
3
7
11
This means, for example, that a batch of cookies takes 1 pound of flour, 2
of sugar, 3 of chocolate.
You are about to shut down for vacation and want
toclear out your inven21
~ = 18.
tory of ingredients, described by the vector w
38
Use row reduction to find a combination of cookies, cakes, and brownies
that uses up the entire inventory.
4. Hubbard, exercises 2.3.8 and 2.3.11 (column operations: a few brief comments about the first problem will suffice for the second. These column
operations will be used in the spring term to evaluate n n determinants.)
5. (This result will be needed in Math 23b)
Suppose that a 2n 2n matrix T has the following properties:
The first n columns are a linearly independent set.
The last n columns are a linearly independent set.
Each of the first n columns is orthogonal to each of the last n columns.
Prove that T is invertible.
~ = a~u + ~v, where ~u is a linear combination of the first n
Hint: Write w
columns and ~v is a linear combination of the last n columns. Start by
~ = ~0,
showing that ~u is orthogonal to ~v. Then exploit the fact that if w
~ w
~ = 0.
w
37
6. (This result will be the key to proving the implicit function theorem, key
to many economic applications.)
Suppose that m n matrix C , where n > m, has m linearly independent
columns and that these columns are placed on the left. Then we can split
off a square matrix A and write C = [A|B].
(a) Let ~y be an (nm)-component vector of the active variables,
and
let
~x
~x be the m-component vector of passive variables such that C
= ~0.
~y
Prove that ~x = A1 B~y.
(b) Use this approach to solve the system of equations
5x + 2y + 3z + w = 0
7x + 3y + z 2w = 0
by inverting a 2 2 matrix, without using row reduction or any other
elimination technique. The solution will express the passive variables x and y in terms of the active variables z and w.
The remaining problems are to be solved by writing R scripts. You may
use the rref() function whenever it works.
7. (Like group problem 3b, but in a finite field, so rref will not help!)
In R, the statement
A<-matrix(sample(0:4, 24, replace = TRUE),4)
was used to create a 4 6 matrix A with 24 entries in Z5 . Each entry
randomly has the value 0, 1, 2, 3, or 4.
Here is the resulting matrix:
3
1
A=
0
1
0
1
2
0
4
3
1
2
0
3
1
0
2
2
4
3
2
1
.
2
4
Use row reduction to find a basis for the image of A and a basis for the
kernel. Please check your answer for the kernel.
8. One of the seventeen problems on the first Math 25a problem set for 2014
was to find all the solutions of the system of equations
2x1 3x2 7x3 + 5x4 + 2x5 = 2
x1 2x2 4x3 + 3x4 + x5 = 2
2x1 4x3 + 2x4 + x5 = 3
x1 5x2 7x3 + 6x4 + 2x5 = 7
without the use of a computer.
Solve this problem using R ( like script 3.1A).
38
9. (Like script 3.1C and group problem 3a)A neo-Cubist sculptor wants to use
a basis for R3 with the following properties:
1
The first basis vector w1 = 1 lies along the body diagonal of the
1
cube.
1
39
R Scripts
1.4A-EigenvaluesCharacteristic.R
Topic 1 - Eigenvectors for a 2x2 matrix
Topic 2 - Not every 2x2 matrix has real eigenvalues
1.4B-EigenvectorsAxler.R
Topic 1 - Finding eigenvectors by row reduction
Topic 2 - Eigenvectors for a 3 x 3 matrix
1.4C-Diagonalization.R
Topic 1: Basis of real eigenvectors
Topic 2 - Raising a matrix to a power
Topic 3 - Wnat if the eigenvalues are complex?
Topic 4 - What if there is no eigenbasis?
1.4X-EigenvectorApplications.R
Topic 1 - The special case of a symmetric matrix
Topic 2 - Markov Process (from script 1.1D)
Topic 3 - Eigenvectors for a reflection
Topic 4 - Sequences defined by linear recurrences
Executive Summary
1.1
1.2
1 4
1
4
Let A =
. Then A I =
2 5
2
5
and A () = det(A I) = (1 )(5 ) + 8 = 2 4 + 3.
Setting 2 4 + 3 = ( 1)( 3) = 0, we find two eigenvalues, 1 and 3.
3
1.3
~ . Keep computing A~
~ , A3 w
~,
Given matrix A, pick an arbitrary vector w
w , A2 w
etc. until you find a vector that is a linear combination of its predecessors. This
situation is easily detected by row reduction.
Now you have found a polynomial p of degree m such that p(A)~
w = 0. Furthermore, this is the nonzero polynomial of lowest degree for which p(A)~
w = 0.
Over the complex numbers, this polynomial is guaranteed to have a root by
virtue of the fundamental theorem of algebra (Hubbard theorem 1.6.13). Over
the real numbers or a finite field, it it will have a root in the field only if you are
lucky. Assuming that the root exists, factor it out: p(t) = (t )q(t).
Now p(A)~
w = (A I)q(A)~
w = 0.
Thus q(A)~
w is an
eigenvector
with eigenvalue .
1 4
Again, let A =
2 5
1
1
7
2
~ choose
~ =
As the arbitrary vector w
. Then A~
w=
and A w
.
0
2
8
~ , as a linear combination
We need to express the third of these vectors, A2 w
of
the first two.
This
is done
by row reducing the matrix
1 1 7
1 0 3
~ = 4A~
~.
to
to find that A2 w
w 3I w
0 2 8
0 1 4
Equivalently, (A2 4A + 3I)~
w = 0.
p(A) = A2 4A + 3I or p(t) = t2 4t + 3 = (t 1)(t 3): eigenvalues 1 and 3.
To get the eigenvector
1, apply the remaining factor
of p(A),
for eigenvalue
4 4 1
4
2
~:
A 3I, to w
=
. Divide by -2 to get ~v1 =
.
2 2 0
2
1
To get theeigenvector
3, apply the remaining
foreigenvalue
factor of p(A),
2 4 1
2
1
~:
A I, to w
=
. Divide by -2 to get ~v2 =
.
2 4 0
2
1
In this case the polynomial p(t) turned out to be the same as the characteristic
polynomial, but that is not always the case.
1
~ =
If we choose w
, we find A~
w = 3~
w, p(A) = A 3I, p(t) = t 3. We
1
~ to find the other eigenvalue.
need to start over with a different w
2 0
If we choose A =
, then any vector is an eigenvector with eigenvalue
0 2
2. So p(t) = t 2. But the characteristic polynomial is (t 2)2 .
2 1
If we choose A =
, the characteristic polynomial is (t 2)2 . But now
0 2
~ = ~e1 we find p(t) = t 2
there is only one eigenvector.
If we choose w
1
~ = ~e2 we find
and the eigenvector
. But if we choose a different w
0
p(t) = (t 2)2 and we fail to find a second, independent eigenvector.
5
1.4
1.5
Matrix Diagonalization
In the best case we can find a basis of n eigenvectors {~v1 , ~v2 , , ~vn } with associated eigenvalues {1 , 2 , , n }. Although the eigenvectors must be independent, some of the eigenvalues may repeat.
Create a matrix P whose columns are the eigenvectors. Since the eigenvectors
form a basis, they are independent and the matrix P has an inverse P 1
The matrix D = P 1 AP is a diagonal matrix.
Proof: D~ek = P 1 A(P~ek ) = P 1 A~vk = P 1 k ~vk = k P 1~vk = k~ek .
The matrix A can be expressed as A = P DP 1 .
Proof: A~vk = P D(P 1~vk ) = P D~ek = P (k~ek ) = k P~ek = k ~vk .
A diagonal matrix Dis easy to raise
to an integer
kpower.
1 0 0
1 0 0
k
1.6
Properties of an eigenbasis
Even if all the eigenvalues are distinct, an eigenbasis is not unique. Any
eigenvector in the basis can be multiplied by a nonzero scalar and remain
an eigenvector.
Eigenvectors that correspond to distinct eigenvalues are linearly independent (your proof 4.1)
If the matrix A is symmetric, eigenvectors that correspond to distinct eigenvalues are orthogonal.
1.7
1.8
Applications of eigenvectors
Markov processes
Suppose that a system can be in one of two or more states and goes through
a number of steps, in each of which it may make a transition from one state
to another in accordance with specified transition probabilities.
p
For a two-state process, vector ~vn = n specifies the probabilities for
qn
the system to be in state 1 or state 2 after n steps of the process, where
0 pn , qn 1. and
pn + qn = 1 The transition probabilities are spacified
a b
by a matrix A =
, where all the entries are between 0 and 1 and
c d
a + c = b + d = 1.
After a large number of steps, the state of the system is speciifed by ~vn =
An~v0 .
The easy way to calculate An is by diagonalizing A. If there is a stationary
state ~v into which the system settles down, it corresponds to an eigenvector
with eigenvalue 1, since ~vn+1 = A~vn and ~vn+1 = ~vn = ~v.
Reflections
If 2 2 matrix F represents reflection in a line through the origin with
direction vector ~v, then ~v must be an eigenvector with eigenvalue 1 and a
vector perpendicular to ~v must be an eigenvector with eigenvalue -1.
If 3 3 matrix F represents reflection in a plane P through the origin
~ then N
~ must be an eigenvector with eigenvalue -1
with normal vector N,
and there must be a two-dimensional subspace of vectors in P , all with
eigenvalue +1.
Linear recurrences and Fibonacci-like sequences.
In computer science, it is frequently the case that the first two terms of a
sequence, a0 and a1 , are specified, and subsequent terms are specified by a
linear recurrence of the form an+1 = ban1 +can . The best-known example
is the Fibonacci sequence (Hubbard, pages 223-225) where a0 = a1 = 1 and
b = c = 1.
n
an
0 1 an1
0 1
a0
Then
=
=
.
an+1
b c
an
b c
a1
0 1
to the nth power is to diagonalize
The easy way to raise matrix A =
b c
it.
Solving systems of linear differential equations
This topic, of crucial importance to physics, will be covered after we have
done some calculus and infinite series.
8
Lecture Outline
1. Using the characteristic polynomial to find eigenvalues and eigenvectors
If A~v = ~v, ~v is called an eigenvector for A, and is the corresponding
eigenvalue.
If A is a 2 2 or 3 3 matrix, there is a quick, well-known way to find
eigenvalues by using determinants.
Rewrite A~v = ~v as A~v = I~v, where I is the identity matrix.
Equivalently, (A I)~v = ~0
Suppose that is an eigenvalue of A. Then the eigenvector ~v is a nonzero
vector in the kernel of the matrix (A I).
It follows that the matrix (A I) is not invertible. But we have a formula
for the inverse of a 22 or 33 matrix, which can fail only if the determinant
is zero. Therefore a necessary condition for the existence of an eigenvalue
is that det(A I) = 0.
The polynomial A () = det(A I) is called the characteristic polynomial of matrix A. It is easy to compute in the 2 2 or 3 3 case, where
there is a simple formula for the determinant. For larger matrices A () is
hard to compute efficiently, and this approach should be avoided.
Conversely, suppose that A () = 0 for some real number . It follows
that the columns of the matrix (A I) are linearly dependent. If we row
reduce the matrix, we will find at least one nonpivotal column, which in
turn implies that there is a nonzero vector in the kernel. This vector is an
eigenvector.
10
3 2
3. Consider the matrix A =
with entries from the finite field Z5 .
3 3
(a) Find the eigenvalues of A by solving the characteristic equation
det(A I) = 0, then find the corresponding eigenvectors. Solving a
quadratic equation over Z5 is easy in a pinch, just try all five possible
roots!
(b) Find the eigenvalues of A by using the technique of example 2.7.5
of Hubbard. You will get the same equation for the eigenvalues, of
course, but it will be more straightforward to find the eigenvectors.
(c) Write down the matrix P whose columns are the basis of eigenvectors,
and check your answer by showing that P 1 AP is a diagonal matrix.
11
12
5. Eigenbases
To construct the matrix P , we need a basis of eigenvectors. A sufficient,
but not necessary, condition is that the matrix A has n distinct eigenvalues.
In examples, these will be real numbers, but the result is valid also in Cn .
Here is your proof 8.1.
If ~v1 , , ~vn are eigenvectors of A : Rn Rn with distinct eigenvalues
1 n , they are linearly independent.
Suppose, for a contradiction, that the eigenvectors are linearly dependent.
There exists a first eigenvector (the jth one) that is a linear combination
of its predecessors:
~vj = a1 v~1 + + aj1~vj1 .
Multiply both sides by A j I. You get zero on the left, and on the right
you get a linear combination where all the coefficients are nonzero because
j i 6= 0. This is in contradiction to the assumption that ~vj was the first
one that is a linear combination of its predecessors.
Since in Rn there cannot be more than n linearly independent vectors, there
are at most n distinct eigenvalues.
Proof 8.1, start to finish:
13
6. Finding eigenvectors
This method is guaranteed to succeed only for the field of complex numbers, but the algorithm is valid for any field, and it finds the eigenvectors
whenever they exist.
~ . If you are really lucky, A~
Given matrix A, pick an arbitrary vector w
w
~ and you have stumbled across an eigenvector. If not,
is a multiple of w
~ , A3 w
~ , etc. until you find a vector that is a linear
keep computing A2 w
combination of its predecessors. This situation is easily detected by row
reduction.
Now you have found a polynomial p of degree m such that p(A)~
w = 0.
Furthermore, this is the nonzero polynomial of lowest degree for which
p(A)~
w = 0.
Over the complex numbers, this polynomial is guaranteed to have a root
by virtue of the fundamental theorem of algebra (Hubbard theorem
1.6.13). Over the real numbers or a finite field, it it will have a root in the
field only if you are lucky. Assuming that the root exists, factor it out:
p(t) = (t )q(t)
Now p(A)~
w = (A I)q(A)~
w = 0.
Thus q(A)~
w is an eigenvector with eigenvalue .
Here is a 2 2 example where the calculation is easy.
1 4
Let A =
2 5
1
~ choose
~.
As the arbitrary vector w
. Compute A~
w and A2 w
0
14
~ , as a linear
Use row reduction to express the third of these vectors, A2 w
combination of the first two.
1 1 7
0 2 8
Factor: p(t)=
To get the eigenvector for eigenvalue 1, apply the remaining factor of
~.
p(A), A 3I, to w
15
7. Change of basis
Our old basis consists of the standard basis vectors ~e1 and ~e2 .
Our new basis consists of one eigenvector for each eigenvalue.
2
1
Lets choose ~v1 =
and ~v2 =
.
1
1
It would be all right to multiply either of these vectors by a constant or to
reverse their order.
Write down the change of basis matrix P whose columns express the new
basis vectors in term of the old ones.
1 0
1 1
0 3 1 2
2 1
1 1
16
1 1 0
A = 1 2 1
0 1 1
2
~ = 3.
Since we have help with the computation, make the choice w
5
The
2
3
5
1 0
3
1 3 9, different from the matrix in Hubbard.
2
3
6
The
1
0
0
0 0 0
1 0 3
0 1 4
17
~ = ~e2 ?
What eigenvector do we find if we choose w
18
2 0
Let A =
. In this case there is only one eigenvalue and there is
1 2
no eigenbasis.
~ = ~e2 ?
What happens if we choose w
~ = ~e1 ,
If we choose w
1 2 4
confirm that
0 1 4
1 0 4
row reduces to
.
0 1 4
What is p1 ?
What happens when we carry out the procedure that usually gives an
eigenvector?
Key point: There was only one eigenvalue, the polynomial (t 2)2
showed up, and we were unable to find a basis of eigenvectors.
19
2 1 1
A = 0 2 0
0 1 1
The procedure for finding eigenvectors is carried out in the Mathematica
file, with the following results:
~ = ~e1 , we get p1 (t) = t 2 and find an eigenvector
Using w
1
0 with eigenvalue 2.
0
~ = ~e2 , we get p2 (t) = (t 1)(t 2) and find two eigenvectors:
Using w
1
1
0 with eigenvalue 1, 1 with eigenvalue 2.
1
1
At this point we have found three linearly independent eigenvectors and we
have a basis.
~ = ~e3 , we get p3 (t) = (t 1)(t 2) and find two eigenvectors:
If we use w
1
1
0 with eigenvalue 1, 0 with eigenvalue 2.
1
0
~ , we will get p(t) = (t 1)(t 2)
In general, if we use some arbitrary w
and we will find the eigenvector with eigenvalue 1 along with some linear
combination of the eigenvectors with eigenvalue 2.
Key points about this case:
The polynomial pi (t), in order to be simple, must have degree less than
n.
We need to use more than one standard basis vector in order to find
a basis of eigenvectors.
20
21
22
23
24
1 1
1
Let A =
, and find an eigenvector, starting with ~e1 =
.
2 4
0
1
1
Then A~e1 =
and A2~e1 =
2
10
1 1 1
1 0 6
We row-reduce
to
0 2 10
0 1 5
and conclude that A2~e1 = 6~e1 + 5A~e1 or A2~e1 5A~e1 + 6~e1 = 0.
1
1
Complete the process of finding two eigenvalues and show that
and
1
2
2
are a pair of eigenvectors that form a basis for R .
p(t) =
For = 2,
For = 3,
The change of basis matrix P expresses the new basis (eigenvectors) in
terms of the old (standard); so its columns are the eigenvectors. Write
down P and calculate its inverse.
2 1
1 1
1 1
2 4
1 1
1 2
25
0 1
1 1
0 1
1 1
Determine a6 and a7 by using the square of the matrix that was just constructed.
1 1
1 2
1 1
1 2
26
c1 0
0 c2
n
cn1 0
.
=
0 cn2
c1 0
AP =
.
0 c2
cn1 0
P 1
A =P
0 cn2
1+ 5
1 5
c1 =
, c2 =
,
2
2
and
2
2
P =
1+ 5 1 5
.
The accompanying Mathematica notebook file Outline8.nb confirms this.
We need to find a systematic way to construct the matrix P .
27
Group Problems
1. Some interesting examples with 2 2 matrices
(a) Since a polynomial equation with real (or complex) coefficients always
has a root (the fundamental theorem of algebra), a real matrix is
guaranteed to have at least one complex eigenvalue. No such theorem
holds for polynomial equations with coefficients in a finite field, so
zero eigenvalues is a possibility. This is one of the few results in linear
algebra that depends on the
field.
underlying
3 1
Consider the matrix A =
with entries from the finite field Z5 .
n 3
By considering the characteristic equation, find values of n that lead
to 2, 1, or 0 distinct eigenvalues. For the case of 1 eigenvalue, find an
eigenvector.
Hint: After writing the characteristic equation with n isolated on the
right side of the equals sign, make a table of the value of t2 + 4t + 4
for each of the five possible eigenvalues. That table lets you determine
how many solutions there are for each of the five possible values of
n. When the characteristic polynomial is the square of a linear factor,
there is only one eigenvector and it is easy to construct.
1 1
(b) The matrix A =
has only a single eigenvalue and only one
4 3
independent eigenvector.
Find the eigenvalue and eigenvector, show that A = D + N where D is
diagonal and N is nilpotent, and use analysis to calculate A3 without
ever multiplying A by itself (unless you want to check your answer).
(c) Extracting square roots by diagonalization.
2 1
The matrix A =
2 3
conveniently has two eigenvalues that are perfect squares. Find a
basis of eigenvectors and construct a matrix P such that P 1 AP is a
diagonal matrix.
Thereby find two independent square roots of A, i.e. find matrices B1
and B2 such that B12 = B22 = A , with B2 6= B1 . Hint: use the
negative square root of one of the eigenvaulues, the positive square
root of the other.
If you take Physics 15c next year, you may encounter this technique
when you study coupled oscillators.
28
2. Some proofs. In doing these, you may use the fact that an eigenbasis exists
if and only if all the pi (t) have simple roots.
(a) Suppose that a 5 5 matrix has a basis of eigenvectors, but that its
only eigenvalues are 1 and 2. Using Hubbard Theorem 2.7.6, convince
yourself that you must make at least three different choices of ~ei in
order to find all the eigenvectors.
(b) An alternative approach to proof 4.1 use induction.
Identify a base case (easy). Then show that if a set of k1 eigenvectors
with distinct eigenvalues is linearly independent and you add to the
set an eigenvector ~vk with an eigenvalue k that is different from any
of the preceding eigenvalues, the resulting set of k eigenvectors with
distinct eigenvalues is linearly independent.
(c) In general, the square matrix A that represents a Markov process has
the property that all the entries are between 0 and 1 and each column
sums to 1. Prove that such a matrix A has an eigenvalue of 1 and
that there is a stationary vector that is transformed into itself by A.
You may use the fact, which we have proved so far only for 22 and
3 3 matrices, that if a matrix has a nonzero vector in its kermel, its
determinant is zero.
29
1 2 0
The matrix A = 2 1 0
0 0 1
has three real, distinct eigenvalues, and there is a basis of eigenvectors.
Find what polynomial equation for the eigenvalues arises from each of
the following choices, and use it to construct as many eigenvectors as
possible.:
~ = e~1 .
w
~ = e~3 .
w
~ = e~1 + e~3 .
w
1 1 1
(b) Find two eigenvectors for the matrix A = 1 1 1 . and confirm
2 2 0
that using each of the three standard basis vectors will not roduce a
third independent eigenvector.
Clearly the columns of A are not independent; so 0 is an eigenvalue.
This property makes the algebra really easy.
(c) Use the technique of example 2.7.5 in Hubbard to find the eigenvalues
3 4 4
30
Homework
1. Consider the sequence of numbers described, in a manner similar to the
Fibonacci numbers, by
b3 = 2b1 + b2
b4 = 2b2 + b3
bn+2 = 2bn + bn+1
(a) Write a matrix B to generate this sequence in the same way that
Hubbard generates the Fibonacci numbers.
(b) By considering the case b1 = 1, b2 = 2 and the case b1 = 1, b2 = 1,
find the eigenvectors and eigenvalues of B.
1
(c) Express the vector
as a linear combination of the two eigenvectors,
1
and thereby find a formula for bn if b1 = 1, b2 = 1.
2. (This is similar to group problem 1c.)
10 9
Consider the matrix A =
.
18 17
(a) By using a basis of eigenvectors, find a matrix P such that P 1 AP is
a diagonal matrix.
(b) Find a cube root of A, i.e. find a matrix B such that B 3 = A.
3. (a) Prove that if ~v1 and ~v2 are eigenvectors of matrix A, both with the
same eigenvalue , then any linear combination of ~v1 and ~v2 is also
an eigenvector.
(b) Suppose that A is a 3 3 matrix with a basis of eigenvectors but
~ , the vectors
with only two distinct eigenvalues. Prove that for any w
2
~ , A~
~ are linearly dependent. (This is another way to
w
w, and A w
understand why all the polynomials pi (t) are simple when A has a
basis of eigenvectors but a repeated eigenvalue.)
31
4. Harvard graduate Ivana Markov, who concentrated in English and mathematics with economics as a secondary field, just cannot decide whether
she wants to be a poet or an investment banker, and so her career path is
described by the following Markov process:
If Ivana works as a poet in year n, there is a probability of 0.9 that she
will feel poor at the end of the year and take a job as an investment
banker for year n + 1. Otherwise she remains a poet.
If Ivana works as an investment banker in year n, there is a probability
of 0.7 that she will feel overworked and unfulfilled at the end of the
year and take a job as a poet for year n + 1. Otherwise she remains
an investment banker.
p
Thus, if n describes the probabilities that Ivana works as a poet or a
qn
banker respectively
probabilities
for year n + 1
in year
n, the corresponding
0.1 0.7
pn
pn+1
, where A =
=A
are given by
qn
0.9 0.3
qn+1
(a) Find the eigenvalues and eigenvectors of A.
(b) Construct the matrix P whose columns
are the eigenvectors, invert
1
it, and thereby express the vector
as a linear combination of the
0
eigenvectors.
p0
1
(c) Suppose that in year 0 Ivana works as a poet, so that
=
.
q
0
0
pn
p
Find an explicit formula for
and use it to determine 10 . What
qn
q10
happens in the limit of large n?
32
3 4 4
(a) A = 1 3 1
3 6 4
1 0 0
(b) B = 1 3 1
1 2 0
5 1 1
7. The matrix A = 1 3 1 has only one eigenvalue, 4, and so its char0 0 4
acteristic polynomial must be (t 4)3 .
33
4 1 1
2
A = 1 3
1
2 3
Express A in the form P DP 1 , where D is diagonal and P is an isometry
matrix whose columns are orthogonal unit vectors.
A similar example is in script 1.4X.
34
R Scripts
Script 2.1A-Countability.R
Topic 1 - The set of ordered pairs of natural numbers is countable
Topic 2 - The set of positive rational numbers is countable
Script 2.1B-Uncountability.R
Topic 1 - Cantors proof of uncountability
Topic 2 - A different-looking version of the same argument
Script 2.1C-Denseness.R
Topic 1 - Placing rational numbers between any two real numbers
Script 2.1D-Sequences.R
Topic 1 - Limit of an infinite sequence
Topic 2 - Limit of sum = sum of limits
Topic 3 - Convergence of sequence of inverses (proof 5.2)
Executive Summary
1.1
1.2
The rational numbers and the real numbers each form an ordered field,
which means that there is a relation with properties
O1. Given a and b, either a b or b a.
O2. If a b and b a, then a = b.
O3. If a b and b c then a c.
O4. If a b, then a + c b + c.
O5. If a b and 0 c, then ac bc.
Many important properties of infinite sequences of real numbers can be
proved on the basis of ordering.
If we think of the rational numbers or the real numbers as lying on a number
line, we can interpret the absolute value |a b| as the distance between
point a and point b: dist(a, b) = |a b|. In two dimensions the statement
dist(a, b) dist(a, c) + dist(c, b) means that the length of one side of a
triangle cannot exceed the sum of the lengths of the other two sides. The
name triangle inequality is also applied to the one-dimensional special
case where c = 0; i.e. |a + b| |a| + |b|.
Many well-known rules of algebra are not included on the list of field axioms.
Usually, as for (a)(b) = ab, this is because they are easily provable
theorems. However, there are properties of the real numbers that cannot
be proved from the field axioms alone because they rely on the axiom that
the real numbers are complete. The Completeness Axiom states that
Every nonempty subset S of R that is bounded above has a least upper
bound.
This least upper bound sup S is not necessarily a member of the set S.
The Archimedean property of the real numbers states that
for any two positive real numbers a and b, there exists a positive integer n
such that na > b. Its proof requires the Completeness Axiom.
The rational numbers are a dense subset of the real numbers. This means
if a, b R and a < b, there exists r Q such that a < r < b.
Again the proof relies of the completeness of the real numbers.
It is not unreasonable to think of real numbers as infinite decimals (though
there are complications). In this view, (which is not even algebraic) is
the least upper bound of the set
S = {3, 3.1, 3.14, 3.141, 3.1415, 3.14159, }
The real numbers form an uncountable set. This means that there is no bijection between them and the natural numbers: they cannot be enumerated
as r1 , r2 , .
1.3
Quantifiers are not used by Ross, but they are conventional in mathematics
and save space when you are writing proofs.
is read there exists. It is usually followed by such that or s.t.
Example: the proposition x s.t. x2 = 4 is true since either 2 or -2 has
the desired property.
is read for all or for each or for every. It is used to specify that some
proposition is true for every member of a possibly infinite set or sequence.
Example: x R, x2 0 is true, but x R, x2 > 0 is false.
Quantifiers and negation: useful in doing proofs by contradiction.
The negation of x such that P (x) is true is x, P (x) is false.
The negation of x, P (x) is true is x such that P (x) is false.
1.4
1.5
Theorems about limits, all provable from the definition. These will be
especially useful for us after we define continuity in terms of sequences.
If lim sn = s then lim(ksn ) = ks.
If lim sn = s and lim tn = t, then lim(sn + tn ) = s + t.
Any convergent sequence is bounded:
if lim sn = s, M such that n, |sn | < M.
If lim sn = s and lim tn = t, then lim(sn tn ) = st.
If lim sn = 0 and (tn ) is bounded, then lim(sn tn ) = 0.
If sn 6= 0 for all n and s = lim sn 6= 0, then inf |sn | > 0 and
converges to 1s .
1
sn
Using the limit theorems above is usually a much more efficient way to find
the limit of the sequence than doing a brute-force calculation of N in terms
of . Ross has six diverse examples.
The symbol + has a precise meaning when used to specify a limit. We
say that the sequence sn diverges to + if
M > 0, N such that n > N, sn > M .
Similarly, we say that the sequence sn diverges to if
M < 0, N such that n > N, sn < M .
Theorems about infinite limits:
If lim sn = + and lim tn > 0 (could be +), then lim sn tn = +.
If (sn ) is a sequence of positive real numbers, then lim sn = + if and
only if lim s1n = 0.
If lim sn = +, then lim sn + tn = + if tn has any of the following
properties:
lim tn >
tn is bounded (but does not necessarily converge).
inf tn > (who cares whether tn is bounded above?).
Lecture Outline
1. Peano axioms for the natural numbers N = 1, 2, 3,
N1. 1 belongs to N.
N2. If n N, then n + 1 N.
N3. 1 is not the successor of any element of N.
N4. If n and m N have the same successor, then n = m.
N5. A subset S N which contains 1, and which contains n + 1
whenever it contains n, must equal N.
Axiom N5 is related to proof by induction, where you want to prove an
infinite set of propositions P1 , P2 , P3 , .
You do this by proving P1 (the base case) and then proving that Pn
implies Pn+1 (the inductive step).
A well known example: the formula 1 + 2 + 3 + + n = 21 n(n + 1)
For proposition P1 simply set n = 1: it is true that 1 = 21 n(n + 1)
Write down proposition Pn , and use a little algebra to show that if Pn is in
the sequence of true propositions, then so is Pn+1
There is less to this approach than meets the eye. Instead of proving that
Pk implies Pk+1 for k 1, we showed that NOT Pk implies NOT Pk1 for
k 2,
But these two statements are logically equivalent: quite generally, for propostions p and q, p = q if and only if q = p. (principle of
contraposition)
A practical rule of thumb:
If it is easier to prove that Pk = Pk+1 , use induction.
If it is easier to prove that Pk = Pk1 , use the least-number
principle.
9
10
12
13
14. Least upper bound principle works for R but not for Q.
Your students at Springfield North are competing with a rival team from
Springfield South to draw up a business plan for a company with m scientists
and n other employees. Entries with m2 > 2n2 get rejected. The entry with
the highest possible ratio of scientists to other employees wins the contest.
Will this competition necessarily have a winner?
14
15
16
.
2
n( n + 1 n2 1)
= 0.99999999874999999...
100( 10001 9999)
(b) Evaluate
4
lim((n + 1) 3 n 3 ).
4
17
3
100 = 4.6415....
Group Problems
1. Proofs that use induction
(a) Prove that for all nonnegative integers n
n
X
i3 =
i=1
n
X
!2
i
i=1
i=
i=1
n(n + 1)
2
(b)
(c)
18
19
20
Homework
1. Ross, exercise 1.1. Do the proof both by induction (with base case and
inductive step) and by the least number principle (show that the assumption that there is a nonempty set of positive integers for which the formula
is not true leads to a contradiction)
2. Using quantifiers to describe infinite sequences
A Greek hero enters the afterlife and is pleased to learn that the goddess
Artemis is going to be to training him for eternity. He will be shooting an
infinite sequence of arrows. The distance that the nth arrow travels is sn .
Use quantifiers and to convert the following to mathematical notation.
(a) He will shoot only finitely many arrows more than 200 meters.
(b) The negation of (a): he will shoot infinitely many arrows more than
200 meters. (You can do this mechanically by using the rules for
negation of statements with quantifiers.)
(c) No matter how small a positive number Artemis chooses, all the rest
of his shots will travel more than 200 meters. (Off the record
this idea can be expressed as lim inf sn = 200)
(d) He will become so consistent that eventually any two of his subsequent
shots will differ in distance by less than 1 meter. (This idea will
resurface next week as the concept of Cauchy sequence.)
3. Denseness of Q
This problem is closely related to group problem 1c.
(a) Find a rational number x such that
355
113
22
.
7
355
.
113
<x<
21
The last three problems must be dome in LaTeX. Print the pdf file and
attach it to your handwritten solutions.
7. Ross, Exercise 8.9. The star on the exercise means that it is referred to in
many places.
8. Ross, Exercise 9.12. This ratio test may be familiar from a calculus
course. There is a similar, better known test for infinite series that is
slightly more difficult to prove.
9. Ross, Exercises 9.15 and 9.16(a). The first of these results is invoked frequently in calculus courses, especially in conjunction with Taylor series, but
surprisingly few students can prove it. If you are working the problems in
order, both should be easy.
22
Review the convergence tests you can remember and any specific criteria
for their applications. Use one to show that
en
n=0
X
(1)n
n
n=0
R Scripts
Script 2.2A-MoreSequences.R
Topic 1 Cauchy Sequences
Topic 2 Lim sup and lim inf of a sequence
Script 2.2B-Series.R
Topic 1 Series and partial sums
Topic 2 Passing and failing the root test
Topic 3 Why the harmonic series diverges
1
1.1
Executive Summary
Monotone sequences
1.2
1.3
Cauchy sequences
1.4
Given any bounded sequence, the tail of the sequence, which consists of the
infinite number of elements beyond the N th element, has a well-defined supremum
and infimum.
Let us combine the notion of limit with the definitions of supremum and
infimum. The limit infimum and limit supremum are written and defined as
follows:
lim inf sn = lim inf{sn : n > N }
N
1.5
1.6
n
X
ak
k=m
1.7
Familiar examples
arn =
n=0
a
1r
X
1
np
n=1
1.8
Cauchy criterion
. We say that a series satisfies the Cauchy criterion if the sequence of its partial
sums is a Cauchy sequence. Writing this out with quantifiers, we have
> 0, N s.t. m, n > N, |sn sm | <
Here is a restatement of the Cauchy criterion, which proves more useful for
some proofs:
n
X
> 0, N s.t. n m > N, |
ak | <
k=m
1.9
Convergence tests
1.10
an x n
n=0
where the sequence (an ) is a sequence of real numbers. A power series defines a
function of x whose domain is the set of values of x for which the series converges.
That, of course, depends on the coefficients (an ). There are three possibilities:
Converges x R.
Converges only for x = 0.
Converges x in some interval, centered at 0. The interval may be open (R, R),
closed [R, R], or a mix of the two like [R, R]. The number R is called the radius
of convergence. Frequently the series converges absolutely in the interior of the
interval, but the convergence at an endpoint is only conditional.
8
Lecture Outline
1. (Ross, p. 62, convergent & Cauchy sequences) A Cauchy sequence is defined as a sequence where > 0, N s.t. m, n > N = |sn sm | <
X
k=0
ark =
a
if |r| < 1,
1r
10
1
1
sn+1
sn+1
| lim inf |sn | n lim sup |sn | n lim sup |
|
sn
sn
If lim sup |an+1 /an | < 1, then the series converges absolutely.
If lim inf |an+1 /an | > 1, then the series diverges.
If lim inf |an+1 /an | 1 lim sup |an+1 /an |, then the test gives no
information.
8. (Ross, p.188, Radius of Convergence)
P
Consider the power series
an xn . Let us refer to lim sup |an |1/n as and
1/ as R. (Logically, it follows that if = 0, R = + and if = +, R =
0. )
Prove the following:
If |x| < R, the power series converges.
If |x| > R, the power series diverges.
(You may recognize R here as the radius of convergence.)
11
12
loge 2 = 1
1 1 1 1 1 1 1
+ + + + .
2 3 4 5 6 7 8
1
1 1 1 1
loge 2 = + + .
2
2 4 6 8
1 1 1 1 1 1
3
loge 2 = 1 + + + = loge 2.
2
4 4 3 8 8 5
3
= 1; 3 = 2; 1 = 0.
2
13
X
n=2
1
n(n 1)
X
1
n2
n=2
is convergent.
14
15
13. A case where the root test outperforms the ratio test
(Ross, Example 8 on page 103)
X
n=0
n n
2(1)
=2+
1 1
1
1
1
+ +
+ +
+ .
4 2 16 8 64
16
14. (Model for group problems, set 3) Find the radius of convergence and the
exact interval of convergence for the series
X
n 3n
x ,
2n
n=0
17
Group Problems
1. Subsequences, monotone sequences, lim sup and lim inf
(a) (Ross, 11.4) Here are four sequences:
n
)
4
an example of a monotone subsequence.
its set of subsequential limits.
its lim sup and lim inf.
is bounded? converges? diverges to +?
n
= 1 + + + .
4
3 5 7 9
i. For the sequence of partial sums (sn ), find an increasing subsequence and a decreasing subsequence.
ii. Prove that lim sup sn = lim inf sn
iii. Prove that the series is not absolutely convergent by showing that
it fails the Cauchy test with = 1/2,
18
1
]t
(n+1)2 n
for n > 1.
19
This last set of problems should be done using LaTeX. They provide good
practice with summations, fractions, and exponents.
3. Applying convergence tests to power series (Ross, 23.1 and 23.2)
Find the radius of convergence R and the exact interval of convergence.
In each case, you can apply the root test (works well with powers) or the
ratio test (works well with factorials) to get an equation that can be solved
for x to get the radius of convergence R. Since you have an xn , the root test,
which you may not have encountered in AP calculus, is especially useful. At
the endpoints you may need to apply something like the alternating series
test or the integral test.
Remember that lim n1/n = 1.
(a)
X
X 2n
xn! .
( )xn and
n!
(b)
X
X
3n
n
)x
and
nxn .
n 4n
(c)
X (1)n
X 3n
xn .
( 2 n )xn and
n4
n
20
Homework
1. Ross, 10.2 (Prove all bounded decreasing sequences converge.)
2. Ross, 10.6
3. Ross, 11.8.
4. Suppose that (sn ) is a Cauchy sequence and that the subsequence (s1 , s2 , s4 , s8 , s16 , )
converges to s. Prove that lim sn = s. Hint use the standard bag of tricks:
the triangle inequality, epsilon-over-2, etc.
5. Sample problem 2 shows that in general, the order of terms in a series must
be respected when calculating the sum. However, addition is commutative
and associative, which makes it surprising that order should matter.
Prove that if a series (an ) has only positive terms, then its sum is
equal to the least upper bound of the numbers that can be obtained
by summing over any finite subset of the terms.
Hint: Call this least upper bound S 0 . Call the sum as defined by Ross
S. Prove that S 0 S and that S S 0 .
Suppose that a series includes both positive and negative terms and
its sum is S. It looks as though you can split it into a series of nonnegative terms and a series of negative terms, sum each separately,
then combine the results. Will this approach work for the seies in
sample problem 2
6. Ross, 14.3 (Determining whether a series converges. Apologies to those who
have already done hundreds of these in a high-school course.)
7. Ross, 14.8.
8. Ross, 15.6
9. Ross, 23.4. You might find it useful to have R generate some terms of the
series.
21
1
for x 6= 0, g(0) = 0
x
1
n
can be used as
Suppose that a function f (x) has the property that the image of the interval
I = [0, 2] is the interval J = [0, 1] [2, 3]. Invent a discontinuous function f
with this property and conveince yourself that no continuous function can
have this property.
When you define the arc sine function in a calculus course, you begin by
restricting the domain of the sine function to the interval [ 2 , 2 ]. Convince
yourself that this restriction makes Theorems 18.4 and 18.5 apply, while
restricting the domain to [0, ] would not work. Which restricted domain
works for defining the arc cosine function?
Read through examples 1-3 in section 19.1 of Ross. You can skip over the
computational details. The key issue is this:
On the interval (0, ) the function f (x) = x12 is continuous for any specified
x0 . However, when x0 is very small, the that is needed to prove continuity
must be proportional to x30 . There is no one size fits all that is independent of x0 . Example 3 shows that even with = 1, it is impossible to meet
the requirement for uniform continuity. When you draw the graph of f (x),
you see what the problem is: the derivative of f (x), which is essentially the
ratio of to , is unbounded.
The squaring function f (x) = x2 is continuous. However, its derivative is
unbounded on [0, ), and the function is not uniformly continuous. Convince yourself that no matter how small you require |y x| to be, you can
always make |f (y) f (x)| be as large as you like simply by making y and
x be large.
Now you have seen two ways to select a function and an interval so that the
function is continuous but not uniformly continuous on the interval. Read
through the rest of section 19.1 to see how to avoid this situation. There
are four ways:
Make the interval be closed and bounded.
If the interval is not closed, make it closed by including its endpoints,
and extend the function so that it remains continuous.
The problem is related to an unbounded derivative: if f 0 (x) is bounded,
it goes away.
If f turns a Cauchy sequence (xn ) into a Cauchy sequence (f (xn )),
there is no problem,
Think hard about definition 20.1. This is not the definition of limit that
is found in most calculus texts, but it is in some ways better because it
incorporates the ideas of limit at infinity and increases without limit.
Look at theorems 20.4 and 20.5, and convince yourself that they are crucial
for proving the well-known formulas for derivatives that are in every calculus
course. If you are fond of entertaining counterexamples, look at example 7
on page 158.
Proofs to present in section or to a classmate who has done them.
7.1 Suppose that a < b, f is continuous on [a, b], and f (a) < y < f (b).
Prove that there exists at least one x [a, b] such that f (x) = y.
Use Rosss no bad sequence definition of continuity, not the epsilon-delta
definition.
7.2 Using the Bolzano-Weierstrass theorem, prove that if function f is continuous on the closed interval [a, b], then f is uniformly continuous on [a, b].
R Scripts
Script 2.3A-Continuity.R
Topic 1 - Two definitions of continuity
Topic 2 Uniform continuity
Script 2.3B-IntermediateValue.R
Topic 1 - Proving the intermediate value theorem
Topic 2 - Corollaries of the IVT
Executive Summary
1.1
1.2
f
g
is continuous at
1.3
Its all a matter of the order of quantifiers. For continuity, y is agreed upon
before the epsilon-delta game is played. For uniform continuity, a challenge is
made using some > 0, then a has to be chosen that meets the challenge
independent of y.
For function f whose domain is a set S:
Continuity: y S, > 0,
> 0 such that x S, |x y| < implies |f (x) f (y)| < .
Uniform continuity: > 0,
> 0 such that x, y S,|x y| < implies |f (x) f (y)| < .
On [0, ] (not a bounded set), the squaring function is continuous but not
uniformly continuous.
On (0, 1) (not closed) the function f (x) =
continuous.
1
x
1.4
Limits of functions
1. Definitions of limit
Rosss definition of limit, consistent with the definition of continuity:
S is a subset of R, f is a function defined on S, and a and L are real
numbers, or . Then limxaS f (x) = L means
for every sequence (xn ) in S with limit a, we have lim(f (xn )) = L.
The conventional epsilon-delta definition:
f is a function defined on S R, a is a real number in the closure of S
(not ) and L is a real number (not ). limxa f (x) = L means
> 0, > 0 such that if x S and |x a| < , then |f (x) L| < .
2. Useful theorems about limits, useful for proving differentiation rules.
Note: a can be but L has to be finite.
Suppose that L1 = limxaS f1 (x) and L2 = limxaS f2 (x) exist and are
finite.
Then
limxaS (f1 + f2 )(x) = L1 + L2 .
limxaS (f1 f2 )(x) = L1 L2 .
limxaS ( ff21 )(x) =
L1
,
L2
Lecture outline
1. (Ross, page 124)
For specified x0 and function f , define the following terminology:
If lim xn = x0 and lim f (xn ) = f (x0 ), we call xn a good sequence.
If lim xn = x0 but lim f (xn ) = f (x0 ), we call xn a bad sequence.
Then Rosss definition of continuity is every sequence is a good sequence.
Prove the following, which is the more conventional definition:
Let f be a real-valued function with domain U R. Then f is continuous
at x0 U if and only if
> 0, > 0 such that if x U and |x x0 | < , |f (x) f (x0 )| < .
2. (Ross, page 128)
Prove that if f and g are real-valued functions that are continuous at x0 R,
then f + g is continuous at x0 .
3. (Ross, page 133)
Let f be a real-valued function on a closed interval [a, b]. Using the BolzanoWeierstrass theorem, prove that f is bounded and that f achieves its maximum value: .i.e. y0 [a, b] such that f (x) f (y0 ) for all x [a, b].
4. (Ross, page 134: the intermediate value theorem)
Suppose that a < b, f is continuous on [a, b], and f (a) < y < f (b). Prove
that there exists at least one x [a, b] such that f (x) = y.
Use Rosss no bad sequence definition of continuity, not the epsilon-delta
definition.
5. (Ross, page 143)
Using the Bolzano-Weierstrass theorem, prove that if function f is continuous on the closed interval [a, b], then f is uniformly continuous on [a, b].
6. (Ross, page 146)
Prove that if f is uniformly continuous on a set S and (sn ) is a Cauchy
sequence in S, then (f (sn )) is a Cauchy sequence. Invent an example where
f is continuous but not uniformly continuous on S and (f (sn )) is not a
Cauchy sequence.
10
x
|x|
for x 6= 0, 0 for x = 0.
11
x2 x4
+
2
24
is equal to zero for one and only one value x [1, 2].
This result will be useful when we define without trigonometry.
12
1
.
x2
13
1 cos x
x2
14
15
x
limx0 |x|
|x| = 0.
does not
x3 1
lim
x1 x 1
16
Group Problems
1. Proofs about continuity
For (a) and (b), do two different versions of the proof:
Use the no bad sequence definition and invoke a result for sequences
from week 1.
Use the epsilon-delta definition and mimic the proof for sequences from
week 1.
(a) Prove that if f and g are real-valued functions that are continuous at
x0 R, then f g is continuous at x0 . (Hint: on any closed interval [x0
a, x0 + b] in the domain of f , the continuous function f is bounded.)
(b) Prove that if f is continuous at x0 R, and g is continuous at f (x0 ),
then the composite function g f is continuous at x0 .
(c)
17
It is continuous.
Its domain is [2,4].
Its codomain is [1,2].
There is no x for which 2T (x) = x.
18
3. Calculation of limits (do these in LaTeX to get practice with fractions and
functions)
(a) Limits by brute force
i. Use the epsilon-delta definition of limit to prove that limx0 x sin x1 =
0.
ii. Use the sequence definition of limit to show that limx0 sin x1 does
not exist.
(b) Limits that involve square roots; use the sum and product rules for
limits
Evaluate
(x + h) 2 x 2
lim
h0
h
Evaluate
lim ( x + 1 x)
(c) Limits that involve trig functions; use the sum and product rules for
limits and the fact that limx0 sinx x = 1.
Evaluate
cos 2x 1
x0
x2
lim
Evaluate
lim
x0
19
tan x sin x
x3
Homework
Special offer if you do the entire problem set, with one problem omitted, in
LaTeX and hand in a printout of the PDF file, you will receive full credit for the
omitted problem.
1. Ross, exercises 19.2(b) and 19.2(c). Be sure that you prove uniform continuity, not just continuity!
2. Ross, exercise 19.4.
3. Ross, exercises 20.16 and 20.17. This squeeze lemma is a cornerstone of
elementary calculus, and it is nice to be able to prove it!
4. Ross, exercise 20.18. Be sure to indicate where you are using various limit
theorems.
5. Ross, exercise 17.4. It is crucial that the value of is allowed to depend on
x.
6. Ross, exercises 17-13a and 17-14. These functions will be of interest when
we come to the topic of integration in the spring term.
7. Ross, exercise 18-4. To show that something exists, describe a way to
construct it.
8. Ross, exercise 18-10. You may use the intermediate-value theorem to prove
the result.
20
Look at example 1 of section 31.4, where the familiar Taylor series for the
exponential function and the sine function are derived. By looking at the
corollary at the start of the section and the theorem that precedes it, figure
out the importance of the statement the derivatives are bounded.
Skim the proof of the binomial theorem in Section 31.7. Notice that it is
not sufficient just to crank out derivatives and get the Taylor series. We
will need to prove that, for any |x| < 1, the series for (1 + x) converges
to the function, and this requires a different form on the remainder. Look
at Corollary 31.6 and Corollary 31.4 and figure out which relies on the
mean-value theorem and which relies on integration by parts.
Proofs to present in section or to a classmate who has done them.
8.1 Suppose that f is a one-to-one continuous function on open interval I
(either strictly increasing or strictly decreasing) Let open interval J = f (I),
and define the inverse function f 1 : J I for which
(f 1 f )(x) = x for X I; f f 1 (y) = y for y J.
Use the chain rule to prove that if f 1 is differentiable at y0 = f (x0 ),
then
1
.
(f 1 )0 (y0 ) = 0
f (x0 )
Let g = f 1 ; it has already been shown that g is continuous at y0 .
Prove that, if f if differentiable at x0 , then
lim
yy0
1
g(y) g(y0 )
.
= 0
y y0
f (x0 )
8.2 Taylors Theorem with remainder: Let f be defined on (a, b) with a <
0 < b. Suppose that the nth derivative f (n) exists on (a, b).
Define the remainder
Rn (x) = f (x)
n1 (k)
X
f (0)
k=0
k!
xk .
Prove, by repeated use of Rolles theorem, that for each x 6= 0 in (a, b),
there is some y between 0 and x for which
Rn (x) =
f (n) (y) n
x .
n!
f (b) f (a)
ba
8.4 (Ross, pp. 228, The Chain Rule easy special case) Assume the following:
Function f is differentiable at a.
Function g is differentiable at f (a).
There is an open interval J containing a on which f is defined and
f (x) 6= f (a) (without this restriction, you need the messy Case 2 on
page 229).
Function g is defined on the open interval I = f (J), which contains
f (a).
Using the sequential definition of a limit, prove that the composite function
g f is defined on J and differentiable at a and that
(g f )0 (a) = g 0 (f (a)) f 0 (a).
R Scripts
Script 2.4A-Taylor Series.R
Topic 1 - Convergence of the Taylor series for the cosine function
Topic 2 - A function that is not the sum of its Taylor series
Topic 3 - Illustrating Rosss proof of Taylor series with remainder.
Script2.4B-LHospital.R Topic 1 - Illustration of proof 6 from Week 8
Script 2.4C-SampleProblems.R
Executive Summary
1.1
xa
1.2
The terminology is the same as what we used for sequences. It applies to functions
whether or not they are differentiable or even continuous.
A function f is strictly increasing on an interval I if x1 , x2 I and
x1 < x2 = f (x1 ) < f (x2 )
A function f is strictly decreasing on an interval I if x1 , x2 I and
x1 < x2 = f (x1 ) > f (x2 )
A function f is increasing on an interval I if x1 , x2 I and
x1 < x2 = f (x1 ) f (x2 )
A function f is decreasing on an interval I if x1 , x2 I and
x1 < x2 = f (x1 ) < f (x2 )
1.3
These justify our procedures when we are searching for the critical points of a
given function. They are the main properties we draw on when reasoning about
a functions behavior.
If f is defined on an open interval, achieves its maximum or minimum at
some x0 , and is differentiable there, then f 0 (x0 ) = 0.
Rolles Theorem. If f is continuous on some interval [a, b] and differentiable
on (a, b) with f (a) = f (b), then there exists at least one x (a, b) such
that f 0 (x) = 0 (Rolles Theorem).
Mean Value Theorem. If f is continuous on some interval [a, b] and differentiable on (a, b), then there exists at least one x (a, b) such that
f 0 (x) =
f (b) f (a)
ba
1.4
1
f 0 (f 1 (y))
1
1
1
1
=
=
=
2
(tan)0 (arctan y)
sec2 (arctan y)
1 + tan (arctan y)
1 + y2
1.5
1.6
LHospitals rule
Then
f (x)
= L.
xa+ g(x)
lim
f (x)
L| < .
g(x)
1.7
Taylor series
k=0
X
f (k) (0)
k=0
k!
ex = 1 + x + x2 + x3! +
2
4
cos x = 1 x2 + x4! +
Taylors theorem with remainder version 2
Rx
The fundamental theorem of calculus says that f (x) f (0) = 0 f 0 (t)dt.
The generalization is that
f 00 (0) 2
f (n1) (0) n1
f (x)f (0)f (0)x
x
x
=
2!
(n 1)!
0
Z
0
(x t)n1 (n)
f (t)dt.
(n 1)!
Lecture Outline
1. (Ross, p.226, Sum and Product Rule for Derivatives)
Consider two functions f and g. Prove that if both functions are differentiable at some point a, then both (f + g) and f g are differentiable at a as
well, and:
(f + g)0 (a) = f 0 (a) + g 0 (a)
(f g)0 (a) = f (a)g 0 (a) + f 0 (a)g(a)
2. (Ross, pp. 228, The Chain Rule easy special case) Assume the following:
Function f is differentiable at a.
Function g is differentiable at f (a).
There is an open interval J containing a on which f is defined and
f (x) 6= f (a) (without this restriction, you need the messy Case 2 on
page 229).
Function g is defined on the open interval I = f (J), which contains
f (a).
Using the sequential definition of a limit, prove that the composite function
g f is defined on J and differentiable at a and that
(g f )0 (a) = g 0 (f (a)) f 0 (a).
3. The derivative at a maximum or minimum (Ross, page 232)
Prove that if f is defined on an open interval containing x0 , if f has its
maximum of minimum at x0 , and if f is differentiable at x0 , then f 0 (x0 ) = 0.
4. (Ross, pp.233-234, Rolles Theorem and the Mean Value Theorem)
Prove Rolles Theorem: if f is a continuous function on [a, b] that is differentiable on (a, b) and satisfies f (a) = f (b), then there exists at least one x
in (a, b) such that f 0 (x) = 0.
Using Rolles Theorem, prove the Mean Value Theorem: f is a continuous
function on [a, b] that is differentiable on (a, b), then there exists at least
one x in (a, b) such that
f 0 (x) =
f (b) f (a)
ba
5. (Ross, theorem 29.9 on pages 237-238, with the algebra done in reverse
order)
Suppose that f is a one-to-one continuous function on open interval I (either
strictly increasing or strictly decreasing) Let open interval J = f (I), and
define the inverse function f 1 : J I for which
(f 1 f )(x) = x for X I; f f 1 (y) = y for y J.
Use the chain rule to prove that if f 1 is differentiable at y0 = f (x0 ),
then
1
.
(f 1 )0 (y0 ) = 0
f (x0 )
Let g = f 1 ; it has already been shown that g is continuous at y0 .
Prove that
g(y) g(y0 )
1
= 0
.
lim
yy0
y y0
f (x0 )
6. (LHospitals Rule; based on Ross, 30.2, but simplified to one special case)
Suppose that f and g are differentiable functions and that
f 0 (z)
= L; f (a) = 0, g(a) = 0; g 0 (a) > 0.
za+ g 0 (z)
lim
Choose x > a so that for a < z x, g(z) > 0 and g 0 (z) > 0.
(You do not have to prove that this can always be done!)
By applying Rolles Theorem to h(z) = f (z)g(x) g(z)f (x),
prove that
f (x)
= L.
xa+ g(x)
lim
10
Rn (x) = f (x)
k=0
k!
xk .
Prove, by repeated use of Rolles theorem, that for each x 6= 0 in (a, b),
there is some y between 0 and x for which
Rn (x) =
f (n) (y) n
x .
n!
Z
L(y) =
1
1
dt.
t
Prove from this definition the following properties of the natural logarithm:
L0 (y) =
1
for y (0, ).
y
11
9. Calculating derivatives
Let f (x) = 3 x.
(a) Calculate f 0 (x) using the definition of the derivative.
(b) Calculate f 0 (x) by applying the chain rule to (f (x))3 = x.
12
13
x0+
(b) Evaluate
xex sin x
x0
x2
lim
14
15
16
x3 x5
x2 x4
+
+ ; cosh x = 1 +
+
+
3!
5!
2!
4!
17
Group Problems
1. Proving differentiation rules
(a) Trig functions
Prove that (sin x)0 = cos x from scratch using the fact that
sin x
=1
x0 x
lim
Let f (x) = csc x so that sin xf (x) = 1. Use the product rule to
prove that
(csc x)0 = csc x cot x.
(b) Integer exponents
Positive: use induction and the product rule to prove that for all
positive integers n
(xn )0 = nxn1
Hint: start with a base case of n = 1.
Negative: let f (x) = xn so that xn f (x) = 1. Use the product
rule to prove that for all positive integers n
(xn )0 = nxn1 .
(c) Non-integer exponents
Rational exponent: Let f (x) = xm/n , so that (f (x))n = xm .
Prove that
m m
f 0 (x) = x n 1 .
n
Irrational exponent:
Let p be any real number and define f (x) = xp = E(pL(x)).
Prove that f 0 (x) = pxp1 .
18
csc x cot x
.
x
It takes a little bit of algebraic work to rewrite this in a form to
which LHospitals rule can be applied.
lim
x0
19
3. Taylor series
(a) Using the Taylor series for the trig functions
Define functions S(x) and C(x) by the power series
x 3 x5
x2 x4
+
; C(x) = 1
+
3!
5!
2!
4!
Calculate S 0 (x) and C 0 (x), and prove that S 2 (x) + C 2 (x) = 1.
Use Taylors theorem to prove that
C(a + x) = C(a)C(x) S(a)S(x).
S(x) = x
1 1 1 1
+ + .
2 3 4 5
20
Homework
Again, if you do the entire assignment in TeX, you may omit one problem and
receive full credit for it.
1. Ross, 28.2
2. Ross, 28.8
3. Ross, 29.12
4. Ross, 29.18
5. Ross, exercises 30-1(d) and 30-2(d). Do these two ways: once by using
LHospitals rule, once by replacing each function by the first two or three
terms of its Taylor series.
6. Ross, 30-4. Use the result to convert exercise 30-5(a) into a problem that
involves a limit as y .
7. One way to define the exponential function is as the sum of its Taylor series:
ex = 1 + x +
x2
2!
x3
3!
+ .
21
a~i |
i=1
|~
ai |
i=1
under the assumption that the infinite series on the right is convergent,
which in turn implies that the infinite series of vectors on the left is convergent.
R Scripts
Script 3.1A-FiniteTopology.R
Topic 1 - The standard Web site graph, used in notes and examples
Topic 2 - Drawing a random graph to create a different topology on the
same set
Script 3.1B-SequencesSeriesRn.R
Topic 1 - A convergent sequence of points in R2
Topic 2 - A convergent infinite series of vectors
Topic 3 - A convergent geometric series of matrices
Script 3.1C-DiffEquations.R
Topic 1 - Two real eigenvalues
Topic 2 - A repeated real eigenvalue
Topic 3 - Complex conjugate eigenvalues
Executive Summary
1.1
Axioms of Topology
In topology, we start with a set X and single out some of its subsets as open
sets. The only requirement on a topology is that the collection of open sets
satisfies the following rules (axioms)
The empty set and the set X are both open.
The union of any finite or infinite collection number of open sets is open.
The intersection of two open sets is open. It follows by induction that the
intersection of n open sets is open, but the intersection of infinitely many
open sets is not necessarily open.
1.2
A model for a set of axioms is a set of real-world objects that satisfy the axioms.
Consider a Web site of six pages, linked together as follows:
In this model, an open set is defined by the property that no page in the
set can be reached by a link from outside the set. We need to show that this
definition is consistent with the axioms for open sets.
The empty set is open. Since it contains no pages, it contains no page that can
be reached by an outside link.
The set X of all six pages is open, because there is no other page on the site
from which an outside link could come.
If sets A and B are open, no page in either can be reached by an outside link,
and so their union is also open.
If sets A and B are open, so is their intersection A B. Proof by contraposition:
Suppose that A B is not open. Then it contains a page that can be reached
by an outside link. If that link comes from A, then B is not open. If that link
comes from B, then A is not open. If that link comes from outside both A and
B, then both A and B are not open.
4
1.3
Topology in R and Rn
The usual way to introduce a topology for the set R is to decree that any open
interval is an open set and so is the empty set. Equivalently, we can decree that
the set of points for which |x x0 | < , with > 0, is an open set. Notice that the
infinite intersection of the open sets (1/n, 1/n) is the single point 0, a closed
set!
The usual way to introduce a topology for the set Rn is to decree that any
open ball, the set of points for which |x x0 | < , with > 0, is an open set.
1.4
These definitions are intuitively reasonable for R and Rn , but they also apply to
the Web-site finite topology,
Closed sets
A closed set A is one whose complement Ac = X A is open. Careful:
this is different from one that is not open. There are lots of sets that are
neither open nor closed, and there are sets that are both open and closed.
A neighborhood of a point is any set that has as a subset an open set
containing the point. A neighborhood does not have to be open.
The closure of set A Rn , denoted A, is the smallest closed set that
contains A, i.e. the intersection of all the closed sets that contain A
is the largest open set that is
The interior of a set A Rn , denoted A,
contained in A, i.e. the union of all the open subsets of A.
The boundary of A, denoted A, is the set of all points x with the property
that any neighborhood of x includes points of A and also includes points
of the complement Ac .
The boundary of A is the difference between the closure of A and its interior.
1.5
1.6
1.7
We need something that can be made less than . For vectors the familiar
length is just fine. The infinite triangle inequality (proof 9.2) states that
|
a~i |
i=1
|~
ai |
i=1
X
(A)r tr
r=0
r!
X
(|A|t)r
r=0
or | exp(At)| exp(|A|t) +
r!
1.8
b 0
bt 0
If D =
, then Dt =
and
0 c
0 ct
bt
1 (bt)2
1 0
bt 0
0
e
0
exp(Dt) =
+
+
+ =
0 1
0 ct
0
(ct)2
0 ect
2
If there is a basis of eigenvectors for A,
then A = P DP 1 , Ar = P Dr P 1 ., and exp(At) = P exp(Dt)P 1 .
Replace D by a conformal matrix C = aI + bJ where J 2 = I and
exp(Ct) = exp(aIt) exp(bJt) can be expressed in terms of sin t and cos t.
If A = bI + N, and N 2 = 0, exp(At) = exp bt exp(N t) = exp bt(I + N t).
1.9
X
Ar tr
r=0
r!
X rAr tr1
d
exp At =
.
dt
r!
r=1
Set s = r 1.
X
X
d
As+1 ts
As ts
exp At =
=A
= A exp At.
dt
s!
s!
s=0
s=0
So
d
~v =
exp At~v0 = A exp At~v0 = A~v.
dt
7
Lecture outline
1. Proof 9.1
Define Hausdorff space, and prove that in a Hausdorff space the
limit of a sequence is unique.
Prove that Rn , with the topology defined by open balls, is a Hausdorff
space.
2. Convergent sequences in Rn :
A sequence a1 , a2 , ... in Rn converges to the limit a if
> 0, M such that if m > M , |am a| < .
Prove that the sequence converges if and only if the sequences of coordinates
all converge.
Then state and prove the corresponding result for infinite series of vectors
in Rn
3. Proof 9.2
Starting from the triangle inequality for two vectors, prove the triangle
inequality for n vectors, then prove the infinite triangle inequality for Rn
|
a~i |
i=1
|~
ai |
i=1
under the assumption that the infinite series on the right is convergent,
which in turn implies that the infinite series of vectors on the left is convergent.
4. Prove that if every element of the convergent sequence (xn ) is in the closed
subset C Rn , then the limit x0 of the sequence is also in C.
5. Proof of inequalities involving matrix length
The length of a matrix is calculated by treating it as a vector: take the
square root of the sum of the squares of all the entries.
If matrix A consists of a single row, then |A~b| |A||~b| is just the CauchySchwarz inequality.
Prove the following:
|A~b| |A||~b| when A is an m n matrix.
|AB| |A||B|
Find all the other sets that must be open because of the union axiom
and the axiom that set X is open.
We now have the smallest collection of open sets that satisfies the axioms and includes the subbasis. A closed set is one whose complement
is open. List all the closed sets.
What is the smallest legal collection of open sets in the general case?
What is the largest legal collection of open sets in the general case?
7. Web site topology. A set of pages is open if there are no incoming links
from elsewhere on the site. A set of pages is closed if no outgoing link
leads to a page outside the set (i.e. if the complement is an open set.)
10
8. The open ball definition of an open set satisfies the axioms of topology.
A set U Rn is open if x U, r > 0 such that the open ball Br (x) U .
Prove that the empty set is open.
11
Let A =
0
12
(I A)1 = I + A + A2 + ....
1
1
4 0
2 , A2 =
.
0
0 41
12
13
3
1
can be written ~v = A~v, where A =
.
1 1
Our standard technique leads to p(t) = t2 4t + 4 = (t 2)2 , so there is
one only eigenvalue.
11
Let N = A 2I =
.
11
We have found that p(A) = A2 4A + 4I = (A 2I)2 = 0, so N 2 = 0.
Since matrices 2I and N commute, exp(At) = exp(2It) exp(N t)
Show that exp At = e2t (I + N t) ,and confirm that (exp At)~e1 is a solution
to the differential equation.
14
12. Solving the harmonic oscillator differential equation (if time permits)
Applying Newtons second law of motion to a mass of 1 attached to a spring
with spring constant 4 leads to the differential equation
x = 4x.
Solve this equation by using matrices for the case where x(0) = 1, v(0) = 0.
The trick is to consider a vector
x(t)
~ =
w
, where v = x.
v(t)
15
Group Problems
1. Topology
(a) We can use the same conventions as for the ferryboat graph of week
1. Column j shows the links going out of page j. If Ti,j = 1, there is
a link from page j to page i. If Ti,j = 0, there is no link from page j
to page i.
0 1 0 0 0 0
1 0 0 0 0 0
0 1 0 1 0 0
T =
0 0 0 0 0 0 .
0 0 0 1 0 0
0 1 0 1 0 0
Draw the Web site graph that this matrix represents.
i. Open sets include {12} and {4}. List all the other open sets and
all the closed sets.
ii. Determine the interior, closure, and boundary of {123}.
iii. Determine to what point or points (if any) the sequence
(1, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 4, 6, 4, 6, 4, 6 ) converges.
(b) Recall the axioms of topology, which refer only to open sets:
The empty set and the set X are both open.
The union of any collection of open sets is open.
The intersection of two open sets is open.
A closed set C is defined as a set whose complement C c is open.
You may use the following well-known properties of set complements,
sometimes called De Morgans Laws:
(A B)c = Ac B c , (A B)c = Ac B c .
i. Prove directly from the axioms of topology that the union of two
closed sets is closed.
ii. In the Web site topology, a closed set of pages is one that has
no outgoing links to other pages on the site. Prove that in this
model, the union of two closed sets is closed.
iii. Prove that if A and B are closed subsets of R2 (with the topology
specified by open balls), their union is also closed.
(c) Subsets of R
A, and A.
i. Let A = {0} (1, 2]. Determine Ac , A,
S
ii. What interval is equal to n=2 [1 + n1 , 1 n1 ]? Is it a problem
that this union of closed sets is not a closed set?
iii. Let Q1 denote the set of rational numbers in the interval (1, 1).
Determine the closure, interior, and boundary of this set.
16
2. Convergence in Rn
(a) The sequence a1 , a2 , ... in Rn converges to a.
The sequence b1 , b2 , ... in Rn converges to b.
Define cn = an + bn , c = a + b.
Prove that the sequence c1 , c2 , ... in Rn converges to c. Use the triangle
inequality for vectors: the proof strategy is similar to the one that you
learned for sequences of real numbers.
(b) Suppose that the sequence a1 , a2 , ... in Rn converges to 0, and the sequence of real numbers k1 , k2 , , although not necessarily convergent,
is bounded: K > 0 such that n N, |kn | < K.
Prove that the sequence k1 a1 , k2 a2 , ... in Rn converges to 0.
0 1
(c) Prove that if J =
, then exp(Jt) = I cos t + J sin t. Show that
1 0
this is consistent with the Taylor series for eit .
17
3. Differential equations
(a) The original patriarchal differential equation problem
Isaac has established large flocks of sheep for his sons Jacob and Esau.
Anticipating sibling rivalry, he has arranged that the majority of the
growth of each sons flock will come from lambs born to the other son.
So, if x(t) denotes the total weight of all of Jacobs sheep and y(t)
denotes the total weight of all of Esaus sheep, the time evolution of
the weight of the flocks is given by the differential equations
x = x + 2y
y = 2x + y
1 2
i. Calculate exp(At), where A =
.
2 1
ii. Show that if the flocks are equal in size, they will remain that way.
What has this got to do with the eigenvectors of A?
iii. Suppose that when t = 0, the weight of Jacobs flock is S while the
weight of Esaus flock is 2S. Find formulas for the sizes as functions of time, and show that the flocks will become more nearly
equal in weight as time passes.
3 1
18
Homework
1. Suppose that you want to construct a Web site of six pages numbered 1
through 6, where the open sets of pages, defined as in lecture, include {126},
{124}, and {56}.
(a) Prove that in the Web site model of finite topology, the intersection
of two open sets is open.
(b) What other sets must be open in order for the family of open sets to
satisfy the intersection axiom?
(c) What other sets must be open in order for the family of open sets to
satisfy the union axiom?
(d) List the smallest family of open sets that includes the three given sets
and satisfies all three axioms. (You have already found all these sets!)
(e) Draw a diagram showing how six Web pages can be linked together so
that only the sets in this family are open. This is tricky. First deal with
5 and 6. Then deal with 1 and 2. Then incorporate 4 into the network,
and finally 3. There are many correct answers since, for example, if
page 1 links to page 2 and page 2 links to page 3, then adding a direct
link from page 1 to page 3 does not change the topology.
2. In R2 , in addition to defining an open ball Br around x, we can define an
open diamond Dr around x by
Dr (x) = {y R2 such that |x1 y1 | + |x2 y2 | < r}
and we can define an open square Sr around x by
Sr (x) = {y R2 such that max(|x1 y1 |, |x2 y2 |) < r}.
3
(a) For x =
, r = 1, make a sketch showing B1 (x), D1 (x), and S1 (x).
2
(b) Suppose that, in Hubbard definition 1.5.2, you replace open ball by
open diamond or open square. Prove that the topology remains
the same: i.e. that an open set according to one definition is an open
set according to either of the others.
(c) (Optional) Show that if, instead of two-component vectors, you use
infinite sequences, there is an open square of radius 1 centered on the
zero vector that is not contained in any open ball and an open ball of
radius 1 that is not contained in any open diamond. You can learn
more about infinite-dimensional vector spaces by taking Math 110,
Math 116, or Physics 143.
19
20
5. The differential equation x = 3x 2x describes the motion of an overdamped oscillator. The acceleration x is the result of the sum of a force
proportional to x,
supplied by a shock absorber, and a force proportional
to x, supplied by a spring.
x
~ =
(a) Introduce v = x as a new variable, and define the vector w
.
v
~ = A~
Find a matrix A such that w
w.
(b) Calculate the matrix exp(At).
(c) Graph x(t) for the following three sets of initial values that specify
position and velocity when t = 0:
1
~0 =
Release from rest: w
.
0
0
~0 =
Quick shove: w
.
1
1
~0 =
Push toward the origin: w
.
3
21
a b
6. Suppose that A is a matrix of the form S =
. Prove that
b a
cosh(bt) sinh(bt)
exp(St) = exp(at)
.
sinh(bt) cosh(bt)
Then use this result to solve
x = x + 2y
y = 2x + y
without having to diagonalize the matrix S.
1 9
7. Let B =
. Show that there is only one eigenvalue and find an
1 5
eigenvector for it. Then show that N = B I is nilpotent.
(a) By writing B = I + N , calculate B 2 .
(b) By writing B = I + N , solve the system of equations
x = x + 9y
y = x + 5y
x
for arbitrary initial conditions ~v0 = 0 .
y0
7 10
8. Week 4, sample problem 6, showed how to write A =
in the form
2 1
3 2
1 2
A = P CP 1 , where C =
is conformal and P
2 3
0 1
22
10.2 Using the Bolzano-Weierstrass theorem, prove that a continuous realvalued function f defined on a compact subset C Rn has a supremum M
and that there is a point a C (a maximum) where f (a) = M .
R Scripts
Script3.2A-LimitFunctionR2.R
Topic 1 - Sequences that converge to the origin
Topic 2 - Evaluating functions along these sequences
Script 3.2B-AffineApproximation.R
Topic 1 - The tangent-line approximation for a single variable
Topic 2 - Displaying a contour plot for a function
Topic 3 - The gradient as a vector field
Topic 4 - Plotting some pathological functions
Executive Summary
1.1
Limits in Rn
xx0
1.2
Function f is continuous at x0 if, for any open set U in the codomain that
contains f (x0 ), the preimage (inverse image) of U , i.e. the set of points x
in the domain for which f (x) U , is also an open set.
Here is the definition that lets us extend real analysis to n dimensions.
f : Rn Rm is continuous at x0 if, for any open codomain ball of radius
centered on f (x0 ), we can find an open domain ball of radius centered
on x0 such that if x is in the domain ball, f (x) is in the codomain ball.
An equivalent condition (your proof 10.1):
f is continuous at x0 if and only if every sequence that converges to x0 is a
good sequence. We will need to prove this for f : Rn Rm , but the proof
is almost identical to the proof for f : R R, which we have already done.
As was the case in R, sums, products, compositions, etc. of continuous
functions are continuous. If you can write a formula for a function of
several variables that does appear to involve division by zero, the theorems
on pages 98 and 99 will show that it is continuous.
To show that a function is discontinuous, construct a bad sequence!
1.3
1.4
Xk 6= .
k=1
1
)
k
1.5
The Heine-Borel theorem states that for a compact subset X Rn , any open
cover contains a finite subcover. In other words, if someone gives you a possibly
infinite collection of open sets Ui whose union includes every point in X, you can
select a finite number of them whose union still includes every point in X
X
m
[
Ui .
i=1
The proof (Hubbard, Appendix A.3) uses the nested compact set theorem.
In general topology, where the sets that are considered are not necessarily
subsets of Rn , the statement every open cover contains a finite subcover is
used as the definition of compact set.
1.6
Partial derivatives
f
xn
then its partial derivative with respect to the ith variable is
a1
a1
1
f
...
... )
= Di f (a) = lim (f
f
ai + h
ai
h0 h
xi
an
an
This does not give the generalization we want. It specifies a good approximation to f only along a line through a, whereas we would like an approximation
that is good in a ball around a.
5
1.7
h0
f (a + h~ei ) f (a)
= Di f (a)
h0
h
D1 f (a)
D2 f (a)
grad f (a) =
,
Dn f (a)
so that
~v f (a) = grad f (a) ~v.
We now have, for differentiable functions (and we will soon prove that if
the partial derivatives of f are continuous, then f is differentiable), a useful
generalization of the tangent-line approximation of single variable calculus.
f (a + h~v) f (a) + [Jf (a)](h~v)
This sort of approximation (a constant plus a linear approximation) is called
anaffine approximation.
6
Lecture outline
1. Given that function f : Rk Rm is continuous at x0 , prove that every
sequence such that xn x0 is a good sequence in the sense that f (xn )
converges to f (x0 ). (This is half of proof 10.1.)
Xk 6= .
k=1
10
m
[
Ui .
i=1
Break up the part of the city where your property lies into closed squares,
each 1 kilometer on a side. There will exist a square B0 that needs infinitely
many guards (the infinite pigeonhole principle).
Break up this square into 4 closed subsquares: again, at least one will need
infinitely many guards. Choose one subsquare and call it B1 . Continue this
procedure to get a decreasing sequence Bi of nested compact sets, whose
intersection includes a point a.
Now show that any guard whose open patrol zone includes a can replace
all but a finite number of other guards.
11
6. Cauchy sequences in Rn
Prove that every Cauchy sequence of vectors ~a1 , ~a2 , Rn is bounded:
i.e. M such that n, |~an | < M .
Hint: ~an = ~an ~am + ~am . When showing that a sequence is bounded,
you can ignore the first N terms.
Prove that if a sequence a1 , a2 , Rn converges to a, it is a Cauchy
sequence. Hint: am an = am a + a an . Use the triangle inequality.
Prove that every convergent sequence of vectors ~a1 , ~a2 , Rn is
bounded (very easy, given the preceding results.)
12
13
14
F y = 2
, F 0 = 0.
x + y2 + z2
z
0
Prove that F is continuous at the origin.
(b) Define
0
x
xy + xz + yz
0 = 0.
,
g
g y = 2
x + y2 + z2
z
0
Prove that g is discontinuous at the origin.
15
16
p
x
11. Let f
= xy 3 .
y
4
Evaluate the Jacobian matrix of f at
and use it to find the best affine
1
4
2
approximation to f (
+t
) for small t.
1
1
4
2
By defining g(t) = f (
+t
), you can convert this problem to one
1
1
in single-variable calculus. Show that using the tangent-line approximation
near t = 0 leads to exactly the same answer.
17
18
Group Problems
1. Theorems related to Bolzano-Weierstrass and Heine-Borel
(a) You are working for Heine-Borel Security and are bidding on a project
to guard the interior of one mile of Pennsylvania Avennue between the
Capitol to the White House, modeled as the open interval I = (0, 1).
Show that you can create a countably infinite set of disjoint open patrol
zones which cover only a subset of I, so that no finite subcover will be
possible. Then show that you cannot do the same with an uncountably
infinite set of disjoint open patrol zones. (Hint: each zone includes a
different rational number.)
(b) A school playground is a compact subset C R2 . Two aspiring quarterbacks are playing catch with a football, and they want to get as far
apart as possible. Show that if sup |x y| = D for any two points in
C, they can find a pair of points x0 and y0 such that |x0 y0 | = D.
Then invent simple examples to show that this cannot be done if the
playground is unbounded or is not closed.
(c) The converse of the Heine-Borel theorem states that if every open
cover of set X Rn contains a finite subcover, then X must be closed
and bounded.
i. By choosing as the open cover a set of open balls of radius 1, 2, ,
prove that X must be bounded.
ii. To show that X is closed, show that its complement X c must be
open. Hint: choose any x0 X c and choose an open cover of X
in which the kth set consists of points whose distance from x0 is
greater than k1 . This open cover of X must have a finite subcover.
If you need a further hint, look on pages 90 and 91 of Chapter 2 of
Ross.
19
(b) Let
xy(x2 y 2 )
x
0
= 0.
f
=
,f
2
2
2
0
y
(x + y )
0
Invent a bad sequence of points (a1 , a2 , ) that converges to
0
for which
lim f (ai ) 6= 0.
i
0
This bad sequence proves that f is discontinuous at
.
0
(c) Let
xy(x2 y 2 )
x
0
g
=
= 0.
,g
2
2
y
0
x +y
0
.
By introducing polar coordinates, prove that g is continuous at
0
20
21
Homework
1. A rewrite of Oetzi the Iceman, with lots of sign changes.
Joe the Plumber, who became a minor celebrity in the 2008 presidential
campagn, has hit the jackpot. Barack Obama enrolls him in a health plan,
formerly available only to members of Congress, that makes him immortal,
and gives him a special 401(k) that delivers $10K per month of tax-free
income. Joe retires to pursue his lifelong dream of camping at the lowest
spot in Death Valley.
Assume that Death Valley National Park is a closed set and that altitude
f (x) in the Park is a continuous function. Prove that the altitude in Death
Valley has a greatest lower bound (even though that is obvious on geographical grounds) and that there is a place where that lower bound is achieved,
so that Joe can achieve his goal.
2. You are the mayor of El Dorado. Not all the streets are paved with gold
only the interval [0,1] on Main Street but you still have a serious security
problem, and you ask Heine-Borel Security LLC to submit a proposal for
keeping the street safe at night. Knowing that the city coffers are full, they
come up with the following pricey plan for meeting your requirements by
using a countable infinity of guards:
Guard 0 patrols the interval ( N1 , N1 ), where you may choose any value
greater than 100 for the integer N . She is paid 200 dollars.
Guard 1 patrols the interval (0.4, 1.2) and is paid 100 dollars.
Guard 2 patrols the interval (0.2, 0.6) and is paid 90 dollars.
Guard 3 patrols the interval (0.1, 0.3) and is paid 81 dollars.
, 2.4 ) and is paid 100(0.9)k1 dollars.
Guard k patrols the interval ( 0.8
2k 2k
(a) Calculate the total cost of hiring this infinite set of guards (sum a
geometric series).
(b) Show that the patrol regions of the guards form an open cover of
the interval [0,1].
(c) According to the Heine-Borel theorem, this infinite cover has a finite
subcover. Explain clearly how to construct it. (Hint: look at the proof
of the Heine-Borel theorem)
(d) Suppose that you want to protect only the open interval (0,1), which
is not a compact subset of Main Street. In what very simple way can
Heine-Borel Security modify their proposal so that you are forced to
hire infinitely many guards?
22
23
2
x
and use it
8. Let f
= x y. Evaluate the Jacobian matrix of f at
4
y
1.98
to find the best affine approximation to f
.
4.06
24
25
R Scripts
Script 3.3A-ComputingDerivatives.R
Topic 1 - Testing for differentiability
Topic 2 - Illustrating the derivative rules
Script 3.3B-NewtonsMethod.R
Topic 1 - Single variable
Topic 2 - 2 equations, 2 unknowns
Topic 3 - Three equations in three unknowns
Script 3.3C-InverseFunction.R
Topic 1 - A parametrization function and its inverse
Topic 2 - Visualizing coordinates by means of a contour plot
Topic 3 - An example that is economic, not geometric
Executive Summary
1.1
This definition leads to the standard rule for calculating the number m,
f (a + h) f (a)
.
h0
h
m = lim
In that case, [Df (a)] is represented by the Jacobian matrix [Jf (a)].
Proof: Since L exists and is linear, it is sufficient to consider its action on
each standard basis vector. We choose ~h = t~
ei so that |~h| = t. Knowing
that the limit exists, we can use any sequence that converges to the origin
to evaluate it, and so
1
1
ei ) f (a) tL~
ei )) = 0? and L(~
ei ) = lim (f (a + t~
ei ) f (a))
lim (f (a + t~
t0 t
t0 t
What is hard is proving that f is differentiable that L exists since that
requires evaluating a limit where ~h ~0. Eventually we will prove that f is
differentiable at a if all its partial derivatives are continuous there.
1.2
f1
Df1 (a)
: then Df (a) =
fn
Dfn (a)
.
f + g is the sum of two functions f and g, both differentiable at a.
The derivative of f + g is the sum of the derivatives of f and g. (easy to
prove)
f g is the product of scalar-valued function f and vector-valued g, both
differentiable. Then
[D(f g)(a)]~v = f (a)([Dg(a)]~v) + ([Df (a)]~v)g(a).
g/f is the quotient of vector-valued function g and scalar-valued f , both
differentiable, and f (a) 6= 0. Then
g
[Dg(a)]~v ([Df (a)]~v)g(a)
[D( )(a)]~v =
.
f
f (a)
(f (a)2
U Rn and V Rm are open sets, and a is a point in U at which we want
to evaluate a derivative.
g : U V is differentiable at a, and [Dg(a)] is an m n Jacobian matrix.
f : V Rp is differentiable at g(a), and [Df (g(a))] is a p m Jacobian
matrix.
The chain rule states that [D(f g)(a))] = [Df (g(a))] [Dg(a)].
The combined effect of all these rules is effectively that if a function is
defined by well-behaved formulas (no division by zero), it if differentiable,
and its derivative is represented by its Jacobian matrix.
1.3
x1
f1 (x)
f (x) = f ... = ...
xn
fm (x)
the Jacobian matrix [Jf (x)] is made up of all the partial derivatives of f :
D1 f1 (a)....Dn f1 (a)
...
[Jf (a)] =
D1 fm (a)....Dn fm (a)
We can invent pathological cases where the Jacobian matrix of f exists
(because all the partial derivatives exist), but the function f is not differentiable. In such a case, using the formula
~v f (a) = [Jf (a)]~v
generally gives the wrong answer for the directional derivative! You are
trying to use a linear approximation where none exists.
Using the Jacobian matrix of partial derivatives to get a good affine approximation for f (a + ~h) is tantamount to assuming that you can reach the
point a + ~h by moving along lines that are parallel to the coordinate axes
and that the change in the function value along the solid horizontal line is
well approximated by the change along the dotted horizontal line. With
the aid of the mean value theorem, you can show that this is the case if
(proof 11.2) the partial derivatives of f at a are continuous.
(a1 , a2 + h2 ) (a1 + h1 , a2 + h2 )
(a1 , a2 )
1.4
(a1 + h1 , a2 )
1.5
Iterating this procedure is the best known for solving systems of nonlinear equations. Hubbard has a detailed discussion (which you are free to ignore) of how
to use Kantorovichs theorem to assess convergence.
1.6
For function f : [a, b] [c, d], we know that if f is strictly increasing or strictly
decreasing on interval [a, b], there is an inverse function g for which g f and
f g are both the identity function. We can find g(y) for a specific y by solving
f (x) y = 0, perhaps by Newtons method. If f (x0 ) = y0 and f 0 (x0 ) 6= 0, we
can prove that g is differentiable at y0 and that g 0 (y0 ) = 1/f 0 (x0 ).
Strictly monotone does not generalize, but nonzero f 0 (x0 ) generalizes to
invertible [Df (x0 )]. Start with a function f : Rn Rn whose partial derivatives
are all continuous, so that we know that it is differentiable everywhere. Choose
a point x0 where the derivative [Df (x0 )] is an invertible matrix. Set y0 = f (x0 ).
Then there is a differentiable local inverse function g = f 1 such that
g(y0 ) = x0 .
f (g(y)) = y if y is close enough to y0 .
[Dg(y)] = [Df (g(y))]1 (follows from the chain rule)
6
Lecture outline
1. (Proof 11.1)
Let U Rn be an open set, and let f and g be functions from U to R.
Prove that if f and g are differentiable at a then so is f g, and that
[D(f g)(a)] = f (a)[Dg(a)] + g(a)[Df (a)].
(This is simpler than the version in Hubbard because both f and g are
scalar-valued functions)
2. (Chain rule in R2 not a proof, but still pretty convincing)
U R2 and V R2 are open sets, and a is a point in U at which we want
to evaluate a derivative.
g : U V is differentiable at a, and [Dg(a)] is a 2 2 Jacobian matrix.
f : V R2 is differentiable at g(a), and [Df (g(a))] is a 2 2 Jacobian
matrix.
The chain rule states that [D(f g)(a))] = [Df (g(a))] [Dg(a)].
Draw a diagram to illustrate what happens when you use derivatives to find
a linear approximation to f g)(a))]. This can be done in a single step or
in two steps.
3. (Proof 11.2) Using the mean value theorem, prove that if a function f :
R2 R has partial derivatives D1 f and D2 f that are continuous
at a, it is
differentiable at a and its derivative is the Jacobian matrix D1 f (a) D2 f (a) .
4. Newtons method
(a) One variable: Function f is differentiable. You are trying to solve the
equation f (x) = 0, and you have found a value a0 , close to the desired
x, for which f (a0 ) is small. Derive the formula a1 = a0 f (a0 )/f 0 (a0 )
for an improved estimate.
(b) n variables: U is an open subset of Rn , and function ~f (x) : U Rn is
differentiable. You are trying to solve the equation ~f (x) = ~0,
and you have found a value a0 , close to the desired x, for which ~f (a0 )
is small. Derive the formula
1
a1 = a0 [D
f (a0 )] ~f (a0 ).
cos u cos v
u
f
= cos u sin v
v
sin u
Work out the Cartesian coordinates of the point with sin u = 35 (37 degrees
North latitude) and sin v = 1(90 degrees East longitude), and calculate the
Jacobian matrix at that point. Then find the best affine approximation to
the Cartesian coordinates of the nearby point where u is 0.01 radians less
(going south) and v is 0.02 radians greater (going east).
10
11
12
13
14
x
3 6
2
2 y
0.03 0.36
19
(b) Use the derivative [Dg] =
to approximate g
0.55 0.6
10
15
Group Problems
1. Chain rule
(a) Chain rule for matrix functions
On smple problem 4, we obtained the differentiation formula for U (A) =
A2 by writing U = S T with S(A) = A2 , T (A) = A1 . Prove
the same formula from the chain rule in a different way, by writing
U = T S. You may reuse the formulas for the derivatives of S and
T:
If S(A) = A2 then [DS(A)](H) = AH + HA.
If T (A) = A1 then [DT (A)](H) = A1 HA1 .
(b) Let U R2 be the set of points whose coordinates
are both positive.
x
Suppose that f : U R can be written f
= (y/x), for some
y
differentiable : R R.
Show that f satisfies the partial differential equation
xD1 f + yD2 f = 0.
(c) Chain rule with 2 2 matrices
r
Start with a pair of polar coordinates
.
x
Function g converts them to Cartesian
.
y
x
2xy
Function f then converts
to
.
y
x2 y 2
r
r
r
Confirm that [D(f g)(
))] = [Df (g
)] [Dg
]
16
2. Issues of differentiability
(a) Let
x2 y 2
x
f
= 2
.
y
x + y2
0
f is defined to be 0 at
. State, in terms of limits, what it means
0
0
to say that f is differentiable at
and prove that its derivative
0
0
[Df
] is the zero linear transformation.
0
(b) Suppose that A is a matrix and S is the cubing function given by the
formula S(A) = A3 . Prove that the derivative of S(A) is
[DS(A)](H) = A2 H + AHA + HA2 .
The proof consists in showing that the length of the remainder goes
to zero faster than the length of the matrix H.
(c) A continuous but non-differentiable function
x2 y
0
x
,f
= 0.
f
= 2
0
y
x + y2
i. Show that both partial derivatives vanish at the origin, so that
the Jacobian matrix at the origin isthe
zero matrix [0 0], but
1
that the directional derivative along
is not zero. How does
1
this calculation show that the function is not differentiable at the
origin?
ii. For all points except the origin, the partial derivatives are given
by the formulas
x4 x2 y 2
2xy 3
x
x
,
D
f
=
D1 f
= 2
2
y
y
(x + y 2 )2
(x2 + y 2 )2
Construct a bad sequence of points approaching the origin to
show that D1 f is discontinuous at the origin.
17
18
Homework
1. (similar to group problem 1a)
We know the derivatives of the matrix-squaring function S and the matrixinversion function T :
If S(A) = A2 then [DS(A)](H) = AH + HA.
If T (A) = A1 then [DT (A)](H) = A1 HA1 .
(a) Use the chain rule to find a formula for the derivative of the function
U (a) = A4 .
(b) Use the chain rule to find a formula for the derivative of the function
W (a) = A4 .
2. (a) Hubbard, Exercise 1.7.21 (derivative of the determinant function).
This is really easy if you work directly from the definition of the derivative.
(b) Generalize this result to the 3 3 case. Hint: consider a matrix whose
columns are ~e1 + h~a1 , ~e2 + h~a2 , ~e3 + h~a3 , and use the definition of the
determinant as a triple product.
3. Hubbard, Exercise 1.8.6, part (b) only. In the case where f and g are
functions of time t, this formula finds frequent use in physics. You can
either do the proof as suggested in part (a) or model your proof on the one
for the dot product on page 143.
4. (similar to group problem 1b)
Hubbard, Exercise 1.8.9. The equation that you prove can be called a
first-order partial differential equation.
19
20
dt
p
.
(t2 + x2 )(t2 + y 2 )
It was revived in the late 20th century as the basis of the AGM (arithmeticgeometric mean) method for calculating . You can get 1 million digits with
a dozen or so iterations.
The function is meant to be composed with itself; so it will be appropriate
to compute the derivative of f f by the chain rule.
(a) f is differentiable whenever x and y are positive; so its derivative is
given by its Jacobian matrix. Calculate this matrix.
8
We choose to evaluate the derivative of f f at the point
.
2
8
5
Conveniently, f
=
. The chain rule says that
2
4
8
5
8
[D(f f )]
= [Df
][Df
].
2
4
2
Evaluate the two numerical Jacobian matrices. Because the derivative
of f is evaluated at two different points, they will not be the same.
(b) Write the formula for f f , compute and evaluate the lower left-hand
entry in its Jacobian matrix, and check that it agrees with the value
given by the chain rule.
7. (Related to group problem 3c)
The quintic equation x(x2 1)(x2 4) = 0 clearly has five real roots that
are all integers. So does the equation x(x2 1)(x2 4) 1 = 0, but you
have to find them numerically. Get all five roots using Newtons method,
carrying out enough iterations to get an error of less than .001. Use R to do
Newtons method and to check your answers. If you have R plot a graph,
it will be easy to find an initial guess for each of the five roots.
21
22
9. (a) Hubbard, problem 2.10.2. Make a sketch to show how this mapping
defines an alternative coordinate system for the plane, in which a point
is defined by the intersection of two hyperbolas.
(b) The point x = 3, y = 2 is specified in this new coordinate system
by the coordinates u = 6, v = 5. Use the derivative of the inverse
function to find approximate values of x and y for a nearby point
where u = 6.5, v = 4.5. (This is essentially one iteration of Newtons
method.)
(c) Find h such that the point u = 6 + h, v = 5.1 has nearly the same
x-coordinate as u = 6, v = 5.
(d) Find k such that the point x = 3 + k, y = 2.1 has nearly the same
u-coordinate as x = 3, y = 2.
(e) For this mapping, you can actually find a formula for the inverse function that works in the region of the plane where x, y, u, and v are all
positive. Find the rather messy formulas for x and y as functions of
u and v, and use them to answer the earlier questions. Once you calculate the Jacobian matrix and plug in appropriate numerical values,
you will be back on familiar ground.
I could get Mathematica Solve[] to find the inverse function only after
I eliminated y by hand. At this point the quadratic formula does the
job anyway!
23
R Scripts
Script 3.4A-ImplicitFunction.R
Topic 1 - Three variables, one constraint
Topic 2 - Three variables, two constraints
Script 3.4B-Manifolds2D.R
Topic 1 - A one-dimensional submanifold of R2 the unit circle
Topic 2 - Interesting examples from the textbook
Topic 3 - Parametrized curves in R2
Topic 4 - A two-dimensional manifold in R2
Topic 5 - A zero-dimensional manifold inR2
Script 3.4C-Manifolds3D.R
Topic 1 - A manifold as a function graph
Topic 2 - Graphing a parametrized manifold
Topic 3 - Graphing a manifold that is specified as a locus
Script 3.4D-CriticalPoints
Topic 1 - Behavior near a maximum or minimum
Topic 2 - Behavior near a saddle point
Script 3.5A-LagrangeMultiplier.R
Topic 1 - Constrained critical points in R2
1
1.1
Executive Summary
Implicit functions review of the linear case.
1.2
1.3
1.4
1.5
Parametrizing a manifold
1.6
1.7
Critical points
1.8
These are of great important in physics, economics, and other areas to which
mathematics is applied.
Consider a point c on manifold M where the function f : Rn R is differentiable. Perhaps f has a maximum or minimum at c when its value is compared to
the value at nearby points on M , even though there are points not on M where f
is larger or smaller. . In that case we should not consider all increment vectors,
but only those increment vectors ~v that lie in the tangent space to the manifold.
The derivative [Df (c)] does not have to be the zero linear transformation, but
it has to give zero when applied to any increment that lies in the tangent space
Tc M , or
Tc M Ker[Df (c)].
When manifold M is specified as the locus where some function F = 0, there
is an ingenious way of finding constrined critical points by using Lagrange multipliers,but not this week!
1.9
1.10
t0 t2
Proofs
1. Let W be an open subset of Rn , and let F : W Rnk be a C 1 mapping
such that F(c) = 0. Assume that [DF(c)] is onto.
Prove that the n variables can be ordered so that the first n k columns
of [DF(c)] are linearly independent, and that [DF(c)] = [A|B] where A is
an invertible (n k) (n k) matrix.
a
Set c =
, where a are the n k passive variables and b are the k active
b
variables.
Let g be the implicit function
from aneighborhood of b to a neighborhood
g(y)
of a such that g(b) = a and F
= 0.
y
Prove that [Dg(b)] = A1 B.
10
11
12
Sample Problems
1. A cometary-exploration robot is fortunate enough to land on an ellipsoidal
comet whose surface is described by the equation
x2 +
y2 z2
+
= 9.
4
9
13
14
3. Assume that, at the top level, there are nine categories x1 , x2 , , x9 in the
Federal budget. They must satisfy four constraints:
One simply fixes the total dollar amount.
One comes from your political advisors it makes the budget looks
good to likely voters in swing states.
One comes form Congress - it guarantees that everyone can have his
or her earmarks.
One comes from the Justice Department it guarantees compliiance
with all laws.
These four constraints together define a function F whose derivative is onto
for budgets that satisfy the constraints. The acceptable budgets, for which
F(x) = 0, form a k-dimensional submanifold M of Rn .
Specify the dimension of the domain and codomain foe
(a) A function g that specifies that passive variables in terms of the active
variables.
(b) The function F that specifies the constraints.
(c) A parametrization function that generates a valid budget from a set
of parameters.
For each alternative, specify the shape of the matrix that represents the
derivative of the relevant function and explain how, given a valid budget c,
it could be used to find a basis for the tangent space Tc M.
15
16
5. Critical points
x
f
= 21 x2 + 13 y 3 xy
y
Calculate the partial derivatives
as
of x and y, and show that the
functions
0
1
only critical points are
and
0
1
0
Find the eigenvalues of H0 and classify the critical point at
.
0
1
Find the eigenvalues of H1 and classify the critical point at
.
1
17
Group Problems
1. Implicitly defined functions
2
x
2
2
x
+
y
+
z
3
(a) The nonlinear equation F y =
= 0 implicitly
x2 + z 2 2
z
determines x and y as a function of z. The first equation describes a
sphere of radius 3, the second describes a cylinder of radius 2 whose
axis is the y-axis. The intersection is a circle in the plane y = 1.
Near the point x = 1, y = 1, z = 1, there is a function that expresses
the two
passive variables
x and y in terms of the active variable z.
2
2z
.
g(z) =
1
Calculate g0 (z) and determine the numerical value of g0 (1)
Then get the same answer without using the function g by forming
the Jacobian matrix [DF] evaluating it at x = y = z = 1, and using
the implicit function theorem to determine g0 (z) = A1 [B].
(b) Dean Smith is working on a budget in which he will allocate x to the
library, y to pay raises, and z to the Houses. He is constrained.
The Library Committee, happy to see anyone get more funds as long
as the library does even better, insists that x2 y 2 z 2 = 1.
The Faculty Council, content to see the Houses do well as long as other
areas benefit equally, recommends that x + y 2z = 1.
To comply with these constraints, the dean tries x = 3, y = 2, z = 2.
Given theconstraints,
x and y are determined by an implicitly defined
x
function
= g(z).
y
Use the implicit function theorem to calculate g0 (2), and use it to find
approximate values of x and y if z increased to 2.1.
x
18
y
F
= xz y = 0 near the point c = 2.
z
1
It can also
described parametrically by
be2
s
s
= st2 near s = 2, t = 1.
t
t4
i. Use the parametrization to find a basis for the tangent space Tc M.
ii. Use the function F to confirm that your basis vectors are indeed
in the tangent space Tc M.
iii. Use the parametrization to do a wireframe plot of the parametrized
manifold near s = 2, t = 1. See script 3.4C, topic 2.
x
19
3. Critical points (rigged to make the algebra work, but you should also plot
contour lines in R and use them to find the critical points)
Calculate the Jacobian matrix and the Hessian both by using R and with
pencil and paper.
x
(a) i. Find the one and only critical point of f
= 4x2 + 12 y 2 + x82 y
y
on the square 14 x 4, 41 y 4.
ii. Use second derivatives (the Hessian matrix) to determine whether
this critical point is a maximum, minimum, or neither.
x
(b) The domain of the function F
= y 2 + (x2 3x) log y is the upper
y
half-plane y > 0. Find all the critical points of F , and use the Hessian
matrix to classsify each as maximum, minimum, or saddle point.
x
(c) The function F
= x2 y 3xy + 12 x2 + y 2 has three critical points,
y
two of which lie on the line x = y. Find each and use the Hessian
matrix to classify it as maximum, minimum, or saddle point.
20
Although all of these problems except the last one were designed so that they
could be done with pencil and paper, it makes sense to do a lot of them in R,
and the Week 12 scripts provide good models. For each problem that you choose
to do in R, include a see my script reference in the paper version. Put all your
R solutions into a single script, and upload it to the homework dropbox on the
week 12 page.
When you use R, you will probably want to include some graphs that are not
required by the statement of the problem.
Do appreciate that problems 3 and 4, which use only androgynous names, are
sexual-orientation neutral as well as gender-neutral and avoid the use of thirdperson singular pronouns.
1. (Hubbard, exercise 3.12)
Let X R3 be the set of midpoints of segments joining a point of the
curve C1 of equation y = x2 , z = 0 to a point of the curve C2 of equation
z = y 2 , x = 0.
(a) Parametrize C1 and C2 .
(b) Parametrize X.
(c) Find an equation for X (i.e. describe X as a locus)
(d) Show that X is a smooth surface.
x
21
3. Pat and Terry are in charge of properties for the world premiere of the
student-written opera Goldfinger at Dunster House. In the climactic
scene the anti-hero takes the large gold brick that he has made by melting
down chalices that he stole from the Vatican Museum and places it in a
safety deposit box in a Swiss bank while singing the aria Papal gold, now
rest in peace.
The gold brick is supposed to have length x = 8, height y = 2, and width
z = 4. With these dimensions in mind, Pat and Terry have spent their
entire budget on 112 square inches of gold foil and 64 cubic inches of an
alloy that melts at 70 degrees Celsius. They plan to fabricate the brick by
melting the alloy in a microwave oven and casting it in a sand mold.
Alas, the student mailboxes that they have borrowed to simulate safetydeposit boxes turn out to be not quite 4 inches wide. Fortunately, the
equation
x
xyz
64
F y =
=0
xy + xz + yz 56
z
specifies x and y implicitly in terms of z.
(a) Use the implicit function
theorem to find [Dg(4)], where g is the funcx
tion that specifies
in terms of z, and find the approximate dimeny
sions of a brick with the same volume and surface area as the original
but with a width of only 3.9 inches.
(b) Show that if the original dimensions had been x = 2, y = 2, z = 16,
then the constraints of volume 64, surface area 136 specify y and z in
terms of x but fail to specify x and y in terms of z.
(c) Show that if the original brick had been a cube with x = y = z = 4,
then, with the constraints of volume 64, surface area 96, we cannot
show the existence of any implicit function. In fact there is no implicit
function, but our theorem does not prove that fact. This happens
because this cube has minimum surface area for the given volume.
22
= 0.
F =
(x1 x2 )2 + (y1 y2 )2 25
x2
y2
One solution to this equation is x1 = 3, y1 = 4, x2 = 0, y2 = 8.
(You can build a model with a couple of ball-point pens and some Scotch
tape).
(a) Show that near the given solution, the constraint equation specifies x1
and y1 as a function of x2 and y2 , but not vice-versa.
(b) Calculate the derivative of the implicit function and show that it is not
onto. Determine in what direction the plutonium container will move
if x2 and y2 are both increased by equal small amounts (or changed
in any other way.) This system is not really satisfactory, because the
plutonium container can move only along a circle.
(c) Casey and Chris come up with a new design in which one spear has its
end confined to the x-axis (coordinate x2 can be changed, but y2 = 0).
The other spear has its end confined to the y-axis (coordinate y3 can
be changed, but x3 = 0). For this new setup, one solution is x1 = 3,
y1 = 4, x2 = 6, y3 =0. Show that x1 and y1 are now specified locally
x
by a function ~g 2 . Calculate [Dg] and show that it is onto.
y3
(d) Are x2 and
y3, near the same solution, now specified locally by a
x1
function ~f
? If so, what is [Df ]?
y1
(e) For the new setup, another solution is x1 = 3, y1 = 4, x2 = 6, y3 = 8.
Show that in this case, although [DF] is onto, the choice of x1 and y1
as passive
variables
is not possible, and there is no implicitly defined
x2
function ~g
as there was in part (c). Draw a diagram to illustrate
y3
what is the problem.
23
F =
= 0.
z
3x + 2y + z 2t 2
t
(a) Show that this surface is a smooth 2-dimensional manifold.
(b) One point on the manifold is x = 1, y = 2, z = 3, t = 4. Near this
point the manifold is the graph of a function g that expresses x and y
as functions of z and t. Using the implicit function theorem, determine
[Dg] at the point z = 3, t = 4.
6. Consider
specified by the parametrization
the
manifold
x
t + et
, < t < .
g(t) =
=
t + e2t
y
Find where it intersects the line 2x+y = 10. You can get an initial estimate
by using the graph in script 3.4B, then use Newtons method to improve
the estimate.
24
x
sec u
y = u = tan u cos v
v
z
tan u sin v
If you use R, you can do a wireframe plot the same way that the sphere
was plotted in script 3.4C, topic 2.
(a) Find the coordinates of the point c on this manifold for which
u = 4 , v = 2 .
(b) Find the equation of the tangent plane Tc X as the image of [D 4 ].
2
x
(c) Find an equation F y = 0 that describes the same manifold near
z
c, and find the equation of the tangent plane Tc X as the kernel of
[DF(c)].
y
(d) Find an equation x = g
that describes the same manifold near
z
c, and
find the equation of the tangent plane Tc X as the graph of
0
[Dg
].
1
8. Hubbard, Exercise 3.6.2. This is the only problem of this genre on the
homework that can be done with pencil and paper, but you must be prepared to do one like it on the final exam!
9. Here is another function that has one maximum, one miminum, and two
saddle points, for all of which x and y are less than 3 in magnitude.
x
f
= x3 y 3 + 2xy 5x + 6y.
y
Locate and classify all four critical points using R, in the manner of script
3.4D. A good first step is to plot contour lines with x and y ranging from
-3 to 3. If you do
contour(x,y,z, nlevels = 20)
you will learn enough to start zooming in on all four critical points.
An alternative, more traditional, approach is to take advantage of the fact
that the function f is a polynomial. If you set both partial derivatives equal
to zero, you can eliminate either x or y from the resulting equations, then
find approximate solutions by plotting a graph of the resulting fourth-degree
polynomial in x or y.
25