数值分析
数值分析
数值分析
Qinghai Zhang
张庆海
Fall 2021
2021年秋季学期
Contents
2 Polynomial Interpolation 9
2.1 The Vandermonde determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 The Cauchy remainder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 The Lagrange formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 The Newton formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 The Neville-Aitken algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6 The Hermite interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.7 The Chebyshev polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.8 The Bernstein polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.9.1 Theoretical questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.9.2 Programming assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Splines 18
3.1 Piecewise-polynomial splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 The minimum properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Error analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 B-Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4.1 Truncated power functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4.2 The local support of B-splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4.3 Integrals and derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.4 Marsden’s identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4.5 Symmetric polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.6 B-splines indeed form a basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.7 Cardinal B-splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 Curve fitting via splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.6.1 Theoretical questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.6.2 Programming assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 Computer Arithmetic 31
4.1 Floating-point number systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Rounding error analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.1 Rounding a single number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.2 Binary floating-point operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.3 The propagation of rounding errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Accuracy and stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3.1 Avoiding catastrophic cancellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3.2 Backward stability and numerical stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3.3 Condition numbers: scalar functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
i
CONTENTS ii
5 Approximation 41
5.1 Orthonormal systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2 Fourier expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.3 The normal equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.4 Discrete least squares (DLS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4.1 Gaussian and Dirac delta functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.4.2 Reusing the formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.4.3 DLS via normal equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.4.4 DLS via QR decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.5.1 Theoretical questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.5.2 Programming assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
B Linear Algebra 61
B.1 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
B.1.1 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
B.1.2 Span and linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
B.1.3 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
B.1.4 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
B.2 Linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
B.2.1 Null spaces and ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
B.2.2 The matrix of a linear map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
B.2.3 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
B.3 Eigenvalues, eigenvectors, and invariant subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
B.3.1 Invariant subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
B.3.2 Upper-triangular matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
B.3.3 Eigenspaces and diagonal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
B.4 Inner product spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
B.4.1 Inner products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
B.4.2 Norms induced from inner products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
B.4.3 Norms and induced inner-products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
B.4.4 Orthonormal bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
B.5 Operators on inner-product spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
B.5.1 Adjoint and self-adjoint operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
B.5.2 Normal operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
B.5.3 The spectral theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
B.5.4 Isometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
B.5.5 The singular value decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
B.6 Trace and determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
CONTENTS iii
C Basic Analysis 73
C.1 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
C.1.1 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
C.1.2 Limit points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
C.2 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
C.3 Continuous functions on R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
C.4 Differentiation of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
C.5 Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
C.6 Riemann integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
C.7 Convergence in metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
C.8 Vector calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
D Point-set Topology 81
D.1 Topological spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
D.1.1 A motivating problem from biology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
D.1.2 Generalizing continuous maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
D.1.3 Open sets: from bases to topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
D.1.4 Topological spaces: from topologies to bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
D.1.5 Generalized continuous maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
D.1.6 The subbasis topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
D.1.7 The topology of phenotype spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
D.1.8 Closed sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
D.1.9 Interior–Frontier–Exterior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
D.1.10 Hausdorff spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
D.2 Continuous maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
D.2.1 The subspace/relative topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
D.2.2 New maps from old ones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
D.2.3 Homeomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
D.3 A zoo of topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
D.3.1 Hierarchy of topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
D.3.2 The order topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
D.3.3 The product topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
D.3.4 The metric topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
D.4 Connectedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
D.5 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
E Functional Analysis 97
E.1 Normed and Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
E.1.1 Metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
E.1.2 Normed spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
E.1.3 The topology of normed spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
E.1.4 Bases of infinite-dimensional spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
E.1.5 Sequential compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
E.1.6 Continous maps of normed spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
E.1.7 Norm equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
E.1.8 Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
E.2 Continuous linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
E.2.1 The space CL(X, Y ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
E.2.2 The topology of CL(X, Y ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
E.2.3 Invertible operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
E.2.4 Series of operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
E.2.5 Uniform boundedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
E.3 Dual spaces of normed spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
E.3.1 The Hahn-Banach theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
E.3.2 Bounded linear functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Preface
This book comes out from my teaching of the course “Numerical Analysis” (formerly “Numerical Ap-
proximation”) in the spring semesters of 2018, 2019, and 2020 and in the fall semesters of 2016 and 2021
at the school of mathematical sciences in Zhejiang University.
In writing this book, I have made special efforts to
• collect the prerequisites in the appendices so that students can quickly brush up on the preliminaries,
• emphasize the connection between numerical analysis and other branches of mathematics such as
elementary analysis and linear algebra,
• arrange the contents carefully with the hope that the total entropy of this book is minimized,
• encourage the student to understand the motivations of definitions, to formally verify all major
theorems on her/his own, to think actively about the contents, to relate mathematical theory to
realworld physics, and to form a habit to tell logical and coherent stories.
In the whole progress of my teaching, many students asked for clarifications, pointed out typos,
reported errors, raised questions, and suggested improvements. Each and every comment contributed
to a better writing and/or teaching, be it small or big, negative or positive, subjective or objective.
关于数学学习的几点建议
A. 深入理解每一个知识点:证明或推导的每一步从哪里来的?争取做到“无一处无出处”,这有助于
培养缜密的逻辑思维能力。
B. 寻找新内容和已知内容或其他数学分支之间的联系。我们学习数值分析已经用到的其它分支包括
分析基础和线性代数等。学习的本质是把新内容和已经牢固掌握的知识联系起来!
C. 深入思考每一个知识点:一个定义捕捉到了什么?一个定理是否可以弱化条件?如果不能的话这
些条件在证明中是在哪里出现的?作用是什么?一个定理的结论是否可以再加强?如果不能原因
是什么?一个数学方法的适用范围是什么?局限性在哪里?
D. 精准识记核心的定义定理,再以一定的逻辑关系把相关知识点串成一个故事,这些关系可以包括
继承、组合、蕴含、特例等;构建这样一个脉络的目的是使自己知识体系的熵(混乱度)最小。
E. 在完成知识体系构建的基础上尽可能地多做习题,但是构建知识体系永远比做题本身重要。
F. 将新知识以一种和已有知识相容的方式纳入自己的知识体系。学数学的过程是盖一座大楼不是在
一个平面上搭很多帐篷;一座大楼的高度取决于基础以及每一层的坚固度。
G. 任何一门数学都包括内容和形式,两者相互依赖,互为补充。
H. “骐骥一跃,不能十步;驽马十驾,功在不舍。锲而舍之,朽木不折;锲而不舍,金石可镂。”
One baby step at a time!
Do the simpliest thing that could possibly work, then keep asking more and refining your answers.
I. “一阴一阳之谓道,继之者善也,成之者性也。仁者见之谓之仁,知者见之谓之知,百姓日用不
知,故君子之道鲜矣!” —— 《易经系辞上》
J. “Think globally, act locally.”
K. “重剑无锋,大巧不工”—— 《神雕侠侣》
Chapter 1
1
Qinghai Zhang Numerical Analysis 2021
where B1 = [α − δ1 , α + δ1 ]. Define
1.5 Newton’s method maxx∈B1 |f 00 (x)|
M=
2 minx∈B1 |f 0 (x)|
Algorithm 1.14. Newton’s method finds the root of f :
R → R near an initial guess x0 by the iteration formula and pick x0 sufficiently close to α such that
(i) |x0 − α| = δ0 < δ1 ;
f (xn )
xn+1 = xn − 0 , n ∈ N. (1.7) (ii) M δ0 < 1.
f (xn )
2
Qinghai Zhang Numerical Analysis 2021
3
Qinghai Zhang Numerical Analysis 2021
the second equation of which yields (1.14). and we have from Lemma 1.23
f [xn−1 , xn ] − f [xn , α]
= g 0 (β) (1.20) F
En+1 = E1 n+1 E0Fn mF F2
1 · · · mn−1 mn ,
n F1
xn−1 − α
where Fn is a Fibonacci number as in Definition 1.20. Then
for some β between xn−1 and α. Compute the derivative of
0 En+1
g (β) from (1.17), use the Lagrangian remainder Theorem F −r F F −r F F −r F F −r F
=E1 n+1 0 n E0 n 0 n−1 m1 n 0 n−1 m2 n−1 0 n−2
C.60, and we have Enr0
F3 −r0 F2 F2 −r0 F1 F1
· · · mn−2 mn−1 mn
f [xn−1 , xn ] − f [xn , α] f 00 (ζn )
= (1.21) r1n r1n−1 r1n−1 r1n−2 r11
xn−1 − α 2 =E1 E0 m1 m2 · · · mn−1 m1n , (1.23)
for some ζn between min(xn−1 , xn , α) and max(xn−1 , xn , α). where the second step follows from Corollary 1.22. (1.22)
The proof is completed by substituting (1.19) and (1.21) into and the convergence we just proved yield
(1.18).
lim mn = mα , (1.24)
n→+∞
Theorem 1.24 (Convergence of the secant method). Con-
sider a C 2 function f : B → R on B = [α − δ, α + δ] satisfying which means
f (α) = 0 and f 0 (α) 6= 0. If both x0 and x1 are chosen suf-
1
00
ficiently close to α and f (α) 6= 0, then the iterates {xn } ∃N ∈ N s.t. ∀n > N, mn ∈ m α , 2mα . (1.25)
2
in the secant
√ method converges to the root α with order
p = 12 (1 + 5) ≈ 1.618. We define
rn r n−1 r n−1 r n−2 r n−N
Proof. The continuity of f 0 and the assumption f 0 (α) 6= 0 A := E1 1 · E0 1 m11 · m21 · · · mN1
yield r n−N −1 r n−N −2 r1
∃δ1 ∈ (0, δ) s.t. ∀x ∈ B1 , f 0 (x) 6= 0 B := mN1+1 · mN1+2 · · · mn−1
1
· m1n
4
Qinghai Zhang Numerical Analysis 2021
so that EEn+1
r0 = AB. Since |r1 | < 1, we have limn→∞ A = 1. 1.7 Fixed-point iterations
n
As for B, we have from (1.25)
Definition 1.26. A fixed point of a function g is an inde-
B ≤ (2mα )
1+r11 +r12 +···+r1n−N
, pendent parameter of g satisfying g(α) = α.
Example 1.27. A fixed point of f (x) = x2 −3x+4 is x = 2.
and then
Lemma 1.28. If g : [a, b] → [a, b] is continuous, then g has
En+1 at least one fixed point in [a, b].
lim r = lim A lim B
n→∞ En0 n→∞ n→∞
1 1 Proof. The function f (x) = g(x) − x satisfies f (a) ≥ 0 and
= lim B ≤ (2mα ) 1−r1 = (2mα ) r0 .
n→∞ f (b) ≤ 0. The proof is then completed by the intermediate
value theorem (Theorem C.39).
The proof is then completed by Definition 1.10.
Exercise 1.29. Let A = [−1, 0)∪(0, 1]. Give an example of
Corollary 1.25. Consider solving f (x) = 0 near a root α. a continuous function g : A → A that does not have a fixed
Let m and sm be the time to evaluate f (x) and f 0 (x) respec- point. Give an example of a continuous function f : R → R
tively. The minimum time to obtain the desired absolute that does not have a fixed point.
accuracy with Newton’s method and the secant method
are respectively Theorem 1.30 (Brouwer’s fixed point). Any continuous
function f : Dn → Dn with
TN = (1 + s)mdlog2 Ke, (1.26)
Dn := {x ∈ Rn : kxk ≤ 1}
TS = mdlogr0 Ke, (1.27)
√ 00 has a fixed point.
f (α)
where r0 = 1+2 5 , c = 2f 0 (α) ,
Example 1.31. Take a map of your country C and place it
log c on the ground of your room. Let f be the function assigning
K= , (1.28) to each point in your country the point on the map corre-
log c|x0 − α|
sponding to it. Then f can be considered as a continuous
2
and d·e denotes the rounding-up operator, i.e. it rounds function C → C. If C is homeomorphic to D , then there
towards +∞. must exist a point on the map that corresponds exactly to
the point on the ground directly beneath it.
1 2n
Proof. We showed |xn − α| ≤ M (M |x0 − α|) in proving
Theorem 1.15. Denote En = |xn − α|, we have Exercise 1.32. Take two pieces of the same-sized paper and
lay one on top of the other. Every point on the top sheet
n
M En ≤ (M E0 )2 . of paper is associated with some point right below it on the
bottom sheet. Crumple the top sheet into a ball without
Let i ∈ N+ denote the smallest number of iterations such ripping it. Place the crumpled ball on top of (and simulta-
i
that the desired accuracy is satisfied, i.e. (M E0 )2 ≤ M . neously within the realm of) the bottom sheet of paper. Use
When is sufficiently small, M → c. Hence we have Theorem 1.30 to prove that there always exists some point
in the crumpled ball that sits above the same point it sat
i = dlog2 Ke. above prior to crumpling.
For each iteration, Newton’s method incurs one function Definition 1.33. A fixed-point iteration is a method for
evaluation and one derivative evaluation, which cost time m finding a fixed point of g with a formula of the form
and sm, respectively. Therefore (1.26) holds.
For the secant method, assume M E0 ≥ M E1 . By the xn+1 = g(xn ), n ∈ N. (1.29)
proof of Theorem 1.24, we have Example 1.34. Newton’s method is a fixed-point iteration.
n+1
√
M En ≤ (M E0 )r0 / 5
. Exercise 1.35. To calculate the square root of some posi-
tive real number a, we can formulate the problem as finding
Let j ∈ N+ denote the smallest number of iterations√such the root of f (x) = x2 − a. For a = 1, the initial guess
that the desired accuracy is satisfied, i.e. r0j ≤ r05 K. of x0 = 2, and the three choices of g1 (x) := x2 + x − a,
Hence g2 (x) := xa , and g3 (x) := 21 (x + xa ), verify that g1 diverges,
& √ ' g2 oscillates, g3 converges. The theorems in this section will
5 explain why.
j = logr0 K + logr0 ≤ logr0 K + 1.
r0
Definition 1.36. A function f : [a, b] → [a, b] is a contrac-
Since the first two values x and x are given in the secant tion or contractive mapping on [a, b] if
0 1
method, the least number of iterations is dlogr0 Ke (com- ∃λ ∈ [0, 1) s.t. ∀x, y ∈ [a, b], |f (x)−f (y)| ≤ λ|x−y|. (1.30)
pare to Newton’s method!). Finally, only the function value
f (xn ) needs to be evaluated per iteration because f (xn−1 ) Example 1.37. Any linear function f (x) = λx + c with
has already been evaluated in the previous iteration. 0 ≤ λ < 1 is a contraction.
5
Qinghai Zhang Numerical Analysis 2021
Theorem 1.38 (Convergence of contractions). If g(x) is a Proof. By Corollary 1.40, the fixed-point iteration converges
continuous contraction on [a, b], then it has a unique fixed uniquely to α because g 0 (α) = 0. By the Taylor expansion
point α in [a, b]. Furthermore, the fixed-point iteration of g at α, we have
(1.29) converges to α for any choice x0 ∈ [a, b] and
Eabs (xn+1 ) := |xn+1 − α| = |g(xn ) − g(α)|
λn p−1
|xn − α| ≤ |x1 − x0 |. (1.31) X (x − α)i
n (x n − α) p
1−λ =
(i)
g (α) + (p)
g (ξ)
i=1
i! p!
Proof. By Lemma 1.28, g has at least one fixed point in
[a, b]. Suppose there are two distinct fixed points α and for some ξ ∈ [a, b]. Since g (p) is continuous on [a, b], Theorem
β, then |α − β| = |g(α) − g(β)| ≤ λ|α − β|, which implies C.48 implies that g (p) is bounded on [a, b]. Hence there ex-
|α − β| ≤ 0, i.e. the two fixed points are identical. ists a constant M such that Eabs (xn+1 ) < M Eabs p
(xn ).
By Definition 1.36, xn+1 = g(xn ) implies that all xn ’s
stay in [a, b]. To prove convergence, Example 1.42. The following method has third-order con-
√
vergence for computing R:
|xn+1 − α| = |g(xn ) − g(α)| ≤ λ|xn − α|.
xn (x2n + 3R)
By induction and the triangle inequality, xn+1 = .
3x2n + R
|xn − α| ≤ λn |x0 − α| √ x(x2 +3R)
n First, R is the fixed point of F (x) = 3x2 +R :
≤ λ (|x1 − x0 | + |x1 − α|)
≤ λn (|x1 − x0 | + λ|x0 − α|). √ √R(R + 3R) √
F R = = R.
3R + R
From the first and last right-hand sides (RHSs), we have
1
|x0 − α| ≤ 1−λ |x1 − x0 |, which yields (1.31). Second, the derivatives of F (x) are
√
Theorem 1.39. Consider g : [a, b] → [a, b]. If g ∈ C 1 [a, b] n F (n) (x) F (n) ( R)
and λ = maxx∈[a,b] |g 0 (x)| < 1, then g has a unique fixed 1 3(x2 −R)2
0
(3x2 +R)2
point α in [a, b]. Furthermore, the fixed-point iteration 48Rx(x2 −R)
(1.29) converges to α for any choice x0 ∈ [a, b], the error 2 (3x2 +R)3 0
−48R(9x4 −18Rx2 +R2 ) −48R(−8R2 ) 3
bound (1.31) holds, and 3 (3x2 +R)4 (4R)4 = 2R 6= 0
xn+1 − α
lim = g 0 (α). (1.32) The rest follows from Corollary 1.41.
n→∞ xn − α
Proof. The mean value theorem (Theorem C.51) implies 1.8 Problems
that, for all x, y ∈ [a, b], |g(x) − g(y)| ≤ λ|x − y|. Theo-
rem 1.38 yields all the results except (1.32), which follows 1.8.1 Theoretical questions
from
I. Consider the bisection method starting with the ini-
xn+1 − α = g(xn ) − g(α) = g 0 (ξ)(xn − α), tial interval [1.5, 3.5]. In the following questions “the
interval” refers to the bisection interval whose width
lim xn = α, and the fact that ξ is between xn and α. changes across different loops.
Corollary 1.40. Let α be a fixed point of g : R → R with • What is the width of the interval at the nth step?
|g 0 (α)| < 1 and g ∈ C 1 (B) on B = [α − δ, α + δ] with some
• What is the maximum possible distance between
δ > 0. If x0 is chosen sufficiently close to α, then the results
the root r and the midpoint of the interval?
of Theorem 1.38 hold.
Proof. Choose λ so that |g 0 (α)| < λ < 1. Choose δ0 ≤ δ II. In using the bisection algorithm with its initial interval
so that maxx∈B0 |g 0 (x)| ≤ λ < 1 on B0 = [α − δ0 , α + δ0 ]. as [a0 , b0 ] with a0 > 0, we want to determine the root
Then g(B0 ) ⊂ B0 and applying Theorem 1.39 completes the with its relative error no greater than . Prove that this
proof. goal of accuracy is guaranteed by the following choice
of the number of steps,
Corollary 1.41. Consider g : [a, b] → [a, b] with a fixed
point g(α) = α ∈ [a, b]. The fixed-point iteration (1.29) log(b0 − a0 ) − log − log a0
n≥ − 1.
converges to α with pth-order accuracy (p > 1, p ∈ N) for log 2
any choice x0 ∈ [a, b] if
III. Perform four iterations of Newton’s method for the
g ∈ C p [a, b], polynomial equation p(x) = 4x3 − 2x2 + 3 = 0 with
∀k = 1, 2, . . . , p − 1, g (k) (α) = 0, (1.33) the starting point x0 = −1. Use a hand calculator and
(p)
g (α) 6= 0. organize results of the iterations in a table.
6
Qinghai Zhang Numerical Analysis 2021
IV. Consider a variation of Newton’s method in which only • x−1 − tan x on [0, π2 ],
the derivative at x0 is used, • x−1 − 2x on [0, 1],
• 2−x + ex + 2 cos x − 6 on [1, 3],
f (xn )
xn+1 = xn − . • (x3 + 4x2 + 3x + 5)/(2x3 − 9x2 + 18x − 2) on [0, 4].
f 0 (x0 )
C. Test your implementation of Newton’s method by solv-
Find C and s such that ing x = tan x. Find the roots near 4.5 and 7.7.
en+1 = Cesn , D. Test your implementation of the secant method by the
following functions and initial values.
where en is the error of Newton’s method at step n,
s is a constant, and C may depend on xn , the given • sin(x/2) − 1 with x0 = 0, x1 = π
2,
function f and its derivatives. • ex − tan x with x0 = 1, x1 = 1.4,
V. Within (− π2 , π2 ), will the iteration xn+1 = tan−1 xn • x3 − 12x2 + 3x + 1 with x0 = 0, x1 = −0.5.
converge? You should play with other initial values and (if you get
different results) think about the reasons.
VI. Let p > 1. What is the value of the following continued
fraction? E. As shown below, a trough of length L has a cross section
1
x= in the shape of a semicircle with radius r. When filled to
p + p+ 1 1 within a distance h of the top, the water has the volume
p+...
Prove that the sequence of values converges. (Hint: h 1
V = L 0.5πr2 − r2 arcsin − h(r2 − h2 ) 2 .
this can be interpreted as x = limn→∞ xn , where r
x1 = p1 , x2 = p+1 1 , x3 = p+ 1 1 , . . ., and so forth.
p p+ 1
p
Formulate x as a fixed point of some function.)
7
Qinghai Zhang Numerical Analysis 2021
where (b) Use Newton’s method to find α with the initial guess
A = l sin β1 , B = l cos β1 , 33◦ for the situation when l, h, β1 are the same as in
C = (h + 0.5D) sin β1 − 0.5D tan β1 , part (a) but D = 30 in..
(c) Use the secant method (with another initial value as
E = (h + 0.5D) cos β1 − 0.5D.
far away as possbile from 33◦ ) to find α. Show that
(a) Use Newton’s method to verify α ≈ 33◦ when l = you get a different result if the initial value is too far
89 in., h = 49 in., D = 55 in. and β1 = 11.5◦ . away from 33◦ ; discuss the reasons.
8
Chapter 2
Polynomial Interpolation
Definition 2.1. Interpolation constructs new data points that the coefficient of xn is det V (x0 , x1 , . . . , xn−1 ). Hence
within the range of a discrete set of known data points, usu- we have
ally by generating an interpolating function whose graph
n−1
goes through all known data points. Y
U (x) = det V (x0 , x1 , . . . , xn−1 ) (x − xi ),
Example 2.2. The interpolating function may be piece- i=0
Definition 2.3. For n + 1 given points x0 , x1 , . . . , xn ∈ R, An induction based on U (x0 , x1 ) = x1 − x0 yields (2.2).
the associated Vandermonde matrix V ∈ R(n+1)×(n+1) is 通过行列式的性质分析得
Theorem 2.5 (Uniqueness of polynomial interpolation).
1 x0 · · · xn0
Given distinct points x0 , x1 , . . . , xn ∈ C and corresponding
1 x1 · · · xn1
V (x0 , x1 , . . . , xn ) = . values f0 , f1 , . . . , fn ∈ C. Denote by Pn the class of polyno-
.. .. . (2.1)
.. ..
. . . mials of degree at most n. There exists a unique polynomial
1 xn ··· xnn pn (x) ∈ Pn such that 多项式插值唯一
9
Qinghai Zhang Numerical Analysis 2021
Theorem 2.7 (Cauchy remainder of polynomial interpola- where the fundamental polynomial for pointwise interpo-
tion). Let f ∈ C n [a, b] and suppose that f (n+1) (x) exists at lation (or elementary Lagrange interpolation polynomial )
each point of (a, b). Let pn (f ; x) denote the unique polyno- `k (x) is
n
mial in Pn that coincides with f at x0 , x1 , . . . , xn . Define Y x − xi
`k (x) = . (2.9)
xk − xi
Rn (f ; x) := f (x) − pn (f ; x) (2.5) i6 = k;i=0
In particular, for n = 0, `0 = 1.
as the Cauchy remainder of the polynomial interpolation.
If a ≤ x0 < x1 < · · · < xn ≤ b, then there exists some Example 2.11. For i = 0, 1, 2, we are given xi = 1, 2, 4
ξ ∈ (a, b) such that and f (xi ) = 8, 1, 5, respectively. The Lagrangian formula
generates p2 (x) = 3x2 − 16x + 21.
(n+1) n
f (ξ) Y
Rn (f ; x) = (x − xi ) (2.6) Lemma 2.12. Define a symmetric polynomial
(n + 1)! i=0
(
1, n = 0;
where the value of ξ depends on x, x0 , x1 , . . . , xn , and f . πn (x) = Qn−1 (2.10)
i=0 (x − xi ), n > 0.
Proof. Since f (xk ) = pn (f ; xk ), the remainder Rn (f ; x) van-
ishes at xk ’s. Fix x 6= x0 , x1 , . . . , xn and define Then for n > 0 the fundamental polynomial for pointwise
interpolation can be expressed as
f (x) − pn (f ; x)
K(x) = Qn πn+1 (x)
i=0 (x − xi ) ∀x 6= xk , `k (x) = . (2.11)
0
(x − xk )πn+1 (xk )
and a function of t
0
n Proof. By the chain rule, πn+1 (x) is the summation of n + 1
terms, each of which is a product of n terms. When x is re-
Y
W (t) = f (t) − pn (f ; t) − K(x) (t − xi ).
i=0 placed with x k , all of the n + 1 terms vanish except one.
The function W (t) vanishes at t = x0 , x1 , . . . , xn . In addi- Lemma 2.13 (Cauchy relations). The fundamental poly-
tion W (x) = 0. By Theorem 2.6, W (n+1) (ξ) = 0 for some nomials `k (x) satisfy the Cauchy relations as follows.
ξ ∈ (a, b), i.e. n
X
`k (x) ≡ 1 (2.12)
0 = W (n+1) (ξ) = f (n+1) (ξ) − (n + 1)!K(x).
k=0
n
Hence K(x) = f (n+1) (ξ)/(n + 1)! and (2.6) holds. ∀j = 1, . . . , n,
X
(xk − x)j `k (x) ≡ 0 (2.13)
n+1 k=0
Corollary 2.8. Suppose f (x) ∈ C [a, b]. Then
n Proof. By Theorems 2.5 and 2.7, for each q(x) ∈ Pn we
Mn+1 Y Mn+1 have pn (q; x) ≡ q(x). Interpolating the constant function
|Rn (f ; x)| ≤ |x − xi | ≤ (b − a)n+1 ,
(n + 1)! i=0 (n + 1)! f (x) ≡ 1 with the Lagrange formula yields (2.12). Simi-
(2.7) larly, (2.13) can be proved by interpolating the polynomial
where Mn+1 = maxx∈[a,b] f (n+1) (x). 误差控制 q(u) = (u − x)j for each j = 1, . . . , n with the Lagrange
formula.
Example 2.9. A value for arcsin(0.5335) is obtained by in-
terpolating linearly between the values for x = 0.5330 and
x = 0.5340. Estimate the error committed. 2.4 The Newton formula
Let f (x) = arcsin(x). Then
3 5
Definition 2.14 (Divided difference and the Newton for-
f 00 (x) = x(1 − x2 )− 2 , f 000 (x) = (1 + 2x2 )(1 − x2 )− 2 . mula). The Newton formula for interpolating the values
f0 , f1 , . . . , fn at distinct points x0 , x1 , . . . , xn is
Since the third derivative is positive over [0.5330, 0.5340].
n
The maximum value of f 00 occurs at 0.5340. By Corollary X
2.8 we have |R1 | ≤ 4.42 × 10−7 . The true error is about pn (x) = ak πk (x), (2.14)
k=0
1.10 × 10−7 .
where πk is defined in (2.10) and the kth divided difference
ak is defined as the coefficient of xk in pk (f ; x) and is de-
2.3 The Lagrange formula noted by f [x0 , x1 , . . . , xk ] or [x0 , x1 , . . . , xk ]f . In particular,
Definition 2.10. To interpolate given values f0 , f1 , . . . , fn f [x0 ] = f (x0 ).
at distinct points x0 , x1 , . . . , xn , the Lagrange formula is Corollary 2.15. Suppose (i0 , i1 , i2 , . . . , ik ) is a permutation
n
X of (0, 1, 2, . . . , k). Then
pn (x) = fk `k (x), (2.8)
k=0
f [x0 , x1 , . . . , xk ] = f [xi0 , xi1 , . . . , xik ]. (2.15)
10
Qinghai Zhang Numerical Analysis 2021
Proof. The interpolating polynomial does not depend on the By Definition 2.18, we can construct the following table of
numbering of the interpolating nodes. The rest of the proof divided difference,
follows from the uniqueness of the interpolating polynomial
0 6
in Theorem 2.5.
1 −3 −9
(2.18)
Corollary 2.16. The kth divided difference can be ex- 2 −6 −3 3
pressed as 3 9 15 9 2
k k
By Definition 2.14, the interpolating polynomial is gener-
fi fi
, ated from the main diagonal and the first column of the
X X
f [x0 , x1 , . . . , xk ] = Qk = 0
i=0 j6=i;j=0 (xi − xj ) i=0
πk+1 (xi ) above table as follows.
(2.16) p3 = 6 − 9x + 3x(x − 1) + 2x(x − 1)(x − 2). (2.19)
where πk+1 (x) is defined in (2.10).
Hence f ( 23 ) ≈ p3 ( 23 ) = −6.
Proof. The uniqueness of interpolating polynomials in Theo-
rem 2.5 implies that the two polynomials in (2.8) and (2.14) Exercise 2.20. Redo Example 2.11 with the Newton for-
are the same. Then the first equality follows from (2.9) mula.
and Definition 2.14, while the second equality follows from Theorem 2.21. For distinct points x0 , x1 , . . . , xn , and x,
Lemma 2.12. we have
Theorem 2.17. Divided differences satisfy the recursion f (x) = f [x0 ] + f [x0 , x1 ](x − x0 ) + · · ·
n−1
f [x1 , x2 , . . . , xk ] − f [x0 , x1 , . . . , xk−1 ]
Y
f [x0 , x1 , . . . , xk ] = . + f [x0 , x1 , · · · , xn ] (x − xi ) (2.20)
xk − x0 i=0
(2.17) n
Y
+ f [x0 , x1 , · · · , xn , x]
(x − xi ).
Proof. By Definition 2.14, f [x1 , x2 , . . . , xk ] is the coefficient i=0
k−1
of x in a degree-(k − 1) interpolating polynomial, say,
P2 (x). Similarly, let P1 (x) be the interpolating polynomial Proof. Take another point z 6= xi . The Newton formula ap-
whose coefficient of xk−1 is f [x0 , x1 , . . . , xk−1 ]. Construct a plied to x0 , x1 , . . . , xn , z yields an interpolating polynomial
polynomial Q(x) = f [x0 ] + f [x0 , x1 ](x − x0 ) + · · ·
n−1
x − x0 Y
P (x) = P1 (x) + (P2 (x) − P1 (x)) . + f [x0 , x1 , · · · , xn ] (x − xi )
xk − x0
i=0
n
Clearly P (x0 ) = P1 (x0 ). Furthermore, the interpolation Y
condition implies P2 (xi ) = P1 (xi ) for i = 1, 2, . . . , k − 1. + f [x0 , x1 , · · · , xn , z] (x − xi ).
i=0
Hence P (xi ) = P1 (xi ) for i = 1, 2, . . . , k − 1. Lastly,
P (xk ) = P2 (xk ). Therefore, P (x) as above is the inter- The interpolation condition Q(z) = f (z) yields
polating polynomial for given values at the k + 1 points.
f (z) = Q(z) = f [x0 ] + f [x0 , x1 ](z − x0 ) + · · ·
In particular, the term f [x0 , x1 , · · · , xk ]xk in P (x) is con-
x n−1
tained in xk −x0 (P2 (x) − P1 (x)). The rest follows from the Y
definitions of and the kth divided difference. + f [x 0 , x 1 , · · · , xn ] (z − xi )
i=0
n
Definition 2.18. The kth divided difference (k ∈ N+ ) on Y
the table of divided differences + f [x0 , x1 , · · · , xn , z] (z − xi ).
i=0
x0 f [x0 ] Replacing the dummy variable z with x yields (2.20).
x1 f [x1 ] f [x0 , x1 ] The above argument assumes x 6= xi . We now con-
x2 f [x2 ] f [x1 , x2 ] f [x0 , x1 , x2 ] sider the case of x = xj for some fixed j. Rewrite (2.20)
x3 f [x3 ] f [x2 , x3 ] f [x1 , x2 , x3 ] f [x0 , x1 , x2 , x3 ] as f (x) = pn (f ; x) + R(x) where R(x) is clearly the last
··· ··· ··· ··· ··· term in (2.20). We need to show
is calculated as the difference of the entry immediately to ∀j = 0, 1, · · · , n, pn (f ; xj ) + R(xj ) − f (xj ) = 0,
the left and the one above it, divided by the difference of
where pn (f ; xj ) is the value of pn (f ; x) at x = xj ; this clearly
the x-value horizontal to the left and the one corresponding
holds because R(xj ) = 0 and the interpolation condition at
to the f -value found by going diagonally up.
xj dictates pn (f ; xj ) = f (xj ).
Example 2.19. Derive the interpolating polynomial via the Corollary 2.22. Suppose f ∈ C n [a, b] and f (n+1) (x) exists
Newton formula for the function f with given values as fol- at each point of (a, b). If a = x < x < · · · < x = b and
0 1 n
lows. Then estimate f ( 32 ). x ∈ [a, b], then there exists ξ(x) ∈ (a, b) such that
x 0 1 2 3 1
f [x0 , x1 , · · · , xn , x] = f (n+1) (ξ(x)). (2.21)
f (x) 6 −3 −6 9 (n + 1)!
11
Qinghai Zhang Numerical Analysis 2021
Proof. This follows from Theorems 2.21 and 2.7. where the second line follows from (2.26), the third line from
splitting one term out of each sum and replacing the dummy
Corollary 2.23. If x0 < x1 < · · · < xn and f ∈ C n [x0 , xn ], variable in the first sum, and the fourth line from (2.29) and
we have the fact that (−1)n+1 fi and fi+n+1 contribute to the first
1 (n)
lim f [x0 , x1 , · · · , xn ] = f (x0 ). (2.22) and last terms, respectively.
xn →x0 n!
Proof. Set x = xn+1 in Corollary 2.22, replace n + 1 by n,
and we have ξ → x0 as xn → x0 since each xi → x0 . Theorem 2.29. On a grid x = x + ih with uniform spac-
i 0
Definition 2.24. A bisequence is a function f : Z → R. ing h, the sequence of values fi = f (xi ) satisfies
Definition 2.25. The forward shift E and the backward
∆n f 0
shift B are linear operators V 7→ V on the linear space V of ∀n ∈ N+ , f [x0 , x1 , . . . , xn ] = . (2.30)
bisequenes given by n!hn
(Ef )(i) = f (i + 1), (Bf )(i) = f (i − 1). (2.23) Proof. Of course (2.30) can be proven by induction. Here
The forward difference ∆ and the backward difference ∇ are we provide a more informativeQproof. For πn+1 (x) defined
0 n
linear operators V 7→ V given by in (2.10), we have πn+1 (xk ) = i=0,i6=k (xk − xi ). It follows
from xk − xi = (k − i)h that
∆ = E − I, ∇ = I − B, (2.24)
where I is the identity operator on V . n
Y
0
πn+1 (xk ) = (k − i)h = hn k!(n − k)!(−1)n−k . (2.31)
Example 2.26. With the notation fi := f (i) for a bise-
i=0,i6=k
quence f , the nth forward difference and the nth backward
difference are
Then we have
∆n fi := (∆n f )(i), ∇n fi := (∇n f )(i). (2.25)
n n
X (−1)n−k fk
In particular, for n = 1 we have
X fk
f [x0 , x1 , . . . , xn ] = =
π 0 (xk ) hn k!(n − k)!
∆fi = fi+1 − fi , ∇fi = fi − fi−1 . (2.26) k=0 k=0
n
∆n f 0
Theorem 2.27. The forward difference and backward dif- 1 X n
= n (−1)n−k fk = n ,
ference are related as h n! k h n!
k=0
+ n n
∀n ∈ N , ∆ fi = ∇ fi+n . (2.27)
where the first step follows from Corollary 2.16, the second
Proof. An easy induction. from (2.31), and the last from Theorem 2.28.
Theorem 2.28. The forward difference can be expressed
explicitly as
n Theorem 2.30 (Newton’s forward difference formula).
n−k n
X
n
∆ fi = (−1) fi+k . (2.28) Suppose pn (f ; x) ∈ Pn interpolates f (x) on a uniform grid
k xi = x0 + ih at x0 , x1 , . . . , xn with fi = f (xi ). Then
k=0
12
Qinghai Zhang Numerical Analysis 2021
2.5 The Neville-Aitken algorithm Definition 2.34. The nth divided difference at n + 1 “con-
fluent” (i.e. identical) points is defined as
[i]
Theorem 2.31. Denote p0 = f (xi ) for i = 0, 1, . . . , n. For 1 (n)
all k = 0, 1, . . . , n − 1 and i = 0, 1, . . . , n − k − 1, define f [x0 , x0 , · · · , x0 ] =
f (x0 ), (2.37)
n!
[i+1] [i] where x0 is repeated n + 1 times on the left-hand side.
[i] (x − xi )pk (x) − (x − xi+k+1 )pk (x)
pk+1 (x) = . (2.34) Theorem 2.35. For the Hermite interpolation problem in
xi+k+1 − xi P
Definition 2.33, denote N = k + i mi . Denote by pN (f ; x)
Then each pk is the interpolating polynomial for the func- the unique element of PN for which (2.36) holds. Suppose
[i]
[0] f (N +1) (x) exists in (a, b). Then there exists some ξ ∈ (a, b)
tion f at the points xi , xi+1 , . . . , xi+k . In particular, pn is
such that
the interpolating polynomial of degree n for the function f
k
at the points x0 , x1 , . . . , xn . f (N +1) (ξ) Y
f (x) − pN (f ; x) = (x − xi )mi +1 . (2.38)
(N + 1)! i=0
Proof. The induction basis clearly holds for k = 0 because
[i] [i]
of the definition p0 = f (xi ). Suppose that pk is the in- Proof. The proof is similar to that of Theorem 2.7. Pay at-
terpolating polynomial of degree k for the function f at the tention to the Qkdifference caused by the multiple roots of the
mi +1
points xi , xi+1 , . . . , xi+k . Then the interpolation conditions polynomial i=0 (x − x i ) .
yield
Example 2.36. For the Hermite interpolation problem
[i+1] [i]
∀j = i + 1, i + 2, . . . , i + k, pk (xj ) = pk (xj ) = f (xj ), p(x0 ) = f0 , p0 (x0 ) = f00 , p00 (x0 ) = f000 ,
which, together with (2.34), implies Newton’s formula yields the interpolating polynomial as
1
[i] p(x) = f0 + f00 (x − x0 ) + f000 (x − x0 )2 ,
∀j = i + 1, i + 2, . . . , i + k, pk+1 (xj ) = f (xj ). 2
which is exactly the Taylor polynomial of degree 2. Thus a
In addition, (2.34) and the induction hypothesis yield Taylor polynomial is a special case of a Hermite interpolat-
[i] [i]
ing polynomial. By Theorem 2.35, the Cauchy remainder of
pk+1 (xi ) = pk (xi ) = f (xi ), this interpolation is
[i] [i+1]
pk+1 (xi+k+1 ) = pk (xi+k+1 ) = f (xi+k+1 ). f (3) (ξ)
R2 (f ; x) = f (x) − p2 (f ; x) = (x − x0 )3 ,
6
The proof is completed by the last three equations and the
which is Lagrange’s formula of the remainder term in Tay-
uniqueness of interpolating polynomials.
lor’s formula; see Theorem C.60.
Example 2.32. To estimate f (x) for x = 32 directly from Example 2.37. For the Hermite interpolation problem
the table in Example 2.19, we construct a table by repeating
p(x0 ) = f0 , p(x1 ) = f1 , p0 (x1 ) = f10 , p(x2 ) = f2 ,
(2.34) with xi = i for i = 0, 1, 2, 3.
the table of divided differences has the form
[i] [i] [i]
xi x − xi f (xi ) p1 (x) p2 (x) p3 (x) x0 f0
0 3
6 −2 15
−4 21
−6 x 1 f1 f [x0 , x1 ]
2
1 x 1 f1 f10 f [x0 , x1 , x1 ]
1 2 −3 − 92 − 27 4
(2.35)
x2 f2 f [x1 , x2 ] f [x1 , x1 , x2 ] f [x0 , x1 , x1 , x2 ]
2 − 12 −6 − 27 2
and the interpolating polynomial follows from Newton’s for-
3 − 23 9 mula. By Theorem 2.35, the Cauchy remainder is
The result is the same as that in Example 2.19. In contrast, f (4) (ξ)
R3 (f ; x) = (x − x0 )(x − x1 )2 (x − x2 )
the calculation and layout of the two tables are distinct. 4!
for some ξ ∈ [x0 , x2 ].
2.6 The Hermite interpolation f1′
13
Qinghai Zhang Numerical Analysis 2021
0
Proof. By trigonometric identities, we have
-0.2
-0.8
Adding up the two equations and setting cos θ = x complete
the proof. -1 -0.5 0 0.5 1
Corollary 2.41. The coefficient of xn in Tn is 2n−1 for each Exercise 2.43. Write a program to reproduce the above
n > 0. plot.
Proof. Use (2.42) and T1 = x in an induction. Theorem 2.44 (Chebyshev). Denote by P̃n the class of all
polynomials of degree n ∈ N+ with leading coefficient 1.
Theorem 2.42. Tn (x) has simple zeros at the n points Then
2k − 1 Tn (x)
xk = cos π, (2.43) ∀p ∈ P̃n , max ≤ max |p(x)| . (2.45)
2n x∈[−1,1] 2n−1 x∈[−1,1]
14
Qinghai Zhang Numerical Analysis 2021
Proof. By Theorem 2.42, Tn (x) assumes its extrema n + 1 Proof. This follows from Definition 2.47.
times at the points x0k defined in (2.44). Suppose (2.45) does
not hold. Then Theorem 2.42 implies that Definition 2.50. The nth Bernstein polynomial of a map
f ∈ C[0, 1] is
1
∃p ∈ P̃n s.t. max |p(x)| < n−1 . (2.46) n
k
2
X
x∈[−1,1]
(Bn f )(t) := f bn,k (t), (2.51)
n
1 k=0
Consider the polynomial Q(x) = 2n−1 Tn (x) − p(x).
where bn,k is a Bernstein base polynomial in (2.49).
0 (−1)k 0
Q(xk ) = n−1 − p(xk ), k = 0, 1, . . . , n. Theorem 2.51 (Weierstrass approximation). Every contin-
2
uous function f : [a, b] → R can be uniformly approximated
By (2.46), Q(x) has alternating signs at these n + 1 points. as closely as desired by a polynomial function.
Hence Q(x) must have n zeros. However, by the con-
struction of Q(x), the degree of Q(x) is at most n − 1. ∀f ∈ C[a, b], ∀ > 0, ∃N ∈ N+ s.t. ∀n > N,
1
Therefore, Q(x) ≡ 0 and p(x) = 2n−1 Tn (x), which implies ∃pn ∈ Pn s.t. ∀x ∈ [a, b], |pn (x) − f (x)| < .
1
max |p(x)| = 2n−1 . This is a contradiction to (2.46). (2.52)
Corollary 2.45. For n ∈ N+ , we have Proof. Without loss of generality, we assume a = 0, b = 1.
Set pn = Bn f in (2.51). For any > 0, there exist δ > 0
1
max |xn + a1 xn−1 + · · · + an | ≥ n−1 . (2.47) and n ∈ N+ such that
x∈[−1,1] 2
Xn
Corollary 2.46. Suppose polynomial interpolation is per- |(Bn f )(t) − f (t)| = (Bn f )(t) − f (t)
bn,k (t)
formed for f on the n + 1 zeros of Tn+1 (x) as in Theorem
k=0
2.42. The Cauchy remainder in Theorem 2.7 satisfies n
f k − f (t) bn,k (t)
X
≤
1 n
|Rn (f ; x)| ≤ n max f (n+1) (x) . (2.48) k=0
2 (n + 1)! x∈[−1,1]
X X k
= + f n − f (t) bn,k (t)
Proof. Theorem 2.7, Corollary 2.41, and Theorem 2.42 yield k k
k:| n −t|<δ k:| n −t|≥δ
n
(n+1) (n+1) kf k∞
|f (ξ)|
Y |f
(ξ)|
|Rn (f ; x)| = (x − xi ) = n |Tn+1 |. ≤ sup |f (t) − f (s)| +
(n + 1)! i=0 2 (n + 1)! |t−s|≤δ 2nδ 2
≤ + = ,
Definition 2.39 completes the proof as |Tn+1 | ≤ 1. 2 2
where the case |k − nt| < nδ in the second inequality follows
2.8 The Bernstein polynomials from (2.50a) and (2.50b), the other case |k − nt| ≥ nδ in the
second inequality follows from (2.50d) and
Definition 2.47. The Bernstein base polynomial s of degree
n ∈ N+ relative to the unit interval [0, 1] are
X X (k − nt)2
bn,k (t) ≤ bn,k (t)
k k
δ 2 n2
k:| n −t|≥δ k:| n −t|≥δ
n k
bn,k (t) = t (1 − t)n−k (2.49) n
k X (k − nt)2 t(1 − t) 1
≤ bn,k (t) = ≤ ,
δ 2 n2 nδ 2 4nδ 2
where k = 0, 1, . . . , n. k=0
Lemma 2.48. The Bernstein base polynomials satisfy and the last inequality follows from the uniform continuity
of f (c.f. Theorem C.44) and the choice of n > kfδk2∞ .
∀k = 0, 1, . . . , n, ∀t ∈ (0, 1), bn,k (t) > 0 (2.50a)
n
X
bn,k (t) = 1, (2.50b) 2.9 Problems
k=0
n
X 2.9.1 Theoretical questions
kbn,k (t) = nt, (2.50c) I. For f ∈ C 2 [x0 , x1 ] and x ∈ (x0 , x1 ), linear interpolation
k=0
n
of f at x0 and x1 yields
X
(k − nt)2 bn,k (t) = nt(1 − t). (2.50d) f 00 (ξ(x))
k=0 f (x) − p1 (f ; x) = (x − x0 )(x − x1 ).
2
Lemma 2.49. The Bernstein base polynomials of degree n Consider the case f (x) = x1 , x0 = 1, x1 = 2.
form a basis of Pn , the vector space of all polynomials with
degree no more than n. • Determine ξ(x) explicitly.
15
Qinghai Zhang Numerical Analysis 2021
• Estimate f (2) using Hermite interpolation. B. Run your routine on the function
• Estimate the maximum possible error of the above 1
answer if one knows, in addition, that f ∈ C 5 [0, 3] f (x) =
1 + x2
and |f (5) (x)| ≤ M on [0, 3]. Express the answer
in terms of M . for x ∈ [−5, 5] using xi = −5 + 10 ni , i = 0, 1, . . . , n, and
n = 2, 4, 6, 8. Plot the polynomials against the exact
VII. Define forward difference by
function to reproduce the plot in the notes that illus-
∆f (x) = f (x + h) − f (x), trate the Runge phenomenon.
k+1
∆ f (x) = ∆∆k f (x) = ∆k f (x + h) − ∆k f (x) C. Reuse your subroutine of Newton interpolation to per-
and backward difference by form Chebyshev interpolation for the function
16
Qinghai Zhang Numerical Analysis 2021
D. A car traveling along a straight road is clocked at a num- that extensively damage these trees in certain years. The
ber of points. The data from the observations are given following table lists the average weight of two samples of
in the following table, where the time is in seconds, the larvae at times in the first 28 days after birth. The first
distance is in feet, and the speed is in feet per second. sample was reared on young oak leaves, whereas the sec-
ond sample was reared on mature leaves from the same
Time 0 3 5 8 13 tree.
Distance 0 225 383 623 993
Speed 75 77 80 74 72
Day 0 6 10 13 17 20 28
(a) Use a Hermite polynomial to predict the position of Sp1 6.67 17.3 42.7 37.3 30.1 29.3 28.7
the car and its speed for t = 10s. Sp2 6.67 16.1 18.9 15.0 10.6 9.44 8.89
(b) Use the derivative of the Hermite polynomial to de-
termine whether the car ever exceeds the 55 mi/h (a) Use Newton’s formula to approximate the average
(81 feet per second) speed limit. weight curve for each sample.
E. It is suspected that the high amounts of tannin in mature (b) Predict whether the two samples of larvae will die
oak leaves inhibit the growth of the winter moth larvae after another 15 days.
17
Chapter 3
Splines
xi fi
xi fi mi
Ki −mi
xi+1 fi+1 Ki xi+1 −xi
mi+1 −Ki mi +mi+1 −2Ki
xi+1 fi+1 mi+1 xi+1 −xi (xi+1 −xi )2
Ki − mi
pi (x) =fi + (x − xi )mi + (x − xi )2
xi+1 − xi
m i + mi+1 − 2Ki
+ (x − xi )2 (x − xi+1 ) , (3.5)
3.1 Piecewise-polynomial splines (xi+1 − xi )2
无视量纲 18
Qinghai Zhang Numerical Analysis 2021
where x ∈ [xi , xi+1 ] and the derivatives should be inter- Theorem 3.7. For a given function f : [a, b] → R, there ex-
preted as the right-hand derivatives. Differentiate (3.8) ists a unique complete/natural/periodic cubic spline s(f ; x)
twice, set x = xi+1 , and we have that interpolates f .
Mi+1 − Mi Proof. We only prove the case of complete cubic splines since
s000 (xi ) = . (3.9) the other cases are similar.
xi+1 − xi
By the proof of Lemma 3.3, s is uniquely determined if
Substitute (3.9) into (3.8), set x = xi+1 , and we have all the mi ’s are uniquely determined on all intervals. For
a complete cubic spline we already have m1 = f 0 (a) and
1 mN = f 0 (b). Assemble (3.3) into a linear system
s0 (xi ) = f [xi , xi+1 ] − (Mi+1 + 2Mi )(xi+1 − xi ). (3.10)
6
2 µ2
m2
ditions s0 (f ; a) = f 0 (a) and s0 (f ; b) = f 0 (b). nonzero and the mi ’s can be uniquely determined.
Alternatively, a complete cubic spline can be uniquely
• A cubic spline with specified second derivatives at its
determined from Lemmas 3.4 and 3.6, following arguments
end points: s00 (f ; a) = f 00 (a) and s00 (f ; b) = f 00 (b).
similar to the above.
• A natural cubic spline s ∈ S23 satisfies boundary con-
ditions s00 (f ; a) = 0 and s00 (f ; b) = 0. Example 3.8. Construct a complete cubic spline s(x) on
points x1 = 1, x2 = 2, x3 = 3, x4 = 4, x5 = 6 from the func-
• A not-a-knot cubic spline s ∈ S23 satisfies that s000 (f ; x) tion values of f (x) = ln(x) and its derivatives at x and x .
1 5
exists at x = x2 and x = xN −1 . Approximate ln(5) by s(5).
not a knot--非节点边界条件
• A periodic cubic spline s ∈ S23 is obtained from From the given conditions, we set up the table of divided
replacing s(f ; b) = f (b) with s(f ; b) = s(f ; a), differences as follows.
s0 (f ; b) = s0 (f ; a), and s00 (f ; b) = s00 (f ; a). xi f [xi ]
1 0
Lemma 3.6. For a complete cubic spline s ∈ S23 , denote
1 0 1
Mi = s00 (f ; xi ) and we have
2 0.6931 0.6931 −0.3069
2M1 + M2 = 6f [x1 , x1 , x2 ], (3.12) 3 1.0986 0.4055 −0.1438
4 1.3863 0.2877 −0.05889
MN −1 + 2MN = 6f [xN −1 , xN , xN ]. (3.13) 6 1.7918 0.2027 −0.02831
6 1.7918 0.1667 −0.01803
Proof. As for (3.12), the cubic polynomial on [x1 , x2 ] can be
written as All values of λi and µi are 12 except that
19
Qinghai Zhang Numerical Analysis 2021
3.2 The minimum properties Proof. Since s00 (x) is linear on [xi , xi+1 ], |s00 (x)| attains its
maximum at xj for some j. If j = 2, . . . , N − 1, it follows
Theorem 3.9 (Minimum bending energy). For any func- from Lemma 3.4 and Corollary 2.22 that
tion g ∈ C 2 [a, b] that satisfies g 0 (a) = f 0 (a), g 0 (b) = f 0 (b),
and g(xi ) = f (xi ) for each i = 1, 2, . . . , N , the complete 2Mj = 6f [xj−1 , xj , xj+1 ] − µj Mj−1 − λj Mj+1
cubic spline s = s(f ; x) satisfies ⇒ 2|Mj | ≤ 6 |f [xj−1 , xj , xj+1 ]| + (µj + λj )|Mj |
Z b D1样条势能最小
Z b ⇒ ∃ξ ∈ [xj−1 , xj+1 ] s.t. |Mj | ≤ 3 |f 00 (ξ)|
00 00
[s (x)] dx ≤ 2
[g (x)] dx, 2
(3.16) ⇒ |s00 (x)| ≤ 3 max |f 00 (x)| . (3.19)
a a x∈[a,b]
where the equality holds only when g(x) = s(f ; x). If |s00 (x)| attains its maximum at x1 or xN , (3.19) clearly
holds at these end points for a cubic spline with specified sec-
Proof. Define η(x) = g(x)−s(x). From the given conditions ond derivatives. After all, s00 (a) = f 00 (a) and s00 (b) = f 00 (b).
we have η ∈ C 2 [a, b], η 0 (a) = η 0 (b) = 0, and ∀i = 1, 2, . . . , N , As for the complete cubic spline, it suffices to prove (3.19)
η(xi ) = 0. Then when |s00 (x)| attains its maximum at x1 . Since the first
Z b Z b derivative f 0 (a) = f [x1 , x1 ] is specified, f [x1 , x1 , x2 ] is a con-
[g 00 (x)]2 dx = [s00 (x) + η 00 (x)]2 dx stant. By (3.12), we have
a a
Z b Z b Z b 2|M1 | ≤ 6|f [x1 , x1 , x2 ]| + |M2 | ≤ 6|f [x1 , x1 , x2 ]| + |M1 |
00 2 00 2 00 00
= [s (x)] dx + [η (x)] dx + 2 s (x)η (x)dx. which, together with Corollary 2.22, implies
a a a
20
Qinghai Zhang Numerical Analysis 2021
which, together with (3.21), leads to (3.20) for j = 2: 3.4.1 Truncated power functions
|f 00 (x) − s00 (x)| ≤ |f 00 (x) − ŝ00 (x)| + |ŝ00 (x) − s00 (x)|
Definition 3.16. The truncated power function with expo-
≤ 4 max |f 00 (x) − ŝ00 (x)| nent n is defined as
x∈[a,b]
1 2
≤ h max f (4) (x) . (3.22)
(
2 x∈[a,b] xn if x ≥ 0,
xn+ = (3.23)
For j = 0, we have f (x) − s(x) = 0 for x = xi , xi+1 . Then 0 if x < 0.
Rolle’s theorem C.50 implies f 0 (ξi ) − s0 (ξi ) = 0 for some
ξi ∈ [xi , xi+1 ]. It follows from the second fundamental the-
orem of calculus (Theorem C.73) that Example 3.17. According to Definition 3.16, we have
Z x
∀x ∈ [xi , xi+1 ], f 0 (x) − s0 (x) = (f 00 (t) − s00 (t)) dt, Z b
n
Z t
(t − a)n+1
ξi ∀t ∈ [a, b], (t − x)+ dx = (t − x)n dx = .
a a n+1
which, together with the integral mean value theorem C.71 (3.24)
and (3.22), yields
|f 0 (x) − s0 (x)|x∈[xi ,xi+1 ] = |x − ξi | |f 00 (ηi ) − s00 (ηi )| Lemma 3.18. The following is a basis of Sn−1 (t1 , . . . , tN ),
n
1
≤ h3 max f (4) (x) .
2 x∈[a,b]
1, x, x2 , . . . , xn , (x−t2 )n+ , (x−t3 )n+ , . . . , (x−tN −1 )n+ . (3.25)
This proves (3.20) for j = 1. Finally, consider interpolating
f (x)−s(x) with some linear spline s̄ ∈ S01 . The interpolation
n−1
conditions dictate ∀x ∈ [a, b], s̄(x) ≡ 0. Hence Proof. ∀i = 2, 3, . . . , N − 1, (x − ti )n+ ∈ Sn,N . Also,
i n−1
|f (x) − s(x)| = |f (x) − s(x) − s̄| ∀i = 0, 1, . . . , n, x ∈ Sn,N . Suppose
x∈[xi ,xi+1 ] x∈[xi ,xi+1 ]
1
≤ (xi+1 − xi )2 max |f 00 (x) − s00 (x)| n N −1
8 x∈[xi ,xi+1 ] X X
1 4 ai xi + an+j (x − tj )n+ = 0(x). (3.26)
≤ h max |f (4) (x)|, i=0 j=2
16 x∈[a,b]
where the second step follows from Theorem 2.7 and the
To satisfy (3.26) for all x < t2 , ai must be 0 for each
third step from (3.22).
i = 0, 1, · · · , n. To satisfy (3.26) for all x ∈ (t2 , t3 ), an+2
Exercise 3.13. Verify Theorem 3.12 using the results in must be 0. Similarly, all an+j ’s must be zero. Hence, the
Example 3.8. functions in (3.25) are linearly independent by Definition
B.25. The proof is completed by Theorem 3.14, Lemma
B.41, and the fact that there are n + N − 1 functions in
3.4 B-Splines (3.25).
Notation 2. In the notation Sn−1 n (t1 , t2 , · · · , tN ), ti ’s in
the parentheses represent knots of a spline. When there is
no danger of ambiguity, we also use the shorthand notation Corollary 3.19. Any s ∈ Sn−1 can be expressed as
Sn−1 n−1
n,N := Sn (t1 , t2 , · · · , tN ) or simply Sn−1
n . n,N
21
Qinghai Zhang Numerical Analysis 2021
3.4.2 The local support of B-splines Proof. This is an easy induction by (3.31) and (3.30).
Definition 3.21. The hat function at ti is Definition 3.28. Let X be a vector space. For each x ∈ X
x−ti−1 we associate a unique real (or complex) number L(x). If
ti −ti−1 x ∈ (ti−1 , ti ],
∀x, y ∈ X and ∀α, β ∈ R (or C), we have
i+1 −x
B̂i (x) = tti+1 −ti x ∈ (ti , ti+1 ], (3.28)
L(αx + βy) = αL(x) + βL(y), (3.36)
0 otherwise.
Theorem 3.22. The hat functions form a basis of S01 . then L is called a linear functional over X.
Proof. By Definition 3.21, we have Example 3.29. X = C[a, b], then the elements of X are
( functions continuous over [a, b].
1 if i = j,
B̂i (tj ) = (3.29) Z b Z b
0 if i 6= j. L(f ) = f (x)dx, L(f ) = x2 f (x)dx
a a
PN
Suppose i=1 ci B̂i (x) = 0(x). Then we have ci = 0 for each are both linear functionals over X.
i = 1, 2, · · · , N by setting x = tj and applying (3.29). Hence
by Definition B.25 the hat functions are linearly indepen- Notation 3. We have used the notation f [x0 , . . . , xk ] for
dent. It suffices to show that span{B̂1 , B̂2 , . . . , B̂N } = S01 , the kth divided difference of f , inline with considering
which is true because f [x0 , . . . , xk ] as a generalization of the Taylor expansion.
N Hereafter, for analyzing B-splines, it is both semantically
and syntactically better to use the notation [x0 , . . . , xk ]f ,
X
∀s(x) ∈ S01 , ∃sB (x) = s(ti )B̂i (x) s.t. s(x) = sB (x).
i=1 inline with considering the procedures of a divided difference
as a linear functional over C[x0 , xk ].
On each interval [ti , ti+1 ], (3.29) implies sB (ti ) = s(ti ) and
sB (ti+1 ) = s(ti+1 ). Hence sB (x) ≡ s(x) because they are Theorem 3.30 (Leibniz formula). For k ∈ N, the kth di-
both linear. Then Definition B.32 completes the proof. vided difference of a product of two functions satisfies
22
Qinghai Zhang Numerical Analysis 2021
In the above derivation, we have applied Theorem 2.17 to Definition 3.23 and the induction hypothesis yield
go from the kth divided difference to the (k + 1)th. Then
Bin+1 (x) = β(x) + γ(x), with
S1 + S2 x − ti−1
[x0 , . . . , xk+1 ]f g = β(x) = B n (x)
xk+1 − x0 ti+n − ti−1 i
k+1
X = (x − ti−1 ) · [ti−1 , . . . , ti+n ](t − x)n+
= [x0 , . . . , xi ]f · [xi , . . . , xk+1 ]g,
i=0 = [ti , . . . , ti+n ](t − x)n+ − [ti−1 , . . . , ti+n ](t − x)n+1
+ ,
which completes the inductive proof. where the last step follows from (3.39). Similarly,
Example 3.31. There exists a relation between B-splines ti+n+1 − x n
and truncated power functions, e.g., γ(x) = B (x)
ti+n+1 − ti i+1
(ti+1 − ti−1 )[ti−1 , ti , ti+1 ](t − x)+ =(ti+n+1 − x) · [ti , . . . , ti+n+1 ](t − x)n+
=[ti , ti+1 ](t − x)+ − [ti−1 , ti ](t − x)+ =(ti+n+1 − ti ) · [ti , . . . , ti+n+1 ](t − x)n+
(ti+1 − x)+ − (ti − x)+ (ti − x)+ − (ti−1 − x)+ + (ti − x) · [ti , . . . , ti+n+1 ](t − x)n+
= −
ti+1 − ti ti − ti−1 =[ti+1 , . . . , ti+n+1 ](t − x)n+ − [ti , . . . , ti+n ](t − x)n+
+ [ti , . . . , ti+n+1 ](t − x)n+1
x−ti−1
ti −ti−1 x ∈ (ti−1 , ti ],
+
=Bi = tti+1
1 i+1 −x
x ∈ (ti , ti+1 ], − [ti+1 , . . . , ti+n+1 ](t − x)n+
−ti
=[ti , . . . , ti+n+1 ](t − x)n+1 − [ti , . . . , ti+n ](t − x)n+ ,
0 otherwise.
+
The algebra is illustrated by the figures below, where the second last step follows from Theorem 2.17 and
(3.39). The above arguments yield
ti−1 ti ti+1
Bin+1 (x) =[ti , . . . , ti+n+1 ](t − x)n+1
+
− [ti−1 , . . . , ti+n ](t − x)n+1
+
ti−1 ti ti+1 ti−1 ti ti+1
=(ti+n+1 − ti−1 ) · [ti−1 , . . . , ti+n+1 ](t − x)n+1
+ ,
ti−1 ti ti+1 ti−1 ti ti+1 ti−1 ti ti+1 which completes the inductive proof.
The significance is that, by applying divided difference
to truncated power functions we can “cure” their drawback 3.4.3 Integrals and derivatives
of non-local support. This idea is made precise in the next
Theorem. Corollary 3.33 (Integrals of B-splines). The average of a
B-spline over its support only depends on its degree,
Theorem 3.32 (B-splines as divided difference of truncated Z ti+n
power functions). For any n ∈ N, we have 1 1
Bin (x)dx = . (3.40)
ti+n − ti−1 ti−1 n+1
n n
Bi (x) = (ti+n − ti−1 ) · [ti−1 , . . . , ti+n ](t − x)+ . (3.38)
Proof. The left-hand side (LHS) of (3.40) is
Proof. For n = 0, (3.38) reduces to
Z ti+n
1
Bi0 (x) = (ti − ti−1 ) · [ti−1 , ti ](t − x)0+ Bin (x)dx
ti+n − ti−1 ti−1
= (ti − x)0+ − (ti−1 − x)0+ Z ti+n
0 if x ∈ (−∞, ti−1 ], = [ti−1 , . . . , ti+n ](t − x)n+ dx
ti−1
= 1 if x ∈ (ti−1 , ti ], Z ti+n
=[t , . . . , t ] (t − x)n+ dx
0 if x ∈ (ti , +∞), i−1 i+n
ti−1
which is the same as (3.31). Hence the induction basis holds. (t − ti−1 )n+1
=[ti−1 , . . . , ti+n ]
Now assume the induction hypothesis (3.38) hold. n+1
By Definition 3.16, (t − x)n+1
+ = (t − x)(t − x)n+ . Then =
1
,
the application of the Leibniz formula (Theorem 3.30) with n+1
f = (t − x) and g = (t − x)n+ yields
where the first step follows from Theorem 3.32, the second
[ti−1 , . . . , ti+n ](t − x)n+1 step from the commutativity of integration and taking di-
+
vided difference, the third step from (3.24), and the last step
=(ti−1 − x) · [ti−1 , . . . , ti+n ](t − x)n+ (3.39)
from Corollary 2.22.
+ [ti , . . . , ti+n ](t − x)n+ .
23
Qinghai Zhang Numerical Analysis 2021
Theorem 3.34 (Derivatives of B-splines). For n ≥ 2, we where the product (t − ti ) · · · (t − ti+n−1 ) is defined as 1 for
have, ∀x ∈ R, n = 0.
n−1
d n nBin−1 (x) nBi+1 (x)
Bi (x) = − . (3.41) Proof. For n = 0, (3.43) follows from Definition 3.23. Now
dx ti+n−1 − ti−1 ti+n − ti suppose (3.43) holds. A linear interpolation of the linear
For n = 1, (3.41) holds for all x except at the three knots function f (t) = t − x is the function itself,
ti−1 , ti , and ti+1 , where the derivative of Bi1 is not defined.
t − ti+n t − ti−1
t−x = (ti−1 −x)+ (ti+n −x). (3.44)
Proof. We first show that (3.41) holds for all x except at the ti−1 − ti+n ti+n − ti−1
knots tj . By (3.32), (3.28), and (3.31), we have
Hence for the inductive step we have
∀x ∈R \ {ti−1 , ti , ti+1 },
d 1 1 1 +∞
B (x) = B 0 (x) − B 0 (x). X
dx i ti − ti−1 i ti+1 − ti i+1 (t − x)n+1 = (t − x) (t − ti ) · · · (t − ti+n−1 )Bin (x)
i=−∞
Hence the induction basis holds. Now suppose (3.41) holds +∞
∀x ∈ R \ {ti−1 , . . . , ti+n }. Differentiate (3.30), apply the
X ti−1 − x
= (t − ti ) · · · (t − ti+n ) B n (x)
induction hypothesis (3.41), and we have i=−∞
ti−1 − ti+n i
n +∞
d n+1 Bin (x) Bi+1 (x) X ti+n − x
Bi (x) = − + nC(x), (3.42) + (t − ti−1 ) · · · (t − ti+n−1 ) B n (x)
dx ti+n − ti−1 ti+n+1 − ti
i=−∞
ti+n − ti−1 i
where C(x) is +∞
X x − ti−1
= (t − ti ) · · · (t − ti+n ) B n (x)
ti+n − ti−1 i
" #
x − ti−1 Bin−1 (x) B n−1 (x) i=−∞
− i+1
ti+n − ti−1 ti+n−1 − ti−1 ti+n − ti +∞
X ti+n+1 − x n
+ (t − ti ) · · · (t − ti+n ) B (x)
ti+n+1 − ti i+1
" #
n−1 n−1
ti+n+1 − x Bi+1 (x) Bi+2 (x) i=−∞
+ −
ti+n+1 − ti ti+n − ti ti+n+1 − ti+1 +∞
X
"
n−1
# = (t − ti ) · · · (t − ti+n )Bin+1 (x),
1 (x − ti−1 )Bin−1 (x) (ti+n − x)Bi+1 (x) i=−∞
= +
ti+n − ti−1 ti+n−1 − ti−1 ti+n − ti
"
n−1 n−1
# where the first step follows from the induction hypothesis,
1 (x − ti )Bi+1 (x) (ti+n+1 − x)Bi+2 (x) the second step from (3.44), the third step from replacing i
− +
ti+n+1 − ti ti+n − ti ti+n+1 − ti+1 with i + 1 in the second summation, and the last step from
Bin (x) n
Bi+1 (x) (3.30).
= − ,
ti+n − ti−1 ti+n+1 − ti
where the last step follows from (3.30). Then (3.42) can be Corollary 3.37 (Truncated power functions as linear com-
written as binations of B-splines). For any j ∈ Z and n ∈ N,
n
d n+1 (n + 1)Bin (x) (n + 1)Bi+1 (x) j−n
Bi (x) = − , X
dx ti+n − ti−1 ti+n+1 − ti (tj − x)n+ = (tj − ti ) · · · (tj − ti+n−1 )Bin (x). (3.45)
i=−∞
which completes the inductive proof of (3.41) except at the
knots. Since Bi1 = B̂i is continuous, an easy induction with
Proof. We need to show that the RHS is (tj − x)n if x ≤ tj
(3.30) shows that Bin is continuous for all n ≥ 1. Hence the
and 0 otherwise. Set t = tj in (3.43) and we have
right-hand side of (3.41) is continuous for all n ≥ 2. There-
d
fore, if n ≥ 2, dx Bin (x) exists for all x ∈ R. This completes +∞
the proof. (tj − x)n =
X
(tj − ti ) · · · (tj − ti+n−1 )Bin (x).
n n−1 i=−∞
Corollary 3.35 (Smoothness of B-splines). Bi ∈ Sn .
Proof. For n = 1, the induction basis Bi1 (x) ∈ S01 holds be- For each i = j − n + 1, . . . , j, the corresponding term in the
cause of (3.32). The rest of the proof follows from (3.30) summation is zero regardless of x; for each i ≥ j +1, Lemma
and Theorem 3.34 via an easy induction. 3.27 implies that Bin (x) = 0 for all x ≤ tj . Hence
j−n
3.4.4 Marsden’s identity X
x ≤ tj ⇒ (tj − ti ) · · · (tj − ti+n−1 )Bin (x) = (tj − x)n .
Theorem 3.36 (Marsden’s identity). For any n ∈ N, i=−∞
+∞
n
(3.43) Otherwise x > tj , then Lemma 3.27 implies Bi (x) = 0 for
X
(t − x)n = (t − ti ) · · · (t − ti+n−1 )Bin (x),
i=−∞ each i ≤ j − n. This completes the proof.
24
Qinghai Zhang Numerical Analysis 2021
25
Qinghai Zhang Numerical Analysis 2021
3.4.6 B-splines indeed form a basis Theorem 3.47 states that each monomial xj can also be ex-
pressed as a linear combination of B-splines. Since the do-
Theorem 3.47. Given any k ∈ N, the monomial xk can be
main is restricted to [t1 , tN ], we know from Lemma 3.27 that
expressed as a linear combination of B-splines for any fixed
only those B-splines in the list of (3.60) appear in the linear
n ≥ k, in the form
combination. Therefore, these B-splines form a spanning list
+∞ of Sn−1
n (t1 , t2 , . . . , tN ). The proof is completed by Lemma
n k X
n
x = σk (ti , . . . , ti+n−1 )Bi (x), (3.58) B.40, Theorem 3.14, and the fact that the length of the list
k i=−∞ (3.60) is also n + N − 1.
26
Qinghai Zhang Numerical Analysis 2021
Theorem 3.55. The cardinal B-spline of degree n can be which proves the middle N − 2 equations of M a = b. By
explicitly expressed as Theorem 3.34, we have
1 X
n d n
n−k n + 1
n−1 n−1
n
Bi,Z (x) = (−1) (k + i − x)n+ . (3.69) B (x) = Bi,Z (x) − Bi+1,Z (x). (3.74)
n! k+1 dx i,Z
k=−1
Differentiate (3.71), apply (3.74), set x = 1, apply (3.66)
Proof. Theorems 3.32, 2.29, and 2.28 yield and we have the first identity in (3.72), which, together with
(3.73), yields
n
Bi,Z (x) = (n + 1)[i − 1, . . . , i + n](t − x)n+
n + 1 n+1 2a0 + a1 = f 0 (1) + 3f (1);
= ∆ (i − 1 − x)n+
(n + 1)! this proves the first equation of M a = b. The last equation
n+1 M a = b and the second identity in (3.72) can be shown
1 X n+1
= (−1)n+1−k (i − 1 + k − x)n+ . similarly. The strictly diagonal dominance of M implies a
n! k
k=0 nonzero determinant of M and therefore a is uniquely deter-
mined. The uniqueness of S(x) then follows from (3.72).
Replacing k with k + 1 and accordingly changing the sum-
mation bounds complete the proof. Theorem 3.58. There is a unique B-spline S(x) ∈ S12 that
interpolates f (x) at ti = i + 21 for each i = 1, 2, . . . , N − 1
Corollary 3.56. The value of a cardinal B-spline at an in-
with end conditions S(1) = f (1) and S(N ) = f (N ). Fur-
teger j is
thermore, this B-spline is
n
n 1 X n+1 N
Bi,Z (j) = (−1)n−k (k + i − j)n (3.70) X
2
n! k+1 S(x) = ai Bi,Z (x), (3.75)
k=j−i+1
i=0
27
Qinghai Zhang Numerical Analysis 2021
2
− 12 we have used Corollary 3.52. Then is a curve. Its tangent vector is
where for B0,Z
Corollary 3.51 and (3.77) yield
γ 0 (t) = (et (cos t − sin t), et (cos t + sin t)) (3.82)
ai−1 + 6ai + ai+1 = 8f (ti ), (3.78)
and thus the modulus of the tangent vector is
which proves the middle N − 3 equations in M a = b. At
the end point x = 1, only two quadratic cardinal B-splines, √
2
B0,Z 2
(x) and B1,Z , are nonzero. Then Example 3.25 yields kγ 0 (t)k2 = 2et .
Definition 3.61. A unit-speed curve is a curve whose tan- Definition 3.70. For a unit-speed curve γ, its signed cur-
gent vector has unit length at each of its points. vature is defined as
Definition 3.62. A point γ(t0 ) is a regular point of γ if
t(t0 ) exists and t(t0 ) 6= 0 holds; a curve is regular if all of κs := γ 00 · ns . (3.85)
its points are regular.
Definition 3.71. The cumulative chordal lengths associ-
Definition 3.63. The arc-length of a curve starting at the ated with a sequence of n points
point γ(t0 ) is defined as
Z t {xi ∈ RD : i = 1, 2, . . . , n} (3.86)
0
sγ (t) = kγ (u)k2 du. (3.80)
t0 are the n real numbers,
Definition 3.64. A map X 7→ Y is a homeomorphism if it (
is continuous and bijective and its inverse is also continuous; 0, i = 1;
ti = (3.87)
then the two sets X and Y are said to be homeomorphic. ti−1 + kxi − xi−1 k2 , i > 1,
28
Qinghai Zhang Numerical Analysis 2021
V. The quadratic B-spline Bi2 (x). C. Run your subroutines on the function
29
Qinghai Zhang Numerical Analysis 2021
E. The roots of the following equation constitute a closed the characteristic points and you should think about (i)
planar curve in the shape of a heart: how many pieces of splines to use? (ii) what boundary
2 conditions are appropriate? )
2 3 p
x + y − |x| = 3. (3.88)
2
F. (*) Write a program to illustrate (3.38) by plotting the
Write a program to plot the heart. The parameter of the truncated power functions for n = 1, 2 and build a table
curve should be the cumulative chordal length defined in of divided difference where the entries are figures instead
(3.87). Choose n = 10, 40, 160 and produce three plots of numbers. The pictures you generated for n = 1 should
of the heart function. (Hints: Your knots should include be the same as those in Example 3.31.
30
Chapter 4
Computer Arithmetic
4.1 Floating-point number systems Algorithm 4.8. A decimal integer can be converted to a
binary number via the following method:
Definition 4.1. The base or radix of a positional numeral
system is the number of unique symbols used to represent • divide by 2 and record the remainder,
numbers. • repeat until you reach 0,
Example 4.2. The binary numeral system consists of two • concatenate the remainder backwards.
digits: “0” and “1,” and thus its base is 2. The decimal
system consists of ten digits: “0” – “9,” and thus its base is A decimal fraction can be converted to a binary number via
10. the following method:
Definition 4.3. A bit is the basic unit of information in • multiply by 2 and check whether the integer part is no
computing; it can have only one of two values 0 and 1. less than 1: if so record 1; otherwise record 0,
• repeat until you reach 0,
Definition 4.4. A byte is a unit of information in com-
puting that commonly consists of 8 bits; it is the smallest • concatenate the recorded bits forward.
addressable unit of memory in many computers.
Combine the above two methods and we can convert any
Definition 4.5. A word is a group of bits with fixed size decimal number to its binary counterpart.
that are handled as a unit by the instruction set architec-
ture (ISA) and/or hardware of the processor. The word Example 4.9. Convert 156 to binary number:
size/width/length is the number of bits in a word and is an
important characteristic of processor or computer architec- 156 = (10011100)2 .
ture.
Example 4.10. What is the normalized binary form of 23 ?
Example 4.6. 32-bit and 64-bit computers are mostly com-
mon these days. A 32-bit register can store 232 values, hence 2
= (0.a1 a2 a3 · · · )2 = (0.1010 · · · )2
a processor with 32-bit memory address can directly access 3
4GB byte-addressable memory. = (1.0101010 · · · )2 × 2−1 .
Definition 4.7 (Floating point numbers). A floating point Definition 4.11 (FPN systems). A floating point number
number (FPN) is a number of the form system F is a proper subset of the rational numbers Q, and
it is characterized by a 4-tuple (β, p, L, U ) with
x = ±m × β e , (4.1)
• the base (or radix) β;
where β is the base or radix, e ∈ [L, U ], and the significand
(or mantissa) m is a number of the form • the precision (or significand digits) p;
• the exponent range [L, U ].
d1 dp−1
m = d0 + + · · · + p−1 , (4.2)
β β Definition 4.12. An FPN is normalized if its mantissa sat-
isfies 1 ≤ m < β.
where the integer di satisfies ∀i ∈ [0, p−1], di ∈ [0, β −1]. d0
and dp−1 are called the most significant digit and the least Definition 4.13. The subnormal or denormalized numbers
significant digit, respectively. The string of digits of m is the are FPNs of the form (4.1) with e = L and m ∈ (0, 1). A
string d0 .d1 d2 · · · dp−1 , of which the portion .d1 d2 · · · dp−1 is normalized FPN system can be extended by including the
called the fraction of m. subnormal numbers.
31
Qinghai Zhang Numerical Analysis 2021
Definition 4.14 (IEEE standard 754-2019). The single pre- • realmax is OFL(F)
cision and double precision FPNs of current IEEE (Institute
of Electrical and Electronics Engineers) standard 754 pub- max |F| = β U (β − β 1−p ) ≈ 1.80 × 10308 .
lished in 2019 are normalized FPN systems with three binary
formats (32, 64, and 128 bits) and two decimal formats (64 In C/C++, these constants are defined in <cfloat> and
and 128 bits). float.h by macros DBL_EPSILON, DBL_MIN, and DBL_MAX.
β = 2, p = 23 + 1, e ∈ [−126, 127]; (4.3a)
Corollary 4.19 (Cardinality of F). For a normalized bi-
β = 2, p = 52 + 1, e ∈ [−1022, 1023]; (4.3b) nary FPN system F,
β = 2, p = 112 + 1, e ∈ [−16382, 16383]; (4.3c)
β = 10, p = 16, e ∈ [−1022, 1023]; (4.3d) #F = 2p (U − L + 1) + 1. (4.7)
β = 10, p = 34, e ∈ [−6143, 6144]. (4.3e)
Proof. The cardinality can be proved by Axiom A.21. The
Example 4.15. In the IEEE 754 standard, there are some factor 2p comes from the sign bit and the mantissa. By Ex-
further details on the representation specifications of FPNs. ample 4.15, U −L+1 is the number of exponents represented
in F. The trailing “+1” in (4.7) accounts for the number
0.
± exponent (e) normalized significand (m)
b
implicit radix point
Definition 4.20. The range of a normalized FPN system
is a subset of R that consists of two intervals,
For example, some major representation specifications of
the 32-bit FPNs are as follows. R(F) := {x : x ∈ R, UFL(F) ≤ |x| ≤ OFL(F)} . (4.8)
(a) Out of the 32 bits, 1 is reserved for the sign, 8 for the Example 4.21. Consider a normalized FPN system with
exponents, 23 for the significand (see the plot above for the characterization β = 2, p = 3, L = −1, U = +1.
the locations and the implicit radix point).
(b) The precision is 24 because we can choose d0 = 1 for
| | | | | | |
normalized binary floating point numbers and get away
-3 -2 -1 0 1 2 3
with never storing d0 .
(c) The exponent has 28 = 256 possibilities. If we assign The four FPNs
1, 2, . . . , 256 to these possibilities, it would not be possi-
ble to represent numbers whose magnitudes are smaller 1.00 × 20 , 1.01 × 20 , 1.10 × 20 , 1.11 × 20
than one. Hence we subtract 1, 2, . . . , 256 by 128 to shift
the exponents to −127, −126, . . . , 0, . . . , 127, 128. Out of correspond to the four ticks in the plot starting at 1 while
these numbers , ±m × β −127 is reserved for ±0 and sub-
normal numbers while ±m × β 128 is reserved for ±∞ 1.00 × 21 , 1.01 × 21 , 1.10 × 21 , 1.11 × 21
and NaNs including qNaN (quiet) and sNaN (signaling).
correspond to the four ticks starting at 2. Add subnormal
Definition 4.16. The machine precision of a normalized FPNs and we have the following plot.
FPN system F is the distance between 1.0 and the next
larger FPN in F,
M := β 1−p . (4.4) | | | | | | |
-3 -2 -1 0 1 2 3
Definition 4.17. The underflow limit (UFL) and the over-
flow limit (OFL) of a normalized FPN system F are respec- Definition 4.22. Two normalized FPNs a, b are adjacent
tively to each other in F iff
UFL(F) := min |F \ {0}| = β L , (4.5)
∀c ∈ F \ {a, b}, |a − b| < |a − c| + |c − b|. (4.9)
U 1−p
OFL(F) := max |F| = β (β − β ). (4.6)
Lemma 4.23. Let a, b be two adjacent normalized FPNs
Example 4.18. By default Matlab adopts IEEE 754 double satisfying |a| < |b| and ab > 0. Then
precision arithmetic. Three characterizing constants are
β −1 M |a| < |a − b| ≤ M |a|. (4.10)
• eps is the machine precision
M = β 1−p = 21−(52+1) = 2−52 ≈ 2.22 × 10−16 , Proof. Consider a > 0, then ∆a := b − a > 0. By Defi-
nitions 4.7 and 4.12, a = m × β e with 1.0 ≤ m < β. a
• realmin is UFL(F) and b only differ from each other at the least significant
digit, hence ∆a = M β e . Since βM < m M
≤ M , we have
min |F \ {0}| = β L = 2−1022 ≈ 2.22 × 10−308 , ∆a
a ∈ (β
−1
M , M ]. The other case is similar.
32
Qinghai Zhang Numerical Analysis 2021
Theorem 4.28. For x ∈ R(F), we have Here u (p) in step (iii) is the unit round-off for FPNs with
x precision p, c.f. Definition 4.26.
fl(x) = , |δ| ≤ u . (4.14)
1+δ
Example 4.31. Consider the calculation of c := fl(a + b)
Proof. The proof is the same as that of Theorem 4.27, ex-
with a = 1.234 × 104 and b = 5.678 × 100 in an FPN system
cept that we replace the last inequality “< u |x|” in (4.13)
F : (10, 4, −7, 8).
by “≤ u |fl(x)|.” Consequently, the equality in (4.14) holds
when x = 21 (xL + xR ) and |fl(x)| = min(|xL |, |xR |) has its (i) b ← 0.0005678 × 104 ; ec ← 4.
significand as m = 1.0.
(ii) mc ← 1.2345678.
Example 4.29. Find xL , xR of x = 23 in normalized single-
precision IEEE 754 standard, which of them is fl(x)? (iii) do nothing.
By Example 4.10, we have (iv) do nothing.
2 (v) mc ← 1.235.
= (0.1010 · · · )2 = (1.0101010 · · · )2 × 2−1 .
3
xL = (1.010 · · · 10)2 × 2−1 ; (vi) c = 1.235 × 104 .
xR = (1.010 · · · 11)2 × 2−1 , For b = 5.678 × 10−2 , c = a would be returned in step (i).
where the last bit of xL must be 0 because the IEEE 754
Example 4.32. Consider the calculation of c := fl(a + b)
standard states that 23 bits are reserved for the mantissa.
with a = 1.000 × 100 and b = −9.000 × 10−5 in an FPN
It follows that
system F : (10, 4, −7, 8).
2 −24
x − xL = × 2 ;
3 (i) b ← −0.0000900 × 100 ; ec ← 0.
xR − xL = 2−24 ,
(ii) mc ← 0.9999100.
1 −24
xR − x = (xR − xL ) − (x − xL ) = × 2 . (iii) ec ← ec − 1; mc ← 9.9991000.
3
Thus Definition 4.24 implies fl(x) = xR . (iv) do nothing.
33
Qinghai Zhang Numerical Analysis 2021
Exercise 4.33. Repeat Example 4.31 with b = 8.769 × 104 , (i) If mb = 0, return NaN; otherwise set ec ← ea − eb .
b = −5.678 × 100 , and b = −5.678 × 103 . (ii) Perform the division Mc ← Ma /Mb in the register with
rounding to nearest.
Lemma 4.34. For a, b ∈ F, a + b ∈ R(F) implies
(iii) Normalization:
fl(a + b) = (a + b)(1 + δ), |δ| < u . (4.15)
• If |Mc | < 1, set Mc ← Mc β, ec ← ec − 1.
Proof. The round-off error in step (v) always dominates that
in step (ii), which, because of the 2p precision, is nonzero (iv) Check range:
only in the case of ea − eb = p + 1. Then (4.15) follows from • return NaN if ec overflows,
Theorem 4.27.
• return 0 if ec underflows.
Definition 4.35 (Multiplication of two FPNs). Express
(v) Round Mc (to nearest) to precision p.
a, b ∈ F as a = Ma ×β ea and b = Mb ×β eb where Ma = ±ma
and Mb = ±mb . The product c := fl(ab) ∈ F is calculated (vi) Set c ← Mc × β ec .
in a register of precision at least 2p as follows. a
Lemma 4.39. For a, b ∈ F, b ∈ R(F) implies
(i) Exponent sum: ec ← ea + eb . a a
(ii) Perform the multiplication Mc ← Ma Mb in the regis- fl = (1 + δ), |δ| < u . (4.17)
b b
ter.
Proof. In the case of |Ma | = |Mb |, there is no rounding er-
(iii) Normalization:
ror in Definition 4.38 and (4.17) clearly holds. Hereafter we
• If |Mc | ∈ (β, β 2 ), set Mc ← Mc /β and ec ← ec +1. denote by Mc1 and Mc2 the results of steps (ii) and (v) in
Definition 4.38, respectively.
• If |Mc | ∈ [β − u (p), β], set Mc ← 1.0 and
In the case of |Ma | > |Mb |, the condition a, b ∈ F, Defi-
ec ← ec + 1.
nition 4.16, and |Ma |, |Mb | ∈ [1, β) imply
(iv) Check range:
Ma β − M −1
Mb ≥ β − 2M > 1 + β M , (4.18)
• return NaN if ec overflows,
• return 0 if ec underflows.
which further implies that the normalization step (iii) in
(v) Round Mc (to nearest) to precision p. Definition 4.38 is not invoked. By Definitions 4.24, 4.16,
and 4.26, the unit roundoff of a register with precision p + k
(vi) Set c ← Mc × β ec . is
Here u (p) in step (iii) is the unit round-off for FPNs with 1 1−p−k 1
precision p, c.f. Definition 4.26. β = β 1−p β 1−p β p−1−k = β p−1−k u M ,
2 2
Example 4.36. Consider the calculation of c := fl(ab) with and hence the unit roundoff of the register in Definition 4.38
a = 2.345 × 104 and b = 6.789 × 100 in an FPN system is β −2 u M . Therefore we have
F : (10, 4, −7, 8).
Mc2 = Mc1 + δ2 , |δ2 | < u
(i) ec ← 4. Ma
= + δ1 + δ2 , |δ1 | < β −2 u M
(ii) Mc ← 15.920205. Mb
(iii) mc ← 1.5920205, ec ← 5. Ma
= (1 + δ);
Mb
(iv) do nothing.
δ1 + δ2 u 1 + β −2 M
(v) mc ← 1.592. |δ| = < < u ,
Ma /Mb 1 + β −1 M
(vi) c = 1.592 × 105 .
where we have applied (4.18) and the triangular inequality
Lemma 4.37. For a, b ∈ F, |ab| ∈ R(F) implies in deriving the first inequality of the last line.
Consider the last case |Ma | < |Mb |. It is impossible to
fl(ab) = (ab)(1 + δ), |δ| < u . (4.16) have |Mc1 | = 1 in step (ii) because
Proof. The error only comes from the round-off in steps (v). |Ma | β − 2M M
Then (4.16) follows from Theorem 4.27. ≤ =1− < 1 − β −1 M
|Mb | β − M β − M
34
Qinghai Zhang Numerical Analysis 2021
and the precision of the register is greater than p+1. There- An easy induction then shows that
fore |Mc1 | < 1 must hold and in Definition 4.38 step (iii) is
k
invoked to yield X
∀k ∈ N, |δk+1 | < u (1 + u )i (4.21)
Ma i=0
Mc1 = + δ1 , |δ1 | < β −2 u M ;
Mb (1 + u )k+1 − 1
= u = (1 + u )k+1 − 1,
Mc2 = βMc1 + δ2 , |δ2 | < u 1 + u − 1
Ma βδ1 + δ2 where the second step follows from the summation formula
=β 1+ ,
Mb βMa /Mb of geometric series. The proof is completed by the binomial
theorem.
where the denominator in the parentheses satisfies
Ma
Exercise 4.42. If we sort the positive numbers ai > 0 ac-
β M −1
β ≥ = 1 + > 1 + β M . cording to their magnitudes and carry out the additions in
M b β − M β − M this ascending order, we can minimize the rounding error
Hence we have term δ in Theorem 4.41. Can you give some examples?
βδ1 + δ2 β −1 u M + u Exercise 4.43. Derive fl(a1 b1 + a2 b2 + a3 b3 ) for ai , bi ∈ F
|δ| = < = u . and make
βMa /Mb 1 + β −1 M P Qsome observations on the corresponding derivation
of fl( i j ai,j ).
Theorem 4.40 (Model of machine arithmetic). Denote by
F a normalized FPN system with precision p. For each Theorem 4.44. For given µ ∈ R+ and a positive integer
ln 2
arithmetic operation = +, −, ×, /, we have n ≤ b µ c, suppose |δi | ≤ µ for each i = 1, 2, . . . , n. Then
n
∀a, b ∈ F, ab ∈ R(F) ⇒ fl(ab) = (ab)(1+δ) (4.19) Y
1 − nµ ≤ (1 + δi ) ≤ 1 + nµ + (nµ)2 , (4.22)
where |δ| < u if and only if these binary operations are i=1
performed in a register with precision 2p + 1. 1
or equivalently, for In := [− 1+nµ , 1],
Proof. This follows from Lemmas 4.34, 4.37, and 4.39. n
Y
∃θ ∈ In s.t. (1 + δi ) = 1 + θ(nµ + n2 µ2 ). (4.23)
4.2.3 The propagation of rounding errors i=1
35
Qinghai Zhang Numerical Analysis 2021
4.3 Accuracy and stability In other words, the relative error of addition or subtraction
can be arbitrarily large when x + y → 0.
4.3.1 Avoiding catastrophic cancellation
Theorem 4.49 (Loss of most significant digits). Suppose
Definition 4.45. Let x̂ be an approximation to x ∈ R. The x, y ∈ F, x > y > 0, and
accuracy of x̂ can be measured by its absolute error
y
β −t ≤ 1 − ≤ β −s . (4.26)
Eabs (x̂) = |x̂ − x| (4.24) x
and/or its relative error Then the number of most significant digits that are lost in
|x̂ − x| the subtraction x − y is at most t and at least s.
Erel (x̂) = . (4.25)
|x| Proof. Rewrite x = mx × β n and y = my × β m with
Definition 4.46. For an approximation ŷ to y = f (x) com- 1 ≤ mx , my < β. Definition 4.30 and the condition x > y
imply that my , the significand of y, is shifted so that y has
puted by ŷ = fˆ(x), the forward error is the relative error of
the same exponent as x before mx − my is performed in the
ŷ in approximating y and the backward error is the small-
register. Then
est relative error in approximating x by an x̂ that satisfies
f (x̂) = fˆ(x), assuming such an x̂ exists. y = (m × β m−n ) × β n
y
f b ŷ ⇒ x − y = (mx − my × β m−n ) × β n
x̂ b
my × β m
y
⇒ mx−y = mx 1 − = m x 1 −
mx × β n x
⇒ β −t ≤ mx−y < β 1−s .
fˆ ∆y
To normalize mx−y into the interval [1, β), it should be mul-
∆x tiplied by at least β s and at most β t . In other words, mx−y
should be shifted to the left for at least s times and at most
f b y t times. Therefore the conclusion on the number of lost
x b significant digits follows.
Rule 4.50. Catastrophic cancellation should be avoided
whenever possible.
Definition 4.47 (Accuracy). An algorithm ŷ = fˆ(x) for
computing the function y = f (x) is accurate if its forward Example 4.51. Calculate y = f (x) = x − sin x for x → 0.
error is small for all x, i.e. ∀x ∈ dom(f ), Erel (fˆ(x)) ≤ cu When x is small, a straightforward calculation would result
where c is a small constant. in a catastrophic cancellation because x ≈ sin x. The solu-
tion is to use the Taylor series
Example 4.48 (Catastrophic cancellation). For two real
numbers x, y ∈ R(F), Theorems 4.27 and 4.40 imply x3 x5 x7
x − sin x = x − x − + − + ···
fl(fl(x) fl(y)) = (fl(x) fl(y))(1 + δ3 ) 3! 5! 7!
3 5 7
= (x(1 + δ1 ) y(1 + δ2 ))(1 + δ3 ) x x x
= − + + ···
3! 5! 7!
where |δi | ≤ u . From Theorems 4.40 and 4.44, we know
that multiplication is accurate: 4.3.2 Backward stability and numerical
fl(fl(x) × fl(y)) = xy(1 + δ1 )(1 + δ2 )(1 + δ3 ) stability
= xy(1 + θ(3u + 92u )), Definition 4.52 (Backward stability). An algorithm fˆ(x)
where θ ∈ [−1, 1]. Similarly, division is also accurate: for computing y = f (x) is backward stable if its backward
error is small for all x, i.e.
x(1 + δ1 )
fl(fl(x)/fl(y)) = (1 + δ3 ) ∀x ∈ dom(f ), ∃x̂ ∈ dom(f ), s.t.
y(1 + δ2 ) (4.27)
x fˆ(x) = f (x̂) ⇒ Erel (x̂) ≤ cu ,
= (1 + δ1 )(1 − δ2 + δ22 − · · · )(1 + δ3 )
y
x where c is a small constant.
≈ (1 + δ1 )(1 − δ2 )(1 + δ3 ).
y Definition 4.53. An algorithm fˆ(x1 , x2 ) for computing
However, addition and subtraction might not be accurate: y = f (x1 , x2 ) is backward stable if
fl(fl(x) + fl(y)) = (x(1 + δ1 ) + y(1 + δ2 ))(1 + δ3 ) ∀(x1 , x2 ) ∈ dom(f ), ∃(x̂1 , x̂2 ) ∈ dom(f ) s.t.
=(x + y + xδ1 + yδ2 )(1 + δ3 ) ˆ Erel (x̂1 ) ≤ c1 u , (4.28)
f (x1 , x2 ) = f (x̂1 , x̂2 ) ⇒
xδ1 + yδ2 xδ1 + yδ2
Erel (x̂2 ) ≤ c2 u ,
=(x + y) 1 + δ3 + + δ3 .
x+y x+y where c1 , c2 are two small constants.
36
Qinghai Zhang Numerical Analysis 2021
Lemma 4.54. For f (x1 , x2 ) = x1 − x2 , x1 , x2 ∈ R(F), the Choose x̂ = x(1 + δ1 + δ2 + δ1 δ2 ) and we have
algorithm fˆ(x1 , x2 ) = fl(fl(x1 ) − fl(x2 )) is backward stable. Erel (x̂) = |δ1 + δ2 + δ1 δ2 | < 3u ,
Proof. We have fˆ(x1 , x2 ) = (fl(x1 ) − fl(x2 ))(1 + δ3 ) from
fˆ(x) − f (x̂) δ2
⇒ = ≤ u ,
Theorem 4.40. Then Theorem 4.27 implies f (x̂) 1 + x(1 + δ1 + δ2 + δ1 δ2 )
fˆ(x1 , x2 ) = (x1 (1 + δ1 ) − x2 (1 + δ2 ))(1 + δ3 ) where the denominator is never close to zero since x > 0.
= x1 (1 + δ1 + δ3 + δ1 δ3 ) − x2 (1 + δ2 + δ3 + δ2 δ3 ).
4.3.3 Condition numbers: scalar functions
Take x̂1 and x̂2 to be the two terms in the above line and
we have Definition 4.59. The (relative) condition number of a
function y = f (x) is a measure of the relative change in
Erel (x̂1 ) = |δ1 + δ3 + δ1 δ3 |, the output for a small change in the input,
0
Erel (x̂2 ) = |δ2 + δ3 + δ2 δ3 |. xf (x)
Cf (x) = . (4.30)
f (x)
Then Definition 4.53 completes the proof.
Definition 4.60. A problem with a low condition number
Example 4.55. For f (x) = 1 + x, x ∈ (0, OFL), show that is said to be well-conditioned. A problem with a high con-
the algorithm fˆ(x) = fl(1.0 + fl(x)) is not backward stable. dition number is said to be ill-conditioned.
We prove a stronger statement that implies the nega- Example 4.61. Definition 4.59 yields
tion of (4.27). For each x ∈ (0, u ), Definition 4.24 yields
Erel (ŷ) / Cf Erel (x̂). (4.31)
fˆ(x) = 1.0. Then fˆ(x) = f (x̂) implies x̂ = 0, which further
implies Erel (x̂) = 1. The approximation mark “≈” refers to the fact that the
quadratic term (∆x)2 has been ignored. As one way to
interpret (4.31) and to understand Definition 4.59, the com-
ŷ b
puted solution to an ill-conditioned problem may have a large
forward error.
∆y
fˆ Example 4.62. For the function f (x) = arcsin(x), its con-
b dition number, according to Definition 4.59, is
f 0
xf (x) x
b Cf (x) =
= √ .
f (x) 2
1 − x arcsin x
∆x Hence Cf (x) → +∞ as x → ±1.
5
x b
4.5
3.5
( ˆ 2
f (x)−f (x̂)
f (x̂) ≤ cf u ,
∀x ∈ dom(f ), ∃x̂ ∈ dom(f ) s.t. 1.5
Erel (x̂) ≤ cu ,
1
(4.29) -1 -0.5 0 0.5 1
where cf , c are two small constants. Lemma 4.63. Consider solving the equation f (x) = 0 near
Lemma 4.57. If an algorithm is backward stable, then it a simple root r, i.e. f (r) = 0 and f 0 (r) 6= 0. Suppose we per-
is numerically stable. turb the function f to F = f + g where f, g ∈ C 2 , g(r) 6= 0,
and |g 0 (r)| |f 0 (r)|. Then the root of F is r + h where
ˆ
Proof. By Definition 4.52, f (x̂) = f (x), hence cf = 0. The g(r)
other condition also follows trivially. h ≈ − 0 . (4.32)
f (r)
Example 4.58. For f (x) = 1 + x, x ∈ (0, OFL), show that Proof. Suppose r + h is the new root, i.e. F (r + h) = 0, or,
the algorithm fˆ(x) = fl(1.0 + fl(x)) is stable.
f (r + h) + g(r + h) = 0.
If |x| < u , then fˆ(x) = 1.0. Choose x̂ = x, then
ˆ Taylor’s expansion of F (r + h) yields
f (x̂) − x = fˆ(x) and f (x)−f (x̂) x
f (x̂) = 1+x < 2u .
Otherwise |x| ≥ . The definitions of the range and f (r) + hf 0 (r) + [g(r) + hg 0 (r)] = O(h2 )
u
unit roundoff (Definitions 4.26 and 4.20) yield x ∈ R(F). and we have
By Theorem 4.27, fˆ(x) = (1 + x(1 + δ1 ))(1 + δ2 ), i.e. g(r) g(r)
fˆ(x) = 1 + δ2 + x(1 + δ1 + δ2 + δ1 δ2 ), where |δ1 |, |δ2 | < u . h ≈ − 0 ≈ − 0 .
f (r) + g 0 (r) f (r)
37
Qinghai Zhang Numerical Analysis 2021
How is the root x = p affected by perturbing f to f + g? Definition 4.68. The componentwise condition number of
By Lemma 4.63, the answer is a vector function f : Rm → Rn is
38
Qinghai Zhang Numerical Analysis 2021
ϕ(x) f
∀x ∈ F, condA (x) ≤ . (4.43)
condf (x) x x∗
Rounding Algorithm fA fA (x∗ )
Proof. Assume ∀x, ∃xA such that f (xA ) = fA (x). Write
xA = x(1 + A ) and we have
39
Qinghai Zhang Numerical Analysis 2021
By (4.31), the first term is X. The math problem of root finding for a polynomial
∗ ∗
kf (x ) − f (x)k kx − xk n
/ condf (x) X
kf (x)k kxk q(x) = a i xi , an = 1, a0 6= 0, ai ∈ R (4.48)
∗ i=0
= Erel (x )condf (x) .
By (4.31) and Definition 4.74, the second term is can be considered as a vector function f : Rn → C:
kfA (x∗ ) − f (x∗ )k kf (x∗A ) − f (x∗ )k kf (x∗A ) − f (x∗ )k r = f (a0 , a1 , . . . , an−1 ).
= ≈
kf (x)k kf (x)k kf (x∗ )k
kx∗ − x∗ k Derive the componentwise condition number of f based
≤ condf (x∗ ) A ∗ on the 1-norm. For the Wilkinson example, compute
kx k
your condition number, and compare your result with
= u condA (x ) condf (x∗ ) ,
∗
that in the Wilkinson Example. What does the com-
where the last step follows from the fact that we only con- parison tell you?
sider the x∗A that is the least dangerous.
XI. Suppose the division of two FPNs is calculated in a reg-
ister of precision 2p. Give an example that contradicts
4.4 Problems the conclusion of the model of machine arithmetic.
4.4.1 Theoretical questions XII. If the bisection method is used in single precision FPNs
of IEEE 754 starting with the interval [128, 129], can
I. Convert the decimal integer 477 to a normalized FPN we compute the root with absolute accuracy < 10−6 ?
with β = 2. Why?
II. Convert the decimal fraction 3/5 to a normalized FPN
XIII. In fitting a curve by cubic splines, one gets inaccurate
with β = 2.
results when the distance between two adjacent points
e is much smaller than those of other adjacent pairs. Use
III. Let x = β , e ∈ Z, L < e < U be a normalized FPN in
F and xL , xR ∈ F the two normalized FPNs adjacent the condition number of a matrix to explain this phe-
to x such that xL < x < xR . Prove xR −x = β(x−xL ). nomenon.
IV. By reusing your result of II, find out the two normal-
ized FPNs adjacent to x = 3/5 under the IEEE 754 4.4.2 Programming assignments
single-precision protocol. What is fl(x) and the rela- A. Print values of the functions in (4.49) at 101 equally
tive roundoff error? spaced points covering the interval [0.99, 1.01]. Calcu-
V. If the IEEE 754 single-precision protocol did not round late each function in a straightforward way without re-
off numbers to the nearest, but simply dropped excess arranging or factoring. Note that the three functions are
bits, what would the unit roundoff be? theoretically the same, but the computed values might
be very different. Plot these functions near 1.0 using a
VI. How many bits of precision are lost in the subtraction magnified scale for the function values to see the varia-
1 − cos x when x = 41 ? tions involved. Discuss what you see. Which one is the
VII. Suggest at least two ways to compute 1−cos x to avoid most accurate? Why?
catastrophic cancellation caused by subtraction.
B. Consider a normalized FPN system F with the charac-
VIII. What are the condition numbers of the following func- terization β = 2, p = 3, L = −1, U = +1.
tions? Where are they large?
• compute UFL(F) and OFL(F) and output them as
• (x − 1)α , decimal numbers;
• ln x, • enumerate all numbers in F and verify the corollary
• ex , on the cardinality of F in the summary handout;
• arccos x. • plot F on the real axis;
IX. Consider the function f (x) = 1 − e−x for x ∈ [0, 1]. • enumerate all the subnormal numbers of F;
• Show that condf (x) ≤ 1 for x ∈ [0, 1]. • plot the extended F on the real axis.
• Let A be the algorithm that evaluates f (x) for
the machine number x ∈ F. Assume that the ex-
ponential function is computed with relative error
f (x) = x8 − 8x7 + 28x6 − 56x5 + 70x4 − 56x3 + 28x2 − 8x + 1
within machine roundoff. Estimate condA (x) for
(4.49a)
x ∈ [0, 1].
• Plot condf (x) and the estimated upper bound of g(x) = (((((((x − 8)x + 28)x − 56)x + 70)x − 56)x + 28)x − 8)x + 1
condA (x) as a function of x on [0, 1]. Discuss your (4.49b)
8
results. h(x) = (x − 1) (4.49c)
40
Chapter 5
Approximation
Definition 5.1. Given a normed vector space Y of func- p : [0, 1) → R2 with its knots at xi ’s and a scaled cumu-
tions and its subspace X ⊂ Y . A function ϕ̂ ∈ X is called lative chordal length as in Definition 3.71. Denote by Int(γ)
the best approximation to f ∈ Y from X with respect to the as the complement of γ that always lies at the left of an ob-
norm k · k iff server who travels γ according to its parametrization. Then
the area difference between S1 := Int(γ) and S2 := Int(p)
∀ϕ ∈ X, kf − ϕ̂k ≤ kf − ϕk. (5.1) can be defined as
Z
Example 5.2. The Chebyshev Theorem 2.44 can be re-
kS1 ⊕ S2 k1 := dx,
stated in the format of Definition 5.1 as follows. As in Ex- S1 ⊕S2
ample B.24, denote by Pn (R) the set of all polynomials with
coefficients in R and degree at most n. For Y = Pn (R), and where
X = Pn−1 (R), the best approximation to f (x) = −xn in Y S1 ⊕ S2 := S1 ∪ S2 \ (S1 ∩ S2 )
from X with respect to the max-norm k · k∞ is the exclusive disjunction of S and S .
1 2
The minimization of this area difference can be formu-
kgk∞ = max |g(x)| (5.2)
x∈[−1,1] lated by a best approximation problem based on the 1-norm.
Tn
is ϕ̂ = 2n−1 − xn , where Tn is the Chebyshev polynomial of Theorem 5.6. Suppose X is a finite-dimensional subspace
degree n. Clearly ϕ̂ satisfies (5.1). of a normed vector space (Y, k · k). Then we have
Definition 5.3. The fundamental problem of linear Pn approx- ∀y ∈ Y, ∃ϕ̂ ∈ X s.t. ∀ϕ ∈ X, kϕ̂ − yk ≤ kϕ − yk. (5.5)
imation is to find the best approximation ϕ̂ = i=1 ai ui to
f ∈ Y from n elements u1 , u2 , . . . , un ∈ X ⊂ Y that are Proof. For a given y ∈ Y , define a closed ball
linearly independent and given a priori.
By := {x ∈ X : kxk ≤ 2kyk}.
Example 5.4. For f (x) = ex in P C ∞ [−1, 1], seeking its best
n
approximation of the form ϕ̂ = i=1 ai ui in the subspace Clearly 0 ∈ By , and the distance from y to By is
2
X = span{1, x, x , . . .} is a problem of linear approximation,
where n can be any positive integer and the norm can be dist(y, By ) := inf ky − xk ≤ ky − 0k = kyk.
x∈By
the max-norm (5.2), the 1-norm
Z +1 By definition, any z ∈ X, z 6∈ By must satisfy kzk > 2kyk,
kgk1 := |g(x)|dx, (5.3) and thus
−1 kz − yk ≥ kzk − kyk > kyk.
or the 2-norm Therefore, if a best approximation to y exists, it must be in
Z +1 1
By . As a subspace of X, By is finite dimensional, closed, and
2
2 bounded, hence By is compact. The extreme value theorem
kgk2 := |g(x)| dx . (5.4)
−1
states that a continuous scalar function attains its minimum
and maximum on a compact set. A norm is a continuous
The three different norms are motivated differently: the function, hence the function d : By → R+ ∪ {0} given by
max-norm corresponds to the min-max error, the 1-norm d(x) = kx − yk must attain its minimum on By .
is related to the area bounded between g(x) and the x-axis,
and the 2-norm is related to the Euclidean distance, c.f. Theorem 5.7. The set C[a, b] of continuous functions over
Section 5.4. [a, b] is an inner-product space over C with its inner product
as
Example 5.5. For a simple closed curve γ : [0, 1) → R2
Z b
and n points xi ∈ γ, consider a spline approximation hu, vi := ρ(t)u(t)v(t)dt, (5.6)
a
41
Qinghai Zhang Numerical Analysis 2021
where v(t) is the complex conjugate of v(t) and the weight Proof. Definition 5.13 implies that the formulae (5.9) can
function ρ(x) ∈ C[a, b] satisfies ρ(x) > 0 for all x ∈ (a, b). In be rewritten in the form of (5.10); this can be proven
addition, C[a, b] with by induction. The induction basis is the recursion basis
! 21 u∗1 = u1 /ku1 k and the inductive step follows from (5.9) as
Z b
kuk2 := ρ(t)|u(t)|2 dt (5.7) n−1
!
1 X
a u∗n = un − hun , u∗k i u∗k ,
kvn k
k=1
is a normed vector space over R.
Proof. This follows from Definitions B.2, B.108, and where each u∗k
is a linear combination of u1 , u2 , . . . , uk and
B.113. ann , the coefficient of un in (5.10), is clearly kv1n k . By (5.9b),
u∗n+1 is normal. We show by induction that u∗n+1 is orthog-
Definition 5.8. The least-square approximation on C[a, b] onal to u∗n , u∗n−1 , . . ., u∗1 . The induction base holds because
is a best approximation problem with the norm in (5.1) set
to that in (5.7). hv2 , u∗1 i = hu2 − hu2 , u∗1 i u∗1 , u∗1 i
= hu2 , u∗1 i − hu2 , u∗1 i hu∗1 , u∗1 i = 0,
5.1 Orthonormal systems where the second step follows from (IP-3) in Definition B.108
and the third step from u∗1 being normal. The inductive step
Definition 5.9. A subset S of an inner product space X is also holds because for any j < n + 1 we have
called orthonormal if
n
* +
( X
∗ ∗ ∗ ∗
0 if u 6= v, vn+1 , uj = un+1 − hun+1 , uk i uk , uj
∀u, v ∈ S, hu, vi = (5.8)
1 if u = v. k=1
n
∗
X
Example 5.10. The standard basis vectors in Rn are or- hun+1 , u∗k i u∗k , u∗j
= un+1 , uj −
thonormal. k=1
∗
un+1 , u∗j = 0,
Example 5.11. The Chebyshev polynomials of the first = u n+1 , uj −
kind as in Definition 2.39 are orthogonal with respect to where the third step follows from the induction hypothesis
1
(5.6) where a = −1, b = 1, ρ = √1−x 2
. However, they do and (5.9b), i.e.,
not satisfy the second case in (5.8). (
Theorem 5.12. Any finite set of nonzero orthogonal ele-
∗ ∗ 1 if k = j;
uk , uj = (5.11)
ments u1 , u2 , . . . , un is linearly independent. 0 otherwise.
Proof. This is easily proven by contradiction using Defini- Exercise 5.15. Prove akk = 1
by using ku∗n k = 1.
kvk k
tions B.25 and 5.9.
u2
Definition 5.13. The Gram-Schmidt process takes in a fi-
nite or infinite independent list (u1 , u2 , . . .) and output two u1
other lists (v1 , v2 , . . .) and (u∗1 , u∗2 , . . .) by
n
X
vn+1 = un+1 − hun+1 , u∗k i u∗k , (5.9a)
k=1 u∗1 u∗2
u∗n+1 = vn+1 /kvn+1 k, (5.9b)
v2 = u2 − hu2 , u∗1 iu∗1
with the recursion basis as v1 = u1 , u∗1 = v1 /kv1 k.
Theorem 5.14. For a finite or infinite independent list
(u1 , u2 , . . .), the Gram-Schmidt process yields constants Corollary 5.16. For a finite or infinite independent list
a11 (u1 , u2 , . . .), we can find constants
a21 a22
b11
a31 a32 a33
.. b21 b22
. b31 b32 b33
..
such that akk = kv1 k > 0 and the elements u∗1 , u∗2 , . . .
k
.
u∗1 = a11 u1 and an orthonormal list (u∗1 , u∗2 , . . .) such that bii > 0 and
u∗2 = a21 u1 + a22 u2 u1 = b11 u∗1
u∗3 = a31 u1 + a32 u2 + a33 u3 (5.10) u2 = b21 u∗1 + b22 u∗2
.. u3 = b31 u∗1 + b32 u∗2 + b33 u∗3 (5.12)
.
..
are orthonormal. .
42
Qinghai Zhang Numerical Analysis 2021
Proof. This follows from (5.10) and that a lower-triangular Example 5.21. With the Euclidean inner product in Def-
matrix with positive diagonal elements is invertible. inition B.112, we select orthonormal vectors in R3 as
Corollary 5.17. In Theorem 5.14, we have hu∗n , ui i = 0 for u∗1 = (1, 0, 0)T , u∗2 = (0, 1, 0)T , u∗3 = (0, 0, 1)T .
each i = 1, 2, . . . , n − 1.
Proof. By Corollary 5.16, each ui can be expressed as For the vector w = (a, b, c)T , the Fourier coefficients are
i
X hw, u∗1 i = a, hw, u∗2 i = b, hw, u∗3 i = c,
ui = bik u∗k .
k=1 and the projections of w onto u∗1 and u∗2 are
Inner product the above equation with u∗n , apply the orthog-
hw, u∗1 i u∗1 = (a, 0, 0)T , hw, u∗2 i u∗2 = (0, b, 0)T .
onal conditions, and we reach the conclusion.
Definition 5.18. Using the Gram-Schmidt orthonormaliz- The Fourier expansion of w is
ing process with the inner product (5.6), we obtain from
w = hw, u∗1 i u∗1 + hw, u∗2 i u∗2 + hw, u∗3 i u∗3 ,
the independent list of monomials (1, x, x2 . . .) the following
classic orthonormal polynomials: with the error of Fourier expansion as 0; see Theorem 5.23.
a b ρ(x)
Exercise 5.22. For the orthonormal list in L2ρ=1 [−π, π],
Chebyshev polynomials
of the first kind -1 1 √ 1
1−x2 1 sin x cos x sin(nx) cos(nx)
Chebyshev polynomials √ , √ , √ ,..., √ , √ ,..., (5.14)
√ 2π π π π π
of the second kind -1 1 1−x 2
3 √
1
w = ci u∗i .
u∗3 = 10 x2 − . i=1
4 3
Then the orthogonality of u∗i ’s implies
5.2 Fourier expansions
∀k = 1, 2, · · · , n, hu∗k , wi = ck ,
Definition 5.20. Let (u∗1 , u∗2 , . . .) be a finite or infinite or-
thonormal list. The orthogonal expansion or Fourier expan- which completes the proof.
sion for an arbitrary w is the series Theorem 5.24 (Minimum properties of Fourier expan-
Xm sions). Let u∗1 , u∗2 , . . . be an orthonormal system and let w
hw, u∗n i u∗n , (5.13) be arbitrary. Then
n=1
N N
where the constants hw, u∗n i are known as the Fourier coef-
X
X
∗ ∗
∗
w − hw, u i u ≤ w − ai i
,
u (5.17)
i i
ficients of w and the term hw, u∗n i u∗n the projection of w on
i=1 i=1
u∗n . The error of the Fourier expansion of w with respect to
(u∗1 , u∗2 , . . .) is simply n hw, u∗n i u∗n − w.
P
for any selection of constants a1 , a2 , · · · , aN .
43
Qinghai Zhang Numerical Analysis 2021
P PN
Proof. With the shorthand notation i = i=1 , we deduce Example 5.28. Consider the problem in Example 5.4 in the
from the definition and properties of inner products sense of least square approximation with the weight function
2 * + ρ = 1. It is equivalent to
X
X X
w − ai u∗i
= w − ai u∗i , w − ai u∗i !2
Z +1 Xn
x i
min e − ai x dx. (5.22)
i i i
* + * + ai −1
X X i=0
= hw, wi − w, ai u∗i − ai u∗i , w
i i For n = 1, 2, use the Legendre polynomials derived in Ex-
ample 5.19:
* +
X X
+ ai u∗i , ai u∗i
1√
r
i i
∗ 1 ∗ 3 ∗
X X u 1 = √ , u2 = x, u3 = 10(3x2 − 1),
= hw, wi − ai hw, u∗i i − ai hu∗i , wi 2 2 4
i i
XX and we have the Fourier coefficients of ex as
ai aj u∗i , u∗j
+
i j Z +1
1 x 1 1
X X X b0 = √ e dx = √ e− ,
= hw, wi − ai hw, u∗i i − ai hu∗i , wi + |ai |2 −1 2 2 e
i i i Z +1 r √
3 x
b = xe dx = 6e−1 ,
X X
∗ ∗ ∗ ∗
− hui , wi hw, ui i + hui , wi hw, ui i 1
2
−1
i i Z +1 √ √
2
X
∗ 2
X
∗ 2
1 2 x 10 7
= kwk − |hw, ui i| + |ai − hw, ui i| , (5.18) b2 = 10(3x − 1)e dx = e− .
−1 4 2 e
i i
where “| · |” denotes the modulus of a complex number. The minimizing polynomials are thus
The first
P two 2 terms are independent of ai . Therefore
(
1
kw − i ai u∗i k is minimized only when ai = hw, u∗i i. (e2 − 1) + 3e x n = 1;
ϕ̂n = 2e 5
(5.23)
ϕ̂1 + 4e (e2 − 7)(3x2 − 1) n = 2.
Corollary 5.25. Let (u1 , u2 , . . . , un ) be an independent
list. The fundamental problem of linearly approximating
an arbitrary vector w is solved by the best approximation 5.3 The normal equations
ϕ̂ = k hw, u∗k i u∗k where u∗k ’s are the uk ’s orthonormalized
P
by the Gram-Schmidt process. The error norm is Theorem 5.29. Let u1 , u2 , . . . , un ∈ X be linearly indepen-
dent and let u∗i be the ui ’s orthonormalized by the Gram-
2
Schmidt process. Then, for any element w,
n
n
2
X X
kw − ϕ̂k2 := min
w − ak uk
= kwk2 − |hw, u∗k i| .
ak
n
!
k=1 k=1 X
(5.19) ∀j = 1, 2, . . . , n, w− hw, u∗k i u∗k ⊥ u∗j , (5.24)
k=1
Proof. This follows directly from (5.18).
where “⊥” denotes orthogonality.
Corollary 5.26 (Bessel inequality). If u∗1 , u∗2 , . . . , u∗N are Pn
orthonormal, then, for an arbitrary w, Proof. If w ∈ X, we have w − k=1 hw, u∗k i u∗k = 0 and
thus (5.24) holds trivially. For the other case of w 6∈ X, set
N w = un+1 , apply Corollary 5.17, and we have (5.24).
2
X
|hw, u∗i i| ≤ kwk2 . (5.20)
i=1 Corollary 5.30.PLet u1 , u2 , . . . , un ∈ X be linearly inde-
n
pendent. If ϕ̂ = k=1 ak uk is the best linear approximant
Proof. This follows directly from Corollary 5.25 and the real
to w, then
positivity of a norm.
∀j = 1, 2, . . . , n, (w − ϕ̂) ⊥ uj . (5.25)
Corollary 5.27. The Gram-Schmidt process in Definition
5.13 satisfies Pn
Proof. Since ϕ̂ = k=1 ak uk is the best linear approximant
n to w, Theorem 5.24 implies that
2
X
+ 2
∀n ∈ N , kvn+1 k = kun+1 k − 2
|hun+1 , u∗k i| . (5.21)
n n
k=1 X X
ak uk = hw, u∗k i u∗k .
Proof. By (5.9a), each vn+1 can be regarded as the error of k=1 k=1
Fourier expansion of un+1 with respect to the orthonormal
list (u∗1 , u∗2 , . . . , u∗n ). In Corollary 5.25, identifying w with Corollary 5.16 and Theorem 5.29 complete the proof.
un+1 completes the proof.
44
Qinghai Zhang Numerical Analysis 2021
Definition 5.31. Let u1 , u2 , . . . , un be a sequence of ele- Theorem 5.33. For nonzero elements u1 , u2 , . . . , un ∈ X,
ments in an inner product space. The n × n matrix we have
n
Y
G = G(u1 , u2 , · · · , un ) = (hui , uj i) 0 ≤ g(u1 , u2 , . . . , un ) ≤ kuk k2 , (5.30)
k=1
hu1 , u1 i hu1 , u2 i . . . hu1 , un i
hu2 , u1 i hu2 , u2 i . . . hu2 , un i
where the lower equality holds if and only if u1 , u2 , . . . , un
= .. .. .. (5.26)
..
are linearly dependent and the upper equality holds if and
. . . .
hun , u1 i hun , u2 i . . . hun , un i only if they are orthogonal.
is the Gram matrix of u1 , u2 , . . . , un . Its determinant Proof. Suppose u1 , u2 , . . . , un are linearly dependent. Then
Pn
we can find constants c1 , c2 , . . . , cn satisfying i=1 ci ui = 0
g = g(u1 , u2 , . . . , un ) = det(hui , uj i) (5.27)
with at least one cj being nonzero. Construct vectors
is the Gram determinant. (P
n
i=1 ci ui = 0, k = j;
Pn
Lemma 5.32. Let wi = j=1 aij uj for i = 1, 2, . . . , n. Let wk =
A = (aij ) and its conjugate transpose AH = (aji ). Then we uk , k 6= j.
have
We have g(w1 , w2 , . . . , wn ) = 0 because hwj , wk i = 0 for
G(w1 , w2 , . . . , wn ) = AG(u1 , u2 , . . . , un )AH (5.28) each k. By the Laplace theorem (Theorem B.188), we can
expand the determinant of C = (cij ) according to minors of
and
its jth row:
2
g(w1 , w2 , . . . , wn ) = | det A| g(u1 , u2 , . . . , un ). (5.29)
1 0 ··· 0 ··· 0
Proof. The inner product of ui and wj yields 0 1 ··· 0 ··· 0
.. .. . . .. .. ..
hu1 , w1 i hu1 , w2 i . . . hu1 , wn i . . . . . .
hu2 , w1 i hu2 , w2 i . . . hu2 , wn i det(C) = det
c1 c2 · · · cj · · · cn
.. .. ..
.. . .. . . . .. ..
.. . ..
. . . . . . .
hun , w1 i hun , w2 i . . . hun , wn i 0 0 ··· 0 ··· 1
hu1 , u1 i hu1 , u2 i . . . hu1 , un i a11 . . . an1 = 0 + · · · + 0 + cj + 0 + · · · + 0 = cj 6= 0,
hu2 , u1 i hu2 , u2 i . . . hu2 , un i a12 . . . an2
= .. .. .. .. ..
.. .. where the determinant of each minor matrix Mi of ci with
. . . . . . .
hun , u1 i hun , u2 i . . . hun , un i a1n . . . ann i 6= j is zero because the ith row of each Mi is a row of all
zeros. Then Lemma 5.32 yields g(u1 , u2 , . . . , un ) = 0.
=G(u1 , u2 , . . . , un )AH . Now suppose u1 , u2 , . . . , un are linearly independent.
Therefore (5.28) holds since Theorem 5.14 yields constants aij such that akk > 0 and
the following vectors are orthonormal:
hw1 , w1 i hw1 , w2 i . . . hw1 , wn i
hw2 , w1 i hw2 , w2 i . . . hw2 , wn i k
X
G(w1 , w2 , . . . , wn ) = ∗
.. .. . u = aki ui .
. .. .. k
. .
i=1
hwn , w1 i hwn , w2 i . . . hwn , wn i
Then Definition 5.31Qimplies g(u∗1 , u∗2 , . . . , u∗n ) = 1. Also,
a11 ... a1n hu1 , w1 i hu1 , w2 i ... hu1 , wn i n
a21 ... a2n hu2 , w1 i hu2 , w2 i ... hu2 , wn i we have det(aij ) = k=1 akk because the matrix (aij ) is
= . .. .. .. .. triangular. It then follows from Lemma 5.32 that
.. .. ..
. . . . . .
an1 ... ann hun , w1 i hun , w2 i . . . hun , wn i n
Y 1
g(u1 , u2 , . . . , un ) = > 0. (5.31)
= AG(u1 , u2 , . . . , un )AH . a2kk
k=1
The following properties of complex conjugate are well
known: Since the list of vectors (u1 , u2 , . . . , un ) is either de-
z + w = z + w, zw = z w. pendent or independent, the arguments so far show that
g(u1 , u2 , . . . , un ) = 0 if and only if u1 , u2 , . . . , un are linearly
Then the identity det(A) = det(AT ) and the Leibniz formula dependent.
of determinants (Definition B.183) yield
Suppose u1 , u2 , . . . , un are orthogonal. By Definition
X n
Y 5.31, G(u1 , u2 , . . . , un ) is a diagonal matrix with kuk k2 on
det A = det AT = sgn(σ) aσi ,i = det AH . the diagonals. Hence the orthogonality of uk ’s implies
σ∈Sn i=1
n
Finally, (5.29) follows from the determinant of (5.28) and
Y
g(u1 , u2 , . . . , un ) = kuk k2 . (5.32)
the identity det(AB) = det(A) det(B). k=1
45
Qinghai Zhang Numerical Analysis 2021
For the converse statement, suppose (5.32) holds. Then 5.4 Discrete least squares (DLS)
u1 , u2 , . . . , un must be independent because otherwise it
would contradict the lower equality of (5.30) proved as
above. Apply the Gram-Schmidt process to (u1 , u2 , . . . , un ) Example 5.36 (An experiment on Newton’s second law by
and we know from Theorem 5.14 that a1kk = kvk k. Set the discrete least squares). A cart with mass M is pulled along
length of the list in Theorem 5.14 to 1, 2, . . . , n and we know a horizontal track by a cable attached to a weight of mass
from (5.31) and (5.32) that mj through a pulley.
uk , u∗j = 0,
∀k = 1, 2, . . . , n, ∀j = 1, 2, . . . , k − 1,
which, together with Corollary 5.16, implies the orthogo- Neglecting the friction of the track and the pulley system,
nality of uk ’s. Finally, we Q remark that the maximum of we have from Newton’s second law
n
g(u1 , u2 , . . . , un ) is indeed k=1 kuk k2 because of (5.31),
1
akk = kvk k, and Corollary 5.27. d2 x
m j g = (m j + M )a = (mj + M ) .
Pn
Theorem 5.34. Let ϕ̂ = i=1 ai ui be the best approxima- dt2
tion to w constructed from the list of independent vectors
(u1 , u2 , . . . , un ). Then the coefficients The following experiments verify Newton’s second law.
a = [a1 , a2 , . . . , an ]T
(i) For fixed M and mj , we measure a number of data
are uniquely determined from the linear system of normal
points (ti , xi ) by recording the position of the cart with
equations,
a high-speed camera.
G(u1 , u2 , . . . , un )T a = c, (5.34)
where c = [hw, u1 i , hw, u2 i , . . . , hw, un i]T .
Pn
Proof. Take inner product of ϕ̂ = i=1 ai ui with uj , apply (ii) Fit a quadratic polynomial p(t) = c0 + c1 t + c2 t2 by
Corollary 5.30, and we have minimizing the total length squared,
n
X
hw, uj i = ak huk , uj i , X
min (xi − p(ti ))2 .
k=1
i
which is simply the jth equation of (5.34). The uniqueness
of the coefficients follows from Theorem 5.33 and Cramer’s
rule.
(iii) Take aj = 2c2 as the experimental result of accelera-
Example 5.35. Solve Example 5.28 by normal equations. tion for the force Fj = mj (g − aj ).
To find the best approximation ϕ̂ = a0 + a1 x + a2 x2
to ex from the linearly independent list (1, x, x2 ), we first
construct the Gram matrix from (5.26), (5.6), and ρ = 1: (iv) Change the weight mj and repeat steps (i)-(iii) a num-
1, x2 2 0 23 ber of times to get data points (aj , Fj ).
h1, 1i h1, xi
G(1, x, x2 ) =
hx, 1i
hx, xi
x, x2 = 0 32 0 .
2
x2 , 1 x2 , x x2 , x2 3 0 25
(v) Fit a linear polynomial f (x) = c0 + c1 x by minimizing
We then calculate the vector
x the total length squared,
he , 1i e − 1/e
c =
hex , xi = 2/e . X
ex , x2 e − 5/e min (Fj − f (aj ))2 .
j
The normal equations then yields
3(11 − e2 ) 3 15(e2 − 7)
a0 = , a1 = , a2 = .
4e e 4e One verifies Newton’s second law by showing that the data
With these values, it is easily verified that the best approx- fitting result c1 is very close to M . Note that the expressions
imation ϕ̂ = a0 + a1 x + a2 x2 equals that in (5.23). in steps (ii) and (v) justify the name “least squares.”
46
Qinghai Zhang Numerical Analysis 2021
5.4.1 Gaussian and Dirac delta functions Definition 5.40. The Dirac delta function δ(x−x̄) centered
at x̄ is
Definition 5.37. A Gaussian function, or a Gaussian, is a
function of the form δ(x − x̄) = lim φ (x − x̄) (5.38)
→0
(x − b)2
f (x) = a exp − , (5.35) where φ (x− x̄) = fx̄, is a normal distribution with its mean
2c2 at x̄ and its standard deviation as .
where a ∈ R+ is the height of the curve’s peak, b ∈ R is the Lemma 5.41. The Dirac delta function satisfies
position of the center of the peak and c ∈ R+ is the standard (
deviation or the Gaussian RMS (root mean square) width. +∞, x = x̄,
δ(x − x̄) = (5.39a)
0, x 6= x̄;
1.2
Z +∞
1 δ(x − x̄)dx = 1. (5.39b)
−∞
0.8
Proof. These follow directly from Definitions 5.39 and 5.40
0.6
and Lemma 5.38.
0.4
0.2
Lemma 5.42 (Sifting property of δ). If f : R → R is con-
tinuous, then
0
Z +∞
-0.2 δ(x − x̄)f (x)dx = f (x̄). (5.40)
-3 -2 -1 0 1 2 3 −∞
Lemma 5.38. The integral of a Gaussian is Proof. Since I := [x̄ − , x̄ + ] is a compact interval and
f (x) is continuous over I , f (x) is bounded over I , say,
Z +∞ (x−b)2 √ f (x) ∈ [m, M ]. The nonnegativeness of φ and the integral
ae− 2c2 dx = ac 2π. (5.36)
−∞ mean value theorem C.71 imply that
Definition 5.39. A normal distribution or Gaussian distri- Lemma 5.44. The Dirac delta function and the Heaviside
bution is a continuous probability distribution of the form function are related as
Z x
(x − µ)2
1 δ(t)dt = H(x). (5.42)
fµ,σ = √ exp − , (5.37)
σ 2π 2σ 2 −∞
where µ is the mean or expectation and σ is the standard Proof. This follows from Definitions 5.40 and 5.43 and
deviation. Lemma 5.41.
47
Qinghai Zhang Numerical Analysis 2021
5.4.2 Reusing the formalism From the plot of the discrete data, it appears that a
quadratic polynomial would be a good fit. Hence we formu-
Definition 5.45. Define a function λ : R → R
late the least square problem as finding the coefficients of a
if t ∈ (−∞, a), quadratic polynomial to minimize
R0
t
λ(t) = a
ρ(τ )dτ if t ∈ [a, b], (5.43)
12
2
2
R b
ρ(τ )dτ if t ∈ (b, +∞). aj xji .
X X
a yi −
i=1 j=0
Then a corresponding continuous measure dλ can be defined
as ( Reusing the procedures in Example 5.35, we have
ρ(t)dt if t ∈ [a, b],
dλ = (5.44)
1, x2
0 otherwise, h1, 1i h1, xi
G(1, x, x2 ) =
hx, 1i
hx, xi
x, x2
Example 5.48. Consider a table of sales record. Similarly, a matrix A is lower triangular iff
48
Qinghai Zhang Numerical Analysis 2021
In plain words, (S-1) means that we jump over the lead zero Proof. For any x ∈ Rn , we have
vectors and (S-2) states that, starting from uj−1 , we pick
the first vector as uj that is not in span(u1 , u2 , . . . , uj−1 ). kAx − bk22 = kQT Ax − QT bk22 = kR1 x − ck22 + krk22 ,
By Corollary 5.16, the Gram-Schmidt process determines
a unique orthogonal matrix A∗r = [u∗1 , u∗2 , . . . , u∗r ] ∈ Rm×r where the first step follows from Lemma 5.52.
and a unique upper triangular matrix such that
b11 b21 . . . br1
b22 . . . br2 5.5 Problems
Ar = A∗r .. . (5.47)
..
. .
brr 5.5.1 Theoretical questions
By definition of the column rank of a matrix, we have r ≤ m. I. Give a detailed proof of Theorem 5.7.
In the rest of this proof, we insert each column vector in
X = {ξ1 , ξ2 , . . . , ξn }\{u1 , u2 , . . . , ur } back into (5.47) and
II. Consider the Chebyshev polynomials of the first kind.
show that the QR form of (5.47) is maintained. For those
zero column vectors in (S-1), we have
(a) Show that they are orthogonal on [−1, 1] with re-
Aξ = [ξ1 . . . ξk1 −1 u1 u2 . . . ur ] (5.48) spect to the inner product in Theorem 5.7 with the
1
weight function ρ(x) = √1−x .
0 ... 0 b11 b21 . . . br1 2
0 ... 0 b22 . . . br2
=A∗r . . .. .. . (b) Normalize the first three Chebyshev polynomials
.. .. ..
. . . to arrive at an orthonormal system.
0 ... 0 brr
For each ξ` with ` ∈ Rj in (S-2), we have III. Least-square approximation of a continuous function.
Approximate
√ the circular arc given by the equation
[u1 , u2 , . . . , uj−1 , ξ` ] (5.49) y(x) = 1 − x2 for x ∈ [−1, 1] by a quadratic poly-
nomial with respect to the inner product in Theorem
b11 ... bj−1,1 c`,1
= u∗1 , u∗2 , . . . , u∗j−1
.. .. .. 5.7.
,
. . .
bj−1,j−1 c`,j−1
(a) ρ(x) = √ 1 with Fourier expansion,
1−x2
where ξ` = c`,1 u∗1
+ ... + c`,j−1 u∗j−1 .
With (5.48) as the in-
duction basis and (5.49) as the inductive step, it is straight- (b) ρ(x) = √ 1 with normal equations.
1−x2
forward to prove by induction that we have A = A∗r R where
R is an upper triangular matrix. IV. Discrete least square via orthonormal polynomials.
If r = m, Definitions 5.49 and 5.9 complete the proof. Consider the example on the table of sales record in
Otherwise r < m and the proof is completed by the well- Example 5.48.
known fact in linear algebra that a list of orthonormal vec-
tors can be extended to an orthonormal basis.
(a) Starting from the independent list (1, x, x2 ), con-
Lemma 5.52. An orthogonal matrix preserves the 2-norm struct orthonormal polynomials by the Gram-
of the vectors it acts on. Schmidt process using
Proof. Definition 5.49 yields N
X
∀x ∈ dom(Q), kQxk22 = xT QT Qx = xT x = kxk22 . hu(t), v(t)i = ρ(ti )u(ti )v(ti ) (5.52)
i=1
Theorem 5.53. Consider an over-determined linear system
Ax = b where A ∈ Rm×n and m ≥ n. The discrete linear as the inner product with N = 12 and ρ(x) = 1.
least square problem P2
(b) Find the best approximation ϕ̂ = i=0 ai xi such
min kAx − bk22 P2
x∈Rn that ky − ϕ̂k ≤ ky − i=0 bi xi k for all bi ∈ R. Ver-
ify that ϕ̂ is the same as that of the example on
is solved by x∗ satisfying
the table of sales record in the notes.
R1 x∗ = c, (5.50)
(c) Suppose there are other tables of sales record in
where R1 ∈ Rn×n and c ∈ Rn result from the QR factoriza- the same format as that in the example . Values
tion of A: of N and xi ’s are the same, but the values of yi ’s
R1
c
are different. Which of the above calculations can
QT A = R = , QT b = . (5.51) be reused? Which cannot be reused? What ad-
0 r
vantage of orthonormal polynomials over normal
Furthermore, the minimum is krk22 . equations does this reuse imply?
49
Qinghai Zhang Numerical Analysis 2021
5.5.2 Programming assignments x 0.0 0.5 1.0 1.5 2.0 2.5 3.0
y 2.9 2.7 4.8 5.3 7.1 7.6 7.7
x 3.5 4.0 4.5 5.0 5.5 6.0 6.5
y 7.6 9.4 9.0 9.6 10.0 10.2 9.7
A. Write a program to perform discrete least square via
x 7.0 7.5 8.0 8.5 9.0 9.5 10.0
normal equations. Your subroutine should take two ar-
y 8.3 8.4 9.0 8.3 6.6 6.7 4.1
rays x and y as the input and output three coefficients
a0 , a1 , a2 that determines a quadratic polynomial as the B. Write a program to solve the previous discrete least
best fitting polynomial in the sense of least squares with square problem via QR factorization. Report the condi-
the weight function ρ = 1. tion number based on the 2-norm of the matrix G in the
normal-equation approach and that of the matrix R1 in
the QR-factorization approach, verifying that the former
Run your subroutine on the following data. is much larger than the latter.
50
Chapter 6
Pn
Definition 6.1. A weighted quadrature formula In (f ) is a (b) ∃B ∈ R s.t. ∀n ∈ N+ , Wn := k=1 |wk | < +∞.
linear functional
Proof. For sufficiency, we need to prove that for any given f
n
X we have limn→+∞ In (f ) = I(f ). To this end, we find f ∈ V
In (f ) := wk f (xk ) (6.1)
such that (6.6) holds, define K := maxx∈[a,b] |f (x) − f (x)|.
k=1
Then we have
that approximates the integral of a function f ∈ C[a, b],
|En (f )| ≤|I(f ) − I(f )| + |I(f ) − In (f )| + |In (f ) − In (f )|
Z b Z
b
I(f ) := f (x)ρ(x)dx, (6.2) =
[f (x) − f (x)] ρ(x)dx
a a
n
where the weight function ρ ∈ C[a, b] satisfies ∀x ∈ (a, b), X
+ |I(f ) − In (f )| + wk [f (xk ) − f (xk )]
ρ(x) > 0. The points xk ’s at which the integrand f is eval-
uated are called nodes or abscissas, and the multiplier wk ’s k=1
n
"Z #
are called weights or coefficients. b X
≤K ρ(x)dx + |wk | + |I(f ) − In (f )|,
Example 6.2. If a and/or b are infinite, I(f ) and In (f ) a k=1
in (6.1) may still be well defined if the moment of weight
function where the first step follows from the triangular inequality,
Z b the second from Definition 6.1, and the third from the def-
µj := xj ρ(x)dx (6.3) inition of K. The terms inside the brackets is bounded be-
a
cause of ρ ∈ C[a, b] and condition (b). By condition (a),
exists and is finite for all j ∈ N. |I(f ) − In (f )| can be made arbitrarily small. Since K can
also be arbitrarily small, we have (6.5).
For necessity, it is trivial to deduce (a) from (6.5). In
6.1 Accuracy and convergence contrast, it is nontrivial to deduce (b) from (6.5) as the pro-
cess involves some key theorems in functional analysis. A
Definition 6.3. The remainder, or error, of In (f ) is
reader not familiar with the principle of uniform bounded-
En (f ) := I(f ) − In (f ). (6.4) ness may skip the rest of the proof.
The numerical quadrature formula In : C[a, b] → R is
In (f ) is said to be convergent for C[a, b] iff a linear functional and is continuous at f = 0 because of
Definition E.58 and the fact that
∀f ∈ C[a, b], lim In (f ) = I(f ). (6.5)
n→+∞
∀ > 0, ∃δ = P , s.t. ∀f ∈ C[a, b],
2 k |wk |
Definition 6.4. A subset V ⊂ C[a, b] is dense in C[a, b] iff
X
kf − 0k∞ < δ ⇒ |In (f ) − In (0)| = wk f (xk )
∀f ∈ C[a, b], ∀ > 0, ∃f ∈ V, s.t. max |f (x) − f (x)| ≤ .
x∈[a,b] k
(6.6)
X X
≤ |wk ||f (xk )| ≤ δ |wk | < .
k k
Theorem 6.5. Let {In (f ) : n ∈ N+ } be a sequence of
quadrature formulas that approximate I(f ), where In and By Theorem E.96, I is continuous and for each n ∈ N+ we
n
I(f ) are defined in (6.1) and (6.2). Let V be a dense subset have I ∈ CL(C[a, b], R) and the convergence (6.5) implies
n
of C[a, b]. In (f ) is convergent for C[a, b] if and only if
∀f ∈ C[a, b], sup |In (f )| < +∞.
(a) ∀f ∈ V, limn→+∞ In (f ) = I(f ), n∈N+
51
Qinghai Zhang Numerical Analysis 2021
Then Theorem E.85 and the principle of uniform bound- Example 6.11. Derive the trapezoidal rule for the weight
edness (Theorem E.148) yield (b). Note that the operator function ρ(x) = x−1/2 on the interval [0, 1]. Note that one
norm of In , by Lemma E.109, equals Wn . cannot apply (6.10) to ρ(x)f (x) because ρ(0) = ∞. (6.8)
yields
Definition 6.6. A weighted quadrature formula (6.1) has Z 1
4
(polynomial) degree of exactness dE iff w1 = x−1/2 (1 − x)dx = ,
0 3
Z 1
2
∀f ∈ PdE , En (f ) = 0,
(6.7) w2 = x−1/2 xdx = .
∃g ∈ PdE +1 , s.t. En (g) 6= 0, 0 3
Hence the formula is
where Pd denotes the set of polynomials with degree no more
than d. 2
I T (f ) = [2f (0) + f (1)]. (6.11)
3
Example 6.7. By Definition 6.6, dE ≥ 0 implies that Theorem 6.12. For f ∈ C 2 [a, b] with weight function
P Rb
k wk is bounded since In (c) = c a ρ(x)dx holds for any ρ(x) ≡ 1, the remainder of the trapezoidal rule satisfies
constant c ∈ R.
(b − a)3 00
∃ζ ∈ [a, b] s.t. E T (f ) = − f (ζ). (6.12)
Lemma 6.8. Let x1 , . . . , xn be given as distinct nodes of 12
In (f ). If dE ≥ n − 1, then its weights can be deduced as Proof. By Theorem 2.5, the interpolating polynomial
Z b p1 (f ; x) is unique. Then we have
∀k = 1, . . . , n, wk = ρ(x)`k (x)dx, (6.8) Z b 00
f (ξ(x))
a E T (f ) = − (x − a)(b − x)dx
a 2
where `k (x) is the fundamental polynomial for pointwise in- f 00 (ζ) b
Z
(b − a)3 00
terpolation in (2.9) applied to the given nodes, = − (x − a)(b − x)dx = − f (ζ),
2 a 12
n
Y x − xi where the first step follows from Theorem 2.7 and the sec-
`k (x) := . (6.9) ond step from the integral mean value theorem (Theorem
xk − xi
i6=k;i=1
C.71). Here we can apply Theorem C.71 because
Proof. Let pn−1 (f ; x) be the unique polynomial that in- w(x) = (x − a)(b − x)
terpolates f at the distinct nodes, as in the theorem on
the uniqueness of polynomial interpolation (Theorem 2.5). is always positive on (a, b). Also note that ξ is a function of
Then we have x while ζ is a constant depending only on f , a, and b.
ρ(x) ≡ 1, it is simply
Let n − 1 be the number of sub-intervals that partition [a, b]
b−a in Definition 6.9. As shown below, the Newton-Cotes for-
I T (f ) = [f (a) + f (b)]. (6.10) mula appears to be non-convergent.
2
52
Qinghai Zhang Numerical Analysis 2021
n−1 2 4 6 8 10 Proof. It suffices to verify that (6.21) holds for the complex
In−1 5.4902 2.2776 3.3288 1.9411 3.5956 exponential em (x) := eimx = cos mx + i sin mx, m ∈ N, i.e.
Z 2π
For equally spaced nodes, the interpolating polynomials
EnT (em ) = em (x)dx
have wilder and wilder oscillations as the degree increases. 0
Consequently, condition (b) of Theorem 6.5 does not hold. " n−1 #
2π em (0) + em (2π) X 2kπ
Hence Newton-Cotes formulas are not convergent even for − + em
well-behaved functions in C[a, b]. In practice, Newton-Cotes n 2 n
k=1
formula with n > 8 is seldom used. Z 2π n−1
2π X imk·2π/n
= eimx dx − e .
0 n
k=0
6.3 Composite formulas R 2π
Since 0 eimx dx = (im)−1 · eimx |2π
0 = 0, the geometric se-
Definition 6.16. The composite trapezoidal rule for ap- ries yields
proximating I(f ) in (6.2) with ρ(x) ≡ 1 is
0
if m = 0;
T
n−1 En (em ) = −2π if m = 0 (mod n), m > 0;
h X h
InT (f ) = f (x0 ) + h f (xk ) + f (xn ), (6.16) 2π 1−eimn·2π/n
2 2 − n 1−eim·2π/n = 0 if m 6= 0 (mod n).
k=1 (6.22)
b−a Hence (6.21) holds as EnT (em ) = 0 for m = 0, . . . , n − 1.
where h = n and xk = a + kh.
Theorem 6.17. For f ∈ C 2 [a, b], the remainder of the com- 6.4 Gauss formulas
posite trapezoidal rule satisfies
Lemma 6.21. Let n, m ∈ N+ and m ≤ n. Pn Giveni poly-
b − a 2 00 Pn+m i
∃ξ ∈ (a, b) s.t. EnT (f ) = − h f (ξ). (6.17) nomials p = i=0 pi x ∈ P n+m and s = i=0 si x ∈ Pn
12 satisfying pn+m 6= 0 and sn 6= 0, there exist unique polyno-
Proof. Apply Theorem 6.12 to the subintervals, sum up the mials q ∈ Pm and r ∈ Pn−1 such that
errors, and we have p = qs + r. (6.23)
" n−1 #
b − a 2 1 X 00 Proof. Rewrite (6.23) as
EnT (f ) = − h f (ξk ) . (6.18)
12 n n+m m
! n ! n−1
k=0 X
i
X
i
X
i
X
pi x = qi x si x + ri xi . (6.24)
The proof is completed by (6.18), the intermediate value i=0 i=0 i=0 i=0
Theorem C.39, and the fact f ∈ C 2 [a, b] ⇒ f 00 ∈ C[a, b]. Since monomials are linearly independent, (6.24) consists of
n + m + 1 equations, the last m + 1 of which are
Definition 6.18. The composite Simpson’s rule for approx-
imating I(f ) in (6.2) with ρ(x) ≡ 1 is pn+m = qm sn ,
pn+m−1 = qm sn−1 + qm−1 sn ,
h
InS (f ) = f (x0 ) + 4f (x1 ) + 2f (x2 ) + 4f (x3 ) + 2f (x4 ) ···
3
+ · · · + 4f (xn−1 ) + f (xn ) , (6.19) pn = qm sn−m + . . . + q0 sn ,
which can be written as Sq = p with S being a lower tri-
b−a
where h = n , xk = a + kh, and n is even. angular matrix whose diagonal entries are sn 6= 0. The
coefficient vector q can be determined uniquely from coef-
Theorem 6.19. For f ∈ C 4 [a, b] and n ∈ 2N+ , the remain-
ficients of p and s. Then r can be determined uniquely by
der of the composite Simpson’s rule satisfies
p − qs from (6.24).
b − a 4 (4)
∃ξ ∈ (a, b) s.t. EnS (f ) = − h f (ξ). (6.20) Definition 6.22. The node polynomial associated with the
180 nodes xk ’s of a weighted quadrature formula is
Proof. Exercise. n
Y
vn (x) = (x − xk ). (6.25)
Lemma 6.20. The trapezoidal rule satisfies k=1
53
Qinghai Zhang Numerical Analysis 2021
Proof. For the necessity, we have which is equivalent to h1, π(x)i = 0 and hx, π(x)i = 0 be-
cause P1 = span(1, x). These two conditions yield
Z b n
X Z 1
vn (x)p(x)ρ(x)dx = wk vn (xk )p(xk ) = 0, 2 2
a k=1
(c0 − c1 x + x2 )x−1/2 dx = + 2c0 − c1 = 0,
0 5 3
Z 1
where the first step follows from the facts dE ≥ n + j − 1 2 2 2
x(c0 − c1 x + x2 )x−1/2 dx = + c0 − c1 = 0.
and vn (x)p(x) ∈ Pn+j−1 , and the second step from (6.25). 0 7 3 5
To prove the sufficiency, we must show that En (p) = 0
Hence c1 = 67 , c0 = 3
35 , and the orthogonal polynomial is
for any p ∈ Pn+j−1 . Lemma 6.21 yields
3 6
∀p ∈ Pn+j−1 , ∃!q ∈ Pj−1 , ∃!r ∈ Pn−1 , s.t. p = qvn + r. π(x) = − x + x2
35 7
(6.27)
with its zeros at
Consequently, we have r ! r !
1 6 1 6
Z b Z b Z b x1 = 3−2 , x2 = 3+2 .
p(x)ρ(x)dx = q(x)v (x)ρ(x)dx + r(x)ρ(x)dx 7 5 7 5
n
a a a
Z b n
X To calculate w1 and w2 , we could again use (6.8), but it is
= r(x)ρ(x)dx = wk r(xk ) simpler to set up a linear system of equations by exploit-
a k=1 ing Corollary 6.25, i.e. Gauss quadrature is exactly for all
n
X n
X constants and linear polynomials,
= wk [p(xk ) − q(xk )vn (xk )] = wk p(xk ), Z 1
k=1 k=1
w1 + w2 = x−1/2 dx = 2,
0
where the first step follows from (6.27), the second from 1
2
Z
(6.26), the third from the condition of dE ≥ n − 1, the x1 w1 + x2 w2 = xx−1/2 dx = ,
fourth from (6.27), and the last from (6.25). 0 3
which yields
Definition 6.24. A Gaussian quadrature formula (or sim-
−2x2 + 32
r
ply a Gauss formula) is a formula (6.1) whose nodes are the 1 5
w1 = =1+ ,
zeros of the polynomial vn (x) in (6.25) that satisfies (6.26) x1 − x2 3 6
for j = n. 2x1 − 32
r
1 5
w2 = =1− .
x1 − x2 3 6
Corollary 6.25. A Gauss formula has dE = 2n − 1.
The desired two-point Gauss formula is thus
Proof. The index j in (6.26) cannot be n + 1 because the r ! r !
node polynomial vn (x) ∈ Pn cannot be orthogonal to itself. G 1 5 3 2 6
I2 (f ) = 1 + f −
Therefore we know that j = n in Theorem 6.23 is optimal: 3 6 7 7 5
the formula (6.1) achieves the highest degree of exactness r ! r !
2n − 1. From an algebraic viewpoint, the 2n degrees of free- 1 5 3 2 6
+ 1− f + . (6.29)
dom of nodes and weights in (6.1) determine a polynomial of 3 6 7 7 5
degree at most 2n − 1. The proof is completed by Theorem
The degree of exactness of the trapezoidal rule is 1 while
6.23 and Definition 6.6.
that of the two-point Gauss formula is 3. Hence we expect
Corollary 6.26. Weights of a Gauss formula In (f ) are that the Gauss formula be much more accurate. Indeed,
calculate errors of the two formulas (6.11) and (6.29) for
Z b
vn (x) f (x) = cos 21 πx and we have
∀k = 1, · · · , n, wk = 0
ρ(x)dx, (6.28)
a (x − xk )vn (xk ) E T = 0.226453 . . . ;
where vn (x) is the node polynomial that defines In (f ). E2G = 0.002197 . . . ,
Proof. This follows from Lemma 6.8; also see (2.11). which can be verified by simple calculations.
Definition 6.28. A set of orthogonal polynomials is a set
Example 6.27. Derive the Gauss formula of n = 2 for the of polynomials P = {pi : deg(pi ) = i} that satisfy
weight function ρ(x) = x−1/2 on the interval [0, 1].
We first construct an orthogonal polynomial ∀pi , pj ∈ P, i 6= j ⇒ hpi , pj i = 0. (6.30)
Example 6.29. In this chapter, the inner product in (6.30)
π(x) = c0 − c1 x + x2
is taken to be
Z b
such that
hpi , pj i = pi (x)pj (x)ρ(x)dx,
Z 1 a
∀p ∈ P1 , hp(x), π(x)i := p(x)π(x)ρ(x)dx = 0, where [a, b] and ρ are the same as those in (6.2).
0
54
Qinghai Zhang Numerical Analysis 2021
Theorem 6.30. Each zero of a real orthogonal polynomial 6.5 Numerical differentiation
over [a, b] is real, simple, and inside (a, b).
Formula 6.36 (The method of undetermined coefficients).
Proof. For fixed n ≥ 1, suppose pn (x) does not change sign
Rb A general method to derive FD formulas that approximate
in [a, b]. Then a ρ(x)pn (x)dx = hpn , p0 i = 6 0. But this con- u(k) (x̄) is based on an arbitrary stencil of n > k distinct
tradicts orthogonality. Hence there exists x1 ∈ [a, b] such points x1 , x2 , . . . , xn . Taylor expansions of u at each point
that pn (x1 ) = 0. xi in the stencil about u(x̄) yield
Suppose there were a zero at x1 which is multiple. Then
pn (x)
(x−x1D )2 would be aEpolynomial of degree n − 2. Hence 1
pn (x)
D
p2n (x)
E u(xi ) = u(x̄) + (xi − x̄)u0 (x̄) + · · · + (xi − x̄)k u(k) (x̄) + · · ·
0 = pn (x), (x−x1 )2 = 1, (x−x1 )2 > 0, which is false. k!
Therefore every zero is simple.
for i = 1, 2, . . . , n. This leads to a linear combination of
Suppose that only j < n zeros of pn , say x1 , x2 , . . . , xj ,
point values that approximates u(k) (x̄),
are inside (a, b) and all other zeros are out of (a, b). Let
Qj
vj (x) = i=1 (x − xi ) ∈ Pj . Then pn vj = Pn−j vj2 where
u(k) (x̄) = c1 u(x1 ) + c2 u(x2 ) + · · · + cn u(xn ) + O(hp ),
Pn−j is a polynomial
of degreen − j that does not change
sign on [a, b]. Hence Pn−j , vj2 > 0, which contradicts the
orthogonality of pn (x) and vj (x). where the cj ’s are chosen to make p as large as possible:
Proof. For each k = 1, 2, . . . , n, the definition of `k (x) in D2 u(x̄) = au(x̄) + bu(x̄ − h) + cu(x̄ − 2h), (6.33)
(6.9) implies `2k ∈ P2n−2 , then we have
n
X Z b we determine the coefficients a, b, and c to give the best
wk = wj `2k (xj ) = ρ(x)`2k (x)dx > 0, possible accuracy. Taylor expansions at x̄ yield
j=1 a
55
Qinghai Zhang Numerical Analysis 2021
Exercise 6.41. Construct a table of divided difference (as and express E2 (f ) in terms of f (4) (τ ) for some
in Definition 2.18) to derive a quadratic polynomial that τ > 0.
agrees with u(x) at x̄, x̄ − h, and x̄ − 2h. Then take deriva-
(c) Apply the formula in (b) to approximate
tive of this polynomial to obtain the FD formula (6.36).
+∞
1 −t
Z
Lemma 6.42. In approximating the second derivative of I= e dt.
u ∈ C 4 (R), the formula 0 1+t
56
Qinghai Zhang Numerical Analysis 2021
There are elementary Hermite interpolation polynomi- (b) Obtain the quadrature rule
als hm , qm such that the solution of (6.43) can be ex-
n
pressed in the form X
In (f ) = [wk f (xk ) + µk f 0 (xk )]
n k=1
X
0
p(t) = [hm (t)fm + qm (t)fm ], that satisfies En (p) = 0 for all p ∈ P2n−1 .
m=1
(c) What conditions on the node polynomial or on the
analogous to the Lagrange interpolation formula. nodes xk must be imposed so that µk = 0 for each
k = 1, 2, . . . , n?
(a) Seek hm and qm in the form V. Prove Lemma 6.42. How do you choose h to minimize
the error bound in (6.42)? Design a fourth-order accu-
hm (t) = (am +bm t)`2m (t), qm (t) = (cm +dm t)`2m (t) rate formula based on a symmetric stencil, derive its
error bound, and minimize the error bound. What do
where `m is the elementary Lagrange polynomial you observe in comparing the second-order case and
in (2.9). Determine the constants am , bm , cm , dm . the fourth-order case?
57
Appendix A
Notation 4. R, Z, N, Q, C denote the sets of real numbers, Example A.8. True or false:
integers, natural numbers, rational numbers and complex ∀x ∈ [2, +∞), ∃y ∈ Z+ s.t. xy < 105 ;
numbers, respectively. R+ , Z+ , N+ , Q+ the sets of positive ∃y ∈ R s.t. ∀x ∈ [2, +∞), x > y;
such numbers. In particular, N contains the number zero ∃y ∈ R s.t. ∀x ∈ [2, +∞), x < y.
while N+ does not. Example A.9 (Translating an English statement into a
Definition A.2. S is a subset of U, written S ⊆ U, if and logical statement). Goldbach’s conjecture states every even
only if (iff) x ∈ S ⇒ x ∈ U. S is a proper subset of U, natural+ number greater than 2 is the sum of two primes. Let
written S ⊂ U, if S ⊆ U and ∃x ∈ U s.t. x 6∈ S. P ⊂ N denote the set of prime numbers. Then Goldbach’s
conjecture is ∀a ∈ 2N+ + 2, ∃p, q ∈ P, s.t. a = p + q.
Definition A.3 (Statements of first-order logic). A univer-
Theorem A.10. The existential-universal statement im-
sal statement is a logical statement of the form
plies the corresponding universal-existential statement, but
not vice versa.
U = (∀x ∈ S, A(x)). (A.2)
Example A.11 (Translating a logical statement to an En-
An existential statement has the form glish statement). Let S be the set of all human beings.
UE =(∀p ∈ S, ∃q ∈ S s.t. q is p’s mom.)
E = (∃x ∈ S, s.t. A(x)), (A.3) E =( ∃q ∈ S s.t. ∀p ∈ S, q is p’s mom.)
U
UE is probably true, but EU is certainly false.
where ∀ (“for each”) and ∃ (“there exists”) are the quanti- If E were true, then U would be true. Why?
U E
fiers, S is a set, “s.t.” means “such that,” and A(x) is the
formula. Axiom A.12 (First-order negation of logical statements).
A statement of implication/conditional has the form The negations of the statements in Definition A.3 are
58
Qinghai Zhang Numerical Analysis 2021
Exercise A.15. Negate the logical statement in Definition Example A.26. If a set S ⊂ R has a maximum, we have
C.64. max S = sup S.
Axiom A.16 (Contraposition). A conditional statement is Example A.27. sup[a, b] = sup[a, b) = sup(a, b] =
logically equivalent to its contrapositive. sup(a, b).
(A ⇒ B) ⇔ (¬B ⇒ ¬A) (A.9) Theorem A.28 (Existence and uniqueness of least upper
Example A.17. “If Jack is a man, then Jack is a human bound). Every nonempty subset of R that is bounded above
being.” is equivalent to “If Jack is not a human being, then has exactly one least upper bound.
Jack is not a man.” Corollary A.29. Every nonempty subset of R that is
Exercise A.18. Draw an Euler diagram of subsets to illus- bounded below has a greatest lower bound.
trate Example A.17.
Definition A.30. A binary relation between two sets X and
Exercise A.19. Rewrite each of the following statements Y is an ordered triple (X , Y, G) where G ⊆ X × Y.
and its negation into logical statements using symbols, quan- A binary relation on X is the relation between X and X .
tifiers, and formulas. The statement (x, y) ∈ R is read “x is R-related to y,” and
denoted by xRy or R(x, y).
(a) The only even prime is 2.
(b) Multiplication of integers is associative. Definition A.31. An equivalence relation “∼” on A is a
binary relation on A that satisfies ∀a, b, c ∈ A,
(c) Goldbach’s conjecture has at most a finite number of
counterexamples. • a ∼ a (reflexivity);
• a ∼ b implies b ∼ a (symmetry);
A.2 Ordered sets • a ∼ b and b ∼ c imply a ∼ c (transitivity).
Definition A.20. The Cartesian product X × Y between Definition A.32. A binary relation “≤” on some set S is
two sets X and Y is the set of all possible ordered pairs with a total order or linear order on S iff, ∀a, b, c ∈ S,
first element from X and second element from Y:
• a ≤ b and b ≤ a imply a = b (antisymmetry);
X × Y = {(x, y) | x ∈ X , y ∈ Y}. (A.10)
• a ≤ b and b ≤ c imply a ≤ c (transitivity);
Axiom A.21 (Fundamental principle of counting). Con-
sider a task that consists of a sequence of k independent • a ≤ b or b ≤ a (totality).
steps. Let ni denote the number of different choices for the A set equipped with a total order is a chain or totally ordered
i-th step, the total number of distinct ways to complete the set.
task is
k
Y Example A.33. The real numbers with less or equal.
ni = n1 n2 · · · nk . (A.11)
i=1 Example A.34. The English letters of the alphabet with
Example A.22. Let A, E, D be the set of appetizers, main dictionary order.
entrees, desserts in a restaurant. A × E × D is the set of
Example A.35. The Cartesian product of a set of totally
possible dinner combos. If #A = 10, #E = 5, #D = 6,
ordered sets with the lexicographical order.
#(A × E × D) = 300.
Definition A.23 (Maximum and minimum). Consider S ⊆ Example A.36. Sort your book in lexicographical order
R, S 6= ∅. If ∃s ∈ S s.t. ∀x ∈ S, x ≤ s , then s is the and save a lot of time. log26 N N !
m m m
maximum of S and denoted by max S. If ∃sm ∈ S s.t. Definition A.37. A binary relation “≤” on some set S is
∀x ∈ S, x ≥ sm , then sm is the minimum of S and denoted a partial order on S iff, ∀a, b, c ∈ S, antisymmetry, transi-
by min S. tivity, and reflexivity (a ≤ a) hold.
Definition A.24 (Upper and lower bounds). Consider A set equipped with a partial order is called a poset.
S ⊆ R, S =6 ∅. a is an upper bound of S ⊆ R if ∀x ∈ S,
Example A.38. The set of subsets of a set S ordered by
x ≤ a; then the set S is said to be bounded above. a is a
inclusion “⊆.”
lower bound of S if ∀x ∈ S, x ≥ a; then the set S is said to
be bounded below. S is bounded if it is bounded above and Example A.39. The natural numbers equipped with the
bounded below. relation of divisibility.
Definition A.25 (Supremum and infimum). Consider a Example A.40. The set of stuff you will put on your body
nonempty set S ⊆ R. If S is bounded above and S has every morning with the time ordered: undershorts, pants,
a least upper bound then we call it the supremum of S and belt, shirt, tie, jacket, socks, shoes, watch.
denote it by sup S. If S is bounded below and S has a great-
est lower bound, then we call it the infimum of S and denote Example A.41. Inheritance (“is-a” relation) is a partial
it by inf S. order. A → B reads “B is a special type of A”.
59
Qinghai Zhang Numerical Analysis 2021
60
Appendix B
Linear Algebra
B.1 Vector spaces Example B.5. The simplest vector space is {0}. Another
simple example of a vector space over a field F is F itself,
Definition B.1. A field F is a set together with two binary equipped with its standard addition and multiplication.
operations, usually called “addition” and “multiplication”
and denoted by “+” and “∗”, such that ∀a, b, c ∈ F, the
following axioms hold, B.1.1 Subspaces
Definition B.6. A subset U of V is called a subspace of V
• commutativity: a + b = b + a, ab = ba;
if U is also a vector space.
• associativity: a + (b + c) = (a + b) + c, a(bc) = (ab)c;
• identity: a + 0 = a, a1 = a; Definition B.7. Suppose U1 , . . . , Um are subsets of V. The
sum of U1 , . . . , Um is the set of all possible sums of elements
• invertibility: a + (−a) = 0, aa−1 = 1 (a 6= 0); of U1 , . . . , Um :
• distributivity: a(b + c) = ab + ac.
X m
Definition B.2. A vector space or linear space over a field F U1 + . . . + Um := uj : uj ∈ Uj . (B.1)
is a set V together with two binary operations “+” and “×”
j=1
respectively called vector addition and scalar multiplication
that satisfy the following axioms: Example B.8. For U = {(x, x, y, y) ∈ F4 : x, y ∈ F} and
W = {(x, x, x, y) ∈ F4 : x, y ∈ F}, we have
(VSA-1) commutativity
∀u, v ∈ V, u + v = v + u; U + W = {(x, x, z, y) ∈ F4 : x, y, z ∈ F}.
(VSA-2) associativity
∀u, v, w ∈ V, (u + v) + w = u + (v + w); Lemma B.9. Suppose U1 , . . . , Um are subspaces of V. Then
U1 + . . . + Um is the smallest subspace of V that contains
(VSA-3) compatibility
U1 , . . . , Um .
∀u ∈ V, ∀a, b ∈ F, (ab)u = a(bu);
(VSA-4) additive identity Definition B.10. Suppose U1 , . . . , Um are subspaces of V.
∃0 ∈ V, ∀u ∈ V, s.t. u + 0 = u; The sum U1 + . . . + Um is called a direct sum if each element
(VSA-5) additive inverse in U1 + . . . + Um can be written in only one way as a sum
P m
∀u ∈ V, ∃v ∈ V, s.t. u + v = 0; j=1 uj with uj ∈ Uj for each j = 1, . . . , m. In this case
we write the direct sum as U1 ⊕ . . . ⊕ Um .
(VSA-6) multiplicative identity
∃1 ∈ F, s.t. ∀u ∈ V, 1u = u; Exercise B.11. Show that U1 + U2 + U3 is not a direct sum:
(VSA-7) distributive laws
U1 = {(x, y, 0) ∈ F3 : x, y ∈ F},
∀u, v ∈ V, ∀a, b ∈ F,
(a + b)u = au + bu, U2 = {(0, 0, z) ∈ F3 : z ∈ F},
a(u + v) = au + av.
U3 = {(0, y, y) ∈ F3 : y ∈ F}.
The elements of V are called vectors and the elements of F
Lemma B.12. Suppose U1 , . . . , Um are subspaces of V.
are called scalars.
Then U1 + . . . + Um is a direct
Pm sum if and only if the only
Definition B.3. A real vector space or a complex vector way to write 0 as a sum j=1 uj , where uj ∈ Uj for each
space is a vector space with F = R or F = C, respectively. j = 1, . . . , m, is by taking each uj equal to 0.
Exercise B.4. Show that a complex vector space can also Theorem B.13. Suppose U and W are subspaces of V.
be considered as a real vector space. Then U + W is a direct sum if and only if U ∩ W = {0}.
61
Qinghai Zhang Numerical Analysis 2021
B.1.2 Span and linear independence Definition B.25. A list of vectors (v1 , v2 , . . . , vm ) in V is
called linearly independent iff
Definition B.14. A list of length n or n-tuple is an or-
dered collection of n elements (which might be numbers, a1 v1 + . . . + am vm = 0 ⇒ a1 = · · · = am = 0. (B.4)
other lists, or more abstract entities) separated by commas
and surrounded by parentheses: x = (x1 , x2 , . . . , xn ). Otherwise the list of vectors is called linearly dependent.
Definition B.15. A vector space composed of all the n-
tuples of a field F is known as a coordinate space, denoted Example B.26. The empty list is declared to be linearly
by Fn (n ∈ N+ ). independent. A list of one vector (v) is linearly independent
iff v 6= 0. A list of two vectors is linearly independent iff
Example B.16. The properties of forces or velocities in the neither vector is a scalar multiple of the other.
real world can be captured by a coordinate space R2 or R3 .
Example B.27. The list (1, z, . . . , z m ) is linearly indepen-
Example B.17. The set of continuous real-valued functions
dent in Pm (F) for each m ∈ N.
on the interval [a, b] forms a real vector space.
Notation 5. For a set S, define a vector space Example B.28. (2, 3, 1), (1, −1, 2), and (7, 3, 8) is linearly
dependent in R3 because
FS := {f : S → F}.
Fn is a special case of FS because n can be regarded as the 2(2, 3, 1) + 3(1, −1, 2) + (−1)(7, 3, 8) = (0, 0, 0).
n
set {1, 2, . . . , n} and each element in F can be considered
as a function {1, 2, . . . , n} 7→ F. Example B.29. Every list of vectors containing the 0 vec-
tor is linearly dependent.
Definition B.18. A linear P combination of a list of vectors
{vi } is a vector of the form i ai vi where ai ∈ F. Lemma B.30 (Linear dependence lemma). Suppose
Example B.19. (17, −4, 2) is a linear combination of V = (v1 , v2 , · · · , vm ) is a linearly dependent list in V. Then
(2, 1, −3), (1, −2, 4) because there exists j ∈ {1, 2, . . . , m} such that
62
Qinghai Zhang Numerical Analysis 2021
where F is a scalar field. In particular, a linear map Example B.51. The null space of the differentiation map
T : V → W is called a linear operator if W = V. in Example B.43 is R.
Notation 6. The set of all linear maps from V to W is de- Definition B.52. The range of a linear map T ∈ L(V, W)
noted by L(V, W). The set of all linear operators from V to is the subset of W consisting of those vectors that are of the
itself is denoted by L(V). form T v for some v ∈ V:
63
Qinghai Zhang Numerical Analysis 2021
Corollary B.56. The matrix MT in (B.12) of a linear map By Corollary B.56, applying this equation to vk yields
T ∈ L(V, W) satisfies
n
X
T [v1 , v2 , . . . , vn ] = [w1 , w2 , . . . , wm ]MT . (B.14) (ψ j ◦ T )(v k ) = cr,j ϕr (vk ) = ck,j .
r=1
Proof. This follows directly from (B.12).
On the other hand, we have
B.2.3 Duality n
!
X
(ψj ◦ T )(vk ) = ψj (T vk ) = ψj ar,k wr
Dual vector spaces r=1
n
Definition B.57. The dual space of a vector space V is the X
= ar,k ψj (wr ) = aj,k .
vector space of all linear functionals on V ,
r=1
64
Qinghai Zhang Numerical Analysis 2021
Lemma B.74. Suppose V is finite-dimensional and U is a Proof. Theorem B.54, Lemma B.75, and Lemma B.74 yield
subspace of V . Then
ϕ ∈ rangeT 0 ⇒ ∃ψ ∈ W 0 s.t T 0 (ψ) = ϕ
dim U + dim U 0 = dim V. (B.23)
⇒ ∀v ∈ nullT, ϕ(v) = ψ(T v) = 0
Proof. Apply Theorem B.54 to the dual of an inclusion ⇒ ϕ ∈ (nullT )0 .
i0 : V 0 → U 0 and we have
The proof is completed by
dim rangei0 + dim nulli0 = dim V 0
⇒ dim rangei0 + dim U 0 = dim V, dim rangeT 0 = dim(rangeT )
where the second line follows from Exercise B.71 and Lemma = dim V − dim nullT
0 0
B.60. For any ϕ ∈ U , Exercise B.73 states that ϕ ∈ U can = dim(nullT )0 .
0 0 0
be extended to ψ ∈ V such that i (ψ) = ϕ. Hence i is
surjective and we have U 0 = rangei0 . The proof is then Corollary B.80. For finite-dimensional vector spaces V
completed by Lemma B.60. and W , any linear map T ∈ L(V, W ) is injective if and
only if T 0 is surjective.
Lemma B.75. Any linear map T ∈ L(V, W ) satisfies
0 0
nullT 0 = (rangeT )0 . (B.24) Proof. T is injective ⇔ nullT = {0} ⇔ (nullT ) = V ⇔
0 0 0
rangeT = V ⇔ T is surjective. The second step follows
Proof. Definitions B.50, B.52, B.63, and B.69 yield from Lemmas B.74 and B.60, and the third step follows from
Lemma B.79.
ϕ ∈ nullT 0 ⇔ 0 = T 0 (ϕ) = ϕ ◦ T
⇔ ∀v ∈ V, ϕ(T v) = 0
Matrix ranks
⇔ ϕ(rangeT ) = 0
⇔ ϕ ∈ (rangeT )0 . Definition B.81. For a matrix A ∈ Fm×n : Fn → Fm , its
column space (or range or image) consists of all linear com-
Lemma B.76. For finite-dimensional vector spaces V and binations of its columns, its row space (or coimage) is the
W , any linear map T ∈ L(V, W ) satisfies column space of AT , its null space (or kernel) is the null
space of A as a linear operator, and the left null space (or
dim nullT 0 = dim nullT + dim W − dim V. (B.25)
cokernel) is the null space of AT .
Proof. Lemma B.75 and Theorem B.54 yield
Definition B.82. The column rank and row rank of a ma-
dim nullT 0 = dim(rangeT )0 = dim W − dim(rangeT ) trix A ∈ Fm×n is the dimension of its column space and row
= dim W − dim V + dim(nullT ) space, respectively.
= dim nullT + dim W − dim V.
Lemma B.83. Let AT denote the matrix of a linear op-
Corollary B.77. For finite-dimensional vector spaces V erator T ∈ L(V, W ). Then the column rank of AT is the
and W , any linear map T ∈ L(V, W ) is surjective if and dimension of rangeT .
only if T 0 is injective. P
Proof. For u = i ci vi , Corollary B.56 yields
Proof. T is surjective ⇔ W = rangeT ⇔ (rangeT )0 = {0}
⇔ nullT 0 = {0} ⇔ T 0 is injective. The second step follows
X
Tu = ci T vi = T [v1 , . . . , vn ]c = [w1 , . . . , wm ]AT c.
from Lemma B.74 applied to W : i
dim rangeT 0 = dim rangeT. (B.26) The LHS is rangeT while {AT c : c ∈ Fn } is the column space
of AT . Since (w1 , . . . , wm ) is a basis, by Definition B.82 the
Proof. Theorem B.54, Lemma B.75, and Lemma B.74 yield column rank of the matrix [w , . . . , w ] is m. Taking dim to
1 m
0
dim rangeT = dim W − dim nullT 0 both sides of the above equation yields the conclusion. Note
m
that the RHS is a subspace of F (why?) and the dimension
= dim W − dim(rangeT )0 of it does not depend on the special choice of its basis, hence
= dim(rangeT ). we can choose (w1 , . . . , wm ) to be the standard basis and
then [w1 , . . . , wm ] is simply the identity matrix.
Lemma B.79. For finite-dimensional vector spaces V and
W , any linear map T ∈ L(V, W ) satisfies
Theorem B.84. For any A ∈ Fm×n , its row rank equals
0 0
rangeT = (nullT ) . (B.27) its column rank.
65
Qinghai Zhang Numerical Analysis 2021
Proof. Define a linear map T : Fn → Fm as T x = Ax. Lemma B.91. Suppose λ1 , . . . , λm are distinct eigenvalues
Clearly, A is the matrix of T for the standard bases of Fn of T ∈ L(V) with corresponding eigenvectors v1 , . . . , vm .
and Fm . Then we have, Then v1 , . . . , vm is linearly independent.
column rank of A = dim rangeT Lemma B.92. Suppose V is finite-dimensional. Then each
= dim rangeT 0 operator on V has at most dim V distinct eigenvalues.
= column rank of the matrix of T 0
B.3.2 Upper-triangular matrices
= column rank of AT
= row rank of A, Notation 7. Suppose T ∈ L(V) and p ∈ P(F) is a polyno-
mial given by
where the first step follows from Lemma B.83, the second
from Lemma B.78, the third from Lemma B.83, the fourth p(z) = a0 + a1 z + · · · + am z m
from Theorem B.65, and the last from the definition of ma-
for z ∈ F. Then p(T ) is the operator given by
trix transpose and matrix products.
p(T ) = a0 I + a1 T + · · · + am T m ,
Definition B.85. The rank of a matrix is its column rank.
Theorem B.86 (Fundamental theorem of linear algebra). where I = T 0 is the identity operator.
For a matrix A ∈ Fm×n : Fn → Fm , its column space and Example B.93. Suppose D ∈ L(P(R)) is the differentia-
row space both have dimension r ≤ min(m, n); its null space tion operator defined by Dq = q 0 and p is the polynomial
and left null space have dimensions n − r and m − r, respec- defined by p(x) = 7 − 3x + 5x2 . Then we have
tively. In addition, we have
p(D) = 7 − 3D + 5D2 , (p(D))q = 7q − 3q 0 + 5q 00 .
Fm = rangeA ⊕ nullAT , (B.28a)
Fn = rangeAT ⊕ nullA, (B.28b) Definition B.94. The product polynomial of two polyno-
mials p, q ∈ P(F) is the polynomial defined by
where rangeA ⊥ nullAT and rangeAT ⊥ nullA.
∀z ∈ F, (pq)(z) := p(z)q(z). (B.29)
Proof. The first sentence is a rephrase of Theorem B.84 and
follows from Theorem B.54. For the second sentence, we Lemma B.95. Any T ∈ L(V) and p, q ∈ P(F) satisfy
only prove (B.28b). x ∈ nullA implies x ∈ Fn and Ax = 0.
The latter expands to (pq)(T ) = p(T )q(T ) = q(T )p(T ). (B.30)
66
Qinghai Zhang Numerical Analysis 2021
B.3.3 Eigenspaces and diagonal matrices Corollary B.110. An inner product has conjugate homo-
geneity in the second slot, i.e.
Definition B.102. A diagonal entry of a matrix is an en-
try of the matrix of which the row index equals the column ∀a ∈ F, ∀v, w ∈ V, hv, awi = ā hv, wi . (B.33)
index. The diagonal of a matrix consists of all diagonal en-
tries of the matrix. A diagonal matrix is a square matrix Exercise B.111. Prove Corollaries B.109 and B.110 from
that is zero everywhere except possibly along the diagonal. Definition B.108.
n
Definition B.103. The eigenspace of T ∈ L(V) corre- Definition B.112. The Euclidean inner product on F is
sponding to λ ∈ F is Xn
hv, wi = vi wi . (B.34)
E(λ, T ) := null(T − λI). (B.31) i=1
Lemma B.104. Suppose λ1 , . . . , λm are distinct eigenval- B.4.2 Norms induced from inner products
ues of T ∈ L(V) on a finite-dimensional space V. Then
Definition B.113. Let F be the underlying field of an inner
E(λ1 , T ) + · · · + E(λm , T ) product space V. The norm induced by an inner product on
V is a function V → F:
is a direct sum and p
kvk = hv, vi. (B.35)
dim E(λ1 , T ) + · · · + dim E(λm , T ) ≤ dim V. (B.32)
Definition B.114. For p ∈ [1, ∞), the Euclidean `p norm
Definition B.105. An operator T ∈ L(V) is diagonalizable of a vector v ∈ Fn is
if it has a diagonal matrix with respect to some basis of V. ! p1
Xn
Theorem B.106 (Conditions of diagonalizability). Sup- kvkp = |vi |p (B.36)
pose λ1 , . . . , λm are distinct eigenvalues of T ∈ L(V) on i=1
a finite-dimensional space V. Then the following are equiv- and the Euclidean ` norm is
∞
alent:
kvk∞ = max |vi |. (B.37)
(a) T is diagonalizable; i
(b) V has a basis consisting of eigenvectors of T ; Theorem B.115 (Equivalence of norms). Any two norms
(c) there exist one-dimensional subspaces U1 , . . . , Un of V, k · kN and k · kM on a finite dimensional vector space V = Cn
each invariant under T , such that V = U1 ⊕ · · · ⊕ Un ; satisfy
An inner product space is a vector space V equipped with hCB, ABi = a2 − hCB, CAi ;
an inner product on V. − hCA, ABi = − hCA, CBi + b2 .
Corollary B.109. An inner product has additivity in the The proof is completed by adding up all three equations and
second slot, i.e. hu, v + wi = hu, vi + hu, wi. applying (B.39).
67
Qinghai Zhang Numerical Analysis 2021
Theorem B.118 (The law of cosines: abstract version). More precisely, we have in the above plot
Any induced norm on a real vector space satisfies
(AB)2 +(BC)2 +(CD)2 +(DA)2 = (AC)2 +(BD)2 . (B.42)
ku − vk2 = kuk2 + kvk2 − 2 hu, vi . (B.41)
Proof. Apply the law of cosines to the two diagonals, add
the two equations, and we obtain (B.42).
Proof. Definitions B.113 and B.108 and F = R yield
Theorem B.123 (The parallelogram law: abstract ver-
ku − vk2 = hu − v, u − vi sion). Any induced norm (B.35) satisfies
= hu, ui + hv, vi − hu, vi − hv, ui
2kuk2 + 2kvk2 = ku + vk2 + ku − vk2 . (B.43)
= kuk2 + kvk2 − 2 hu, vi .
Proof. Replace v in (B.41) with −v and we have
B.4.3 Norms and induced inner-products ku + vk2 = kuk2 + kvk2 + 2 hu, vi .
Definition B.119. A function k · k : V → F is a norm for (B.43) follows from adding the above equation to (B.41).
a vector space V iff it satisfies
Exercise B.124. In the case of Euclidean `p norms, show
(NRM-1) real positivity: ∀v ∈ V, kvk ≥ 0; that the parallelogram law (B.43) holds if and only if p = 2.
(NRM-2) point separation: kvk = 0 ⇒ v = 0.
Theorem B.125. The induced norm (B.35) holds for some
(NRM-3) absolute homogeneity: inner product h·, ·i if and only if the parallelogram law (B.43)
∀a ∈ F, ∀v ∈ V, kavk = |a|kvk; holds for every pair of u, v ∈ V.
(NRM-4) triangle inequality: Exercise B.126. Prove Theorem B.125.
∀u, v ∈ V, ku + vk ≤ kuk + kvk.
Example B.127. By Theorem B.125 and Exercise B.124,
The function k · k : V → F is called a semi-norm iff it satifies the `1 and `∞ spaces do not have a corresponding inner
(NRM-1,3,4). A normed vector space (or simply a normed product for the Euclidean `1 and `∞ norms.
space) is a vector space V equipped with a norm on V.
Exercise B.120. Explain how (NRM-1,2,3,4) relate to the B.4.4 Orthonormal bases
geometric meaning of the norm of vectors in R3 . Definition B.128. Two vectors u, v are called orthogonal
if hu, vi = 0, i.e., their inner product is the additive identity
Lemma B.121. The norm induced by an inner product is
of the underlying field.
a norm as in Definition B.119.
Example B.129. An inner product on the vector space of
Proof. The induced norm as in (B.35) satisfies (NRM-1,2) continuous real-valued functions on the interval [−1, 1] is
trivially. For (NRM-3),
Z +1
2 2 2
kavk = hav, avi = a hv, avi = aā hv, vi = |a| kvk . hf, gi = f (x)g(x)dx.
−1
To prove (NRM-4), we have f and g are said to be orthogonal if the integral is zero.
where the second step follows from (IP-5) and the fourth where the equality holds iff one of u, v is a scalar multiple
step from Cauchy-Schwarz inequality. of the other.
Proof. For any complex number λ, (IP-1) implies
Theorem B.122 (The parallelogram law). The sum of
squares of the lengths of the four sides of a parallelogram hu + λv, u + λvi ≥ 0
equals the sum of squares of the two diagonals.
⇒ hu, ui + λ hv, ui + λ̄ hu, vi + λλ̄ hv, vi ≥ 0.
68
Qinghai Zhang Numerical Analysis 2021
Example B.134. If f, g : [a, b] → R are continuous, then Definition B.142. The adjoint of a linear map
Z 2 ! Z ! T ∈ L(V, W) between inner-product spaces is a function
b Z b b
T ∗ : W → V that satisfies
2 2
f (x)g(x)dx ≤ f (x)dx g (x)dx
a a a
∀v ∈ V, ∀w ∈ W, hT v, wi = hv, T ∗ wi .
(B.48)
Definition B.135. A list of vectors (e1 , e2 , . . . , em ) is
called orthonormal if the vectors in it are pairwise orthogo- Example B.143. Define a linear operator T : R3 → R2 ,
nal and each vector has norm 1, i.e.
( T (x1 , x2 , x3 ) = (x2 + 3x3 , 2x1 ).
∀i = 1, 2, . . . , m, kei k = 1;
(B.45)
∀i 6= j, hei , ej i = 0. Then T ∗ (y1 , y2 ) = (2y2 , y1 , 3y1 ) because
69
Qinghai Zhang Numerical Analysis 2021
Corollary B.150. A unitary matrix U preserves norms and Proof. By Definition B.108 and (B.51), we have, ∀u, w ∈ V,
inner products. More precisely, we have
hT (u + w), u + wi − hT (u − w), u − wi
n
hT u, wi =
∀v, w ∈ C , hU v, U wi = hv, wi . 4
hT (u + iw), u + iwi − hT (u − iw), u − iwi
+i
Proof. This follows from Definitions B.142 and B.148. 4
2×2 =0.
Theorem B.151. Every unitary matrix U ∈ C with
det U = 1 is of the form Setting w = T u completes the proof.
a b B.154. An operator T ∈ L(V) is self-adjoint iff
U= , (B.50) Definition
−b a T = T ∗ , i.e.
70
Qinghai Zhang Numerical Analysis 2021
(a) T is normal but not self-adjoint. B.5.5 The singular value decomposition
(b) The matrix of T with respect to every orthonormal basis Definition B.166. A self-adjoint linear operator whose
of V has the form eigenvalues are non-negative is called positive semidefinite
a −b or positive, and called positive definite if it is also invertible.
M (T ) = (B.56)
b a
Corollary B.167. For any linear operator f ∈ L(V), both
where b 6= 0.
f ∗ ◦ f and f ◦ f ∗ are self-adjoint and positive semidefinite.
Proof. (b) ⇒ (a) trivially holds, so we only prove (a) ⇒ (b).
Let (e1 , e2 ) be an orthonormal basis of V and set Proof. By Definition B.154, f ∗ ◦ f is self-adjoint since
a c h(f ∗ ◦ f )u, vi = hf u, f vi = hu, (f ∗ ◦ f )vi .
M (T, (e1 , e2 )) = .
b d
∗
By Definition B.55, Theorem B.130, and Definition B.135, Suppose (λ, u) is an eigen-pair of (f ◦ f ). Then we have
we have kT e1 k2 = a2 + b2 . In addition, Theorem B.152
yields kT ∗ e1 k2 = a2 + c2 . Then Lemma B.159 implies λ hu, ui = h(f ∗ ◦ f )u, ui = hf u, f ui
b2 = c2 and the condition of T being not self-adjoint fur- hf u, f ui
⇒λ = ≥ 0.
ther yields c = −b 6= 0. Considering kT e2 k2 and kT ∗ e2 k2 hu, ui
yields a = d.
Similar arguments apply to f ◦ f ∗ .
B.5.3 The spectral theorem
Definition B.168. The singular values of a linear map f
Theorem B.161 (Complex spectral). For a linear operator are the square roots of the nonnegative eigenvalues of f ∗ ◦ f .
T ∈ L(V) with F = C, the following are equivalent:
(a) T is normal; Definition B.169. For a rectangular matrix A ∈ Fm×n ,
the factorization A = P ΣQ∗ is a singular value decomposi-
(b) V has an orthonormal basis consisting of eigenvectors of
tion (SVD) iff any entry of Σ ∈ Rm×n is zero except pos-
T;
sibly at a diagonal entry (an entry of which the column
(c) T has a diagonal matrix with respect to some orthonor- index equals the row index), and P ∈ Fm×m and Q ∈ Fn×n
mal basis of V. are unitary matrices or orthogonal matrices for F = C or
Theorem B.162 (Real spectral). For a linear operator F = R, respectively. The diagonal entries of Σ, written
T ∈ L(V) with F = R, the following are equivalent: σ 1 ≥ σ2 ≥ · · · ≥ σq ≥ 0 where q = min(m, n), are the
singular values of A. The column vectors of P and Q are
(a) T is self-adjoint; the left singular vectors and the right singular vectors of A,
(b) V has an orthonormal basis consisting of eigenvectors of respectively.
T;
m×n
(c) T has a diagonal matrix with respect to some orthonor- Theorem B.170. Any matrix A ∈ C has an SVD.
mal basis of V.
Definition B.171. Two matrices A, B ∈ Rn×n are called
similar iff there exists an invertible matrix P such that
B.5.4 Isometries B = P −1 AP . The map A 7→ P −1 AP is called a similar-
Definition B.163. An operator S ∈ L(V) is called a (lin- ity transformation or conjugation of the matrix A.
ear) isometry iff
∀v ∈ V, kSvk = kvk. (B.57) B.6 Trace and determinant
Theorem B.164. An operator S ∈ L(V) on a real inner
product space is an isometry if and only if there exists an Definition B.172. The trace of a matrix A, denoted by
orthonormal basis of V with respect to which S has a block Trace A, is the sum of the diagonal entries of A.
diagonal matrix such that each block on the diagonal is a
Lemma B.173. The trace of a matrix is the sum of its
1-by-1 matrix containing 1 or −1, or, is a 2-by-2 matrix of
eigenvalues, each of which is repeated according to its mul-
the form
cos θ − sin θ
tiplicity.
(B.58)
sin θ cos θ
Definition B.174. A permutation of a set A is a bijective
where θ ∈ (0, π). function σ : A → A.
Corollary B.165. For an operator S ∈ L(V) on a two-
dimensional real inner product space, the following are Definition B.175. Let σ be a permutation of A =
equivalent: {1, 2, . . . , n} and let s denote the number of pairs of inte-
gers (j, k) with 1 ≤ j < k ≤ n such that j appears after k
(a) S is an isometry; in the list (m1 , . . . , mn ) given by mi = σ(i). The sign of the
(b) S is either an identity or a reflection or a rotation. permutation σ is 1 if s is even and −1 if s is odd.
71
Qinghai Zhang Numerical Analysis 2021
Definition B.176. The signed volume of a parallelotope Theorem B.185. The signed volume function satisfying
spanned by n vectors v1 , v2 , . . . , vn ∈ Rn is a function (SVP-1,2,3) in Definition B.176 is unique and is the same as
δ : Rn×n → R that satisfies the determinant in (B.62).
(SVP-1) δ(I) = 1; Proof. Let the parallelotope be spanned by the column vec-
(SVP-2) δ(v1 , v2 , . . . , vn ) = 0 if vi = vj for some i 6= j; tors v1 , v2 , . . . , vn . We have
(SVP-3) δ is linear, i.e., ∀j = 1, . . . , n, ∀c ∈ R, v11 v12 . . . v1n
v21 v22 . . . v2n
δ(v1 , . . . , vj−1 , v + cw, vj+1 , . . . , vn ) δ .
.. .. ..
= δ(v1 , . . . , vj−1 , v, vj+1 , . . . , vn ) (B.59) .
. . . .
+cδ(v1 , . . . , vj−1 , w, vj+1 , . . . , vn ). vn1 vn2 . . . vnn
| v12 . . . v1n
Exercise B.177. Give a geometric proof that the signed n ei1 v22 . . . v2n
volume of the parallelogram determined by the two vectors =
X
vi 1 1 δ .. ..
..
v = (a, b)T and v = (c, d)T is
1 2
| i1 =1 . . .
| vn2 ... vnn
v1⊥ , v2
δ(v1 , v2 ) = ad − bc = . (B.60)
| | v13 ... v1n
Lemma B.178. Adding a multiple of one vector to another X n ei1 ei2 v23 ... v2n
does not change the signed volume. = vi1 1 vi2 2 δ .. ..
..
i1 ,i2 =1
| | . . .
Proof. This follows directly from (SVP-2,3). | | vn2 ... vnn
Lemma B.179. If the vectors v1 , v2 , . . . , vn are linearly =···
dependent, then δ(v1 , v2 , . . . , vn ) = 0.
n
X | | ... |
Pn
Proof. WLOG, we assume v1 = i=2 ci vi . Then the result = vi1 1 vi2 2 · · · vin n δ ei1 ei2 . . . ein
follows from (SVP-2,3). i1 ,i2 ,...,in =1 | | ... |
| | ... |
Lemma B.180. The signed volume δ is alternating, i.e., X
= vσ(1),1 vσ(2),2 · · · vσ(n),n δ eσ(1) eσ(2) . . . eσ(n)
δ(v1 , . . . , vi , . . . , vj , . . . , vn ) = −δ(v1 , . . . , vj , . . . , vi , . . . , vn ). σ∈Sn | | ... |
(B.61) =
X
vσ(1),1 vσ(2),2 · · · vσ(n),n sgn(σ)
Exercise B.181. Prove Lemma B.180 using (SVP-2,3). σ∈S n
n
Lemma B.182. Let Mσ denote the matrix of a permuta- =
X Y
sgn(σ) vσ(i),i ,
tion σ : E → E where E is the set of standard basis vectors σ∈Sn i=1
in (B.5). Then we have δ(Mσ ) = sgn(σ).
where the first four steps follow from (SVP-3), the sixth
Proof. There is a one-to-one correspondence between the
step from Lemma B.182, and the fifth step from (SVP-
vectors in the matrix
2). In other words, the signed volume δ(·) is zero for any
Mσ = [eσ(1) , eσ(2) , . . . , eσ(n) ] ij = ik and hence the only nonzero terms are those of which
(i1 , i2 , . . . , in ) is a permutation of (1, 2, . . . , n).
and the scalars in the one-line notation
(σ(1) σ(2) . . . σ(n)). Exercise B.186. Use the formula in (B.62) to show that
det A = det AT .
A sequence of transpositions taking σ to the identity map
n×n
also takes Mσ to the identity matrix. By Lemma B.180, each Definition B.187. The i, j cofactor of A ∈ R is
transposition yields a multiplication factor −1. Definition
B.175 and (SVP-1) give δ(Mσ ) = sgn(σ)δ(I) = sgn(σ). Cij = (−1)i+j Mij , (B.63)
Definition B.183 (Leibniz formula of determinants). The where Mij is the i, j minor of a matrix A, i.e. the determi-
determinant of a square matrix A ∈ Rn×n is nant of the (n−1)×(n−1) matrix that results from deleting
n the i-th row and the j-th column of A.
X Y
det A = sgn(σ) aσ(i),i , (B.62)
Theorem B.188 (Laplace formula of determinants). Given
σ∈Sn i=1
fixed indices i, j ∈ 1, 2, . . . , n, the determinant of an n-by-n
where the sum is over the symmetric group Sn of all permu- matrix A = [aij ] is given by
tations and aσ(i),i is the element of A at the σ(i)th row and
n n
the ith column. X X
det A = aij 0 Cij 0 = ai0 j Ci0 j . (B.64)
Lemma B.184. The determinant of a matrix is the prod- j 0 =1 i0 =1
uct of its eigenvalues, each of which is repeated according to
its multiplicity. Exercise B.189. Prove Theorem B.188 by induction.
72
Appendix C
Basic Analysis
• If you want to build a fence over your backyard swim- ∀ > 0, ∀n ≥ N, |an − a| ≤ |an − aN | + |aN − a| ≤ ,
ming pool, several digits of π is probably enough;
which completes the proof.
• in NASA, calculations involving π use 15 digits for
Guidance Navigation and Control; Lemma C.10. Every Cauchy sequence is bounded.
73
Qinghai Zhang Numerical Analysis 2021
Lemma C.11. Every real sequence has a monotone subse- Lemma C.21. Let (an )∞ n=m be a sequence of real numbers.
quence. For L+ = lim sup an and L− = lim inf an , we have
Theorem C.12. A bounded monotone sequence is conver- (a) For every x > L+ , elements of the sequence are eventu-
gent. ally less than x:
Theorem C.13 (Bolzano-Weierstrass). Every bounded se-
∀x > L+ , ∃N ≥ m s.t. ∀n ≥ N, an < x.
quence has a convergent subsequence.
Theorem C.14 (Cauchy criterion). Every Cauchy se- Similarly, for every x < L− , elements of the sequence
quence in R converges to a limit in R. are eventually greater than x:
Proof. By Lemma C.10, the Cauchy sequence (an ) is ∀x < L− , ∃N ≥ m s.t. ∀n ≥ N, an > x.
bounded. Theorem C.13 implies that (an )n∈N has a con-
vergent subsequence (ank )k∈N . Then Lemma C.9 completes
(b) For every x < L+ , there are an infinite number of ele-
the proof.
ments in the sequence that are greater than x:
Theorem C.15 (Completeness of R). A sequence of real
numbers is Cauchy if and only if it is convergent. ∀x < L+ , ∀N ≥ m, ∃n ≥ N s.t. an > x.
Proof. This is a summary of Lemma C.8 and Theorem Similarly, for every x > L− , there are an infinite number
C.14. of elements in the sequence that are less than x:
Definition C.18. A real number x is a limit point or ad- Theorem C.22 (Squeeze test or the Sandwich Theorem).
herent point of a sequence (an )∞
n=m of real numbers if it is
Let (an )∞ ∞ ∞
n=m , (bn )n=m , and (cn )n=m be sequences of real
∞
continually -adherent to (an )n=m for every ≥ 0. numbers that satisfy
and (a− ∞
n )n=N is the sequence We also write
74
Qinghai Zhang Numerical Analysis 2021
C.2 Series Definition C.34 (Limit of a scalar function with one vari-
able). Consider a function f : I → R with I(c, r) =
Definition C.23 (Finite series). Let m, n be integers and (c − r, c) ∪ (c, c + r). The limit of f (x) exists as x approaches
let (ai )ni=m be a finite sequence of real numbers. The finite c, written limx→c f (x) = L, iff
n
series or finite
Pnsum associated with the sequence (ai )i=m is
the number i=m ai given by the recursive formula ∀ > 0, ∃δ > 0, s.t. ∀x ∈ I(c, δ), |f (x) − L| < . (C.13)
1
n
( Example C.35. Show that limx→2 x = 12 .
X 0 if n < m;
ai := Pn−1 (C.9) 1
i=m
an + i=m ai otherwise. 1 1 If 1 ≥ 2 , choose
Proof. δ = 1. Then x ∈ (1, 3) implies
− < since 1 − 1 is a monotonically decreasing func-
x 2 2 x 2
Definition C.24 (Formal infinite series). A (formal) infi- tion with its supremum at x = 1.
associated with an infinite sequence {an } is the
nite series P If ∈ (0, 12 ), choose δ = . Then x ∈ (2−, 2+) ⊂ ( 32 , 52 ).
∞
Hence x1 − 12 = |2−x|
expression n=0 an . |2x| < |2−x| < . The proof is completed
by Definition C.34.
∞
Definition C.25. The sequence of partial
P∞ sums (Sn )n=0 as-
sociated with a formal infinite series i=0 ai is defined for Definition C.36. f : R → R is continuous at c iff
each n as the sum of the sequence {ai } from a0 to an
lim f (x) = f (c). (C.14)
x→c
Xn
Sn = ai . (C.10) Definition C.37. A scalar function f is continuous on
i=0 (a, b), written f ∈ C(a, b), if (C.14) holds ∀x ∈ (a, b).
Definition C.26. A formal infinite series is said to be con- Theorem C.38 (Extreme values). A continuous func-
P∞ con- tion f : [a, b] → R attains its maximum at some point
vergent and converge to L if its sequence of partial sums
verges to some limit L. In this case we write L = n=0 an xmax ∈ [a, b] and its minimum at some point xmin ∈ [a, b].
and call L the sum of the infinite series.
Theorem C.39 (Intermediate value). A scalar function
Definition C.27. A formal infinite series is said to be di- f ∈ C[a, b] satisfies
vergent if its sequence of partial sums diverges. In this case
∀y ∈ [m, M ] , ∃ξ ∈ [a, b], s.t. y = f (ξ) (C.15)
we do not assign any real number value to this series.
P∞ where m = inf x∈[a,b] f (x) and M = supx∈[a,b] f (x).
Lemma C.28. An infinite series n=0 an of real numbers
is convergent if and only if Definition C.40. Let I = (a, b). A function f : I → R is
q uniformly continuous on I iff
X
∀ > 0, ∃N ∈ N s.t. ∀p, q ≥ N, an ≤ . (C.11)
∀ > 0, ∃δ > 0, s.t.
n=p (C.16)
∀x, y ∈ I, |x − y| < δ ⇒ |f (x) − f (y)| < .
P∞
Definition C.29. An infinite
P∞ series n=0 an is absolutely Example C.41. Show that, on (a, ∞), f (x) = x is uni-
1
convergent iff the series n=0 |an | is convergent. formly continuous if a > 0 and is not so if a = 0.
Lemma C.30. An infinite series that is absolutely conver- Proof. If a > 0, then |f (x) − f (y)| = |x−y| |x−y|
xy < a2 .
gent is convergent. Hence ∀ > 0, ∃δ = a2 , s.t.
a2
P∞ |x − y| < δ ⇒ |f (x) − f (y)| < |x−y| a2 < a2 = .
Theorem C.31 (Root test). For an infinite series n=0 an ,
If a = 0, negating the condition of uniform continuity,
define
1 i.e. eq. (C.16), yields ∃ > 0 s.t. ∀δ > 0 ∃x, y > 0 s.t.
α := lim sup |an | n . (C.12) (|x − y| < δ) ∧ (| x1 − y1 | ≥ ).
n→∞
We prove a stronger version: ∀ > 0, ∀δ > 0 ∃x, y > 0
The series is convergent if α < 1 and divergent if α > 1. s.t. |x − y| < δ ⇒ |f (x) − f (y)| ≥ .
1 1 1
P∞ If δ ≥ 2 , choose x = 2 , y = 4 . This choice satis-
Theorem C.32 (Ratio test). An infinite series n=0 an of 1 1
fies |x − y| < δ since x − y = 4 < 2 ≤ δ. However,
nonzero real numbers is |x−y|
|f (x) − f (y)| = xy = 2 > .
• absolutely convergent if lim supn→∞ |a|an+1
n|
|
< 1; If δ < 21
, then 2δ < 1. Choose x ∈ (0, δ 2 ) and y ∈
(2δ , δ). This choice satisfies |x − y| < δ and |x − y| > δ 2 .
2
• divergent if lim inf n→∞ |a|an+1 |
> 1. δ 2
n| However, |f (x) − f (y)| = |x−y| 1 1
xy > xy > y > δ > 2 > .
1
Exercise C.42. On (a, ∞), f (x) = x2 is uniformly contin-
C.3 Continuous functions on R uous if a > 0 and is not so if a = 0.
Definition C.33. A scalar function is a function whose Theorem C.43. Uniform continuity implies continuity but
range is a subset of R. the converse is not true.
75
Qinghai Zhang Numerical Analysis 2021
f (a + h) − f (a)
f 0 (a) = lim . (C.17) where an ’s are the coefficients. The interval of convergence
h→0 h is the set of values of x for which the series converges:
If the limit exists, f is differentiable at a.
Example C.46. For the power function f (x) = xα , we have Ic (p) = {x | p(x) converges}. (C.19)
f 0 = αxα−1 due to Newton’s generalized binomial theorem,
X∞
α α−n n Definition C.54. If the derivatives f (i) (x) with i =
α
(a + h) = a h . 1, 2, . . . , n exist for a function f : R → R at x = c, then
n=0
n
n
Definition C.47. A function f (x) is k times continuously X f (k) (c)
T (x) = (x − c)k (C.20)
differentiable on (a, b) iff f (k) (x) exists on (a, b) and is itself n
k!
k=0
continuous. The set or space of all such functions on (a, b)
is denoted by C k (a, b). In comparison, C k [a, b] is the space
is called the nth Taylor polynomial for f (x) at c.
of functions f for which f (k) (x) is bounded and uniformly
In particular, the linear approximation for f (x) at c is
continuous on (a, b).
Theorem C.48. A scalar function f is bounded on [a, b] if T1 (x) = f (c) + f 0 (c)(x − c). (C.21)
f ∈ C[a, b].
Theorem C.49. If f : (a, b) → R assumes its maximum or Example C.55. If f ∈ C ∞ , then ∀n ∈ N, we have
minimum at x0 ∈ (a, b) and f is differentiable at x0 , then
f 0 (x0 ) = 0.
( P
n f (k) (c)
k=m (k−m)! (x − c)k−m , m ∈ N, m ≤ n;
0
Proof. Suppose f (x0 ) > 0. Then we have Tn(m) (x) =
0, m ∈ N, m > n.
f (x) − f (x0 )
f 0 (x0 ) = lim > 0.
x→x0 x − x0 This can be proved by induction. In the inductive step, we
The definition of a limit implies regroup the summation into a constant term and another
shifted summation.
∃δ > 0 s.t. a < x0 − δ < x0 + δ < b,
which, together with |x − x0 | < δ, implies f (x)−f
x−x0
(x0 )
> 0. Definition C.56. The Taylor series (or Taylor expansion)
This is a contradiction to f (x0 ) being a maximum when we for f (x) at c is
choose x ∈ (x0 , x0 + δ). ∞
X f (k) (c)
(x − c)k . (C.22)
Theorem C.50 (Rolle’s). If a function f : R → R satisfies k!
k=0
76
Qinghai Zhang Numerical Analysis 2021
Theorem C.60 (Taylor’s theorem with Lagrangian form). Definition C.63. The Riemann sum of f : R → R over a
Consider a function f : R → R. If f ∈ C n [c − d, c + d] and partition Pn is
f (n+1) (x) exists on (c − d, c + d), then ∀x ∈ [c − d, c + d], n
there exists some ξ between c and x such that Sn (f ) =
X
f (x∗i )(xi − xi−1 ), (C.30)
f (n+1) (ξ) i=1
En (x) = (x − c)n+1 . (C.25)
(n + 1)! where x∗i ∈ Ii is a sample point of the ith subinterval.
Proof. Fix x 6= c, let M be the unique solution of Definition C.64. A function f : R → R is Riemann inte-
n+1 grable on [a, b] iff
M (x − c)
En (x) = f (x) − Tn (x) = .
(n + 1)! ∃L ∈ R, s.t. ∀ > 0, ∃δ > 0 s.t.
Consider the function ∀Pn (a, b) with h(Pn ) < δ, |Sn (f ) − L| < . (C.31)
M (t − c)n+1 Rb
g(t) := En (t) − . (C.26) In this case we write L = a
f (x)dx and call it the Riemann
(n + 1)! integral of f on [a, b].
Clearly g(x) = 0. By Lemma C.59, g (k) (c) = 0 for each Example C.65. The following function f : [a, b] → R is
k = 0, 1, . . . , n. Then Rolle’s theorem implies that not Riemann integrable.
∃x1 ∈ (c, x) s.t. g 0 (x1 ) = 0. (
1 if x is rational;
f (x) =
If x < c, change (c, x) above to (x, c). Apply Rolle’s theorem 0 if x is irrational.
to g 0 (t) on (c, x1 ) and we have
To see this, we first negate the logical statement in (C.31)
∃x2 ∈ (c, x1 ) s.t. g (2) (x2 ) = 0. to get
Repeatedly using Rolle’s theorem, ∀L ∈ R, ∃ > 0, s.t. ∀δ > 0
(n+1) ∃Pn (a, b) with h(Pn ) < δ, s.t. |Sn (f ) − L| ≥ .
∃xn+1 ∈ (c, xn ) s.t. g (xn+1 ) = 0. (C.27)
(n+1)
Since Tn is a polynomial of degree n, we have Tn (t) = 0, If |L| < b−a ∗
2 , we choose all xi ’s to be rational so that
which, together with (C.27) and (C.26), yields f (x∗i ) ≡ 1; then (C.30) yields Sn (f ) = b − a. For = b−a4 ,
the formula |Sn (f ) − L| ≥ clearly holds.
f (n+1) (xn+1 ) − M = 0. If |L| ≥ b−a ∗
2 , we choose all xi ’s to be irrational so that
The proof is completed by identifying ξ with xn+1 . f (xi ) ≡ 0; then (C.30) yields Sn (f ) = 0. For = b−a
∗
4 , the
formula |Sn (f ) − L| ≥ clearly holds.
Example C.61. How many terms are needed to compute
e2 correctly to four decimal places? Definition C.66. If f : R → R is integrable on [a, b], then
The requirement of four decimal places means an accu- the limit of the Riemann sum of f is called the definite in-
racy of at least = 10−5 . By Definition C.56, the Taylor tegral of f on [a, b]:
series of ex at c = 0 is Z b
+∞ n f (x)dx = lim Sn (f ). (C.32)
X x hn →0
ex = . a
n!
n=0 Theorem C.67. A scalar function f is integrable on [a, b]
By Theorem C.60, we have if f ∈ C[a, b].
∃ξ ∈ [0, 2] s.t. En (2) = eξ 2n+1 /(n + 1)! < e2 2n+1 /(n + 1)! Definition C.68. A monotonic function is a function be-
tween ordered sets that either preserves or reverses the given
Then e2 2n+1 /(n + 1)! ≤ yields n ≥ 12, i.e., 13 terms. order. In particular, f : R → R is monotonically increasing
if ∀x, y, x ≤ y ⇒ f (x) ≤ f (y); f : R → R is monotonically
decreasing if ∀x, y, x ≤ y ⇒ f (x) ≥ f (y).
C.6 Riemann integral
Theorem C.69. A scalar function is integrable on [a, b] if
Definition C.62. A partition of an interval I = [a, b] is a it is monotonic on [a, b].
totally-ordered finite subset Pn ⊆ I of the form
Exercise C.70. True or false: a bijective function is either
Pn (a, b) = {a = x0 < x1 < · · · < xn = b}. (C.28) order-preserving or order-reversing?
The interval Ii = [xi−1 , xi ] is the ith subinterval of the par- Theorem C.71 (Integral mean value). Let w : [a, b] → R+
tition. The norm of the partition is the length of the longest be integrable on [a, b]. For f ∈ C[a, b], ∃ξ ∈ [a, b] s.t.
subinterval, Z b Z b
h = h(P ) = max(x − x ), i = 1, 2, . . . , n. (C.29) w(x)f (x)dx = f (ξ) w(x)dx. (C.33)
n n i i−1
a a
77
Qinghai Zhang Numerical Analysis 2021
Proof. Denote m = inf x∈[a,b] f (x), M = supx∈[a,b] f (x), and Notation 9. In Definition C.76 we used the synonym no-
Rb
I = a w(x)dx. Then mw(x) ≤ f (x)w(x) ≤ M w(x) and tation
|u − v|X := dX (u, v). (C.40)
Z b
mI ≤ w(x)f (x)dx ≤ M I. Definition C.77 (Pointwise convergence). Let (fn )∞ n=1 be
a a sequence of functions from one metric space (X , dX ) to
w > 0 implies I 6= 0, hence another (Y, dY ), and let f : X → Y be another function.
We say that (fn )∞n=1 converges pointwise to f on X iff
1 b
Z
m≤ w(x)f (x)dx ≤ M. ∀x ∈ X , lim fn (x) = f (x), (C.41)
I a n→∞
Example C.75. Set X to be C[a, b], the set of continuous Example C.81. Pointwise convergence does not preserve
functions [a, b] → R. Then the following is a metric on X , boundedness. For example, the function sequence
(
d(x, y) = max |x(t) − y(t)|. (C.37) exp(x) if exp(x) ≤ n;
t∈[a,b]
fn (x) = (C.43)
n if exp(x) > n
Definition C.76 (Limiting value of a function). Let converges pointwise to f (x) = exp(x). Similarly, the func-
(X , dX ) and (Y, dY ) be metric spaces. Let E be a subset tion sequence
of X and x0 ∈ X be an adherent point of E. A function (
1
f : X → Y is said to converge to L ∈ Y as x converges to x if x ≥ n1 ;
x0 ∈ E, written f n (x) = (C.44)
0 if x ∈ (0, 1 ) n
lim f (x) = L, (C.38) converges pointwise to f (x) = 1
x→x0 ;x∈E x. As another example, the
function sequence
iff
x
fn (x) = n sin (C.45)
∀ > 0,∃δ > 0 s.t. ∀x ∈ E, n
|x − x0 |X < δ ⇒ |f (x) − L|Y < . (C.39) converges pointwise to f (x) = x.
78
Qinghai Zhang Numerical Analysis 2021
79
Qinghai Zhang Numerical Analysis 2021
Definition C.97 (Partial derivative). Let E be a subset of Theorem C.99. Let E be a subset of Rn , f : E → Rm a
Rn , f : E → Rm a function, x0 ∈ E an interior point of E, function, F a subset of E, and x0 ∈ E an interior point of
∂f
and 1 ≤ j ≤ n. The partial derivative of f with respect to F . If all the partial derivatives ∂x j
exist on F and are con-
the xj variable at x0 is defined by tinuous at x0 , then f is differentiable at x0 , and the linear
transformation f 0 (x0 ) : Rn → Rm is defined by
∂f f (x0 +tej )−f (x0 )
∂xj (x0 ) := limt→0;t>0,x0 +tej ∈E t (C.52) n
d
= dt f (x0 + tej )|t=0 X ∂f
f 0 (x0 )(v) = vj (x0 ). (C.53)
j=1
∂xj
provided that the limit exists. Here ej is the jth standard
basis vector of Rn . Definition C.100. The derivative matrix or differential
matrix or Jacobian matrix of a differentiable function
Exercise C.98. Show that the existence of partial deriva-
f : Rn → Rm is a m × n matrix,
tives at x0 does not imply that the function is differentiable
at x0 by considering the differentiability of the following ∂f ∂f1 ∂f1
1
· · ·
function f : R2 → R at (0,0). ∂x
∂f21
∂x2
∂f2
∂xn
∂f2
· · · ∂x
∂x1 ∂x 2 n
( 3 Df := . . . . . (C.54)
.. .. .. ..
x
x2 +y 2 if (x, y) 6= (0, 0);
f (x, y) =
∂fm ∂fm
0 if (x, y) = (0, 0). ∂x1 ∂x2 · · · ∂f
∂xn
m
80
Appendix D
Point-set Topology
81
Qinghai Zhang Numerical Analysis 2021
82
Qinghai Zhang Numerical Analysis 2021
In the above plots, imagine a population with phenotype void bubble_sort(int a[]){
‘star’ in an evolutionary situation where phenotype ‘trian- int n = sizeof(a)/sizeof(a[0]);
gle’ would be advantageous or desirable. But phenotype for (int j=0; j<n-1; j++)
‘triangle’ may not be accessible to phenotype ‘star’ in the for (int i = 0; i<n-1-j; i++)
vicinity of the population’s current location. However, due if(a[i] > a[i+1])
to the neutral network of ‘star,’ the population is not stuck, swap(a[i], a[i+1]);
but can drift on that network into far away regions, vastly }
improving its chances of encountering the neutral network
of ‘triangle.’ Therefore, neutral networks enable phenotypic void swap(int& b, int& c){
innovation by permitting the accumulation of neutral muta- int temp = b;
tions. b = c;
c = temp;
Exercise D.13. How do we capture and quantify the ac- }
cessibility of one (favorable) phenotype from another (less
favorable) by means of mutations in the sequence space? For As a limit of the above implementation, the program
any two phenotypes, is there always a directed path from one does not apply to the data type char, nor any other data
to the other? type without an implicit conversion, even if the “less than”
binary relation for such a data type is natural. You have
Definition D.14. A phenotype space is a set of RNA shapes to manually repeat the above program for each data type.
on which a topology is defined to quantify proximity of RNA An elegant solution is to use function template in C++ as
shapes. follows.
template<typename T>
Definition D.15. The mutation probability of an RNA void bubble_sort(T a[])
shape r to another RNA shape s is defined as {
mr,s int n = sizeof(a)/sizeof(a[0]);
pr,s := , (D.1) for (int i=0; i<n-1; i++)
mr,∗
for (int j=0; j<n-1-i; j++)
where mr,s is the number of point mutations that change a if (a[j] > a[j+1])
sequence in N (r) to a neighboring sequence in N (s) and mr,∗ swap<T>(a[j], a[j+1]);
is the number of point mutations that change a sequence in }
N (r) to a neighboring sequence in any other network.
template<typename T>
Exercise D.16. Show that the mutation probability cannot void swap(T& b, T& c){
be a metric on the phenotype space. T temp = b;
b = c;
Example D.17 (Bubble sort). To sort the sequence 51428, c = temp;
the first pass of the algorithm goes as follows. }
( 5 1 4 2 8 ) --> ( 1 5 4 2 8 )
( 1 5 4 2 8 ) --> ( 1 4 5 2 8 ) D.1.2 Generalizing continuous maps
( 1 4 5 2 8 ) --> ( 1 4 2 5 8 ) Definition D.18. A function f : R → R is continuous at a
( 1 4 2 5 8 ) --> ( 1 4 2 5 8 ) iff
The second pass goes as follows. ∀ > 0 ∃δ > 0 s.t. |x − a| < δ ⇒ |f (x) − f (a)| < . (D.2)
Definition D.19. A function f : Rn → Rm is continuous
( 1 4 2 5 8 ) --> ( 1 4 2 5 8 )
at x = a iff
( 1 4 2 5 8 ) --> ( 1 2 4 5 8 )
( 1 2 4 5 8 ) --> ( 1 2 4 5 8 ) ∀ > 0 ∃δ > 0 s.t. f (B(a, δ)) ⊂ B(f (a), ), (D.3)
( 1 2 4 5 8 ) --> ( 1 2 4 5 8 )
where the n-dimensional open ball B(p, r) is
Now, the array is already sorted, but the algorithm does not B(p, r) = {x ∈ Rn : kx − pk2 < r}. (D.4)
know if it is completed. The algorithm needs one whole pass
without any swap to know it is sorted. The third pass goes Definition D.20. A function f : X → Y with X ⊂ Rn and
m
as follows. Y ⊂ R is continuous at x = a iff
∀ > 0 ∃δ > 0 s.t. f (Va ) ⊂ Ua , (D.5)
( 1 2 4 5 8 ) --> ( 1 2 4 5 8 )
( 1 2 4 5 8 ) --> ( 1 2 4 5 8 ) where the two sets associated with a are
( 1 2 4 5 8 ) --> ( 1 2 4 5 8 ) Va := B(a, δ) ∩ X, Ua := B(f (a), ) ∩ Y. (D.6)
( 1 2 4 5 8 ) --> ( 1 2 4 5 8 )
Definition D.21. A function f : X → Y is continuous if
This algorithm is expressed in C as follows. it is continuous at every point a ∈ X.
83
Qinghai Zhang Numerical Analysis 2021
Example D.22. Is the function x 7→ x1 continuous? It de- • for (m, n) ∈ R2 and d > 0,
pends on whether its domain includes the origin. But it is
indeed continuous on domains such as (0, 1], R \ {0}, and ∀(a, b) ∈ S((m, n), d), ∃r > 0 s.t. S((a, b), r) ⊂ S((m, n), d).
[1, 2]. Note that definitions of the one-sided continuity in
calculus are nicely incorporated in Definition D.20. • the set of all open squares in R2 is a basis of R2 ,
Definition D.23. A function f : X → Y with X ⊂ Rn and Bs = {S((a, b), d) : (a, b) ∈ R2 , d > 0}.
Y ⊂ Rm is continuous iff
Exercise D.31. Show that the closed balls (r > 0)
∀Ua ∈ γY , ∃Va ∈ γX s.t. f (Va ) ⊂ Ua , (D.7)
B̄(p, r) = {x ∈ Rn : kx − pk2 ≤ r} (D.13)
where γX and γY are sets of intersections of the open balls
to X and Y , respectively, do not form a basis of Rn . However, the following collection
is indeed a basis:
γX := {B(a, δ) ∩ X : a ∈ X, δ ∈ R+ };
γY := {B(f (a), ) ∩ Y : f (a) ∈ Y, ∈ R+ }. Bp = {B̄(a, r) : a ∈ Rn , r ≥ 0}, (D.14)
Definition D.24. A basis of neighborhoods (or a basis) on which is the union of all closed balls and all singleton sets.
a set X is a collection B of subsets of X such that
D.1.3 Open sets: from bases to topologies
• covering: ∪B = X, and
• refining: Definition D.32. A subset U of X is open (with respect
to a given basis of neighborhoods B of X) iff
∀U, V ∈ B, ∀x ∈ U ∩ V, ∃B ∈ B s.t. x ∈ B ⊂ (U ∩ V ).
∀x ∈ U ∃B ∈ B s.t. x ∈ B ⊂ U. (D.15)
Definition D.25. For two sets X, Y with bases of neigh- Lemma D.33. Each neighborhood in the basis B is open.
borhoods BX , BY , a surjective function f : X → Y is con-
tinuous iff Proof. This follows from B ⊂ B ∈ B and Definition
D.32.
∀U ∈ BY ∃V ∈ BX s.t. f (V ) ⊂ U. (D.8)
Exercise D.34. What are the open subsets of R with re-
Lemma D.26. If a surjective function f : X → Y is con- spect to the right rays in (D.9)?
tinuous in the sense of Definitions D.20 and D.21, then it is
continuous in the sense of Definition D.25. Lemma D.35. The intersection of two open sets is open.
Proof. By Definition D.24, the following collections are Proof. Let U1 and U2 be two open sets and fix a point
bases of X ⊆ Rm and Y = f (X) ⊆ Rn , respectively, x ∈ U1 ∩ U2 . By Definition D.32, there exists B1 , B2 ∈ B
such that x ∈ B1 ⊂ U1 and x ∈ B2 ⊂ U2 . Then Def-
BX = {B(a, δ) ∩ X : a ∈ X, δ > 0}; inition D.24 implies that there exists B3 ∈ B such that
BY = {B(b, ) ∩ Y : b ∈ Y, > 0}. x ∈ B3 ⊂ B1 ∩ B2 ⊂ U1 ∩ U2 . Then the proof is completed
by Definition D.32 and x being arbitrary.
The rest follows from Definitions D.25 and D.20.
Lemma D.36. The union of two open sets is open.
Example D.27. The right rays
Lemma D.37. The union of any collection of open sets is
BRR = {{x : x > s} : s ∈ R} (D.9) open.
Definition D.38. The topology of X generated by a basis
form a basis of R.
B is the collection T of all open subsets of X in the sense of
Exercise D.28. Prove that the set of all right half-intervals Definition D.32.
in R is a basis of neighborhoods:
Definition D.39. The standard topology is the topology
B = {[a, b) : a < b}. (D.10) generated by the standard Euclidean basis, which is the col-
lection of all open balls in X = Rn .
Example D.29. A basis on R2 is the set of all quadrants
Theorem D.40. The topology of X generated by a basis
Bq = {Q(r, s) : r, s ∈ R}, (D.11) satisfies
Q(r, s) = {(x, y) ∈ R2 : x > r, y > s}. (D.12) • ∅, X ∈ T ;
84
Qinghai Zhang Numerical Analysis 2021
Example D.41. The largest basis on a set X is the set of Lemma D.48. Let (X, T ) be a topological space. Suppose
all subsets of X, a collection of open sets C ⊂ T satisfies
and the topology it generates is called the discrete topol- Then C is a basis for T .
ogy, which coincides with the basis. This topology is more Proof. We first show that C is a basis. The covering rela-
economically generated by the basis of all singletons, tion holds trivially by setting U = X in (D.19). As for the
refining condition, let x ∈ C1 ∩ C2 where C1 , C2 ∈ C. Since
Bs (X) = {x} : x ∈ X . (D.17) C1 ∩C2 is open, (D.19) implies that there exists C3 ∈ C such
that x ∈ C3 ⊂ C1 ∩ C2 . Hence C is a basis by Definition
The smallest basis on X is simply {X} and the topol-
D.24.
ogy it generates is called the trivial/anti-discrete/indiscrete
Then we show the topology T 0 generated by C equals T .
topology Ta = {∅, X}.
On one hand, for any U ∈ T and any x ∈ U , by (D.19) there
Exercise D.42. Show that if U is open with respect to a exists C ∈ C such that x ∈ C ⊂ U . By Definitions D.32 and
basis B, then B ∪ {U } is also a basis. D.38, we have U ∈ T 0 . On the other hand, it follows from
Corollary D.47 that any W ∈ T 0 is a union of elements of
C. Since each element of C is in T , we have W ∈ T .
D.1.4 Topological spaces: from topologies
to bases Example D.49. The following countable collection
Definition D.43. For an arbitrary set X, a collection T B = {(a, b) : a < b, a and b are rational} (D.20)
of subsets of X is called a topology on X iff it satisfies the
following conditions, is a basis that generates the standard topology on R.
Lemma D.50. A collection of subsets of X is a topology
(TPO-1) ∅, X ∈ T ;
on X if and only if it generates itself.
(TPO-2) α ⊂ T ⇒ ∪α ∈ T ;
Proof. The necessity holds trivially since (TPO-1) implies
(TPO-3) U, V ∈ T ⇒ U ∩ V ∈ T . the covering condition and (TPO-3) implies the refining con-
The pair (X, T ) is called a topological space. The elements dition. As for the sufficiency, suppose U, V ∈ T . By Def-
of T are called open sets. inition D.32, U ∪ V is also open, hence U ∪ V ∈ T . This
argument holds for the union of an arbitrary number of open
Corollary D.44. The topology of X generated by a basis sets.
B as in Definition D.38 is indeed a topology in the sense of
Definition D.43. D.1.5 Generalized continuous maps
Proof. This follows directly from Theorem D.40. Definition D.51. The preimage of a set U ⊂ Y (or the
Example D.45. For each n ∈ Z, define fiber over U ) under f : X → Y is
( f −1 (U ) := {x ∈ X : f (x) ∈ U }. (D.21)
{n} if n is odd;
B(n) = (D.18) Exercise D.52. Show that the operation f −1 preserves in-
{n − 1, n, n + 1} if n is even.
clusions, unions, intersections, and differences of sets:
The topology generated by the basis B = {B(n) : n ∈ Z} is
B0 ⊆ B1 ⇒ f −1 (B0 ) ⊆ f −1 (B1 ),
called the digital line topology and we refer to Z with this
−1
f (B0 ∪ B1 ) = f −1 (B0 ) ∪ f −1 (B1 ),
topology as the digital line. (D.22)
f −1 (B0 ∩ B1 ) = f −1 (B0 ) ∩ f −1 (B1 ),
−1
−1 −1
Theorem D.46. A topology generated by a basis B equals f (B0 \ B1 ) = f (B0 ) \ f (B1 ).
the collection of all unions of elements of B. (In particular, In comparison, f only preserves inclusions and unions:
the empty set is the union of “empty collections” of elements
of B.)
A0 ⊆ A1 ⇒ f (A0 ) ⊆ f (A1 ),
f (A0 ∪ A1 ) = f (A0 ) ∪ f (A1 ),
Proof. Given a collection of elements of B, Lemma D.33 (D.23)
f (A0 ∩ A1 ) ⊆ f (A0 ) ∩ f (A1 ),
states that each of them belongs to T . Since T is a topology,
f (A0 \ A1 ) ⊇ f (A0 ) \ f (A1 ),
(TPO-2) implies that all unions of these elements are also
in T . Conversely, given an open set U ∈ T , we can choose where the equalities in the last two equations holds if f is
for each x ∈ U an element Bx ∈ B such that x ∈ Bx ⊂ U . injective.
Hence U = ∪x∈U Bx and this completes the proof. Lemma D.53. For a map f : X → Y , A ⊆ X, and B ⊆ Y ,
Corollary D.47. Let T be a topology on X generated by we have
the basis B. Then every open set U ∈ T is a union of some A ⊆ f −1 (f (A)), f (f −1 (B)) ⊆ B, (D.24)
basis neighborhoods in B. (In particular, the empty set is where the first inclusion is an equality if f is injective and
the union of “empty collections” of elements of B.) the second is an equaility if f is surjective or B ⊆ f (X).
85
Qinghai Zhang Numerical Analysis 2021
∀U ∈ BY , ∀x ∈ X satisfying f (x) ∈ U,
∃V ∈ BX satisfying x ∈ V s.t. f (V ) ⊂ U.
Example D.58. A continuous function is not necessarily
“well behaved,” as exemplified by the following space-filling
Hilbert curve.
86
Qinghai Zhang Numerical Analysis 2021
f −1 (U ) = f −1 (Y \ (Y \ U )) = X \ f −1 (Y \ U ).
D.1.9 Interior–Frontier–Exterior
Exercise D.67. Change the threshold value in Example
from 1/7 to 1/10 and repeat the entire process. What is Definition D.77. A point x ∈ X is an interior point of
the minimum value of q such that Tq on GC10 becomes the A if there is a neighborhood W of x that lies entirely in A.
discrete topology? The set of interior points of a set U is called its interior and
denoted by Int(U ).
D.1.8 Closed sets Lemma D.78. Int(A) is open for any A.
Definition D.68. A subset of X is called closed if its com- Proof. Exercise.
plement is open.
Example D.79. The interior of a closed ball is the corre-
Example D.69. The set
sponding open ball.
1
K= : n ∈ Z+ (D.28) Definition D.80. A point x ∈ X is an exterior point of A
n if there is a neighborhood W of x that lies entirely in X \ A.
is neither open nor closed. In comparison, K ∪ {0} is closed. The set of exterior points of a set U is called its exterior and
denoted by Ext(U ).
Theorem D.70. The set σ of all closed subsets of X satis-
fies the following conditions: Example D.81. The exterior of the set K in (D.28) is
R \ K \ {0}. Why not 0?
(TPC-1) ∅, X ∈ σ;
(TPC-2) α ⊂ σ ⇒ ∩α ∈ σ; Definition D.82. A point x is a closure point of A if each
neighborhood of x contains some point in A.
(TPC-3) U, V ∈ σ ⇒ U ∪ V ∈ σ.
Example D.83. Any point in the set K in (D.28) is a clo-
Example D.71. The following example shows that infinite
sure point of K, so is 0.
intersections of open sets might not be open and infinite
unions of closed sets might not be closed. Definition D.84. A point x is an accumulation point (or
\ 1 1 a limit point) of A if each neighborhood of x contains some
− , : n = 1, 2, . . . = {0}; point p ∈ A with p 6= x.
n n
Example D.85. The only accumulation point of the set K
[ 1 1
−1 + , 1 − : n = 1, 2, . . . = (−1, 1). in (D.28) is 0.
n n
87
Qinghai Zhang Numerical Analysis 2021
Example D.86. Each point in R is an accumulation point Corollary D.98. A subset of a topological space is closed
of Q. if and only if it contains all of its accumulation points.
Definition D.87. A point x in a set A is isolated if there Proof. If A is a superset of A0 , the set of all accumulation
exists a neighborhood of x such that x is the only point of points of A. We have A ∪ A0 = A = Cl(A) from Theorem
A in this neighborhood. D.97. Definition D.91 implies that A is closed.
Example D.88. Every point of the set K in (D.28) is iso- Suppose A is closed, but there is an accumulation point
lated. x of A such that x 6∈ A. By Definition D.84, in any neigh-
borhood of x there exists a point p ∈ A such that p 6= x;
Definition D.89. A point x is a frontier point of a set A
this contradicts the complement of A being open.
iff it is a closure point for both A and its complement. The
set of all frontier points is called the frontier Fr(A) of A.
Theorem D.90. For any set A in X, its interior, its fron- D.1.10 Hausdorff spaces
tier, and its exterior form a partition of X. Definition D.99. Suppose X is a set with a basis of neigh-
Proof. Consider an arbitrary point a ∈ X. If there exists borhoods γ. Let {xn : n = 1, 2, . . .} be a sequence of ele-
a neighborhood Na of a such that Na ⊂ A, then Definition ments of X and a ∈ X. Then we say the sequence converges
D.77 implies a ∈ Int(A). If Na ⊂ X \ A, then Definition to a, written
D.80 implies a ∈ Ext(A). Otherwise, for all neighborhoods
of a we have Na 6⊂ A and Na 6⊂ X \ A, which implies that lim xn = a, or xn → a as n → ∞,
n→∞
any Na contains points both from A and X \ A. The rest
follows from Definition D.89. iff
Exercise D.95. Prove Cl(A ∩ B) ⊂ Cl(A) ∩ Cl(B). What Exercise D.104. For metric topology, show that a func-
if we have infinitely many sets? tion f : X → Y is continuous if and only if the function
commutes with limits for any convergent sequence in X.
Theorem D.96. The interior of a set A is the largest open
set contained in A, Example D.105. When do we have xn → a for discrete
Int(A) = ∪{U : U ⊂ A, U is open in X}. (D.30) topology?
Theorem D.97. Let A0 be the set of accumulation points Example D.106. When do we have xn → a for anti-
of A. Then Cl(A) = A ∪ A0 . discrete topology?
Proof. Suppose x ∈ Cl(A), If x ∈ A, then x ∈ A ∪ A0 Definition D.107. A topological space (X, T ) is called a
trivially holds. Otherwise x 6∈ A, Definition D.91 dictates Hausdorff space iff
that its neighborhood must contain at least one point in A.
Hence Definition D.77 yields x ∈ A0 . In both cases we have ∀a, b ∈ X, a 6= b, ∃U, V ∈ T s.t. a ∈ U, b ∈ V, U ∩ V = ∅.
x ∈ A ∪ A0 . (D.32)
Conversely, suppose x ∈ A ∪ A0 . If x ∈ A, Lemma D.92
implies x ∈ Cl(A). If x ∈ A0 , x is an accumulation point of Lemma D.108. Every subset of finite points in a Hausdorff
A and is thus a closure point A. space is closed.
88
Qinghai Zhang Numerical Analysis 2021
Proof. By (TPC-3) in Theorem D.70, it suffices to show that Proof. For (TPO-1), we choose W = ∅, A. For (TPO-2),
every singleton set is closed. Consider X \ {x0 }. For any [ [
x 6= x0 , Definition states that there exists U ⊃ x, V ⊃ x0 (W ∩ A) = ( W ) ∩ A,
such that U ∩ V = ∅, hence x0 6∈ U and U ∈ X \ {x0 }. W ∈α W ∈α
Therefore X \ {x0 } is open. S
where W ∈α W is a subset of X. For (TPO-3),
Exercise D.109. Does there exist a topological space X
that is not Hausdorff but in which every finite point set is (U ∩ A) ∩ (V ∩ A) = (U ∩ V ) ∩ A,
closed?
where U ∩ V is a subset of X.
Definition D.110. A topological space is called a T1 space
iff every finite subset is closed in it. Definition D.116 (Subspace and subspace topology).
Given a topological space (X, T ) and a subset A ⊂ X, the
Theorem D.111. Let X be a T1 space and A a subset of
topological space (A, TA ) is called a subspace of X and the
X. A point x is an accumulation point of A if and only
topology TA in (D.34) is called the subspace topology or rel-
if every neighborhood of x intersects with infinitely many
ative topology induced by X.
points of A.
Proof. The sufficiency follows directly from Definition D.84. Theorem D.117. Let γX be a basis that generates the
As for the necessity, suppose there exists a neighborhood U topology TX on a topological space X. Then the subspace
of x such that (A \ {x}) ∩ U = {x1 , x2 , . . . , xm }. Then by topology on A induced by TX is equivalent to the subspace
Definition D.110 we know topology generated by γX . In other words, TA is generated
by γA .
U ∩ (X \ {x1 , x2 , . . . , xm }) = U ∩ (X \ (A \ {x})) open
γX TX
is an open set containing x, yet it does not contain any ∩A ∩A
points in A other than x. This contradicts the condition of γA
open
TA
x being an accumulation point of A.
Theorem D.112. A sequence of points in a Hausdorff space Proof. We first show that U is open with respect to (w.r.t.)
X converges to at most one point in X. γA for any given U ∈ TA . By Lemma D.115, there exists
U 0 ∈ TX such that U = U 0 ∩ A. The condition of γX being
Proof. By Definition D.99, a convergence to two points in a basis of X yields
X would be a contradiction to Definition D.107.
∀y ∈ U 0 , ∃B 0 ∈ γX s.t. y ∈ B 0 ⊂ U 0 ,
Proof. The covering condition for A holds because the cov- where Nx = Nx0 ∩ A for some Nx0 ∈ X. We then choose
ering condition of X holds. As for the refining condition, for
any U, V ∈ γA and any x ∈ U ∩ V , there exists U 0 , V 0 ∈ γX
[
U 0 := Nx0 .
such that U = U 0 ∩ A, V = V 0 ∩ A, and W 0 ⊂ U 0 ∩ V 0 for x∈U
some W 0 ∈ γX . Setting W := W 0 ∩ A and we have
Theorem D.46 implies that U 0 is open and U = U 0 ∩ A.
x ∈ W ⊂ (U 0 ∩ V 0 ) ∩ A = (U 0 ∩ A) ∩ (V 0 ∩ A) = U ∩ V,
Lemma D.118. Let A be a subspace of X. If U is open in
which completes the proof. A and A is open in X, then U is open in X.
Definition D.114. The topology generated by γA in (D.33) Proof. Since U is open in A, Definition D.116 yields
is called the relative topology or subspace topology on A gen-
erated by the basis γX of X. ∃U 0 ∈ X open s.t. U = U 0 ∩ A,
Lemma D.115. Consider a subset A of a topological space
the rest of the proof follows from A being open in X.
X. Suppose TX is a topology on X. Then
TA := {W ∩ A : W ∈ TX } (D.34) Lemma D.119 (Closedness in a subspace). Let A be a sub-
space of X. Then a set V ⊂ A is closed in A if and only if
is a topology on A. it equals the intersection of A with a closed subset of X.
89
Qinghai Zhang Numerical Analysis 2021
Proof. Suppose V is closed in A. Then Theorem D.125 (Restricting the domain). Any restriction
of a continuous function is continuous.
∃V 0 ⊂ A s.t. V ∪ V 0 = A, V 0 ∈ TA .
Proof. For any open set U in Y , we have i−1A (U ) = U ∩ A.
Since A is a subspace of X, we have from Definition D.116 The rest follows from the relative topology.
90
Qinghai Zhang Numerical Analysis 2021
Exercise D.130. Show that Lemma D.129 fails if A and B Lemma D.142. All closed intervals of a non-zero, finite
are not closed. length are homeomorphic.
Exercise D.131. Formulate the pasting lemma in terms of Lemma D.143. All open intervals, including infinite ones,
open sets and prove it. are homeomorphic.
Exercise D.132. What is the counterpart of the pasting Proof. The tangent function gives you a homeomorphism
lemma in complex analysis? between (− π2 , π2 ) and (−∞, +∞).
Definition D.133 (Expanding the domain). For A ⊂ X Lemma D.144. An open interval is not homeomorphic to
and a given function f : A → Y , a function F : X → Y is a closed interval (nor half-open).
called an extension of f if F |A = f . Definition D.145. The n-sphere is a subset in Rn+1 ,
Definition D.134. A function f : X → Y between topo- Its north pole is denoted by N = (0, 0, · · · , 0, 1).
logical spaces X and Y is called a homeomorphism iff f is Definition D.146. The stereographic projection
bijective and both f and f −1 are continuous. Then X and
Y are said to be homeomorphic or topologically equivalent, P : Sn \ N → Rn
written X ≈ Y .
is given by
Lemma D.135. If two spaces X and Y are homeomorphic,
then x1 x2 xn
P (x) := , ,..., . (D.41)
1 − xn+1 1 − xn+1 1 − xn+1
∀a ∈ X, ∃b ∈ Y s.t. X \ {a} ≈ Y \ {b}. (D.39)
Lemma D.147. The stereographic projection is a homeo-
Exercise D.136. Show that the function f : {A, B} → {C} morphism with its inverse as
given by f (A) = f (B) = C is continuous, but not a homeo- 1
P −1 (y) = 2y1 , 2y2 , . . . , 2yn , kyk2 − 1 . (D.42)
morphism. Hence a necessary condition for homeomorphism 2
1 + kyk
is the number of connected components.
Exercise D.148. Show that the 2-sphere and the hollow
Example D.137. Consider X the letter “T” and Y a line cube are homeomorphic by using the radial projection f ,
segment. They are not homeomorphic because removing
the junction point in T would result in three pieces while x
f (x) = . (D.43)
removing any point in the line segment yields at most two kxk
connected components.
Theorem D.149. Homeomorphisms form an equivalence
Exercise D.138. Classify the following symbols of the relation on the set of all topological spaces.
standard computer keyboard by considering them as 1- Proof. For a homeomorphism f : (X, T ) → (Y, T ), we can
X Y
dimensional topological spaces. define a function fT : TX → TY by setting fT (V ) := f (V ).
‘ 1 2 3 4 5 6 7 8 9 0 - = It is easy to show that fT is also a bijection from Definition
q w e r t y u i o p [ ] \ D.134.
a s d f g h j k l ; ’ Definition D.150. An embedding of X in Y is a function
z x c v b n m , . / f : X → Y that maps X homeomorphically to the subspace
~ ! @ # $ % ^ & * ( ) _ + f (X) in Y .
Q W E R T Y U I O P { } |
A S D F G H J K L : " Example D.151. For an embedding f : [0, 1] → X, its im-
Z X C V B N M < > ? age is called an arc in X. For an embedding f : S1 → X,
its image is called a simple closed curve in X.
Exercise D.139. Consider the identity function f = IX :
(X, T ) → (X, κ) where κ is the anti-discrete topology and
T is not. Show that f −1 is not continuous and hence f is D.3 A zoo of topologies
not a homeomorphism.
D.3.1 Hierarchy of topologies
Exercise D.140. Give an example of a continuous bijec-
tion f : X → Y that isn’t a homeomorphism; this time both Definition D.152. Suppose that T and T 0 are two topolo-
X and Y are subspaces of R2 . gies on a given set X. If T ⊃ T , we say that T 0 is
0
91
Qinghai Zhang Numerical Analysis 2021
Lemma D.153. Let B and B 0 be bases for the topologies Proof. For any x ∈ (a, b), we can always find [x, b) ∈ T` and
T and T 0 , respectively, on X. T 0 is finer than T if and only (a, b) ∈ TK such that x ∈ [x, b) ⊂ (a, b) and x ∈ (a, b) ⊂
if (a, b). On the other hand, for any x ∈ R and any neighbor-
hood [x, b) ∈ R` , no open interval in the standard topology
∀x ∈ X, ∀B ∈ B with x ∈ B, ∃B 0 ∈ B 0 s.t. x ∈ B 0 ⊂ B. simultaneously contains x and is a subset of [x, b). Similarly,
(D.44) for 0 ∈ R and BK := (−1, 1) \ K ⊃ {0}, no open interval
simultaneously contains 0 and is a subset of BK . Hence R`
Proof. The sufficiency U ∈ T ⇒ U ∈ T 0 follows directly
and RK are strictly finer than the standard topology on R.
from (D.44) and Definition D.152.
To show that R` and RK are not comparable, it suffices
As for the necessity, we start with given x ∈ X and
to give two examples. For any x ∈ K ⊂ R and any neighbor-
B ∈ B with x ∈ B. By Lemma D.33, B is open, i.e. B ∈ T .
hood [x, b) ∈ T` , no open sets in TK simultaneously contains
Then by hypothesis B ∈ T 0 . Definition D.32 implies that
x and is a subset of [x, b). Conversely, for 0 ∈ R and the
there exists B 0 ∈ B 0 such that x ∈ B 0 ⊂ B, which completes
above BK , no interval [a, b) ∈ T` simultaneously contains 0
the proof.
and is a subset of BK .
Exercise D.154. The bounded complements of all non- Exercise D.163. The topologies on R2 generated by the
degenerate Jordan curves form a basis of neighborhoods. Is open balls and the open squares are the same topology.
the topology generated by this basis finer than that gener-
ated by the open balls? Exercise D.164. Show that the collection
Definition D.155. The finite complement topology on X C = {[a, b) : a < b, a and b are rational} (D.49)
is is a basis that generates a topology TQ different from the
T = {U ⊂ X : U = ∅ or X \ U is finite}. (D.45) lower limit topology T` on R. Compare this to Example
The countable complement topology on X is D.49.
92
Qinghai Zhang Numerical Analysis 2021
Exercise D.172. Show that Definition D.165 does not gen- Proof. Let T denote the product topology in Definition
eralize to posets. D.176 and T 0 denote the topology generated by the sub-
basis (D.55). Every element in S belongs to T , so do any
Definition D.173. Let X be an ordered set and a ∈ X. unions of finite intersections of elements of S. Hence Defi-
The rays determined by a are the four subsets of X: nition D.59 yields T 0 ⊂ T .
Conversely, each element in the basis of T is an intersec-
(a, +∞) := {x : x > a}; (D.51a) tion of elements in S,
(−∞, a) := {x : x < a}; (D.51b) B × B = π −1 (B ) ∩ π −1 (B ),
1 2 1 1 2 2
[a, +∞) := {x : x ≥ a}; (D.51c) 0 0
hence B1 × B2 ∈ T and thus T ⊂ T .
(−∞, a] := {x : x ≤ a}. (D.51d)
Corollary D.183. The projections in Definition D.181 are
The first two are open rays while the last two closed rays. continuous (with respect to the product topology).
Proof. Consider π1 : X×Y → X. For each open set U ∈ TX ,
Exercise D.174. Show that the open rays form a subbasis Lemma D.182 and Definition D.61 imply that its preimage
for the order topology on X. under π1 is open in the product topology.
Definition D.175. The set [0, 1] × [0, 1] in the dictionary Theorem D.184 (Product of maps). Given f1 : A → X
order topology is called the ordered square, denoted by Io2 . and f2 : A → Y , the map f : A → X × Y with
f (a) := (f1 (a), f2 (a)) (D.56)
D.3.3 The product topology is continuous if and only if both f1 and f2 are continuous.
Definition D.176. Let X and Y be topological spaces. The Proof. Write f1 = π1 ◦ f and f2 = π2 ◦ f . The necessity
product topology on X × Y is the topology generated by the follows from Corollary D.183 and Theorem D.121.
basis As for the sufficiency, we need to show that the preim-
age f −1 (U × V ) of any basis element U × V is open. By
γ̄X×Y := {B1 × B2 : B1 ∈ TX , B2 ∈ TY }, (D.52) Definition D.176, U × V ∈ BX×Y implies that U ∈ TX and
V ∈ TY . By Definition D.51, any point a ∈ f −1 (U × V ) if
where TX and TY are topologies on X and Y , respectively. and only if f (a) ∈ U × V , which, by (D.56), is equivalent to
f1 (a) ∈ U and f2 (a) ∈ V . Hence, we have
Exercise D.177. Check that γ̄X×Y in (D.52) is indeed a
basis. f −1 (U × V ) = f1−1 (U ) ∩ f2−1 (V ).
The rest of the proof follows from the conditions of both f1
Exercise D.178. Give an example that γ̄X×Y is not a and f2 being continuous.
topology.
Example D.185. A parametrized curve γ(t) = (x(t), y(t))
Exercise D.179. The product of two Hausdorff spaces X is continuous if and only if both x and y are continuous.
and Y is Hausdorff.
D.3.4 The metric topology
Theorem D.180. Let X and Y be topological spaces with
bases γX and γY , respectively. Then the set Definition D.186. For a metric d on X in Definition C.74,
the number d(x, y) is called the distance between x and y.
γX×Y := {B1 × B2 : B1 ∈ γX , B2 ∈ γY }, (D.53) Definition D.187. In a metric space (X , d), an open ball
Br (x) centered at x ∈ X with radius r is the subset
is a basis for the topology of X × Y .
Br (x) := {y ∈ X : d(x, y) < r} . (D.57)
Proof. Both the covering and refining conditions hold triv- Lemma D.188. If d is a metric on X, then the collection
ially. of all open balls is a basis on X.
Definition D.189. The topology on X generated by the
Definition D.181. For topological spaces X and Y , the
basis of all open balls in Definition D.187 is called the met-
functions π1 : X × Y → X and π2 : X × Y → Y given by
ric topology induced by the metric d.
π1 (x, y) = x, π2 (x, y) = y (D.54) Lemma D.190. A set U is open in the metric topology
induced by d if and only if
are called the projections of X × Y onto its first and second ∀x ∈ U, ∃r > 0 s.t. Br (x) ⊂ U.
factors, respectively.
Definition D.191. A topological space X is said to be
Lemma D.182. The product topology on X × Y is the metrizable if there exists a metric d on X that induces the
same as the topology generated by the subbasis topology of X. A metric space is a metrizable topological
space together with a specific metric d that gives the topol-
S := π1−1 (U ) : U ∈ TX ∪ π2−1 (V ) : V ∈ TY . (D.55) ogy of X.
93
Qinghai Zhang Numerical Analysis 2021
Definition D.192. A point x in a normed space X is an Example D.202. The rationals Q are not connected. The
interior point of A if there is an open ball Br (x) that lies only connected subspaces are the one-point spaces: for
entirely in A. The set of interior points of a set U is called Y = {p, q} ⊂ Q with p < q, choose an irrational number
its interior and denoted by Int(U ). a ∈ (p, q) and write
Definition D.193. A point x in a normed space X is an Y = (Y ∩ (−∞, p)) ∪ (Y ∩ (q, +∞)).
exterior point of A if there is an open ball Br (x) that lies
entirely in X \ A. The set of exterior points of a set U is According to Definition D.196, this separation implies that
called its exterior and denoted by Ext(U ). Y is not connected.
Definition D.194. For metric spaces (X, d1 ) and (Y, d2 ), Theorem D.203. Connectedness is preserved by continu-
a function f : X → Y is continuous iff ous functions; i.e., the image of a connected space under a
continuous map is connected.
∀ > 0 ∀x ∈ X ∃δ > 0 s.t. ∀y ∈ X (D.58)
Proof. Let X be a connected space and f : X → Y a contin-
d1 (x, y) < δ ⇒ d2 (f (x), f (y)) < uous function. We show that the image space Z := f (X) is
Definition D.195. For metric spaces (X, d1 ) and (Y, d2 ), connected. Suppose Z is not connected. Then there exists
a function f : X → Y is uniformly continuous iff disjoint nonempty open sets U, V such that Z = U ∪ V . By
Definition D.54, f −1 (U ) and f −1 (V ) are disjoint open sets
∀ > 0 ∃δ > 0 s.t. ∀x, y ∈ X (D.59) and X = f −1 (U ) ∪ f −1 (V ), which contradicts the condition
d1 (x, y) < δ ⇒ d2 (f (x), f (y)) < . of X being connected.
Theorem D.204 (Intermediate value theorem (general-
D.4 Connectedness ized)). Let f : X → Y be a continuous function where
X is a connected space and Y is an ordered set in the order
Definition D.196. Let X be a topological space. A sepa- topology. If a and b are two points of X and if r is a point
ration of X is a pair U, V of disjoint nonempty open subsets of Y lying between f (a) and f (b), then there exists a point
of X whose union is X. A topological space is connected if c of X such that f (c) = r.
there does not exist a separation of X. Definition D.205. A path in a topological space X is a con-
Exercise D.197. Why do we define the separation as a pair tinuous map f : I → X, where I := [0, 1], and x0 := f (0)
of disjoint open sets? Can we define separation using closed and x1 := f (1) are called its initial point and final point,
sets? respectively.
Example D.198. A space X with indiscrete topology is Definition D.206. A space X is path-connected if, for ev-
connected, since there exists no separation of X. ery x0 , x1 ∈ X, there exists a path from x0 to x1 .
Lemma D.199. For a subspace Y of X, a separation of Exercise D.207. Prove that if [a, b] is path-connected, so
Y is a pair of disjoint nonempty sets A and B such that are (a, b) and [a, b).
A ∪ B = Y and neither of them contains a limit point of the Theorem D.208. Path-connectedness is preserved by con-
other. The space Y is connected if there exists no separation tinuous functions; i.e., the image of a path-connected space
of Y . under a continuous function is path-connected.
Proof. Suppose first that A and B form a separation of Y . Proof. Let X be a connected space and f : X → Y
By Definition D.196, A and B are both open in Y . Further- a continuous function. We show that the image space
more, A is also closed since its complement B is open in Y . Z := f (X) is connected. Any C, D ∈ Z have their preim-
Thus A = A and B ∩ A = ∅. ages A = f −1 (C) ∈ X and B = f −1 (D) ∈ X. The path-
Conversely, suppose A ∪ B = Y , A ∩ B = ∅, and connectedness of X implies that there exists a continuous
B ∩ A = ∅. Then we have function q : [0, 1] → X such that q(0) = A and q(1) = B.
A ∩ Y = A ∩ (A ∪ B) = (A ∩ A) ∪ (A ∩ B) = A. By Theorem D.121, the composition p = f q is continuous,
p(0) = f (q(0)) = f (A) = C, and p(1) = f (q(1)) = f (B) =
Similarly, B ∩ Y = B. Both A and B are closed in Y , hence D. Hence Z is path-connected by Definition D.206.
they are both open in Y and form a separation of Y .
Lemma D.209. Every path-connected space is connected.
Example D.200. Let Y = [−1, 1] be a subspace of X = R.
Proof. Suppose a topological space X is not connected but
The sets [−1, 0] and (0, 1] are disjoint and nonempty, but
path-connected. Then there exists a separation U, V of
they do not form a separation of Y because [−1, 0] is not
X such that X = U ∪ V . Consider an arbitrary path
open in Y . Alternatively, one can use Lemma D.199 to say
f : [0, 1] → X. Since f ([0, 1]) is a continuous image of a
that [−1, 0] contains a limit point 0 of the other set (0, 1].
connected set, we know from Theorem D.203 that f ([0, 1])
Example D.201. Let Y = [−1, 0) ∪ (0, 1]. Each of the is connected, hence it must lie entirely in either U or V .
sets [−1, 0) and (0, 1] is nonempty and open in Y ; therefore, Consequently, there is no path in X joining a point of A to
they form a separation of Y . Again, an alternative argument a point of B, contradicting the condition of X being path-
utilizes Lemma D.199. connected.
94
Qinghai Zhang Numerical Analysis 2021
Exercise D.210. A connected space is not necessarily path- Definition D.218. A space X is called locally path-
connected, c.f. the topologist’s sine curve. The space connected at x iff for every neighborhood U of x, there exists
a path-connected neighborhood V of x contained in U . X is
1 locally path-connected iff it is locally path-connected at each
S= x, sin : x ∈ (0, 1] . (D.60)
x of its points.
is connected because it is the image of the connected space Theorem D.219. A space X is locally connected if and
(0, 1] under a continuous map. Hence the closure of S only if for every open set U of X, each component of U is
open in X.
S = S ∪ {(0, y) : y ∈ [−1, 1]}. (D.61) Theorem D.220. A space X is locally path-connected if
and only if for every open set U of X, each path component
is also connected in R2 . But S is not path-connected. Can of U is open in X.
you prove it?
Theorem D.221. Each path component of a topological
Exercise D.211. Deduce Theorem C.39 from Theorem space X lies in a component of X. If X is locally path con-
D.208. nected, then the components and the path components of
X are the same.
Theorem D.212 (Fixed points in one dimension). Every
Proof. The first statement follows from Lemma D.209. Let
continuous function f : [−1, 1] → [−1, 1] has a fixed point.
C be a component of X, P be a path-component of X. If
Proof. If f (−1) = −1 or f (1) = 1, we are done; otherwise there is a point x ∈ P and x ∈ C, we have P ⊂ C. Suppose
we have f (−1) = a > −1 and f (1) = b < 1. Hence none of P 6= C. Let Q be the union of all other path components
the following two disjoint sets is empty, of X, each of which intersects C and thus lies in C. Hence
we have C = P ∪ Q. By Theorem D.220 and the local path-
A := {(x, f (x)) : f (x) > x}, B := {(x, f (x)) : f (x) < x}. connectedness of X, each path component of X must be
open in X. Thus P and Q constitute a separation of X,
By Theorems D.184 and D.203, the graph of f , contradicting the connectedness of C.
Exercise D.213. Prove Theorem D.212 via connectedness. Definition D.223. A collection α of subsets of a topolog-
ical space X is said to cover X, or to be a covering of X,
Definition D.214. The equivalence classes resulting from if the union of all elements of α equals X; it is an open
connectedness and path-connectedness are called compo- covering of X if each element of α is an open subset of X.
nents and path components, respectively.
Definition D.224. An (open) cover of a subset X in a
Example D.215. The topologist’s sine curve S in Exercise topological space Y is a collection α of (open) subsets in Y
D.210 has only one component, but has two path compo- such that X ⊂ ∪α. A subcover of X is a subcollection of a
nents S and V := S \ S. Note that S is open in S but not cover that also covers X.
closed, while V is closed in S but not open. Example D.225. Consider K in (D.28) and X = K ∪ {0}.
If one forms a space from S by deleting all points of V An open cover of K in R is {Un : n ∈ N+ } where
having rational second coordinate, one obtains a space that
has only one component but uncountably many path com- 1 1 1
Un = − n , + n , n := ;
ponents. n n n(n + 1)
elements of this open cover are pairwise disjoint for all n > 1.
Definition D.216. A space X is called locally connected at
An open cover of X in R is {Un : n ∈ N+ } ∪ (−, ) with
x iff for every neighborhood U of x, there exists a connected
:= N1 for some N ∈ N+ .
neighborhood V of x contained in U . X is locally connected
iff it is locally connected at each of its points. Example D.226. Consider K in Example D.225 as a space
with relative topology induced from R. Each singleton set
Example D.217. Q is neither connected nor locally con-
1
nected; the subspace [−1, 0) ∪ (0, +1] is not connected but sn :=
locally connected; the topologist’s sine curve is connected n
but not locally connected; each interval and each ray in the is open in K since sn = Un ∩ K and Un is open in R. Hence
real line is both connected and locally connected. {sn : n ∈ N+ } is an infinite open cover of K.
95
Qinghai Zhang Numerical Analysis 2021
Exercise D.227. Consider X in Example D.225 as a space Definition D.234. A topological space is said to be locally
with relative topology induced from R. Is the collection compact at x iff there is some compact subspace C of X that
contains a neighborhood of x; it is locally compact iff it is
{{0}} ∪ {sn : n ∈ N+ } locally compact at each of its points.
an open cover of X? If not, can you find an infinite open
Example D.235. The real line R is not compact, but lo-
cover of X whose elements are pairwise disjoint for suffi-
cally compact. The subspace Q is not locally compact.
ciently large n? If not, can you give a finite open cover of
X? Theorem D.236. A topological space X is locally compact
Exercise D.228. What is the crucial difference between K Hausdorff if and only if there exists a compact Hausdorff
and X in the space R in terms of covers and subcovers? space Y such that X is a subspace of Y and Y \ X consists
of a single point.
Proof. For any open cover U of X, there exists an element
of U containing all but finite many of the points 1/n. Hence, Proof. Munkres p. 183.
we have a finite subcover in U for X. This is not true for
K. Definition D.237. If Y is a compact Hausdorff space and
X is a proper subspace of Y such that X = Y , then Y is said
Definition D.229. A compact topological space is a topo- to be a compactification of X. In particular, if Y \ X is a
logical space X where every open cover of X has a finite singleton set, then Y is called the one-point compactification
subcover. of X.
Lemma D.230. A subspace Y of a topological space X is
Example D.238. In Example D.225, X is the one-point
compact if and only if every open cover of Y contains a finite
compactification of K.
subcover of Y .
Lemma D.231. If X is a compact subset of a space Y , Example D.239. The one-point compactification of the
then X is compact in relative topology. real line R is homeomorphic with the circle. Similarly, the
one-point compactification of the complex plane is homeo-
Theorem D.232 (Bolzano-Weierstrass). In a compact morphic with the sphere S2 . The Riemann sphere is the
space, every infinite subset has an accumulation point. space C ∪ {∞}.
Proof. Suppose there exists an infinite subset A that does
Theorem D.240. Let X be a Hausdorff space. Then X is
not have an accumulation point. Then we can construct an
locally compact if and only if, given x ∈ X and a neighbor-
open cover of X
hood U of x, there is a neighborhood V of x such that V is
α = {Ux : x ∈ X}
compact and V ⊂ U .
such that there is at most one element of A in an element
of α. By compactness, α contains a finite subcover α0 that Corollary D.241. Let X be locally compact Hausdorff and
covers X. However, since each element in the finite set α0 let A be a subspace of X. If A is closed in X or open in X,
only has one element of A and α0 covers A, A must be finite, then A is locally compact.
which contradicts the condition of A being infinite.
Corollary D.242. A space X is homeomorphic to an open
Corollary D.233. In a compact space, every sequence has subspace of a compact Hausdorff space if and only if X is
a convergent subsequence. locally compact Hausdorff.
96
Appendix E
Functional Analysis
Example E.1. A copper mining company mines in a moun- Exercise E.3. Let X be the set of all bounded and un-
tain that has an estimated total amount of Q tonnes of cop- bounded sequences of complex numbers. Show that the fol-
per. Let x(t) denote the amount of copper removed during lowing is a metric on X ,
the period [0, t], with x(0) = 0 and x(T ) = Q. Assume x is ∞
a continuous function [0, T ] → R and the cost of extracting X 1 |ξj − ηj |
d(x, y) = , (E.3)
copper per unit tonne at time t is 2j 1 + |ξj − ηj | j=1
0
c(t) = ax(t) + bx (t), (E.1) where x = (ξj ) and y = (ηj ).
p
where a, b ∈ R+ . What is the optimal mining operation x(t) Definition E.4. For a real number p ≥ 1, the ` sequence
p
that minimizes the cost function space is the metric space (` , d) with
Z T
∞
(ax(t) + bx0 (t))x0 (t)dt?
f (x) =
X
`p := (ξj )∞
j=1 : ξj ∈ C; |ξj |p < ∞ ; (E.4)
0
j=1
1 + 1/p
In math terms, we would like to minimize f : CQ →R
[0, T ]
∞
1
X
where CQ [0, T ] is the set of continuously differentiable func- d(x, y) = |ξj − ηj |p , (E.5)
tions x : [0, T ] → R satisfying x(0) = 0 and x(T ) = Q. j=1
In calculus, the minimizer x∗ of a function f ∈ C 2 is
usually found by the condition f 0 (x∗ ) = 0 and f 00 (x∗ ) > 0. where x = (ξj ) and y = (ηj ) are both in X . In particular,
However, the above problem does not fit into the usual the Hilbert sequence space `2 is the `p space with p = 2.
framework of calculus, since x is not a number but a function
Definition E.5. A pair of conjugate exponents are two real
that belongs to an infinite-dimensional function space. Solv-
numbers p, q ∈ [1, ∞] satisfying
ing this problem requires a number of techniques in func-
tional analysis. 1 1
p + q = pq, i.e., + = 1. (E.6)
p q
E.1 Normed and Banach spaces Lemma E.6. Any two positive real numbers α, β satisfy
αp βq
E.1.1 Metric spaces αβ ≤ + , (E.7)
p q
Definition E.2. The `∞ sequence space is a metric space where p and q are conjugate exponents and the equality
(`∞ , d), where `∞ is the set of all bounded sequences of holds if β = αp−1 .
complex numbers,
Proof. By (E.6), we have
`∞ := (ξ1 , ξ2 , . . .) : ∃cx ∈ R, s.t. sup |ξi | ≤ cx . (E.2)
i∈N+
u = tp−1 ⇒ t = uq−1 .
u = tp−1 u = tp−1
and the metric is given by
β β
2 2
d(x, y) = sup |ξi − ηi | u u
1
i∈N+ 1
0 0
0 α 0 α
where y = (η1 , η2 , . . .) ∈ X . t t
97
Qinghai Zhang Numerical Analysis 2021
where k · kp is the Euclidean norm. For p, q ∈ (1, ∞), the Lemma E.15. Any open ball in a normed space is a convex
equality in (E.9) holds if set as in Definition 1.18.
Proof. For α ∈ [0, 1] and x, y ∈ Br (0), we have
∃c ∈ R s.t. ∀j = 1, . . . , n, |xj |p = c|yj |q . (E.10)
kαx + (1 − α)yk ≤ kαxk + k(1 − α)yk
Pn Pn
Proof. If j=1 |xj |p = 0 or j=1 |yj |p = 0 or p = ∞ or ≤ αkxk + (1 − α)kyk < αr + (1 − α)r = r,
q = ∞, then (E.9) holds trivially. Otherwise we define
where we have applied properties of norms. Hence αx + (1 −
|xi |p |yi |q α)y ∈ Br (0).
ai := Pn p
, bi := Pn q
.
j=1 |xj | j=1 |yj | Exercise E.16. Show that the Euclidean norm k · kp in
Example E.11 satisfies a monotonicity property:
It follows from (E.8) that
1 ≤ p ≤ q ≤ ∞ ⇒ ∀x ∈ Rn kxkq ≤ kxkp .
|xi yi | |xi |p |yi |q
≤ Pn + Pn . n
P
n
p1 P
n
q1 p j=1 |xj |p q j=1 |yj |q Example E.17. (R , k · k∞ ) is a normed space, where k·k∞
j=1 |xj |p j=1 |yj |q is the Euclidean norm in Definition B.114:
98
Qinghai Zhang Numerical Analysis 2021
Example E.21. For the `p space in (E.4) with p ∈ [1, ∞), Definition E.33. A normed space is separable if it has a
( ) countable dense set.
X
p p
` := (an )n∈N : an ∈ C; |an | < ∞ , Example E.34. By Definitions E.29 and E.33, Lp (Ω) is
n∈N
separable since the set of all polynomials with rational co-
we have efficients is countable and is dense in Lp (Ω).
99
Qinghai Zhang Numerical Analysis 2021
E.1.5 Sequential compactness Proof. The necessity follows from Lemma E.47, we only
prove the sufficiency. Any sequence (xn )n∈N ⊂ K is
Definition E.41. A subset K of a normed space (X, k·k) is
bounded because K is bounded. Then Lemma E.45 dic-
sequentially compact if every sequence in K has a convergent
tates that (xn )n∈N has a convergent subsequence. Because
subsequence that converges in K,
each term xn ∈ K and K is closed, Corollary D.98 implies
∀(xn )n∈N ⊂ K, ∃nk : N → N, ∃L ∈ K s.t. lim xnk = L. that the limit point of (xn )n∈N is also in K. The proof is
n→+∞ then completed by Definition E.41.
(E.24)
Example E.49. The intervals (a, b], [a, b), (−∞, b], and
Example E.42. Any interval [a, b] is sequentially compact
[a, +∞) are not sequentially compact in R.
in R. Indeed, any sequence in [a, b] is bounded, and by the
Bolzano-Weierstrass theorem (Theorem C.13) it has a con- Definition E.50. The Cantor set is a subset of R give by
vergent subsequence, of which the limit must be in [a, b], C := ∩+∞n=1 Fn where F1 = [0, 1] and each Fn+1 is obtained
thanks to the completeness of R (Theorem C.15), by deleting from Fn the open middle third of each closed
Example E.43. (a, interval.
b−a
b) is not sequentially compact since the
sequence a + 2n n∈N+ is contained in (a, b), but its limit Example E.51. The Cantor set is an intersection of closed
a is not contained in (a, b). set and thus it is closed. It is also bounded and thus it is
Example E.44. R is not sequentially compact because the sequentially compact.
sequence (n)n∈N in R cannot have a convergent subsequence: Corollary E.52. A subset K of a finite-dimensional
the distance bewteen any two terms on any subsequence is normed space X is sequentially compact if and only if K
at least 1. is closed and bounded.
Lemma E.45. Every bounded sequence in Rn has a con-
Example E.53. The closed unit ball in (C[0, 1], k · k∞ )
vergent subsequence.
Proof. We prove this statement by an induction on n. The K := {f ∈ C[0, 1] : kf k∞ ≤ 1} (E.25)
induction basis is the Bolzano-Weierstrass theorem (Theo-
rem C.13). Suppose the statement holds for n ≥ 1. For a is closed and bounded, but K is not sequentially compact.
bounded sequence (xm )m∈N ⊂ Rn+1 , we split each xm as Consider the hat function
xm = (αn , βn ), where αn ∈ Rn and βn ∈ R. Since xm
x−an
bn −an x ∈ [an , bn ],
is bounded and kαn k2 ≤ kxm k2 , αm is also bounded. By
x−cn
the induction hypothesis, (αm )m∈N has a convergent sub- Bn (x) = bn −cn x ∈ [bn , cn ], (E.26)
sequence, say (αmk )k∈N , that converges to α ∈ Rk . Then 0 otherwise,
(βmk )k∈N is bounded and by Theorem C.13 it has a con-
vergent subsequence (βmkp )p∈N that converges to β ∈ R. where an = 1 − 21n , cn = an+1 , and bn = an +c
2
n
. Then the
Therefore we have sequence (Bn )n∈N has no convergent subsequence.
lim xmkp = lim αmkp , βmkp = (α, β) ∈ Rn+1 , Example E.54. The closed unit ball in `2 ,
p→∞ p→∞
K := x ∈ `2 : kxk2 ≤ 1 ,
which completes the proof. (E.27)
Theorem E.46. In a metric space, sequential compactness is closed and bounded, but is not sequentially compact. For
is equivalent to compactness.
en = (0, · · · , 0, 1, 0, · · · , 0) ∈ K ⊂ `2
Lemma E.47. A sequentially compact subset K of a
normed space X must be closed and bounded. where all terms are zero except than the nth term is 1, the
sequence (en )n∈N+ has no convergent subsequence.
Proof. Supppose K is compact but not bounded. Then
Example E.55. The Hilbert cube in the normed space `2 ,
∀n ∈ N, ∃xn ∈ K s.t. kxn k ≥ n.
1
Hence no subsequence of (xn )n∈N ⊂ K converges and this C := (xn )n∈N+ : xn ∈ 0, , (E.28)
n
contradicts the compactness of K.
For any convergent sequence (xn )n∈N ⊂ K, Definition can be shown to be a sequentially compact subset.
E.41 implies that it has a convergent subsequence that con-
verges in K. The uniqueness of limit (Lemma C.6) dictates Definition E.56. An open cover of a topological space X
that the two sequences converge to the same limit in K. is collection of open subsets of X such that any element of
Now that any convergent sequence converges to some limit X belongs to some open subset in the collection.
point in K, Corollary D.98 implies that K is closed.
Definition E.57. A subset K in a topological space is com-
Theorem E.48. A subset K of Rn is sequentially compact pact if and only if every open cover of K has a finite sub-
if and only if K is closed and bounded. cover.
100
Qinghai Zhang Numerical Analysis 2021
E.1.6 Continous maps of normed spaces yet kfn − 0k can be made arbitrarily small as n → ∞. In
contrast, D : (C 1 [a, b], k · k1,∞ ) → (C[a, b], k · k∞ ) is continu-
Definition E.58. Let X and Y be normed spaces. A func-
ous because
tion f : X → Y is continuous at x0 ∈ X iff
∀ > 0, ∃δ = , s.t. ∀f, g ∈ C 1 [0, 1], kf − gk1,∞ < δ ⇒
∀ > 0, ∃δ > 0 s.t.
kDf − Dgk∞ = kf 0 − g 0 k∞ ≤ kf − gk1,∞ < δ = .
∀x ∈ X, kx − x0 kX < δ ⇒ kf (x) − f (x0 )kY < . (E.29)
Exercise E.65. Show that the arc length function L :
The function f : X → Y is continuous iff it is continuous at C 1 [0, 1] → R,
every x0 ∈ X. Z 1p
Lemma E.59. Let X and Y be normed spaces. A function L(f ) := 1 + (f 0 (t))2 dt, (E.32)
0
f : X → Y is continuous at x ∈ X iff, for any sequence with
limn→∞ xn = x, we have limn→∞ f (xn ) = f (x). is not continuous if the norm of C 1 [0, 1] is k · k∞ , whereas it
is continuous if we equip C 1 [0, 1] with (E.31).
Exercise E.60. Prove Lemma E.59.
Exercise E.66. Is the function S : (c00 , k · k∞ ) → (R, | · |),
Lemma E.61. The norm function k · k is continuous. ∞
X
Proof. By Definition E.58, we have lim ku − uk = 0 S ((a )
n n∈N ) = a2n (E.33)
n→∞ n
n=1
from limn→∞ un = u. The rest of the proof follows from the
backward triangle inequality (E.12). continuous?
Exercise E.62. For V = C[0, 1] and x0 ∈ [0, 1], define a Theorem E.67. A map f : X → Y between normed spaces
function `x0 : V → R as is continuous if and only if the preimage f −1 (V ) of each open
set V in Y is open in X.
`x0 (v) = v(x0 ).
Corollary E.68. A map f : X → Y between normed spaces
Show that `x0 is continuous on C[0, 1]. is continuous if and only if the preimage f −1 (V ) of each
closed set V in Y is closed in X.
Example E.63. The function S : (C[0, 1], k·k∞ ) → (R, |·|),
Lemma E.69. If f : X → Y and g : Y → Z are continu-
Z 1 ous functions between normed spaces, then the composition
S(f ) = f 2 (x)dx, (E.30) map g ◦ f : X → Z is continuous.
0
Lemma E.70. Let X, Y be normed spaces and let K be a
is continuous. Indeed, for any g ∈ C[0, 1], we have compact subset of X. If f : X → Y is continuous at each
Z 1 Z 1 x ∈ K, then f (K) is a compact subset of Y .
2 2
|S(f ) − S(g)| =
f (x)dx − g (x)dx
0 0
Proof. For a sequence (yn )n∈N ⊂ f (K), there exists for each
Z 1 n ∈ N an xn ∈ K such that f (xn ) = yn . This defines a
≤ |f (x) − g(x)| |f (x) − g(x) + 2g(x)| dx sequence (xn )n∈N ⊂ K. Because K is compact, Definition
0 E.41 implies the existence of a subsequence (xnk )k∈N that
Z 1
≤ kf − gk∞ (kf − gk∞ + 2kgk∞ ) dx, converges to L ∈ K. Since f is continuous, Lemma E.59
0 implies that (yn )n∈N converges to f (L) ∈ f (K).
which implies Theorem E.71 (Weierstrass). Suppose K is a nonempty
compact subset of a normed space X and the function
f : X → R is continuous at each x ∈ K. Then
∀ > 0, ∃δ = min 1, s.t. kf − gk∞ < δ ⇒
1 + 2kgk∞
f (a) = max{f (x) : x ∈ K},
kf − gk∞ (kf − gk∞ + 2kgk∞ ) < kf − gk∞ (1 + 2kgk∞ ) < ∃a, b ∈ K s.t.
f (b) = min{f (x) : x ∈ K}.
⇒ |S(f ) − S(g)| < .
Proof. It suffices to only prove the first clause. By Lemma
Example E.64. The differentiation map E.70, f (K) is compact, and thus by Lemma E.47 f (K) is
bounded. f (K) is also nonempty because K is nonempty.
d
: (C 1 [a, b], k · k∞ ) → (C[a, b], k · k∞ ) Then Theorem A.28 implies that f (K) ⊂ R must have a
dt unique supremum
is not continuous, but can be made continuous if we change
M := sup{f (x) : x ∈ K} ∈ R,
the norm on C 1 [a, b] to
and hence there exists a sequence (xn )n∈N ⊂ K satisfying
kf k1,∞ := kf k∞ + kf 0 k∞ . (E.31) lim
n→∞ f (xn ) = M . By Definition E.41, (xn )n∈N has a con-
vergent subsequence (xnk )k∈N that converges to some c ∈ K.
Indeed, for fn (t) = √1n cos(2πnt), we have
The continuity of f , Lemma E.59 and Lemma C.9 yield
√
∀n ∈ N+ , kfn0 − 00 k∞ = 2π n > 1, lim f (xnk ) = f (c) = M.
k→∞
101
Qinghai Zhang Numerical Analysis 2021
Example E.72. Since the set K = {x ∈ R3 : kxk2 = 1} is Corollary E.79. Over a finite dimensional space, any two
P3
compact in R3 and the function x 7→ j=1 xj is continuous, norms are equivalent.
the optimization problem
Proof. This follows from Theorem E.78 and the isomor-
P3 phism of linear spaces.
minimize j=1 xj ,
subject to kxk2 = 1, Example E.80. In the normed space V := C[0, 1], consider
a sequence of functions {un } given by
has a minimizer. (
1 − nx, x ∈ [0, n1 ];
un (x) :=
E.1.7 Norm equivalence 0, x ∈ ( n1 , 1].
Example E.73. The optimal mining problem in Example For the p-norm in (E.14), we have
E.1 concerns C 1 [a, b]. Since C 1 [a, b] is a subspace of C[a, b], 1
we could use the norm k·k∞ for C[a, b] as a norm for C 1 [a, b]. kun kp = [n(p + 1)]− p
But by Example E.64, the differentiation map would not be
continuous; instead, if we equip C 1 [a, b] with (E.31), then the and thus the sequence {un } converges to u = 0. However,
differentiation map is continuous. Also, it might be more ap- for the ∞-norm in (E.15), we have
propriate to regard two functions in C 1 [a, b] as being close kun k∞ = 1
to each other if both their function values and their function
derivatives are close. and thus the sequence {un } does not converge to u = 0.
Therefore all the Euclidean `p norms are equivalent. Example E.83 (Q is not complete). The sequence
102
Qinghai Zhang Numerical Analysis 2021
In particular, set m = N , let n → ∞, and we have Example E.88. For p ∈ [1, ∞), C(Ω), k · kp is not a Ba-
nach space. Consider (un )n∈N ⊂ C[0, 1] given by
∀t ∈ [a, b], |fN (t) − f (t)| ≤ kfN − fn k∞ < .
3
0,
x ∈ [0, 21 − 2n1
];
n−1 1 1 1 1
The condition of fN ∈ C[a, b] implies un (x) = nx − 2 , x ∈ [ 2 − 2n , 2 + 2n ];
(E.38)
x ∈ [ 12 + 2n
1
1 , 1].
∀t ∈ [a, b],∀ > 0, ∃δ > 0 s.t.
(un )n∈N is clearly Cauchy and we have
|t − τ | < δ ⇒ |fN (t) − fN (τ )| < .
3 (
The above two equations yield 0, x ∈ [0, 12 );
lim un = u(x) =
n→∞ 1, x ∈ ( 12 , 1].
∀t ∈ [a, b], ∀ > 0,∃δ > 0 s.t. |t − τ | < δ ⇒
|f (t) − f (τ )| ≤ |f (t) − fN (t)| + |fN (t) − fN (τ )| But u(x) cannot be in C(Ω) no matter how we define u( 21 ).
+ |fN (τ ) − f (τ )| Exercise E.89. Show that the sequence space (`p , k · kp ) is
complete for p ∈ [1, +∞].
≤ + + = ,
3 3 3
Theorem E.90. In a Banach space, absolutely convergent
which shows that f is continuous at every t ∈ [a, b]. series converge. More precisely, if (xn )n∈N is a sequence in
Finally, we show that {fn }n≥1 indeed converges to f . a Banach space (X, k · k) such that P∞ kx k converges,
n=1 n
The sequence {fn } ⊂ C[a, b] being Cauchy implies P∞
then n=1 xn converges in X. Furthermore,
∀ > 0, ∃N ∈ N s.t. ∀m, n > N, kfm − fn k∞ < .
∞
X
X
∞
xn
≤ kxn k . (E.39)
For a fixed n > N , we have
n=1
n=1
∀m > N, ∀t ∈ [a, b], |fn (t) − fm (t)| ≤ kfn − fm k∞ < , Proof. Since XPis Banach, it suffices to prove that the se-
n
quence (sn = i=1 xi )n∈N is Cauchy. Since the real se-
Pn
which implies quence (σn = i=1 kxi k)n∈N is Cauchy, we have
n
∀t ∈ [a, b], |fn (t) − f (t)| = f (t) − lim fm (t) < .
X
m→∞ ∀ > 0, ∃N ∈ N+ s.t. ∀n > m > N, kxi k < ,
i=m+1
It follows that
Pn
which implies that (sn = i=1 xi )n∈N is Cauchy:
kfn − f k∞ = max |fn (t) − f (t)| < .
t∈[a,b]
n
X
n
X
xi
≤ kxi k < .
In the above process, we could have fixed any n > N at the
i=m+1 i=m+1
outset to obtain the same result. Therefore we have
Pn
Set L := i=m+1 xi and we have
∀ > 0, ∃N ∈ N s.t. ∀n > N, kfn − f k∞ < ,
∀ > 0, ∃N ∈ N s.t. ∀n > N, ksn − Lk < ,
which implies limn→∞ fn = f .
which implies
Exercise E.86. Define Cb [0, ∞) as the set of all functions
f that are continuous on [0, ∞) and satisfy ∞
X
kLk ≤ ksn − Lk + ksn k < + σn < + kxn k ,
kf k∞ := sup |f (x)| < ∞. n=1
x≥0
where the second inequality follows from the triangle in-
Show Cb [0, ∞) with this norm is complete. equality and the third from n being a finite number. Then
(E.39) holds because can be made arbitrarily small.
Exercise E.87. Define C α [a, b] as the set of all functions
f ∈ C[a, b] satisfying Example E.91. The series
P∞ 1
P∞ 1n=1 n2 sin(nx) converges in
2π], k · k∞ ) since
(C[0, P n=1 n2 converges in R. Hence
|f (x) − f (y)| ∞
x 7→ n=1 n12 sin(nx) defines a continuous function.
Mα (f ) := sup < ∞.
x,y∈[a,b];x6=y |x − y|α
Exercise E.92. Prove the converse of Theorem E.90, i.e., a
Define kf kα = kf k∞ + Mα (f ). Show that (C α [a, b], k · kα ) is normed space X is complete if every absolutely convergent
a Banach space. series converges in X.
103
Qinghai Zhang Numerical Analysis 2021
Theorem E.93. For each normed space V , there exists an- Example E.98. The linear map T : (C[a, b], k · k∞ ) → R
other normed space W and a dense subspace Vb ⊂ W such given by T (f ) = b f (t)dt is continuous because
R
a
that one can find an isometric isomorphism between V and
Vb , i.e., a bijective linear function I : V → Vb satisfying
Z
Z b b
|T (f )| = f (t)dt ≤ kf k∞ dt = (b − a)kf k∞ .
a a
∀v ∈ V, kIvkW = kvkV . (E.40)
Furthermore, the complete normed space W is unique up to By Lemma E.59, T preserves convergent sequences:
the isometric isomorphism. Z b Z b
lim fn = f ⇒ lim fn = f.
Definition E.94. The normed space W in Theorem E.93 n→∞ n→∞ a a
is called the completion of the normed space V .
In other words, the continuity of T under k · k∞ guarantees
Example E.95. If V is the normed space Q of rational that T and limn→∞ are commutative; see Section C.7.
numbers, then W = R is a completion of Q, where each ele-
ment is an equivalence class of Cauchy sequences of rational Theorem E.99 (Existence and uniqueness of ODEs). The
numbers. IVP
dx
(t) = f (x(t), t) (E.44)
dt
E.2 Continuous linear maps with initial condition x(0) = x0 ∈ R has a unique solution
x ∈ C 1 [0, T ] for some T > 0, if f : R × [0, ∞) → R is
E.2.1 The space CL(X, Y ) Lipschitz continuous in space and continuous in time.
Notation 11. CL(X, Y ) denotes the set of all continu- Proof. For existence, we define y0 (t) = x0 and
ous linear transformations or bounded linear transforma- Z t
tions from the normed space X to the normed space Y , (∗) : yn+1 (t) = x0 + f (yn (τ ), τ )dτ.
0
CL(X, Y ) := C(X, Y ) ∩ L(X, Y ). (E.41)
1
For any t ∈ [0, 2L ] where L is the Lipschitz constant,
For Y = X, we write CL(X). R
t
|yn+1 (t) − yn (t)| = 0 f (yn (τ ), τ ) − f (yn−1 (τ ), τ )dτ
Theorem E.96. For any map T ∈ L(X, Y ), the following Rt
statements are equivalent: ≤ 0 |f (yn (τ ), τ ) − f (yn−1 (τ ), τ )| dτ
Rt
≤ 0 L|yn (τ ) − yn−1 (τ )|dτ
(1) T is continuous, Rt
≤ 0 Lkyn − yn−1 k∞ dτ
(2) T is continuous at 0,
≤ 21 kyn − yn−1 k∞ .
(3) ∃M ∈ R+ s.t. ∀x ∈ X, kT xkY ≤ M kxkX .
Hence we have
Proof. (1)⇒(2) follows from Definition E.58. For (2)⇒(3),
the continuity of T at 0 implies 1 1
kyn+1 − yn k∞ ≤ kyn − yn−1 k∞ ≤ n ky1 − y0 k∞ .
2 2
for = 1, ∃δ > 0 s.t. kxk < δ ⇒ kT xk < 1.
It follows that (yn )n∈N is a Cauchy sequence and there
Replacing x with y = 2δ kxk
x
in the above inequalities yields exists y ∈ C 1 [0, T ] such that limn→∞ yn = y. Similarly,
2 (f (yn , t))n∈N is a Cauchy sequence and there exists f (y, t)
kT xk ≤ M kxk with M = δ . Finally, (3)⇒(1) follows from
such that limn→∞ f (yn , t) = f (y, t). Take limn→∞ (∗), apply
Example E.98, and we have
∀ > 0, ∃δ = s.t. kx − yk < δ ⇒
M Z t
(∗∗) : y(t) = x0 + f (y(τ ), τ )dτ.
kT x − T yk = kT (x − y)k ≤ M kx − yk = kx − yk < .
δ 0
Example E.97. The left shift operator L : `2 → `2 and It is trivial to check that the above y(t) solves (E.44).
right shift operator R : `2 → `2 , For uniqueness, suppose for two solutions x and y of
(E.44) there exists t∗ ∈ (0, T ) satisfying
L(a1 , a2 , a3 , . . .) = (a2 , a3 , . . .), (E.42)
R(a , a , a , . . .) = (0, a , a , . . .), (E.43) t∗ := max{t ∈ [0, T ] : ∀τ ≤ t, y(τ ) = x(τ )}.
1 2 3 1 2
104
Qinghai Zhang Numerical Analysis 2021
1
to obtain t∗ + LN ≤ T . Then (∗∗) implies Exercise E.108. Prove Lemma E.107.
1
∀t ∈ [t∗ , t∗ + LN Lemma E.109. The operator norm k · k : CL(X, Y ) → R,
R],
t
|x(t) − y(t)| = t∗ [f (x(τ ), τ ) − f (y(τ ), τ )] dτ
∀T ∈ CL(X, Y ), kT k := sup kT xk : x ∈ X, kxk ≤ 1 ,
Rt Rt
≤ t∗ |f (x(τ ), τ ) − f (y(τ ), τ )| dτ ≤ t∗ L |x(τ ) − y(τ )| dτ (E.46)
M is well defined, i.e., kT k is a unique bounded real number.
≤ LM (t − t∗ ) ≤ N,
Proof. By Theorem A.28, it suffices to show that
which yields M ≤ M N and contradicts N ≥ 2. Hence the
uniqueness is proved by the non-existence of such a t∗ .
S := kT xk : x ∈ X, kxk ≤ 1 (E.47)
Example E.100. The continuity of differentiation maps in is a nonempty bounded subset of R. S is nonempty because
Example E.64 can be determined by Theorem E.96. For 0 ∈ X and T 0 = 0Y , imply 0 ∈ S. The boundedness of S
k · k1,∞ , we have kDxk∞ ≤ kxk1,∞ , and thus the operator follows from Theorem E.96(3) and kxkX ≤ 1.
D : (C[0, 1], k · k1,∞ ) → (C[0, 1], k · k∞ ) is continuous.
In comparison, D : (C[0, 1], k · k∞ ) → (C[0, 1], k · k∞ ) Lemma E.110. For any T ∈ CL(X, Y ), we have
is not continuous: for xn = tn , we have kxn k∞ = 1 yet
limn→∞ kx0n k = ∞. (∀x ∈ X, kT xk ≤ M kxk) ⇒ kT k ≤ M. (E.48)
Corollary E.101. For finite-dimensional normed spaces X Proof. M is an upper bound of the set S in (E.47) while
and Y , we have L(X, Y ) = CL(X, Y ). kT k is the least upper bound of S.
Proof. Each linear transformation TA ∈ L(Rn , Rm ) has a Lemma E.111. ∀T ∈ CL(X, Y ), ∀x ∈ X, kT xk ≤ kT kkxk.
matrix A ∈ Rm×n such that
2 Proof. The statement holds trivially for x = 0. Otherwise
x
Xm Xn for y = kxk we have kT yk ∈ S where S is in (E.47). Hence
kTA xk22 = kAxk22 = aij xj
i=1 j=1 kT yk ≤ kT k ⇒ kT xk ≤ kT kkxk.
2 2
m n n n
m X
X X X X Lemma E.112. ∀S ∈ CL(X, Y ), ∀T ∈ CL(Y, Z), we have
≤ aij xj = kxk22 a2ij ,
kST k ≤ kSkkT k.
i=1 j=1 j=1 i=1 j=1
Exercise E.103. For A, B ∈ C[a, b] and Since Y is complete, (Tn x)n∈N converges to some L(x) ∈ Y .
This defines a map T (x) = L(x).
S := {f ∈ C 1 [a, b] : f (a) = f (b) = 0}, (E.45) The second step is to show T ∈ CL(X, Y ).
The third step is to show limn→∞ Tn = T .
show that the map L : (S, k · k1,∞ ) → R given by
Z b Exercise E.116. Supplement the proof of Lemma E.115
0 with all details.
L(f ) = A(t)f (t) + B(t)f (t) dt
a
Corollary E.117. If X is a normed space over R, then the
is a bounded linear transformation. dual space of X, X 0 = CL(X, R), is a Banach space with the
Exercise E.104. For A ∈ Rm×n , show that the subspace operator norm.
ker A is closed in Rn . Proof. This follows directly from Lemma E.115.
Theorem E.105. Every subspace of Rn is closed.
Corollary E.118. If X is a Banach space, then CL(X) is
Exercise E.106. Prove Theorem E.105. a Banach space with the operator norm.
Lemma E.107. CL(X, Y ) is a subspace of L(X, Y ). Proof. This follows directly from Lemma E.115.
105
Qinghai Zhang Numerical Analysis 2021
∀u, v ∈ V, kuvk ≤ kukkvk. (E.50) Proof. Suppose that no Fn contains any nonempty open set.
Then Lemma E.123 implies that X \ Fn is dense in X for
A Banach algebra is a normed algebra that is complete. each n ∈ N. Therefore we have
∃x1 ∈ (X \ F1 ), ∃r1 > 0 s.t. B(x1 , r1 ) ⊂ (X \ F1 ).
E.2.2 The topology of CL(X, Y ) Both B(x1 , r1 ) and (X \ F1 ) are open and thus their inter-
Notation 12. For a vector space X and its subsets A, A1 , section D2 := B(x1 , r1 ) ∩ (X \ F1 ) is also open. Hence,
A2 , we write r
1
∃x2 ∈ D2 , ∃r2 ∈ 0, s.t. B(x2 , r2 ) ⊂ D2 ;
2
∀α ∈ R, αA := {αa, a ∈ A};
∀w ∈ X, A + w := {a + w, a ∈ A}. Proceed inductively and we have
r
∀A1 , A2 ⊂ X, A1 + A2 := {a1 + a2 : a1 ∈ A1 , a2 ∈ A2 }.
n−1
∃xn ∈ Dn , ∃rn ∈ 0, s.t. B(xn , rn ) ⊂ Dn ,
(E.51) 2
where Dn := B(xn−1 , rn−1 ) ∩ (X \ Fn−1 ). By construction,
Definition E.121. A linear map T : X → Y between
n > m implies B(xn , rn ) ⊂ B(xm , rm ) and
normed spaces X and Y is open if its image of any open
set is open. r1
kxn − xm k < rm < .
2m−1
Lemma E.122. Let X and Y be normed spaces. A
Hence (xn )n∈N is a Cauchy sequence and converges to x in
bounded linear map T ∈ CL(X, Y ) is open if and only if
the Banach space X. For any m ∈ N, we have
the image of the unit open ball in X under T contains some
open ball centered at 0Y in Y , i.e., x ∈ B(xm , rm ) ⊂ (X \ ∪m
i=1 Fm ) ,
106
Qinghai Zhang Numerical Analysis 2021
To sum up the above arguments, we have Proof. T S = I implies ker S = {0} because
107
Qinghai Zhang Numerical Analysis 2021
Proof. For the bijective map A, define a map B : Y → X, E.2.4 Series of operators
∀v ∈ X, A(Bv) = v. Theorem E.141 (Neumann series). Suppose X is a Banach
space and A ∈ CL(X) has kAk < 1. Then we have
The existence and uniqueness of Bv are guaranteed by the
surjectivity and injectivity of A. Therefore, AB = I. Fur- (NST-1) I − A is invertible in CL(X),
thermore, BA = I follows from the injectivity of A and P∞
(NST-2) (I − A)−1 = I + A + · · · + An + · · · = n=0 An ,
∀v ∈ X, A(BAv) = (AB)Av = Av.
(I − A)−1
≤ 1 .
(NST-3) 1−kAk
Finally, Lemma E.133 implies that B is a linear map. Proof. Since X is a Banach space, Corollary E.118 states
Theorem E.135. Suppose X and Y are finite-dimensional that CL(X) is also P a Banach space. By Theorem E.90,
∞ n
normed spaces. Then a map A ∈ CL(X, Y ) is invertible with the convergence of n=0 kAk implies that the sequence
A−1 ∈ CL(Y, X) if and only if A is bijective. (Sn )n∈N with
Xn
Proof. This follows from Lemmas E.132, E.133, E.134, and Sn = Ak
Corollary E.101. k=0
Example E.136. The map A : c00 → c00 given by converges to some S ∈ CL(X). It follows that
x2 x3 Pn+1
∀(xn )n∈N ∈ c00 , A(x1 , x2 , x3 , . . .) = (x1 , , , . . .) Sn A = ASn = k=1 Ak = Sn+1 − I
2 3 (
kASn − ASk ≤ kAkkSn − Sk,
is linear, bijective, and continuous (since kAxk∞ ≤ kxk∞ ). ⇒
kSn A − SAk ≤ kAkks − Sn k.
However, it is not invertible in CL(c00 ). Suppose it is and
B ∈ CL(c00 ) is the inverse of A. Then for the sequences ⇒ SA = AS = S − I
em := (0, . . . , 0, 1, 0, . . .) where all terms are 0 except that ⇒ (I − A)S = I = S(I − A)
P∞
the mth term is 1, we have ⇒ (I − A)−1 = S = n=0 An ,
kBk
1 = kem k∞ = kBAem k∞ ≤ kBkkAem k∞ = where the last step follows from Definition E.130. Finally,
.
m
(NST-3) follows from
Hence ∀m ∈ N, kBk ≥ m and this contradicts Lemma E.109.
∞ ∞ ∞
−1
X
n
X n X n 1
Theorem E.137 (Banach). For Banach spaces X and Y ,
(I − A)
=
A
≤ kA k ≤ kAk =
a map T ∈ CL(X, Y ) is invertible with T −1
∈ CL(Y, X) if
n=0
n=0 n=0
1 − kAk
and only if T is bijective.
where the first inequality follows from Theorem E.90 and
Proof. The necessity follows from Lemma E.132. For suffi- the second inequality from Lemma E.112.
ciency, the bijective map T induces a map T −1 : Y → X,
Theorem E.142. Suppose X is a Banach space. Then the
∀y = T x ∈ Y, T −1 (y) = x. exponential of A ∈ CL(X), defined as
Since the bijectiveness of T guarantees that T −1 is well de- ∞
fined, T −1 is indeed an inverse of T . By Lemma E.133, T −1
X 1 n
eA := A , (E.54)
is linear. It remains to show that T −1 is continuous. By n=0
n!
the surjectivity of T and Theorem E.126, T is open. Hence
T (U ) is open whenever U is open. Meanwhile we have converges in CL(X).
−1
T −1 (U ) = y ∈ Y : T −1 y ∈ U Proof. By Lemma E.112, we have
= {y ∈ Y : y ∈ T (U )} = T (U ). ∞
1 n
X ∞
X 1 n
−1
A
≤ kAk = ekAk .
Thus T is continuous by Theorem E.67.
n!
n!
n=0 n=0
108
Qinghai Zhang Numerical Analysis 2021
Corollary E.145. For a Banach space X and A ∈ CL(X), an open ball B(0X , δ). Consequently, x ∈ B(0X , δ) ⊂ Fn
eA is always invertible with its inverse as e−A . implies
Theorem E.146 (Existence and uniqueness of ODEs). For kxk < δ ⇒ ∀i ∈ I, kTi xk ≤ n.
a Banach space X and A ∈ CL(X), the IVP δ x
Thus for any x ∈ X, there exists y = 2 kxk such that
dx
(t) = Ax(t) (E.57) 2n
dt ∀i ∈ I, kTi yk ≤ n ⇒ kTi xk ≤ kxk,
δ
with initial condition x(0) = x0 ∈ X has a unique solution
x(t) = etA x0 for t ∈ R. and the proof is completed by Lemma E.110.
Proof. If x(t) solves (E.57), then
Example E.149. Many PDEs can be written in the form
d −tA d
(e x(t)) = e−tA (−A)x(t) + e−tA (x(t)) = 0,
dt dt T x = y,
which implies e−tA x(t) = x0 and thus x(t) = etA x0 .
where y is a known vector incorporating initial and bound-
ary conditions, x is the unknown, and T is a continuous lin-
E.2.5 Uniform boundedness ear operator. If the PDE is well-posed, we can often assume
Lemma E.147. Suppose X is a normed space and a subset that T is a bijection, hence by Theorem E.137 the inverse
of T is a bounded linear operator and we write x = T −1 y.
A ⊂ X satisfies
In numerically solving the PDE, we usually approximate y
• A is symmetric, i.e., −A = A; by a grid function yn and approximate T −1 by a discrete
• A is mid-point convex, i.e., ∀x, y ∈ A, x+y
∈ A, operator Tn−1 . The convergence usually means
2
• there exists a nonempty open set U ⊂ A. ∀y ∈ C r (Ω), lim yn = y, lim Tn−1 yn = x,
n→∞ n→∞
Then there exists δ > 0 such that B(0X , δ) ⊂ A.
i.e., supn→∞ kTn−1 yn k < ∞. Theorem E.148 then implies
Proof. For α 6= 0 and a ∈ X, the maps x 7→ x + a and supn∈N kTn−1 k < ∞, which usually implies some form of nu-
x 7→ αa are both continuous with continuous inverses. By merical stability.
Theorem E.67, U being open in X implies that its preimage
U + {−a} under x 7→ x + a is also open in X. Adopting Theorem E.150 (Banach-Steinhaus). Suppose X and Y
notations in (E.51), we find that the set are Banach spaces. If a sequence (Tn )n∈N ∈ CL(X, Y ) has
U + (−A) := ∪a∈A (U + {−a})
∀x ∈ X, lim kTn xk < ∞,
n→∞
is open since it is a union of open sets. For a ∈ U , we have
then the map x 7→ limn→∞ Tn x belongs to CL(X, Y ).
a − a U + (−A) A + (−A) A+A
0X = ∈ ⊂ = = A,
2 2 2 2 Proof. Clearly the map T (x) = limn→∞ Tn x is linear, it
where the last two equalities follows from A being symmetric remains to show that it is continuous. Because the limit
and mid-point convex, respectively. The proof is completed limn→∞ Tn x exists, we have supn∈N kTn xk < ∞ for all
by Lemma D.190 and U +(−A) being open. x ∈ X. Then Theorem E.148 implies supn∈N kTn k < ∞.
2
Hence
Theorem E.148 (Uniform boundedness principle). Sup-
pose X is a Banach space and Y is a normed linear space. ∃M ∈ R, ∀x ∈ X, ∀n ∈ N, kTn xk ≤ M kxk;
For a family of maps Ti ∈ CL(X, Y ), i ∈ I, “pointwise
boundedness” implies “uniform boundedness,” the limit of the above yields ∀x ∈ X, kT xk ≤ M kxk. The
proof is completed by Theorem E.96.
∀x ∈ X, sup kTi xk < +∞ ⇒ sup kTi k < +∞.
i∈I i∈I
Proof. For any given n ∈ N, we define E.3 Dual spaces of normed spaces
Fn := ∩i∈I {x ∈ X : kTi xk ≤ n} = {x ∈ X : sup kTi xk ≤ n}. Definition E.151. The dual space of a normed vector space
i∈I
X over a field F, written X 0 , is the normed space CL(X, F)
As intersection of closed sets, each Fn is closed. By point- equipped with the operator norm. The elements in X 0 are
wise boundedness, we have X = ∪n∈N Fn . The Baire theo- called bounded linear functionals.
rem E.124 implies that there exists some Fn that contains
a nonempty open subset. Since Fn is also symmetric and Theorem E.152 (Isomorphism of (`p )0 and `q ). (`p )0 ' `q
mid-point convex, Lemma E.147 implies that Fn contains where p and q are conjugate exponents with p 6= ∞.
109
Qinghai Zhang Numerical Analysis 2021
Proof. Consider a given T ∈ CL(`p , R). Let ek ∈ `p denote By Definition A.45, ĝ is an upper bounded of C with respect
the sequence (0, · · · , 0, 1, 0, · · · ) with the kth term being 1 to ≺. Then Zorn’s lemma (Axiom A.46) implies that Ep (f )
and all other terms being 0. Define a function φ : (`p )0 → `q , has a maximal element T .
We claim that dom(T ) = X. Suppose that this is not
φ(T ) = y := (T ek )k∈N+ , true, then we can choose y1 6= 0 in X \ dom(T ) to consider
and we need to show y ∈ `q . the subspace Y1 = span(y1 , dom(T )). Any x ∈ Y1 can be
For p = 1, we have y ∈ ` because∞ uniquely expressed as
Pn 1
Indeed, x = y + αy1 = z + βy1 implies y − z = (β − α)y1
For p > 1, we have ( k=1 |yk |q ) q ≤ kT k for n > 1 since with y − z ∈ dom(T ), y1 6∈ dom(T ), and thus we must have
n n y − z = 0 = (β − α)y1 . We then extend T to a linear
functional g1 : Y1 → R that satisfies
X X
|yk |q = |yk |q−1 T (ek )sgn(yk )
k=1 k=1
n
! (∗) : g1 (x) = g1 (y + αy1 ) = T (y) + αc
X
q−1
=T |yk | ek sgn(yk )
where c is some real constant to be determined later. Since
k=1
n
T ≺ g1 , it suffices to prove that g1 ∈ Ep (f ) because this
would contradict the conclusion in the first paragraph that
X
q−1
≤ kT k
|yk | ek sgn(yk )
k=1
T is a maximal element of Ep (f ).
p
! p1 ! p1 It remains to show that there exists some choice of c such
n n
X
p(q−1)
X
q
that ∀x ∈ dom(g1 ), g1 (x) ≤ p(x). For a, b ∈ dom(T ),
= kT k |yk | = kT k |yk | .
k=1 k=1 T (a) − T (b) ≤ T (a − b) ≤ p(a − b) ≤ p(a + y1 ) + p(−y1 − b),
Then y ∈ `q follows from taking the limit case of n → ∞.
which implies
It is straightforward to show that φ is a continuous linear
bijection. The proof is completed by Definition E.138. −T (b) − p(−y1 − b) ≤ −T (a) + p(y1 + a).
E.3.1 The Hahn-Banach theorems Since a, b ∈ dom(T ) are arbitrary, we choose any c satisfying
110
Qinghai Zhang Numerical Analysis 2021
Proof. First, we prove the simple case of F = R, c.f. Exer- there exists some a ∈ `1 such that ϕa = Λ in Example E.158.
cise B.3. Since F = R yields f (x) ≤ |f | ≤ p(x), Theorem However, this cannot hold for the sequences
E.155 implies that we can extend f (x) to a linear functional
T : X → R with T (x) ≤ p(x). Then (E.60) follows from ek = (0, · · · , 0, 1, 0, · · · ) ∈ `∞
−T (x) = T (−x) ≤ p(−x) = | − 1|p(x) = p(x). where the kth term is 1 and all other terms being 0 because
For F = C, let u(x) be the real part of f (x). Lemma ∀n ∈ N, 0 = Λ(en ) = λ(en ) = ϕa (en ) = an
B.48 states that f (x) = u(x) − iu(ix). Since u(x) is a real
⇒ a = 0 ⇒ Λ = 0,
linear functional satisfying |u(x)| ≤ |f (x)| ≤ p(x), the first
paragraph implies that we can extend u(x) to a real linear 0
but Λ(1, 1, ·, ) = 1 shows that Λ 6= 0. To sum up, (`∞ ) 6' `1
functional Tu : X → R satisfying (E.60). Then we construct ∞ 0 1
because (` ) is “bigger” than ` .
a map Tf : X → C by Tf (x) = Tu (x) − iTu (ix). By Lemma
B.49, Tf is a complex linear functional. Furthermore, the
polar form Tf (x) = |Tf (x)|eiθ yields E.3.2 Bounded linear functionals
|Tf (x)| = Tf (x)e−iθ = Tf (xe−iθ ) = uf (xe−iθ ) Theorem E.160. For any x0 6= 0 of a normed space X,
≤ p(xe −iθ
) = |e −iθ
|p(x) = p(x). there exists a bounded linear functional T on X such that
111
Qinghai Zhang Numerical Analysis 2021
where Pn = {t0 = a, t1 , . . . , tn = b} is a partition of [a, b] as where ξt ∈ BV [a, b] is a charateristic function of [a, t],
in Definition C.62 and P[a, b] is the set of all such partitions.
(
Definition E.166. A function w : [a, b] → R is said to have 1 if y ∈ [a, t];
ξt (y) :=
bounded variation on [a, b] if var(w) is finite on [a, b]. 0 otherwise.
Lemma E.167. Denote by BV [a, b] the set of all functions
of bounded variations on [a, b] with the usual pointwise op- Hence for a partition Pn = {t0 = a, t1 , . . . , tn = b} we have
erations. Then (BV [a, b], k · k) is a normed space where
ξt = · · · = ξtj−1 = 0,
0
∀w ∈ BV [a, b], kwk := |w(a)| + var(w). (E.66) ξtj = · · · = ξtn = 1,
(∗) : ∀t ∈ (tj−1 , tj ],
ξ t − ξtj−1 = 1,
j
Exercise E.168. Prove Lemma E.167.
∀i 6= j, ξti − ξti−1 = 0.
112
Qinghai Zhang Numerical Analysis 2021
Second, we claim that var(w) ≤ kT k∞ . For any n ∈ N, Definition E.170, and limn→∞ T (zn ) = limn→∞ RSn (x; w):
n
X n
X |T (zn ) − RSn (x; w)|
|w(tj ) − w(tj−1 )| = |T (ξt1 )| + T (ξtj ) − T (ξtj−1 )
n
X
j=1 j=2
n
= [x(tj ) − x(tj−1 )] [w(tj ) − w(tj−1 )]
X j=1
= T (ξt1 )u1 + uj T (ξtj ) − T (ξtj−1 )
j=2 n
X
n
≤h(Pn ) [w(tj ) − w(tj−1 )] .
X j=1
= T ξt1 u1 + uj ξtj − ξtj−1
j=2
Finally, Lemma E.172 and (E.69) yield
Xn
≤ kT k
ξt1 u1 +
uj ξtj − ξtj−1
|f (x)| ≤ max |x(t)|var(w) = kxk∞ var(w).
t∈[a,b]
j=2
∞
≤ kT k, Take supremum over all x ∈ C[a, b] of unit norm and we
have kf k∞ ≤ var(w). Then the proof is completed by
where in the second step uj ∈ C is given by kf k∞ ≥ var(w), the conclusion of the second paragraph.
By definition of ξt and (∗), we have zn (a) = x(a) and ∀x ∈ X, ∀ψ ∈ Y 0 , (T 0 ψ)(x) = ψ(T x). (E.70)
• ∀ψ ∈ Y 0 , T 0 ψ ∈ X 0 ;
Hence zn is bounded. Furthermore,
• T 0 ∈ CL(Y 0 , X 0 ).
∀ > 0, ∃δ > 0 s.t. max |tj − tj−1 | < δ ⇒ kzn − xk∞ < .
j Example E.178. A dual operator D0 of the differentiation
above equation and applying Lemma E.59, w ∈ BV [a, b], ∀y ∈ C[0, 1], ψ(y) = y(t)dwψ .
0
113
Bibliography
M. Eigen. Selforganization of matter and the evolution of biological macromolecules. Naturwissenschaften, 58:465–523,
1971.
W. Fontana and P. Schuster. Continuity in evolution: on the nature of transitions. Science, 280:1451–5, 1998a.
W. Fontana and P. Schuster. Shaping space: the possible and the attainable in RNA genotype-phenotype mapping. J.
Theor. Biol., 194:491–515, 1998b.
B. M. R. Stadler, G. Wagner, and W. Fontana. The topology of the possible: Formal spaces underlying patterns of
evolutionary change. J. Theor. Biol., 213:241–274, 2001.
114