0% found this document useful (0 votes)
29 views

Intro To Comp

Uploaded by

murilofear
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Intro To Comp

Uploaded by

murilofear
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed.

Yang-Comp-Maths2014 page v

Preface

Computational mathematics is essentially the foundation of modern scien-


tific computing. Traditional ways of doing sciences consist of two major
paradigms: by theory and by experiment. With the steady increase in
computer power, there emerges a third paradigm of doing sciences: by
computer simulation. Numerical algorithms are the very essence of any
computer simulation, and computational mathematics is just the science of
developing and analyzing numerical algorithms.
The science that studies numerical algorithms is numerical analysis or
more broadly computational mathematics. Loosely speaking, numerical
algorithms and analysis should include four categories of algorithms: nu-
merical linear algebra, numerical optimization, numerical solutions of dif-
ferential equations (ODEs and PDEs) and stochastic data modelling.
Many numerical algorithms were developed well before the computer
was invented. For example, Newton’s method for finding roots of nonlinear
equations was developed in 1669, and Gauss quadrature for numerical in-
tegration was formulated in 1814. However, their true power and efficiency
have been demonstrated again and again in modern scientific computing.
Since the invention of the modern computer in the 1940s, many numerical
algorithms have been developed since the 1950s. As the speed of com-
puters increases, together with the increase in the efficiency of numerical
algorithms, a diverse range of complex and challenging problems in math-
ematics, science and engineering can nowadays be solved numerically to
very high accuracy. Numerical algorithms have become more important
than ever.
The topics of computational mathematics are broad and the related
literature is vast. It is often a daunting task for beginners to find the
right book(s) and to learn the right algorithms that are widely used in

v
November 3, 2014 11:34 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page vi

vi Introduction to Computational Mathematics

computational mathematics. Even for lecturers and educators, it is no


trivial task to decide what algorithms to teach and to provide balanced
coverage of a wide range of topics, because there are so many algorithms
to choose from.
The first edition of this book was published by World Scientific Publish-
ing in 2008 and it was well received. Many universities courses used it as
a main reference. Constructive feedbacks and helpful comments have also
been received from the readers. This second edition has incorporated all
these comments and consequently includes more algorithms and new algo-
rithms to reflect the state-of-the-art developments such as computational
intelligence and swarm intelligence.
Therefore, this new edition strives to provide extensive coverage of
efficient algorithms commonly used in computational mathematics and
modern scientific computing. It covers all the major topics including
root-finding algorithms, numerical integration, interpolation, linear algebra,
eigenvalues, numerical methods of ordinary differential equations (ODEs)
and partial differential equations (PDEs), finite difference methods, finite
element methods, finite volume methods, algorithm complexity, optimiza-
tion, mathematical programming, stochastic models such as least squares
and regression, machine learning such as neural networks and support vec-
tor machine, computational intelligence and swarm intelligence such as
cuckoo search, bat algorithm, firefly algorithm as well as particle swarm
optimization.
The book covers both traditional methods and new algorithms with
dozens of worked examples to demonstrate how these algorithms work.
Thus, this book can be used as a textbook and/or reference book, especially
suitable for undergraduates and graduates in computational mathematics,
engineering, computer science, computational intelligence, data science and
scientific computing.
Xin-She Yang

London, 2014
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page vii

Contents

Preface v

I Mathematical Foundations 1

1. Mathematical Foundations 3
1.1 The Essence of an Algorithm . . . . . . . . . . . . . . . . 3
1.2 Big-O Notations . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Differentiation and Integration . . . . . . . . . . . . . . . 6
1.4 Vector and Vector Calculus . . . . . . . . . . . . . . . . . 10
1.5 Matrices and Matrix Decomposition . . . . . . . . . . . . 15
1.6 Determinant and Inverse . . . . . . . . . . . . . . . . . . . 20
1.7 Matrix Exponential . . . . . . . . . . . . . . . . . . . . . . 24
1.8 Hermitian and Quadratic Forms . . . . . . . . . . . . . . . 26
1.9 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . 28
1.10 Definiteness of Matrices . . . . . . . . . . . . . . . . . . . 31

2. Algorithmic Complexity, Norms and Convexity 33


2.1 Computational Complexity . . . . . . . . . . . . . . . . . 33
2.2 NP-Complete Problems . . . . . . . . . . . . . . . . . . . 34
2.3 Vector and Matrix Norms . . . . . . . . . . . . . . . . . . 35
2.4 Distribution of Eigenvalues . . . . . . . . . . . . . . . . . 37
2.5 Spectral Radius of Matrices . . . . . . . . . . . . . . . . . 44
2.6 Hessian Matrix . . . . . . . . . . . . . . . . . . . . . . . . 47
2.7 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

vii
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page viii

viii Introduction to Computational Mathematics

3. Ordinary Differential Equations 51


3.1 Ordinary Differential Equations . . . . . . . . . . . . . . . 51
3.2 First-Order ODEs . . . . . . . . . . . . . . . . . . . . . . 52
3.3 Higher-Order ODEs . . . . . . . . . . . . . . . . . . . . . 53
3.4 Linear System . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.5 Sturm-Liouville Equation . . . . . . . . . . . . . . . . . . 58

4. Partial Differential Equations 59


4.1 Partial Differential Equations . . . . . . . . . . . . . . . . 59
4.1.1 First-Order Partial Differential Equation . . . . . 60
4.1.2 Classification of Second-Order Equations . . . . . 61
4.2 Mathematical Models . . . . . . . . . . . . . . . . . . . . . 61
4.2.1 Parabolic Equation . . . . . . . . . . . . . . . . . 61
4.2.2 Poisson’s Equation . . . . . . . . . . . . . . . . . . 61
4.2.3 Wave Equation . . . . . . . . . . . . . . . . . . . . 62
4.3 Solution Techniques . . . . . . . . . . . . . . . . . . . . . 64
4.3.1 Separation of Variables . . . . . . . . . . . . . . . 65
4.3.2 Laplace Transform . . . . . . . . . . . . . . . . . . 67
4.3.3 Similarity Solution . . . . . . . . . . . . . . . . . . 68

II Numerical Algorithms 71
5. Roots of Nonlinear Equations 73
5.1 Bisection Method . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 Simple Iterations . . . . . . . . . . . . . . . . . . . . . . . 75
5.3 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . 76
5.4 Iteration Methods . . . . . . . . . . . . . . . . . . . . . . . 78
5.5 Numerical Oscillations and Chaos . . . . . . . . . . . . . . 81

6. Numerical Integration 85
6.1 Trapezium Rule . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2 Simpson’s Rule . . . . . . . . . . . . . . . . . . . . . . . . 87
6.3 Gaussian Integration . . . . . . . . . . . . . . . . . . . . . 89

7. Computational Linear Algebra 95


7.1 System of Linear Equations . . . . . . . . . . . . . . . . . 95
7.2 Gauss Elimination . . . . . . . . . . . . . . . . . . . . . . 97
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page ix

Contents ix

7.3 LU Factorization . . . . . . . . . . . . . . . . . . . . . . . 101


7.4 Iteration Methods . . . . . . . . . . . . . . . . . . . . . . . 103
7.4.1 Jacobi Iteration Method . . . . . . . . . . . . . . . 103
7.4.2 Gauss-Seidel Iteration . . . . . . . . . . . . . . . . 107
7.4.3 Relaxation Method . . . . . . . . . . . . . . . . . 108
7.5 Newton-Raphson Method . . . . . . . . . . . . . . . . . . 109
7.6 QR Decomposition . . . . . . . . . . . . . . . . . . . . . . 110
7.7 Conjugate Gradient Method . . . . . . . . . . . . . . . . . 115

8. Interpolation 117
8.1 Spline Interpolation . . . . . . . . . . . . . . . . . . . . . . 117
8.1.1 Linear Spline Functions . . . . . . . . . . . . . . . 117
8.1.2 Cubic Spline Functions . . . . . . . . . . . . . . . 118
8.2 Lagrange Interpolating Polynomials . . . . . . . . . . . . . 123
8.3 Bézier Curve . . . . . . . . . . . . . . . . . . . . . . . . . 125

III Numerical Methods of PDEs 127


9. Finite Difference Methods for ODEs 129
9.1 Integration of ODEs . . . . . . . . . . . . . . . . . . . . . 129
9.2 Euler Scheme . . . . . . . . . . . . . . . . . . . . . . . . . 130
9.3 Leap-Frog Method . . . . . . . . . . . . . . . . . . . . . . 131
9.4 Runge-Kutta Method . . . . . . . . . . . . . . . . . . . . . 132
9.5 Shooting Methods . . . . . . . . . . . . . . . . . . . . . . 134

10. Finite Difference Methods for PDEs 139


10.1 Hyperbolic Equations . . . . . . . . . . . . . . . . . . . . . 139
10.2 Parabolic Equation . . . . . . . . . . . . . . . . . . . . . . 142
10.3 Elliptical Equation . . . . . . . . . . . . . . . . . . . . . . 143
10.4 Spectral Methods . . . . . . . . . . . . . . . . . . . . . . . 146
10.5 Pattern Formation . . . . . . . . . . . . . . . . . . . . . . 148
10.6 Cellular Automata . . . . . . . . . . . . . . . . . . . . . . 150

11. Finite Volume Method 153


11.1 Concept of the Finite Volume . . . . . . . . . . . . . . . . 153
11.2 Elliptic Equations . . . . . . . . . . . . . . . . . . . . . . . 154
11.3 Parabolic Equations . . . . . . . . . . . . . . . . . . . . . 155
11.4 Hyperbolic Equations . . . . . . . . . . . . . . . . . . . . . 156
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page x

x Introduction to Computational Mathematics

12. Finite Element Method 157


12.1 Finite Element Formulation . . . . . . . . . . . . . . . . . 157
12.1.1 Weak Formulation . . . . . . . . . . . . . . . . . . 157
12.1.2 Galerkin Method . . . . . . . . . . . . . . . . . . . 158
12.1.3 Shape Functions . . . . . . . . . . . . . . . . . . . 159
12.2 Derivatives and Integration . . . . . . . . . . . . . . . . . 163
12.2.1 Derivatives . . . . . . . . . . . . . . . . . . . . . . 163
12.2.2 Gauss Quadrature . . . . . . . . . . . . . . . . . . 164
12.3 Poisson’s Equation . . . . . . . . . . . . . . . . . . . . . . 165
12.4 Transient Problems . . . . . . . . . . . . . . . . . . . . . . 169

IV Mathematical Programming 171

13. Mathematical Optimization 173


13.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 173
13.2 Optimality Criteria . . . . . . . . . . . . . . . . . . . . . . 175
13.3 Unconstrained Optimization . . . . . . . . . . . . . . . . . 177
13.3.1 Univariate Functions . . . . . . . . . . . . . . . . 177
13.3.2 Multivariate Functions . . . . . . . . . . . . . . . 178
13.4 Gradient-Based Methods . . . . . . . . . . . . . . . . . . . 180
13.4.1 Newton’s Method . . . . . . . . . . . . . . . . . . 181
13.4.2 Steepest Descent Method . . . . . . . . . . . . . . 182

14. Mathematical Programming 187


14.1 Linear Programming . . . . . . . . . . . . . . . . . . . . . 187
14.2 Simplex Method . . . . . . . . . . . . . . . . . . . . . . . 189
14.2.1 Basic Procedure . . . . . . . . . . . . . . . . . . . 189
14.2.2 Augmented Form . . . . . . . . . . . . . . . . . . 191
14.2.3 A Case Study . . . . . . . . . . . . . . . . . . . . 192
14.3 Nonlinear Programming . . . . . . . . . . . . . . . . . . . 196
14.4 Penalty Method . . . . . . . . . . . . . . . . . . . . . . . . 196
14.5 Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . 197
14.6 Karush-Kuhn-Tucker Conditions . . . . . . . . . . . . . . 199
14.7 Sequential Quadratic Programming . . . . . . . . . . . . . 200
14.7.1 Quadratic Programming . . . . . . . . . . . . . . 200
14.7.2 Sequential Quadratic Programming . . . . . . . . 200
14.8 No Free Lunch Theorems . . . . . . . . . . . . . . . . . . 202
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page xi

Contents xi

V Stochastic Methods and Data Modelling 205

15. Stochastic Models 207


15.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . 207
15.2 Binomial and Poisson Distributions . . . . . . . . . . . . . 209
15.3 Gaussian Distribution . . . . . . . . . . . . . . . . . . . . 211
15.4 Other Distributions . . . . . . . . . . . . . . . . . . . . . . 213
15.5 The Central Limit Theorem . . . . . . . . . . . . . . . . . 215
15.6 Weibull Distribution . . . . . . . . . . . . . . . . . . . . . 216

16. Data Modelling 221


16.1 Sample Mean and Variance . . . . . . . . . . . . . . . . . 221
16.2 Method of Least Squares . . . . . . . . . . . . . . . . . . . 223
16.2.1 Maximum Likelihood . . . . . . . . . . . . . . . . 223
16.2.2 Linear Regression . . . . . . . . . . . . . . . . . . 223
16.3 Correlation Coefficient . . . . . . . . . . . . . . . . . . . . 226
16.4 Linearization . . . . . . . . . . . . . . . . . . . . . . . . . 227
16.5 Generalized Linear Regression . . . . . . . . . . . . . . . . 229
16.6 Nonlinear Regression . . . . . . . . . . . . . . . . . . . . . 233
16.7 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . 237
16.7.1 Confidence Interval . . . . . . . . . . . . . . . . . 237
16.7.2 Student’s t-Distribution . . . . . . . . . . . . . . . 238
16.7.3 Student’s t-Test . . . . . . . . . . . . . . . . . . . 240

17. Data Mining, Neural Networks and Support Vector Machine 243
17.1 Clustering Methods . . . . . . . . . . . . . . . . . . . . . . 243
17.1.1 Hierarchy Clustering . . . . . . . . . . . . . . . . . 243
17.1.2 k-Means Clustering Method . . . . . . . . . . . . 244
17.2 Artificial Neural Networks . . . . . . . . . . . . . . . . . . 247
17.2.1 Artificial Neuron . . . . . . . . . . . . . . . . . . . 247
17.2.2 Artificial Neural Networks . . . . . . . . . . . . . 248
17.2.3 Back Propagation Algorithm . . . . . . . . . . . . 250
17.3 Support Vector Machine . . . . . . . . . . . . . . . . . . . 251
17.3.1 Classifications . . . . . . . . . . . . . . . . . . . . 251
17.3.2 Statistical Learning Theory . . . . . . . . . . . . . 252
17.3.3 Linear Support Vector Machine . . . . . . . . . . 253
17.3.4 Kernel Functions and Nonlinear SVM . . . . . . . 256
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page xii

xii Introduction to Computational Mathematics

18. Random Number Generators and Monte Carlo Method 259


18.1 Linear Congruential Algorithms . . . . . . . . . . . . . . . 259
18.2 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . 260
18.3 Generation of Other Distributions . . . . . . . . . . . . . . 262
18.4 Metropolis Algorithms . . . . . . . . . . . . . . . . . . . . 266
18.5 Monte Carlo Methods . . . . . . . . . . . . . . . . . . . . 267
18.6 Monte Carlo Integration . . . . . . . . . . . . . . . . . . . 270
18.7 Importance of Sampling . . . . . . . . . . . . . . . . . . . 273
18.8 Quasi-Monte Carlo Methods . . . . . . . . . . . . . . . . . 275
18.9 Quasi-Random Numbers . . . . . . . . . . . . . . . . . . . 276

VI Computational Intelligence 279


19. Evolutionary Computation 281
19.1 Introduction to Evolutionary Computation . . . . . . . . . 281
19.2 Simulated Annealing . . . . . . . . . . . . . . . . . . . . . 282
19.3 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . 286
19.3.1 Basic Procedure . . . . . . . . . . . . . . . . . . . 287
19.3.2 Choice of Parameters . . . . . . . . . . . . . . . . 289
19.4 Differential Evolution . . . . . . . . . . . . . . . . . . . . . 291

20. Swarm Intelligence 295


20.1 Introduction to Swarm Intelligence . . . . . . . . . . . . . 295
20.2 Ant and Bee Algorithms . . . . . . . . . . . . . . . . . . . 296
20.3 Particle Swarm Optimization . . . . . . . . . . . . . . . . 297
20.4 Accelerated PSO . . . . . . . . . . . . . . . . . . . . . . . 299
20.5 Binary PSO . . . . . . . . . . . . . . . . . . . . . . . . . . 301

21. Swarm Intelligence: New Algorithms 303


21.1 Firefly Algorithm . . . . . . . . . . . . . . . . . . . . . . . 303
21.2 Cuckoo Search . . . . . . . . . . . . . . . . . . . . . . . . 306
21.3 Bat Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 310
21.4 Flower Algorithm . . . . . . . . . . . . . . . . . . . . . . . 313
21.5 Other Algorithms . . . . . . . . . . . . . . . . . . . . . . . 317

Bibliography 319
Index 325
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 1

Part I

Mathematical Foundations
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 2
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 3

Chapter 1

Mathematical Foundations

Computational mathematics concerns a wide range of topics, from basic


root-finding algorithms and linear algebra to advanced numerical methods
for partial differential equations and nonlinear mathematical programming.
In order to introduce various algorithms, we first review some mathematical
foundations briefly.

1.1 The Essence of an Algorithm

Let us start by asking: what is an algorithm? In essence, an algorithm is


a step-by-step procedure of providing calculations or instructions. Many
algorithms are iterative. The actual steps and procedures will depend on
the algorithm used and the context of interest. However, in this book,
we place more emphasis on iterative procedures and ways for constructing
algorithms.
For example, a simple algorithm
√ of finding the square root of any posi-
tive number k > 0, or x = k, can be written as
1 k
xn+1 = (xn + ), (1.1)
2 xn
starting from a guess solution x0 = 0, say, x0 = 1. Here, n is the iteration
counter or index, also called the pseudo-time or generation counter. The
above iterative equation comes from the re-arrangement of x2 = k in the
following form
x k
= , (1.2)
2 2x
which can be rewritten as
1 k
x = (x + ). (1.3)
2 x

3
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 4

4 Introduction to Computational Mathematics

For example, for k = 7 with x0 = 1, we have


1 7 1 7
x1 = (x0 + ) = (1 + ) = 4. (1.4)
2 x0 2 1
1 7
x2 = (x1 + ) = 2.875, x3 ≈ 2.654891304, (1.5)
2 x1
x4 ≈ 2.645767044, x5 ≈ 2.6457513111. (1.6)
We can see that √ x5 after just 5 iterations (or generations) is very close to
the true value of 7 = 2.64575131106459..., which shows that this iteration
method is very efficient.
The reason that this iterative process √ works is that the series
x1 , x2 , ..., xn converges to the true value k due to the fact that
xn+1 1 k √
= (1 + 2 ) → 1, xn → k, (1.7)
xn 2 xn
as n → ∞. However, a good choice of the initial value x0 will speed up
the convergence. A wrong choice of x0 could make the iteration fail; for
example, we cannot √ use x0 = 0 as the initial guess, and we cannot use
x0 < 0 √ either as k > 0 (in this case, the iterations will approach another
root − k). So a sensible choice should be an educated guess. At the
initial step, if x20 < k, x0 is the lower bound and k/x0 is upper bound. If
x20 > k, then x0 is the upper bound and k/x0 is the lower bound. For other
iterations, the new bounds will be xn and k/xn . In fact, the value xn+1
is always between these two bounds xn and k/xn , and the new estimate
xn+1 is thus the mean or average of the √ two bounds. This guarantees that
the series converges to the true value of k. This method is similar to the
well-known bisection method.
You may have already wondered why x2 = k was converted to Eq. (1.1)?
Why do not we write it as the following iterative formula:
k
xn = , (1.8)
xn
starting from x0 = 1? With this and k = 7, we have
7 7
x1 = = 7, x2 = = 1, x3 = 7, x4 = 1, x5 = 7, ..., (1.9)
x0 x1
which leads to an oscillating feature at two distinct stages 1 and 7. You
may wonder that it may be the problem of initial value x0 . In fact, for any
initial value x0 = 0, this above formula will lead to the oscillations between
two values: x0 and k. This clearly demonstrates that the way to design a
good iterative formula is very important.
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 5

Mathematical Foundations 5

Mathematically speaking, an algorithm A is a procedure to generate a


new and better solution xn+1 to a given problem from the current solution
xn at iteration or time t. That is,
xn+1 = A(xn ), (1.10)
where A is a mathematical function of xn . In fact, A can be a set of
mathematical equations in general. In some literature, especially those
in numerical analysis, n is often used for the iteration index. In many
textbooks, the upper index form x(n+1) or xn+1 is commonly used. Here,
xn+1 does not mean x to the power of n + 1. Such notations will become
useful and no confusion will occur when used appropriately. We will use
such notations when appropriate in this book.

1.2 Big-O Notations

In analyzing the complexity of an algorithm, we usually estimate the order


of computational efforts in terms of its problem size. This often requires
the order notations, often in terms of big O and small o.
Loosely speaking, for two functions f (x) and g(x), if
f (x)
lim → K, (1.11)
x→x0 g(x)
where K is a finite, non-zero limit, we write
f = O(g). (1.12)
The big O notation means that f is asymptotically equivalent to the order
of g(x). If the limit is unity or K = 1, we say f (x) is order of g(x). In this
special case, we write
f ∼ g, (1.13)
which is equivalent to f /g → 1 and g/f → 1 as x → x0 . Obviously, x0
can be any value, including 0 and ∞. The notation ∼ does not necessarily
mean ≈ in general, though it may give the same results, especially in the
case when x → 0. For example, sin x ∼ x and sin x ≈ x if x → 0.
When we say f is order of 100 (or f ∼ 100), this does not mean f ≈ 100,
but it can mean that f could be between about 50 and 150. The small o
notation is often used if the limit tends to 0. That is
f
lim → 0, (1.14)
x→x0 g
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 6

6 Introduction to Computational Mathematics

or
f = o(g). (1.15)
If g > 0, f = o(g) is equivalent to f ≪ g. For example, for ∀x ∈ R, we
have
x2
ex ≈ 1 + x + O(x2 ) ≈ 1 + x + + o(x).
2

Example 1.1: A classic example is Stirling’s asymptotic series for facto-


rials
√ n 1 1 139
n! ∼ 2πn ( )n (1 + + − − ...),
e 12n 288n2 51480n3
which can demonstrate the fundamental difference between an asymptotic
series and the standard approximate expansions. For the standard power
expansions, the error Rk (hk ) → 0, but for an asymptotic series, the error
of
√ the truncated series Rk decreases compared with the leading term [here
2πn(n/e)n ]. However, Rn does not necessarily tend to zero. In fact,
1 √
R2 = · 2πn(n/e)n ,
12n
is still very large as R2 → ∞ if n ≫ 1. For example, for n √= 100, we have
n! = 9.3326 × 10 , while the leading approximation is 2πn(n/e)n =
157

9.3248 × 10157 . The difference between these two values is 7.7740 × 10154 ,
which is still very large, though three orders smaller than the leading ap-
proximation.

1.3 Differentiation and Integration

Differentiation is essentially to find the gradient of a function. For any


curve y = f (x), we define the gradient as
dy df (x) f (x + h) − f (x)
f ′ (x) ≡ ≡ = lim . (1.16)
dx dx h→0 h
The gradient is also called the first derivative. The three notations f ′ (x),
dy/dx and df (x)/dx are interchangeable. Conventionally, the notation
dy/dx is called Leibnitz’s notation, while the prime notation ′ is called
Lagrange’s notation. Newton’s dot notation ẏ = dy/dt is now exclusively
used for time derivatives. The choice of such notations is purely for clarity,
convention and/or personal preference.
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 7

Mathematical Foundations 7

From the basic definition of the derivative, we can verify that differ-
entiation is a linear operator. That is to say that for any two functions
f (x), g(x) and two constants α and β, the derivative or gradient of a linear
combination of the two functions can be obtained by differentiating the
combination term by term. We have
[αf (x) + βg(x)]′ = αf ′ (x) + βg ′ (x), (1.17)
which can easily be extended to multiple terms.
If y = f (u) is a function of u, and u is in turn a function of x, we want
to calculate dy/dx. We then have
dy dy du
= · , (1.18)
dx du dx
or
df [u(x)] df (u) du(x)
= · . (1.19)
dx du dx
This is the well-known chain rule.
It is straightforward to verify the product rule
(uv)′ = uv ′ + vu′ . (1.20)
If we replace v by 1/v = v −1 and apply the chain rule
d(v −1 ) dv 1 dv
= −1 × v −1−1 × =− 2 , (1.21)
dx dx v dx
we have the formula for quotients or the quotient rule
d( uv ) d(uv −1 ) −1 dv du v du − u dv
= = u( 2 ) + v −1 = dx 2 dx . (1.22)
dx dx v dx dx v
For a smooth curve, it is relatively straightforward to draw a tangent
line at any point; however, for a smooth surface, we have to use a tangent
plane. For example, we now want to take the derivative of a function of two
independent variables x and y, that is z = f (x, y) = x2 +y 2 /2. The question
is probably ‘with respect to’ what? x or y? If we take the derivative with
respect to x, then will it be affected by y? The answer is we can take
the derivative with respect to either x or y while taking the other variable
as constant. That is, we can calculate the derivative with respect to x in
the usual sense by assuming that y = constant. Since there is more than
one variable, we have more than one derivative and the derivatives can be
associated with either the x-axis or y-axis. We call such derivatives partial
derivatives, and use the following notation
∂z ∂f (x, y) ∂f  f (x + h, y) − f (x, y)
≡ ≡ fx ≡  = lim . (1.23)
∂x ∂x ∂x y h→0,y=const h
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 8

8 Introduction to Computational Mathematics

The notation ∂f
∂x |y emphasises the fact that y = constant; however, we often
omit |y and simply write ∂f∂x as we know this fact is implied.
Similarly, the partial derivative with respect to y is defined by
∂z ∂f (x, y) ∂f  f (x, y + k) − f (x, y)
≡ ≡ fy ≡  = lim . (1.24)
∂y ∂y ∂y x x=const,k→0 k
Then, the standard differentiation rules for univariate functions such as
f (x) apply. For example, for z = f (x, y) = x2 + y 2 /2, we have
∂f dx2
= + 0 = 2x,
∂x dx
and
∂f d(y 2 /2) 1
=0+ = × 2y = y,
∂y dy 2
where the appearance of 0 highlights the fact that dy/dx = dx/dy = 0 as
x and y are independent variables.
Differentiation is used to find the gradient for a given function. Now a
natural question is how to find the original function for a given gradient.
This is the integration process, which can be considered as the reverse of
the differentiation process. Since we know that
d sin x
= cos x, (1.25)
dx
that is, the gradient of sin x is cos x, we can easily say that the original
function is sin x since its gradient is cos x. We can write

cos x dx = sin x + C, (1.26)

where C is the constant of integration. Here dx is the standard notation
showing the integration is with respect to x, and we usually call this the
integral. The function cos x is called the integrand.
The integration constant comes from the fact that a family of curves
shifted by a constant will have the same gradient at their corresponding
points. This means that the integration can be determined up to an arbi-
trary constant. For this reason, we call it an indefinite integral.
Integration is more complicated than differentiation in general. Even
when we know the derivative of a function, we have to be careful. For
1
example, we know that (xn+1 )′ = (n + 1)xn or ( n+1 xn+1 )′ = xn for any n
integers, so we can write

1
xn dx = xn+1 + C. (1.27)
n+1
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 9

Mathematical Foundations 9

However, there is a possible problem when n = −1 because 1/(n + 1) will


become 1/0. In fact, the above integral is valid for any n except n = −1.
When n = −1, we have 
1
dx = ln x + C. (1.28)
x
If we know that the gradient of a function F (x) is f (x) or F ′ (x) = f (x),
it is possible and sometimes useful to express where the integration starts
and ends, and we often write
 b  b
f (x)dx = F (x) = F (b) − F (a). (1.29)
a a
Here a is called the lower limit of the integral, while b is the upper limit
of the integral. In this case, the constant of integration has dropped out
because the integral can be determined accurately. The integral becomes a
definite integral and it corresponds to the area under a curve f (x) from a
to b ≥ a.
From the differentiation rule
d(uv) dv du
=u +v , (1.30)
dx dx dx
we integrate it with
 respect to x, wehave 
d(uv) dv du
dx = uv = u dx + v dx. (1.31)
dx dx dx
By rearranging, we have 
dv du
u dx = uv − v dx, (1.32)
dx dx
or  
uv ′ dx = uv − vu′ dx, (1.33)
or simply  
udv = uv − vdu. (1.34)
This is the well-known formula for the technique known as integration by
parts.

Example 1.2: To calculate xex dx, we can set u = x and v ′ = ex , which
gives u′ = 1 and x
 v = e . Now wehave
xex dx = xex − ex · 1dx = xex − ex + C.
However, this will not work if we set u = ex and v ′ = x. Thus, care should
be taken when using integration by parts.
Obviously, there are many other techniques for integration such as sub-
stitution and transformation. Integration can also be multivariate, which
leads to multiple integrals. We will not discuss this further, and interested
readers can refer to more advanced textbooks.
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 10

10 Introduction to Computational Mathematics

1.4 Vector and Vector Calculus

Vector analysis is an important part of computational mathematics. Many


quantities such as force, velocity, and deformation in sciences are vectors
which have both a magnitude and a direction. Vectors are a special class of
matrices. Here, we will briefly review the basic concepts in linear algebra.
A vector u is a set of ordered numbers u = (u1 , u2 , ..., un ), where its
components ui (i = 1, ..., n) ∈ ℜ are real numbers. All these vectors form
an n-dimensional vector space V n . A simple example is the position vector
p = (x, y, z) where x, y, z are the 3-D Cartesian coordinates.
To add any two vectors u = (u1 , ..., un ) and v = (v1 , ..., vn ), we simply
add their corresponding components,
u + v = (u1 + v1 , u2 + v2 , ..., un + vn ), (1.35)
and the sum is also a vector. The addition of vectors has commutability
(u + v = v + u) and associativity [(u + v) + w = u + (v + w)]. This
is because each of the components is obtained by simple addition, which
means it has the same properties.

u w

a v b v
v

u u

Fig. 1.1 Addition of vectors: (a) parallelogram a = u + v; (b) vector polygon b =


u + v + w.

The zero vector 0 is a special case where all its components are zeros.
The multiplication of a vector u with a scalar or constant α ∈ ℜ is carried
out by the multiplication of each component,
αu = (αu1 , αu2 , ..., αun ). (1.36)
Thus, we have
−u = (−u1 , −u2 , ..., −un ). (1.37)
The dot product or inner product of two vectors x and y is defined as
n

x · y = x1 y1 + x2 y2 + ... + xn yn = xi yi , (1.38)
i=1
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 11

Mathematical Foundations 11

which is a real number. The length or norm of a vector x is the root of the
dot product of the vector itself,

n
√ 
|x| = x = x · x = x2i . (1.39)
i=1

When x = 1, then it is a unit vector. It is straightforward to check that


the dot product has the following properties:
u · v = v · u, u · (v + w) = u · v + u · w, (1.40)
and
(αu) · (βv) = (αβ)u · v, (1.41)
where α, β ∈ ℜ are constants.
The angle θ between two vectors u and v can be calculated using their
dot product
u · v = ||u|| ||v|| cos(θ), 0 ≤ θ ≤ π, (1.42)
which leads to
u·v
cos(θ) = . (1.43)
u v
If the dot product of these two vectors is zero or cos(θ) = 0 (i.e., θ = π/2),
then we say that these two vectors are orthogonal.
Since cos(θ) ≤ 1, then we get the useful Cauchy-Schwartz inequality:
u · v ≤ u v. (1.44)
The dot product of two vectors is a scalar or a number. On the other
hand, the cross product or outer product of two vectors is a new vector
 
 i j k
 
u = x × y =  x1 x2 x3 
y y y 
1 2 3

     
 x2 x3   x3 x1   x1 x2 
=
  i+
 j+
 k
y2 y3  y3 y1  y1 y2 

= (x2 y3 − x3 y2 )i + (x3 y1 − x1 y3 )j + (x1 y2 − x2 y1 )k. (1.45)


In fact, the norm of x × y is the area of the parallelogram formed by x
and y. We have
x × y = x y sin θ, (1.46)
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 12

12 Introduction to Computational Mathematics

where θ is the angle between the two vectors. In addition, the vector
u = x × y is perpendicular to both x and y, following a right-hand rule.
It is straightforward to check that the cross product has the following
properties:
x × y = −y × x, (x + y) × z = x × z + y × z, (1.47)
and
(αx) × (βy) = (αβ)x × y, α, b ∈ ℜ. (1.48)
A very special case is u × u = 0. For unit vectors, we have
i × j = k, j × k = i, k × i = j. (1.49)

Example 1.3: For two 3-D vectors u = (4, 5, −6) and v = (2, −2, 1/2),
their dot product is
u · v = 4 × 2 + 5 × (−2) + (−6) × 1/2 = −5.
As their moduli are

||u|| = 42 + 52 + (−6)2 = 77,

||v|| = 22 + (−2)2 + (1/2)2 = 33/2,
we can calculate the angle θ between the two vectors. We have
u·v −5 10
cos θ = =√ √ =− √ ,
||u||||v|| 77 × 33/2 11 21
or
10
θ = cos−1 (− √ ) ≈ 101.4◦.
11 21
Their cross product is
w =u×v

= (5 × 1/2 − (−2) × (−6), (−6) × 2 − 4 × 1/2, 4 × (−2) − 5 × 2)

= (−19/2, −14, −18).


Similarly, we have
v × u = (19/2, 14, 18) = −u × v.
The norm of the cross product is
−19 2
w = ( ) + (−14)2 + (−18)2 ≈ 24.70,
2
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 13

Mathematical Foundations 13

while

√ 33
uv sin θ = 77 × × sin(101.4◦ ) ≈ 24.70 = w.
2
It is easy to verify that
u · w = 4 × (−19/2) + 5 × (−14) + (−6) × (−18) = 0,
and
v · w = 2 × (−19/2) + (−2) × (−14) + 1/2 × (−18) = 0.
Indeed, the vector w is perpendicular to both u and v.
Any vector v in an n-dimensional vector space V n can be written as a
combination of a set of n independent basis vectors or orthogonal spanning
vectors e1 , e2 , ..., en , so that
n

v = v1 e1 + v2 e2 + ... + vn en = vi ei , (1.50)
i=1

where the coefficients/scalars v1 , v2 , ..., vn are the components of v relative


to the basis e1 , e2 , ..., en . The most common basis vectors are the orthog-
onal unit vectors. In a three-dimensional case, they are i = (1, 0, 0), j =
(0, 1, 0), k = (0, 0, 1) for x-, y-, z-axis, respectively. Thus, we have
x = x1 i + x2 j + x3 k. (1.51)
The three unit vectors satisfy i · j = j · k = k · i = 0.
Two non-zero vectors u and v are said to be linearly independent if
αu + βv = 0 implies that α = β = 0. If α, β are not all zeros, then
these two vectors are linearly dependent. Two linearly dependent vectors
are parallel (u//v) to each other. Similarly, any three linearly dependent
vectors u, v, w are in the same plane.
The differentiation of a vector is carried out over each component and
treating each component as the usual differentiation of a scalar. Thus, from
a position vector
P (t) = x(t)i + y(t)j + z(t)k, (1.52)
we can write its velocity as
dP
v= = ẋ(t)i + ẏ(t)j + ż(t)k, (1.53)
dt
and acceleration as
d2 P
a= = ẍ(t)i + ÿ(t)j + z̈(t)k, (1.54)
dt2
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 14

14 Introduction to Computational Mathematics

where ˙ = d/dt. Conversely, the integral of v is


 t
P = vdt + p0 , (1.55)
0
where p0 is a vector constant or the initial position at t = 0.
From the basic definition of differentiation, it is easy to check that the
differentiation of vectors has the following properties:
d(αa) da d(a · b) da db
=α , = ·b+a· , (1.56)
dt dt dt dt dt
and
d(a × b) da db
= ×b+a× . (1.57)
dt dt dt
Three important operators commonly used in vector analysis, especially
in the formulation of mathematical models, are the gradient operator (grad
or ∇), the divergence operator (div or ∇·) and the curl operator (curl or
∇×).
Sometimes, it is useful to calculate the directional derivative of φ at a
point (x, y, z) in the direction of n
∂φ ∂φ ∂φ ∂φ
= n · ∇φ = cos(α) + cos(β) + cos(γ), (1.58)
∂n ∂x ∂y ∂z
where n = (cos α, cos β, cos γ) is a unit vector and α, β, γ are the directional
angles. Generally speaking, the gradient of any scalar function φ of x, y, z
can be written in a similar way,
∂φ ∂φ ∂φ
gradφ = ∇φ = i+ j+ k. (1.59)
∂x ∂y ∂z
This is the same as the application of the del operator ∇ to the scalar
function φ
∂ ∂ ∂
∇= i+ j+ k. (1.60)
∂x ∂y ∂z
The direction of the gradient operator on a scalar field gives a vector field.
As the gradient operator is a linear operator, it is straightforward to
check that it has the following properties:
∇(αψ + βφ) = α∇ψ + β∇φ, ∇(ψφ) = ψ∇φ + φ∇ψ, (1.61)
where α, β are constants and ψ, φ are scalar functions.
For a vector field
u(x, y, z) = u(x, y, z)i + v(x, y, z)j + w(x, y, z)k, (1.62)
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 15

Mathematical Foundations 15

the application of the operator ∇ can lead to either a scalar field or vector
field depending on how the del operator is applied to the vector field. The
divergence of a vector field is the dot product of the del operator ∇ and u
∂u1 ∂u2 ∂u3
div u ≡ ∇ · u = + + , (1.63)
∂x ∂y ∂z
and the curl of u is the cross product of the del operator and the vector
field u
 
 i j k 
 ∂ ∂ ∂ 
curl u ≡ ∇ × u =  ∂x ∂y ∂z . (1.64)
u u u 
1 2 3

One of the most commonly used operators in engineering and science is


the Laplacian operator
∂2φ ∂2φ ∂2φ
∇2 φ = ∇ · (∇φ) = + 2 + 2, (1.65)
∂x2 ∂y ∂z
for Laplace’s equation

Δφ ≡ ∇2 φ = 0. (1.66)

Some important theorems are often rewritten in terms of the above


three operators, especially in fluid dynamics and finite element analysis.
For example, Gauss’s theorem connects the integral of divergence with the
related surface integral
 
(∇ · Q) dΩ = Q · n dS. (1.67)
Ω S

1.5 Matrices and Matrix Decomposition

Matrices are widely used in scientific computing, engineering and sciences,


especially in the implementation of many algorithms. A matrix is a table or
array of numbers or functions arranged in rows and columns. The elements
or entries of a matrix A are often denoted as aij . For a matrix A with m
rows and n columns,
⎛ ⎞
a11 a12 ... a1j ... a1n
⎜ a21 a22 ... a2j ... a2n ⎟
⎜ ⎟
A ≡ [aij ] = ⎜ . .. .. .. ⎟, (1.68)
⎝ .. . . . ⎠
am1 am2 ... amj ... amn
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 16

16 Introduction to Computational Mathematics

we say the size of A is m by n, or m × n. A is a square matrix if m = n.


For example,
   
123 ex sin x
A= , B= , (1.69)
456 −i cos x eiθ
and
⎛ ⎞
u
u = ⎝ v ⎠, (1.70)
w
where A is a 2 × 3 matrix, B is a 2 × 2 square matrix, and u is a 3 × 1
column matrix or column vector.
The sum of two matrices A and B is possible only if they have the same
size m × n, and their sum, which is also m × n, is obtained by adding their
corresponding entries
C = A + B, cij = aij + bij , (1.71)
where (i = 1, 2, ..., m; j = 1, 2, ..., n). The product of a matrix A with a
scalar α ∈ ℜ is obtained by multiplying each entry by α. The product of
two matrices is possible only if the number of columns of A is the same as
the number of rows of B. That is to say, if A is m × n and B is n × r, then
the product C is m × r,
n

cij = (AB)ij = aik bkj . (1.72)
k=1
n
  
n
If A is a square matrix, then we have A = AA...A. The multiplications
of matrices are generally not commutive, i.e., AB = BA. However, the
multiplication has associativity A(uv) = (Au)v and A(u+v) = Au+Av.
The transpose AT of A is obtained by switching the position of rows
and columns, and thus AT will be n × m if A is m × n, (aT )ij = aji , (i =
1, 2, ..., m; j = 1, 2, ..., n). Generally,
(AT )T = A, (AB)T = B T AT . (1.73)
The differentiation and integral of a matrix are carried out over each of
its members or elements. For example, for a 2 × 2 matrix
⎛ da11 da12 ⎞
dA dt dt
= Ȧ = ⎝ ⎠, (1.74)
dt da21 da22
dt dt
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 17

Mathematical Foundations 17

and
⎛  ⎞
 a11 dt a12 dt
Adt = ⎝ ⎠. (1.75)
 
a21 dt a22 dt
A diagonal matrix A is a square matrix whose every entry off the main
diagonal is zero (aij = 0 if i = j). Its diagonal elements or entries may or
may not have zeros. In general, it can be written as
⎛ ⎞
d1 0 ... 0
⎜ 0 d2 ... 0 ⎟
⎜ ⎟
D=⎜ .. ⎟. (1.76)
⎝ . ⎠
0 0 ... dn
For example, the matrix
⎛ ⎞
1 0 0
I = ⎝0 1 0⎠ (1.77)
0 0 1
is a 3 × 3 identity or unitary matrix. In general, we have
AI = IA = A. (1.78)
A zero or null matrix 0 is a matrix with all of its elements being zero.
There are three important matrices: lower (upper) triangular matrix,
tridiagonal matrix, and augmented matrix, and they are important in the
solution of linear equations. A tridiagonal matrix often arises naturally
from the finite difference and finite volume discretization of partial differ-
ential equations, and it can in general be written as
⎛ ⎞
b1 c1 0 0 ... 0 0
⎜ a2 b2 c2 0 ... 0 0 ⎟
⎜ ⎟
Q = ⎜ 0 a3 b3 c3 ... 0 0 ⎟.
⎜ ⎟
(1.79)
⎜ . . . ⎟
⎝ .. .. .. ⎠
0 0 0 0 ... an bn
An augmented matrix is formed by two matrices with the same number
of rows. For example, the following system of linear equations
a11 u1 + a12 u2 + a13 u3 = b1 ,

a21 u1 + a22 u2 + a23 u3 = b2 ,

a31 u1 + a32 u2 + a33 u3 = b3 , (1.80)


November 3, 2014 11:34 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 18

18 Introduction to Computational Mathematics

can be written in a compact form in terms of matrices


⎛ ⎞⎛ ⎞ ⎛ ⎞
a11 a12 a13 u1 b1
⎝ a21 a22 a23 ⎠⎝ u2 ⎠ = ⎝ b2 ⎠, (1.81)
a31 a32 a33 u3 b3
or
Au = b. (1.82)
This can in turn be written as the following augmented form
⎛ ⎞
a11 a12 a13 b1
[A|b] = ⎝a21 a22 a23 b2⎠ . (1.83)
a31 a32 a33 b3
The augmented form is widely used in Gauss-Jordan elimination and linear
programming.
A lower (upper) triangular matrix is a square matrix with all the ele-
ments above (below) the diagonal entries being zeros. In general, a lower
triangular matrix can be written as
⎛ ⎞
l11 0 ... 0
⎜ l21 l22 ... 0 ⎟
⎜ ⎟
L=⎜ .. ⎟, (1.84)
⎝ . ⎠
ln1 ln2 ... lnn
while the upper triangular matrix can be written as
⎛ ⎞
u11 u12 ... u1n
⎜ 0 u22 ... u2n ⎟
⎜ ⎟
U =⎜ .. ⎟. (1.85)
⎝ . ⎠
0 0 ... unn
Any n × n square matrix A = [aij ] can be decomposed or factorized as a
product of an L and a U , that is
A = LU , (1.86)
2
though some decomposition is not unique because we have n +n unknowns:
n(n + 1)/2 coefficients lij and n(n + 1)/2 coefficients uij , but we can only
provide n2 equations from the coefficients aij . Thus, there are n free pa-
rameters. The uniqueness of decomposition is often achieved by imposing
either lii = 1 or uii = 1 where i = 1, 2, ..., n.
Other LU variants include the LDU and LUP decompositions. An LDU
decomposition can be written as
A = LDU , (1.87)
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 19

Mathematical Foundations 19

where L and U are lower and upper matrices with all the diagonal entries
being unity, and D is a diagonal matrix. On the other hand, the LUP
decomposition can be expressed as
A = LU P , or A = P LU , (1.88)
where P is a permutation matrix which is a square matrix and has exactly
one entry 1 in each column and each row with 0’s elsewhere. However, most
numerical libraries and software packages use the following LUP decompo-
sition
P A = LU , (1.89)
which makes it easier to decompose some matrices. However, the require-
ment for LU decompositions is relatively strict. An invertible matrix A
has an LU decomposition provided that the determinants of all its diagonal
minors or leading submatrices are not zeros.
A simpler way of decomposing a square matrix A for solving a system
of linear equations is to write
A = D + L + U, (1.90)
where D is a diagonal matrix. L and U are the strictly lower and upper
triangular matrices without diagonal elements, respectively. This decom-
position is much simpler to implement than the LU decomposition because
there is no multiplication involved here.

Example 1.4: The following 3 × 3 matrix


⎛ ⎞
2 1 5
A = ⎝ 4 −4 5 ⎠,
5 2 −5
can be decomposed as A = LU . That is
⎛ ⎞⎛ ⎞
1 0 0 u11 u12 u13
A = ⎝ l21 1 0 ⎠⎝ 0 u22 u23 ⎠,
l31 l32 0 0 0 u33
which becomes
⎛ ⎞
u11 u12 u13
A = ⎝ l21 u11 l21 u12 + u22 l21 u13 + u23 ⎠
l31 u11 l31 u12 + l32 u22 l31 u13 + l32 u23 + u33
⎛ ⎞
2 1 5
= ⎝ 4 −4 5 ⎠.
5 2 −5
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 20

20 Introduction to Computational Mathematics

This leads to u11 = 2, u12 = 1 and u13 = 5. As l21 u11 = 4, so l21 =


4/u11 = 2. Similarly, l31 = 2.5. From l21 u12 + u22 = −4, we have u22 =
−4 − 2 × 1 = −6. From l21 u13 + u23 = 5, we have u23 = 5 − 2 × 5 = −5.
Using l31 u12 + l32 u22 = 2, or 2.5 × 1 + l32 × (−6) = 2, we get l32 = 1/12.
Finally, l31 u13 + l32 u23 + u33 = −5 gives u33 = −5 − 2.5 × 5 − 1/12 × (−5) =
−205/12. Therefore, we now have
⎛ ⎞ ⎛ ⎞⎛ ⎞
2 1 5 1 0 0 2 1 5
⎝ 4 −4 5 ⎠ = ⎝ 2 1 0 ⎠⎝ 0 −6 −5 ⎠.
5 2 −5 5/2 1/12 1 0 0 −205/12
The L+D+U decomposition can be written as
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
2 0 0 00 0 01 5
A = D + L + U = ⎝ 0 −4 0 ⎠ + ⎝ 4 0 0⎠ + ⎝0 0 5 ⎠.
0 0 −5 52 0 00 0

1.6 Determinant and Inverse

The determinant of a square matrix A is a number or scalar obtained by


the following recursive formula, or the cofactors, or the Laplace expansion
by column or row. For example, expanding by row k, we have
n

det(A) = |A| = (−1)k+j akj Mkj , (1.91)
j=1

where Mij is the determinant of a minor matrix of A by deleting row i and


column j. For a simple 2 × 2 matrix, its determinant simply becomes
 
 a11 a12 
 a21 a22  = a11 a22 − a12 a21 . (1.92)
 

The determinants of matrices have the following properties:


|αA| = α|A|, |AT | = |A|, |AB| = |A||B|, (1.93)
where A and B have the same size (n × n).
An n × n square matrix is singular if |A| = 0, and is nonsingular if and
only if |A| = 0. The trace of a square matrix tr(A) is defined as the sum
of the diagonal elements,
n

tr(A) = aii = a11 + a22 + ... + ann . (1.94)
i=1
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 21

Mathematical Foundations 21

The rank of a matrix A is the number of linearly independent vectors


forming the matrix. Generally speaking, the rank of A satisfies
rank(A) ≤ min(m, n). (1.95)
For an n × n square matrix A, it is nonsingular if rank(A) = n.
The inverse matrix A−1 of a square matrix A is defined as
A−1 A = AA−1 = I. (1.96)
More generally,
A−1 −1
l A = AAr = I, (1.97)
where A−1
lis the left inverse while A−1
r is the right inverse. If A−1
l = A−1
r ,
we say that the matrix A is invertible and its inverse is simply denoted by
A−1 . It is worth noting that the unit matrix I has the same size as A.
The inverse of a square matrix exists if and only if A is nonsingular or
det(A) = 0. From the basic definitions, it is straightforward to prove that
the inverse of a matrix has the following properties
(A−1 )−1 = A, (AT )−1 = (A−1 )T , (1.98)
and
(AB)−1 = B −1 A−1 . (1.99)
The inverse of a lower (upper) triangular matrix is also a lower (upper)
triangular matrix. The inverse of a diagonal matrix
⎛ ⎞
d1 0 ... 0
⎜ 0 d2 ... 0 ⎟
⎜ ⎟
D=⎜ .. ⎟, (1.100)
⎝ . ⎠
0 0 ... dn
can simply be written as
⎛ ⎞
1/d1 0 ... 0
⎜ 0 1/d2 ... 0 ⎟
D −1
⎜ ⎟
=⎜ .. ⎟, (1.101)
⎝ . ⎠
0 0 ... 1/dn
where di = 0. If any of these elements di is zero, then the diagonal matrix
is not invertible as it becomes singular. For a 2 × 2 matrix, its inverse is
simply
   
ab 1 d −b
A= , A−1 = . (1.102)
cd ad − bc −c a
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 22

22 Introduction to Computational Mathematics

Example 1.5: For two matrices,


⎛ ⎞ ⎛ ⎞
4 5 0 2 3
A = ⎝ −2 2 5 ⎠, B = ⎝ 0 −2 ⎠,
2 −3 1 5 2
their transpose matrices are
⎛ ⎞
4 −2 2  
T T 2 0 5
A = 5 2 −3 , B =
⎝ ⎠ .
3 −2 2
0 5 1
Let D = AB be their product; we have
⎛ ⎞
D11 D12
AB = D = ⎝ D21 D22 ⎠.
D31 D32
The first two entries are
3

D11 = A1j Bj1 = 2 × 4 + 5 × 0 + 0 × 5 = 8,
j=1

and
3

D12 = A1j Bj2 = 4 × 3 + 5 × (−2) + 0 × 2 = 2.
j=1

Similarly, the other entries are:


D21 = 21, D22 = 0, D31 = 9, D33 = 14.
Therefore, we get
⎛ ⎞⎛ ⎞ ⎛ ⎞
4 5 0 2 3 8 2
AB = ⎝ −2 2 5 ⎠⎝ 0 −2 ⎠ = D = ⎝ 21 0 ⎠.
2 −3 1 5 2 9 14
However, the product BA does not exist, though
 
T T 8 21 9
B A = = DT = (AB)T .
2 0 14
The inverse of A is
⎛ ⎞
17 −5 25
1 ⎝
A−1 = 12 4 −20 ⎠,
128
2 22 18
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 23

Mathematical Foundations 23

and the determinant of A is


det(A) = 128.
It is straightforward to verify that
⎛⎞
100
AA−1 = A−1 A = I = ⎝ 0 1 0 ⎠.
001
For example, the first entry is obtained by
3
 17 12 2
A1j A−1
j1 = 4 × +5× +0× = 1.
j=1
128 128 128

Other entries can be verified similarly. Finally, the trace of A is


tr(A) = A11 + A22 + A33 = 4 + 2 + 1 = 7.

The algorithmic complexity of most algorithms for obtaining the inverse


of a general square matrix is O(n3 ). That is why most modern algorithms
try to avoid the direct inverse of a large matrix. Solution of a large matrix
system is instead carried out either by partial inverse via decomposition
or by iteration (or a combination of these two methods). If the matrix
can be decomposed into triangular matrices either by LU factorization or
direction decomposition, the aim is then to invert a triangular matrix, which
is simpler and more efficient.
For a triangular matrix, the inverse can be obtained using algorithms
of O(n2 ) complexity. Similarly, the solution of a linear system with a lower
(upper) triangular matrix A can be obtained by forward (back) substitu-
tions. In general, for a lower triangular matrix
⎛ ⎞
α11 0 ... 0
⎜ α12 α22 ... 0 ⎟
⎜ ⎟
A=⎜ .. ⎟, (1.103)
⎝ . ⎠
αn1 αn2 ... αnn
the forward substitutions for the system Au = b can be carried out as
follows:
b1
u1 = ,
α11

1
u2 = (b2 − α21 u1 ),
α22
November 3, 2014 11:34 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 24

24 Introduction to Computational Mathematics

i−1
1 
ui = (bi − αij uj ), (1.104)
αii j=1

where i = 2, ..., n. We see that it takes 1 division to get u1 , 3 floating


point calculations to get u2 , and (2i − 1) to get ui . So the total algorithmic
complexity is O(1 + 3 + ... + (2n − 1)) = O(n2 ). Similar arguments apply
to the upper triangular systems.
The inverse A−1 of a lower triangular matrix can in general be written
as
⎛ ⎞
β11 0 ... 0
⎜ β21 β22 ... 0 ⎟  
A−1 = ⎜
⎜ ⎟
. ⎟ = B = B1 B2 ... Bn , (1.105)
⎝ .. ⎠
βn1 βn2 ... βnn
where B j are the j-th column vector of B. The inverse must satisfy
AA−1 = I or
   
A B1 B2 ... Bn = I = e1 e2 ... en , (1.106)
where ej is the j-th unit vector of size n with the j-th element
 being 1 and
all other elements being zero. That is eTj = 0 0 ... 1 0 ... 0 . In order to
obtain B, we have to solve n linear systems
AB 1 = e1 , AB 2 = e2 , ..., AB n = en . (1.107)
As A is a lower triangular matrix, the solution of AB j = ej can easily be
obtained by direct forward substitutions discussed earlier in this section.

1.7 Matrix Exponential

Sometimes, we need to calculate exp[A], where A is a square matrix. In


this case, we have to deal with matrix exponentials. The exponential of a
square matrix A is defined as

 1 n
eA ≡ A
n=0
n!

1 2
=I +A+ A + ..., (1.108)
2!
where I is an identity matrix with the same size as A, and A2 = AA and
so on. This (rather odd) definition in fact provides a method of calculating
November 3, 2014 11:34 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 25

Mathematical Foundations 25

the matrix exponential. The matrix exponentials are very useful in solving
systems of differential equations.

Example 1.6: For a simple matrix


 
t0
A= ,
0t

its exponential is simply


 
et 0
eA = .
0 et

For a more complicated matrix


 
t a
B= ,
a t

we have
⎛1 t+a 1 t+a ⎞
2 (e + et−a ) 2 (e − et−a )
eB = ⎝ ⎠.
1 t+a t−a 1 t+a t−a
2 (e −e ) 2 (e +e )

As you can see, it is quite complicated but still straightforward to cal-


culate the matrix exponentials. Fortunately, it can easily be done using
most computer software packages. By using the power expansions and the
basic definition, we can prove the following useful identities

 1 t2
etA ≡ (tA)n = I + tA + A2 + ..., (1.109)
n=0
n! 2!


 (−1)n−1 n 1 1
ln(I + A) ≡ A = A − A2 + A3 + ..., (1.110)
n=1
n! 2 3

eA eB = eA+B (if AB = BA), (1.111)

d tA
(e ) = AetA = etA A, (1.112)
dt

(eA )−1 = e−A , det(eA ) = etrA . (1.113)


October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 26

26 Introduction to Computational Mathematics

1.8 Hermitian and Quadratic Forms

The matrices we have discussed so far are real matrices because all their
elements are real. In general, the entries or elements of a matrix can be
complex numbers, and the matrix becomes a complex matrix. For a matrix
A, its complex conjugate A∗ is obtained by taking the complex conjugate
of each of its elements. The Hermitian conjugate A† is obtained by taking
the transpose of its complex conjugate matrix. That is to say, for
⎛ ⎞
a11 a12 ...
A = ⎝ a21 a21 ... ⎠, (1.114)
... ... ...
we have
⎛ ⎞
a∗11 a∗12 ...
A∗ = ⎝ a∗21 a∗22 ... ⎠, (1.115)
... ... ...
and
⎛ ⎞
a∗11 a∗21 ...
A† = (A∗ )T = (AT )∗ = ⎝ a∗12 a22 ... ⎠. (1.116)
... ... ...
A square matrix A is called orthogonal if and only if A−1 = AT . If a
square matrix A satisfies A∗ = A, it is called an Hermitian matrix. It is
an anti-Hermitian matrix if A∗ = −A. If the Hermitian matrix of a square
matrix A is equal to the inverse of the matrix (or A† = A−1 ), it is called
a unitary matrix.

Example 1.7: For a complex matrix


 
2 + 3iπ 1 + 9i 0
A= ,
eiπ −2i i sin θ
its complex conjugate A∗ is
 
2 − 3iπ 1 − 9i 0
A∗ = .
e−iπ 2i −i sin θ
The Hermitian conjugate of A is
⎛ ⎞
2 − 3iπ e−iπ
A† = ⎝ 1 − 9i 2i ⎠ = (A∗ )T .
0 −i sin θ
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 27

Mathematical Foundations 27

For the rotation matrix


 
cos θ sin θ
A= ,
− sin θ cos θ
its inverse and transpose are
 
1 cos θ − sin θ
A−1 = ,
cos θ + sin2 θ
2 sin θ cos θ
and
 
T cos θ − sin θ
A = .
sin θ cos θ
Since cos2 θ + sin2 θ = 1, we have AT = A−1 . Therefore, the original
rotation matrix A is orthogonal.
A very useful concept in computational mathematics and computing
is quadratic forms. For a real vector q T = (q1 , q2 , q3 , ..., qn ) and a real
symmetric square matrix A, a quadratic form ψ(q) is a scalar function
defined by
⎛ ⎞⎛ q ⎞
A11 A12 ... A1n 1
⎜ q2 ⎟
 ⎜ A21 A22 ... A2n ⎟⎜ ⎟

ψ(q) = q T Aq = q1 q2 ... qn ⎜
⎝ ... ... ... ... ⎠⎜ . ⎟, (1.117)
⎝ .. ⎠
An1 An2 ... Ann qn
which can be written as
n 
 n
ψ(q) = qi Aij qj . (1.118)
i=1 j=1

Since ψ is a scalar, it should be independent of the coordinates.


In the case of a square matrix A, ψ might be more easily evaluated in
certain intrinsic coordinates Q1 , Q2 , ..., Qn . An important result concerning
the quadratic form is that it can always be written through appropriate
transformations as
n
ψ(q) = λi Q2i = λ1 Q21 + λ2 Q22 + ... + λn Q2n , (1.119)
i=1

where λi are the eigenvalues of the matrix A determined by


det |A − λI| = 0, (1.120)
and Qi are the intrinsic components along directions of the eigenvectors in
this case.
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 28

28 Introduction to Computational Mathematics

The natural extension of quadratic forms is the Hermitian form which


is the quadratic form for a complex Hermitian matrix A. Furthermore,
the matrix A can consist of linear operators and functionals in addition to
numbers.
Example 1.8: For a vector q = (q1 , q2 ) and the square matrix
 
2 −5
A= ,
−5 2
we have a quadratic form
  
  2 −5 q1
ψ(q) = q1 q2 = 2q12 − 10q1 q2 + 2q22 .
−5 2 q2
The eigenvalues of the matrix A is determined by
 
 2 − λ −5 
 −5 2 − λ  = 0,
 

whose solutions are λ1 = 7 and λ2 = −3 (see the next section for further
details). Their corresponding eigenvectors are
 √  √ 
−√ 2/2 √2/2
v1 = , v2 = .
2/2 2/2
We can see that v 1 ·v 2 = 0, which means that these two eigenvectors are or-
thogonal. Writing the quadratic form in terms of the intrinsic coordinates,
we have
ψ(q) = 7Q21 − 3Q22 .
Furthermore, if we assume ψ(q) = 1 as a simple constraint, then the equa-
tion 7Q21 − 3Q22 = 1 corresponds to a hyperbola.

1.9 Eigenvalues and Eigenvectors

The eigenvalue λ of any n × n square matrix A is determined by


Au = λu, (1.121)
or
(A − λI)u = 0, (1.122)
where I is a unitary matrix with the same size as A. Any non-trivial
solution requires that
det |A − λI| = 0, (1.123)
November 3, 2014 11:34 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 29

Mathematical Foundations 29

or
 
 a11 − λ a12 ... a1n 
 
 a21 a22 − λ ... a2n 
 
 .. ..  = 0, (1.124)

 . . 

 a an2 ... ann − λ
n1

which again can be written as a polynomial


λn + αn−1 λn−1 + ... + α0 = (λ − λ1 )...(λ − λn ) = 0, (1.125)
where λi s are the eigenvalues which could be complex numbers. In general,
the determinant is zero, which leads to a polynomial of order n in λ. For
each eigenvalue λ, there is a corresponding eigenvector u whose direction
can be uniquely determined. However, the length of the eigenvector is not
unique because any non-zero multiple of u will also satisfy equation (1.121),
and thus can be considered as an eigenvector. For this reason, it is usually
necessary to apply additional conditions by setting the length as unity, and
subsequently the eigenvector becomes a unit eigenvector.
Generally speaking, a real n × n matrix A has n eigenvalues λi (i =
1, 2, ..., n), however, these eigenvalues are not necessarily distinct. If the
real matrix is symmetric, that is to say AT = A, then the matrix has n
distinct eigenvectors, and all the eigenvalues are real numbers.
The eigenvalues λi are related to the trace and determinant of the matrix
n
 n

tr(A) = aii = λ1 + λ2 + ... + λn = λi , (1.126)
i=1 i=1

and
n

det(A) = |A| = λi . (1.127)
i=1

Example 1.9: The eigenvalues of the square matrix


 
4 9
A= ,
2 −3
can be obtained by solving
 
4 − λ 9 
 2 −3 − λ  = 0.


We have
(4 − λ)(−3 − λ) − 18 = (λ − 6)(λ + 5) = 0.
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 30

30 Introduction to Computational Mathematics

Thus, the eigenvalues are λ = 6 and λ = −5. Let v = (v1 v2 )T be the


eigenvector; we have for λ = 6
  
−2 9 v1
|A − λI| = = 0,
2 −9 v2
which means that
−2v1 + 9v2 = 0, 2v1 − 9v2 = 0.
These two equations are virtually the same (not linearly independent), so
the solution is
9
v1 = v2 .
2
Any vector parallel to v is also an eigenvector. In order to get a unique
eigenvector, we have to impose an extra requirement, that is, the length of
the vector is unity. We now have
v12 + v22 = 1,
or
9v2 2
( ) + v22 = 1,
√ 2 √
which gives v2 = ±2/ 85, and v1 = ±9/ 85. As these two vectors are
in opposite directions, we can choose any of the two directions. So the
eigenvector for the eigenvalue λ = 6 is√
⎛ ⎞
9/ 85
v = ⎝ √ ⎠.
2/ 85
Similarly,
√ √the corresponding eigenvector for the eigenvalue λ = −5 is v =
(− 2/2 2/2)T .
Furthermore, the trace and determinant of A are
tr(A) = 4 + (−3) = 1, det(A) = 4 × (−3) − 2 × 9 = −30.
The sum of the eigenvalues is
2
λi = 6 + (−5) = 1 = tr(A),
i=1
while the product of the eigenvalues is
2
λi = 6 × (−5) = −30 = det(A).
i=1

For any real square matrix A with the eigenvalues λi = eig(A), the
eigenvalues of αA are αλi where α = 0 ∈ ℜ. This property becomes handy
when rescaling the matrices in some iteration formulae so that the rescaled
scheme becomes more stable. This is also the major reason why the pivoting
and removing/rescaling of exceptionally large elements works.
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 31

Mathematical Foundations 31

1.10 Definiteness of Matrices

A square symmetric matrix A is said to be positive definite if all its eigen-


values are strictly positive (λi > 0 where i = 1, 2, ..., n). By multiplying
(1.121) by uT , we have
uT Au = uT λu = λuT u, (1.128)
which leads to
uTAu
λ= . (1.129)
uT u
This means that
uTAu > 0, if λ > 0. (1.130)
In fact, for any vector v, the following relationship holds
v TAv > 0. (1.131)
Since v can be a unit vector, thus all the diagonal elements of A should be
strictly positive as well. If all the eigenvalues are non-negative or λi ≥ 0,
then the matrix is called positive semi-definite. In general, an indefinite
matrix can have both positive and negative eigenvalues.
The inverse of a positive definite matrix is also positive definite. For a
linear system Au = f where f is a known column vector, if A is positive
definite, then the system can be solved more efficiently by matrix decom-
position methods.

Example 1.10: In general, a 2 × 2 symmetric matrix A


 
αβ
A= ,
βγ
is positive definite if
αu21 + 2βu1 u2 + γu22 > 0,
for all u = (u1 , u2 )T = 0. The inverse of A is
 
1 γ −β
A−1 = ,
αγ − β 2 −β α
which is also positive definite.
As the eigenvalues of
 
12
A= ,
21
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 32

32 Introduction to Computational Mathematics

are λ = 3, −1, the matrix is indefinite. For another matrix


 
4 6
B= ,
6 20
we can find its eigenvalues using a similar method as discussed earlier, and
the eigenvalues are λ = 2, 22. So matrix B is positive definite. The inverse
of B
 
1 20 −6
B −1 = ,
44 −6 4
is also positive definite because B −1 has two eigenvalues: λ = 1/2, 1/22.
We have briefly reviewed the basic algebra of vectors and matrices as
well as some basic calculus, now can move onto more algorithm-related
topics such as algorithm complexity.
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 33

Chapter 2

Algorithmic Complexity, Norms and


Convexity

When analyzing an algorithm, we often discuss its computational complex-


ity. This also allows us to compare one algorithm with other algorithms in
terms of various performance measures.

2.1 Computational Complexity

The efficiency of an algorithm is often measured by its algorithmic complex-


ity or computational complexity. In the literature, this complexity is also
called Kolmogorov complexity. For a given problem size n, the complexity
is denoted using Big-O notations such as O(n2 ) or O(n log n).
For the sorting algorithm for a given number of n data entries, sort-
ing these numbers into either ascending or descending order will take the
computational time as a function of the problem size n. O(n) means a
linear complexity, while O(n2 ) has a quadratic complexity. That is, if n is
doubled, then the computational time or computational efforts will double
for linear complexity, but it will quadruple for quadratic complexity.
For example, the bubble sorting algorithm starts at the beginning of the
data set by comparing the first two elements. If the first is smaller than the
second, then swap them. This comparison and swap process continues for
each possible pair of adjacent elements. There are n × n pairs as we need
two loops over the whole data set, then the algorithm complexity is O(n2 ).
On the other hand, the quicksort algorithm uses a divide-and-conquer
approach via partition. By first choosing a pivot element, we then put all
the elements into two sublists with all the smaller elements before the pivot
and all the greater elements after it. Then, the sublists are recursively
sorted in a similar manner. This algorithm will result in a complexity of
O(n log n). The quicksort is much more efficient than the bubble algorithm.

33
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 34

34 Introduction to Computational Mathematics

For n = 1000, then the bubble algorithm will need about O(n2 ) ≈ O(106 )
calculations, while the quicksort only requires O(n log n) ≈ O(3 × 103 )
calculations (at least two orders less in this simple case).

2.2 NP-Complete Problems

In mathematical programming, an easy or tractable problem is a problem


whose solution can be obtained by computer algorithms with a solution time
(or number of steps) as a polynomial function of problem size n. Algorithms
with polynomial-time are considered efficient.
A problem is called a P-problem or polynomial-time problem if the
number of steps needed to find the solution is bounded by a polynomial in
n and it has at least one algorithm to solve it.
On the other hand, a hard or intractable problem requires a solution
time that is an exponential function of n, and thus exponential-time al-
gorithms are considered inefficient. A problem is called nondeterministic
polynomial (NP) if its solution can only be guessed and evaluated in poly-
nomial time, and there is no known rule to make such a guess (hence,
nondeterministic). Consequently, guessed solutions cannot be guaranteed
to be optimal or even near optimal.
In fact, no known algorithms exist to solve NP-hard problems, and only
approximate solutions or heuristic solutions are possible. Thus, heuristic
and metaheuristic methods are very promising in obtaining approximate
solutions or nearly optimal/suboptimal solutions. We will introduce some
popular nature-inspired metaheuristic algorithms, especially those based on
swarm intelligence, in the last two chapters of the book in Part VI.
A problem is called NP-complete if it is an NP-hard problem and all
other problems in NP are reducible to it via certain reduction algorithms.
The reduction algorithm has a polynomial time. An example of an NP-hard
problem is the Travelling Salesman Problem (TSP), and its objective is to
find the shortest route or minimum travelling cost to visit all n given cities
exactly once and then return to the starting city.
The solvability of NP-complete problems (whether by polynomial time
or not) is still an unsolved problem, which is why Clay Mathematical Insti-
tute is offering a million dollar reward for a formal proof. Most real-world
problems are NP-hard, and thus any advance in dealing with NP problems
will have a profound impact on many applications.
The analysis of an algorithm often involves the calculations of norms
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 35

Algorithmic Complexity, Norms and Convexity 35

and other quantities. In addition, the Hessian matrices are often used
in optimization while the spectral radius of a matrix is widely used in
the stability analysis of an iteration procedure. We will now review these
fundamental concepts.

2.3 Vector and Matrix Norms

For a vector v, its p-norm is denoted by vp and defined as

n
vp = ( |vi |p )1/p , (2.1)
i=1
where p is a positive integer. From this definition, it is straightforward
to show that the p-norm satisfies the following conditions: v ≥ 0 for
all v, and v = 0 if and only if v = 0. This is the non-negativeness
condition. In addition, for any real number α, we have the scaling condition:
αv = αv.
The three most common norms are one-, two- and infinity-norms when
p = 1, 2, and ∞, respectively. For p = 1, the one-norm is just the simple
sum of the absolute value of each component |vi |, while the 2-norm (or
two-norm) v2 for p = 2 is the standard Euclidean norm because v2 is
the length of the vector v
√ 
v2 = v · v = v12 + v22 + ... + vn2 , (2.2)
where the notation u · v is the inner product of two vectors u and v.
For the special case p = ∞, we denote vmax as the maximum absolute
value of all the components vi , or vmax ≡ max |vi | = max(|v1 |, |v2 |, ..., |vn |).
n  n
  vi p 1/p
v∞ = lim ( |vi |p )1/p = lim vmax
p
| |
p→∞
i=1
p→∞
i=1
vmax
n
p 1  vi p p1  vi p 1p
= lim (vmax )p ( | | ) = vmax lim ( | | ) . (2.3)
p→∞ vmax p→∞
i=1
vmax
Since |vi /vmax | ≤ 1 and for all terms |vi /vmax | < 1, we have
|vi /vmax |p → 0, for p → ∞.
Thus, the only non-zero term in the sum is the one with |vi /vmax | = 1,
which means that
n

lim |vi /vmax |p = 1. (2.4)
p→∞
i=1
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 36

36 Introduction to Computational Mathematics

Therefore, we finally have


v∞ = vmax = max |vi |. (2.5)
For the uniqueness of norms, it is necessary for the norms to satisfy the
triangle inequality
u + v ≤ u + v. (2.6)
It is straightforward to check that for p = 1, 2, and ∞ from their definitions,
they indeed satisfy the triangle inequality. The equality occurs when u = v.
It remains as an exercise to check this inequality is true for any p > 0.
 T
Example 2.1: For two 4-dimensional vectors u = 5 2 3 −2 and
 T
v = −2 0 1 2 , then the p-norms of u are
u1 = |5| + |2| + |3| + | − 2| = 12,

u2 = 52 + 22 + 32 + (−2)2 = 42,
and
u∞ = max(5, 2, 3, −2) = 5.
Similarly, v1 = 5, v2 = 3 and v∞ = 2. We know that
⎛ ⎞ ⎛ ⎞
5 + (−2) 3
⎜ 2 + 0 ⎟ ⎜2⎟
u+v=⎜ ⎝ 3 + 1 ⎠ = ⎝ 4 ⎠,
⎟ ⎜ ⎟

−2 + 2 0

and its corresponding norms are u + v1 = 9, u + v2 = 29 and
u + v∞ = 4. It is straightforward to check that
u + v1 = 9 < 12 + 5 = u1 + v1 ,
√ √
u + v2 = 29 < 42 + 3 = u2 + v2 ,
and
u + v∞ = 4 < 5 + 4 = u∞ + v∞ .
Matrices are the extension of vectors, so we can define the corresponding
norms. For an m × n matrix A = [aij ], a simple way to extend the norms is
to use the fact that Au is a vector for any vector u = 1. So the p-norm
is defined as
m  n
Ap = ( |aij |p )1/p . (2.7)
i=1 j=1
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 37

Algorithmic Complexity, Norms and Convexity 37

Alternatively, we can consider that all the elements or entries aij form
a vector. A popular norm, called Frobenius form (also called the Hilbert-
Schmidt norm), is defined as
m 
 n 1/2
AF = a2ij . (2.8)
i=1 j=1

In fact, Frobenius norm is a 2-norm.


Other popular norms are based on the absolute column sum or row sum.
For example,
m
A1 = max ( |aij |), (2.9)
1≤j≤n
i=1

which is the maximum of the absolute column sum, while


n
A∞ = max ( |aij |), (2.10)
1≤i≤m
j=1

is the maximum of the absolute row sum. The max norm is defined as
Amax = max{|aij |}. (2.11)
From the definitions of these norms, we know that they satisfy the non-
negativeness condition A ≥ 0, the scaling condition αA = |α|A, and
the triangle inequality A + B ≤ A + B.
 
2 3
Example 2.2: For the matrix A = , it is easy to calculate that
4 −5

AF = A2 = 22 + 32 + 42 + (−5)2 = 54,
⎡ ⎤
|2| + |3|
A∞ = max ⎣ ⎦ = 9,
|4| + | − 5|
and Amax = 5.

2.4 Distribution of Eigenvalues

For any n × n matrix A = [aij ], there is an important theorem, called


Gerschgorin theorem, concerning the locations of all the eigenvalues λi of
A.
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 38

38 Introduction to Computational Mathematics

Let us first define a number (or radius) ri by


n
 n

ri ≡ |aij | = |aij | − |aii |, (i = 1, 2, ..., n), (2.12)
j=1,j=i j=1

and then denote Ωi as the circle, |z − aii | ≤ ri , centred at aii with a radius
ri in the complex plane z ∈ C. Such circles are often called Gerschgorin’s
circles or discs.
Since the eigenvalues λi (counting the multiplicity of roots) and their
corresponding eigenvectors u(i) are determined by
Au(i) = λi u(i) , (2.13)
(i)
for all i = 1, 2, ..., n, we have each component uk (k = 1, 2, ..., n) satisfying
n
 (i) (i)
akj uj = λk uk , (2.14)
j=1
(i) (i) (i) (i)
where u(i) = (u1 , u2 , ..., un )T and uj is the j-th component of the
vector u(i) . Furthermore, we also define the largest absolute component of
u(i) (or its infinity norm) as
(i)
|u(i) | = u(i) ∞ = max {|uj |}. (2.15)
1≤j≤n

As the length of an eigenvector is not zero, we get |u(i) | > 0. We now have
(i)
 (i) (i)
akk uk + akj uj = λk uk , (2.16)
j=k

whose norm leads to


#  #
(i)  (i)
 j=k akj uj  j=k |akj ||u |
 
|λk − akk | =  (i)
 ≤ (i)
= |akj | = rk .
 uk  |u |
j=k

This is equivalent to the following simple inequality (for any eigenvalue λ)


|λ − aii | ≤ ri , (2.17)
which is essentially the Gerschgorin circle theorem. Geometrically speaking,
this important theorem states that the eigenvalues λi of A must be inside
one of these circles Ωi . In addition, if p of these circles form a connected
set S which is disjointed from the remaining n − p circles, it can be proven
that there are exactly p of the eigenvalues are inside the set S, here we have
to count the multiplicity of roots. Furthermore, if A is symmetric and real
(or A = AT ), all the eigenvalues are real, and thus they all fall on the real
axis.
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 39

Algorithmic Complexity, Norms and Convexity 39

s 2

❝ ❝ ❝ s
−5 0 5
s −2

−4

Fig. 2.1 Gerschgorin circles of eigenvalues.

First, let us look at a simple example.

Example 2.3: For example, the matrix A


⎛ ⎞
2 2 0
A = ⎝ 2 −2 4 ⎠,
4 0 0

has three eigenvalues λ. These eigenvalues should satisfy

|λ − 2| ≤ r1 = |2| + |0| = 2,

|λ − (−2)| ≤ r2 = |2| + |4| = 6,

|λ − 0| ≤ |4| + |0| = 4.

These circles are shown in Fig. 2.1.


Following the method of finding the eigenvalues discussed earlier in this
book, we have the eigenvalues of A

λi = 4, −2 + 2i, −2 − 2i,

which are marked as solid dots in the same figure. We can see that all these
eigenvalues λi are within the union of all the Greschgorin discs.
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 40

40 Introduction to Computational Mathematics

Tridiagonal matrices are important in many applications. For example,


the finite difference scheme (to be discussed in later chapters) in an equally-
spaced grid often results in a simple tridiagonal matrix
⎛ ⎞
ba 0
⎜a b a ⎟
⎜ ⎟
⎜ . . ⎟
A=⎜ a b
⎜ . ⎟,
⎟ (2.18)
⎜ . . . .

⎝ . . a⎠
0 a b
In this case, the radius ri becomes
r1 = rn = |a|,

ri = |a| + |a| = 2|a|, (i = 2, 3, ..., n − 1). (2.19)


All the eigenvalues of A will satisfy
|λ − b| ≤ ri , (2.20)
or
−2|a| + b ≤ λ ≤ b + 2|a|, (2.21)
where we have used that r1 = rn = |a| ≤ 2|a|.
In many applications, we also have to use the inverse A−1 of A. The
eigenvalues Λ of the inverse A−1 are simply the reciprocals of the eigenval-
ues λ of A. That is, Λ = 1/λ. Now we have
1
−2|a| + b ≤ ≤ b + 2|a|, (2.22)
Λ
or
1 1
≤Λ≤ . (2.23)
b + 2|a| b − 2|a|
For example, if a = 1 and b = 4, we have 1/6 ≤ Λ ≤ 1/2. However, if
a = −1 and b = −4, we have −1/2 ≤ Λ ≤ −1/6.
Eigenvalues and eigenvectors have important applications. For example,
they can be used to carry out eigendecomposition of the original square ma-
trix A. If A has n eigenvalues λi (i = 1, 2, ..., n) with n linearly independent
eigenvectors ui . Then, A can be decomposed in terms of its eigenvalues
and eigenvectors as
A = P ΛP −1 , (2.24)
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 41

Algorithmic Complexity, Norms and Convexity 41

where P is an n×n matrix formed by using the eigenvector ui for eigenvalue


λi to form its i-th column. That is
 
P = u1 u2 ... un , (2.25)
which is sometimes called the modal matrix.
Furthermore, the diagonal eigenvalue matrix is given by
⎛ ⎞
λ1 0
⎜ λ2 ⎟
⎜ ⎟
Λ=⎜ .. ⎟. (2.26)
⎝ . ⎠
0 λn
This kind of matrix decomposition is often referred to as eigendecomposi-
tion. It is also often called spectral decomposition because the eigenvalues
form the spectrum of A.
In the case of a symmetric matrix, all its eigenvalues are real, and the
matrix P becomes orthogonal. An orthogonal matrix Q is a real square
matrix satisfying
QQT = QT Q = I, (2.27)
where I is an identity matrix with the same size n × n as Q. For any
invertible matrix, we have QQ−1 = Q−1 Q = I, which means that
Q−1 = QT . (2.28)
A distinct advantage of an orthogonal matrix is that its inverse can easily
be obtained by simple transposition. Therefore, for a symmetric matrix A,
we have Q = P and
A = QΛQT . (2.29)
Eigendecomposition can be used to invert the matrix, especially for a
symmetric matrix. If a matrix A is invertable and all its eigenvalues are
non-zero, we have
A−1 = QΛ−1 Q−1 , (2.30)
where
⎛ ⎞
1/λ1 0
⎜ 1/λ2 ⎟
Λ−1 = ⎜
⎜ ⎟
.. ⎟. (2.31)
⎝ . ⎠
0 1/λn
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 42

42 Introduction to Computational Mathematics

Let us use a simple example to demonstrate this.

Example 2.4: Since the following matrix


⎛ ⎞
4 12 0
A = ⎝ 12 11 0 ⎠,
0 0 30
is symmetric, all its eigenvalues are real. Using the standard method of
computing the eigenvalues and eigenvectors, we have its eigenvalues
λ1 = 20, λ2 = −5, λ3 = 30,
and their corresponding eigenvectors
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
3/5 4/5 0
u1 = ⎝ 4/5 ⎠, u2 = ⎝ −3/5 ⎠, u3 = ⎝ 0 ⎠.
0 0 1
It is straightforward to verify that these three eigenvectors are mutually
orthogonal. For example,
3 4 4 −3
u1 · u2 = uT1 u2 = × + × + 0 × 0 = 0.
5 5 5 5
As the length of the eigenvectors are normalized to unity, these three eigen-
vectors are thus orthonormal, too. The modal matrix Q can be formed by
these three eigenvectors
⎛ ⎞
  3/4 4/5 0
Q = u1 u2 u3 = ⎝ 4/5 −3/5 0 ⎠,
0 0 1
which is orthogonal. In fact, we have
⎛ ⎞
3/5 4/5 0
Q−1 = ⎝ 4/5 −3/5 0 ⎠ = QT .
0 0 1
By defining Λ as
⎛ ⎞
λ1 0 0
Λ = ⎝ 0 λ2 0 ⎠,
0 0 λ3
we have
⎛ ⎞
4 12 0
A = ⎝ 12 11 0 ⎠ = QΛQT
0 0 30
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 43

Algorithmic Complexity, Norms and Convexity 43

⎛ ⎞⎛ ⎞⎛ ⎞
3/5 4/5 0 20 0 0 3/5 4/5 0
= ⎝ 4/5 −3/5 0 ⎠⎝ 0 −5 0 ⎠⎝ 4/5 −3/5 0 ⎠.
0 0 1 0 0 30 0 0 1
In addition, the inverse of A can be obtained by
⎛ ⎞
−11 12 0
1 ⎝
A−1 = QΛ−1 QT = 12 −4 0 ⎠.
100
0 0 10/3
Since the determinant of A has the property
det(A) = det(Q) det(Λ) det(Q−1 ) = det(Λ), (2.32)
it is not necessary to normalize the eigenvectors. In fact, if we repeat the
above example with eigenvectors
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
3 4 0
u1 = ⎝ 4 ⎠, u2 = ⎝ −3 ⎠, u3 = ⎝ 0 ⎠, (2.33)
0 0 2
the eigendecomposition will lead to the same matrix A.
Another important question is how to construct a matrix for a given set
of eigenvalues and their corresponding eigenvectors (not necessarily mutu-
ally orthogonal). This is basically the reverse of the eigendecomposition
process. Now let us look at another example.

Example 2.5: For given eigenvalues λ1 = 1/4, λ2 = −1/5 and λ3 = −2,


and their corresponding eigenvectors
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
3 4 3
u1 = ⎝ 4 ⎠, u2 = ⎝ −3 ⎠, u3 = ⎝ 4 ⎠.
−2 0 25
We will now construct the original matrix A. First we have to use the
eigenvectors to form a modal matrix P
⎛ ⎞
  3 4 3
P = u1 u2 u3 = ⎝ 4 −3 4 ⎠.
−2 0 25
Its inverse is
⎛ ⎞
1/9 4/27 −1/27
P −1 = ⎝ 4/25 −3/25 0 ⎠.
2/225 8/675 1/27
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 44

44 Introduction to Computational Mathematics

Therefore, the matrix A becomes


A = P ΛP −1 ,
where the eigenvalue matrix Λ is given by
⎛ ⎞ ⎛ ⎞
λ1 0 0 1/4 0 0
Λ = ⎝ 0 λ2 0 ⎠ = ⎝ 0 −1/5 0 ⎠.
0 0 λ3 0 0 −2
Finally, we have
⎛ ⎞
−49/500 17/125 −1/4
A = P ΛP −1 = ⎝ 17/125 −7/375 −1/3 ⎠.
−1/2 −2/3 −11/6
It is straightforward to verify that the eigenvalues of A are 1/4, −1/5 and
−2, which are indeed the same as Λ.

2.5 Spectral Radius of Matrices

Another important concept related to the eigenvalues of a matrix is the


spectral radius of a square matrix. If λi (i = 1, 2, ..., n) are the eigenvalues
(either real or complex) of a matrix A, then the spectral radius ρ(A) is
defined as
ρ(A) ≡ max {|λi |}, (2.34)
1≤i≤n

which is the maximum absolute value of all the eigenvalues. Geometrically


speaking, if we plot all the eigenvalues of the matrix A on the complex
plane, and draw a circle with its centre at the origin (0, 0) on a complex
plane so that it encloses all the eigenvalues inside, then the minimum radius
of such a circle is the spectral radius.
For any 0 < p ∈ ℜ, the eigenvectors have non-zero norms u = 0 and
up  =
 0. Using Au = λu and taking the norms, we have
|λ|p up  = (λu)p  = (Au)p  ≤ Ap up . (2.35)
By dividing both sides of the above equation by up  = 0, we reach the
following inequality
1/p
|λ| ≤ Ap  , (2.36)
which is valid for any eigenvalue. Therefore, it should also be valid for the
maximum absolute value or ρ(A). We finally have
ρ(A) ≤ Ap 1/p , (2.37)
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 45

Algorithmic Complexity, Norms and Convexity 45

which becomes an equality when p → ∞.

Example 2.6: Let us now calculate the spectral radius of the following
matrix
⎛ ⎞
−5 1/2 1/2
A = ⎝ 0 −1 −2 ⎠.
1 0 −3/2
From Gerschgorin’s theorem, we know that
|λ − (−5)| ≤ |1/2| + |1/2| = 1, |λ − (−1)| ≤ 2, |λ − (−3/2)| ≤ 1.
These Gerschgorin discs are shown in Figure 2.2. Since two discs (Ω2 and
Ω3 ) form a connected region (S), which means that there are exactly two
eigenvalues inside this connected region, and there is a single eigenvalue
inside the isolated disc (Ω1 ) centred at (−5, 0).

Ω1 1
Ω3 s
s❝ ❝ ❝
−6 −4 −2 s
−1
ρ(A) Ω2
−2

Fig. 2.2 Spectral radius of a square matrix.

We can calculate its eigenvalues by


det(A − λI) = 0,
and we have
λ1 = −5.199, λ2 = −1.150 + 0.464i, λ3 = −1.150 − 0.464i.
These three eigenvalues are marked as solid dots in Figure 2.2 inside the
three Gerschgorin discs. The spectral radius is
ρ(A) = max{|λi |} = 5.199,
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 46

46 Introduction to Computational Mathematics

which is also shown in the same figure. Indeed, there are two eigenvalues
(λ2 and λ3 ) inside the connected region (S) and there is a single (λ1 ) inside
the isolated disc (Ω1 ).
The spectral radius is very useful in determining whether an iteration
algorithm is stable or not. Most iteration schemes can be written as
u(n+1) = Au(n) + b, (2.38)
where b is a known column vector and A is a square matrix with known
coefficients. The iterations start from an initial guess u(0) (often, set u(0) =
0), and proceed to the approximate solution u(n+1) . For the iteration
procedure to be stable, it requires that
ρ(A) ≤ 1.
If ρ(A) > 1, then the algorithm will not be stable and any initial errors
will be amplified in each iteration.
In the case of A is a lower (or upper) matrix
⎛ ⎞
a11 0 ... 0
⎜ a21 a22 ... 0 ⎟
⎜ ⎟
A=⎜ . .. ⎟, (2.39)
⎝ .. . ⎠
an1 an2 ... ann
then its eigenvalues are the diagonal entries: a11 , a22 , ..., ann . In addition,
the determinant of the triangular matrix A is simply the product of its
diagonal entries. That is
n

det(A) = |A| = aii = a11 a22 ...ann . (2.40)
i=1

Obviously, a diagonal matrix is just a special case of a triangular matrix.


Thus, the properties for its inverse, eigenvalues and determinant are the
same as the above.
These properties are convenient in determining the stability of an iter-
ation scheme such as the Jacobi-type and Gauss-Seidel iteration methods
where A may contain triangular matrices.

Example 2.7: Determine if the following iteration is stable or not


⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞
u1 223 u1 2
⎝ u2 ⎠ = ⎝ 7 6 5 ⎠⎝ u2 ⎠ + ⎝ −2 ⎠.
u2 n+1
045 u3 1/2
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 47

Algorithmic Complexity, Norms and Convexity 47

We know the eigenvalues of


⎛ ⎞
22 3
A = ⎝7 6 5 ⎠,
04 5
are λ1 = 0.6446 + 1.5773i, λ2 = 0.6446 − 1.5773i, λ3 = 11.7109. Then the
spectral radius is
ρ(A) = max (|λi |) ≈ 11.7109 > 1,
i∈{1,2,3}

therefore, the iteration process will not be convergent.

2.6 Hessian Matrix

The gradient vector of a multivariate function f (x) is defined as a column


vector
 T
∂f
G(x) ≡ ∇f (x) ≡ ∂x , ∂f , . . . , ∂x
1 ∂x2
∂f
n
, (2.41)

where x = (x1 , x2 , ..., xn )T is a vector. As the gradient ∇f (x) of a linear


function f (x) is always a constant vector k, then any linear function can
be written as
f (x) = kT x + b, (2.42)
where b is a vector constant.
The second derivatives of a generic function f (x) form an n × n matrix,
called Hessian matrix, given by
⎛ ∂2 f ∂2 f

2 ... ∂x1 ∂xn
⎜ ∂x. 1 .. ⎟
H(x) ≡ ∇2 f (x) ≡ ⎜ ⎝ . . . ⎠,
⎟ (2.43)
∂2 f ∂2 f
∂x1 ∂xn . . . ∂xn 2
2 2
which is symmetric due to the fact that ∂x∂i ∂x f
j
= ∂x∂j ∂x
f
i
. When the Hes-
sian matrix H(x) = A is a constant matrix (the values of its entries are
independent of x), the function f (x) is called a quadratic function, and can
subsequently be written as the following generic form
1
f (x) = xTAx + kT x + b. (2.44)
2
The use of the factor 1/2 in the expression is to avoid the appearance
everywhere of a factor 2 in the derivatives, and this choice is purely for
convenience.
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 48

48 Introduction to Computational Mathematics

Example 2.8: The gradient of f (x, y, z) = xy − y exp(−z) + z cos(x) is


simply
 T
G = y − z sin x x − e−z ye−z + cos x .
The Hessian matrix is
⎛ 2 ⎞
∂ f ∂2f ∂2f ⎛ ⎞
∂x2 ∂x∂y ∂x∂z −z cos x 1 − sin x
⎜ ⎟
⎜ ⎟ ⎜ ⎟
⎜ ∂2f ∂2f ∂2f
⎟ ⎜ ⎟
H=⎜ ⎟=⎜ −z ⎟.
⎜ ∂y∂x ∂y 2 ∂y∂z ⎟ ⎜ 1 0 e ⎟
⎜ ⎟ ⎝ ⎠
⎝ ⎠
∂2f ∂2f ∂2f −z −z
− sin x e −ye
∂z∂x ∂z∂y ∂z 2

2.7 Convexity

Knowing the properties of a function can be useful for finding the maxi-
mum or minimum of the function. In fact, in mathematical optimization,
nonlinear problems are often classified according to the convexity of the
defining function(s). Geometrically speaking, an object is convex if for any
two points within the object, every point on the straight line segment join-
ing them is also within the object. Examples are a solid ball, a cube or
a pyramid. Obviously, a hollow object is not convex. Three examples are
given in Fig. 2.3.

(a) (b)

Fig. 2.3 Convexity: (a) non-convex, and (b) convex.

Mathematically speaking, a set S ∈ ℜn in a real vector space is called


a convex set if
tx + (1 − t)y ∈ S, ∀(x, y) ∈ S, t ∈ [0, 1]. (2.45)
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 49

Algorithmic Complexity, Norms and Convexity 49

f (x)

rA
❍❍
❍❍ Q
❍❍
❍ B

❍r
P


0 ✛ Lβ ✲✛ Lα ✲ x

Fig. 2.4 Convexity of a function f (x). Chord AB lies above the curve segment joining
A and B. For any point P , we have Lα = αL, Lβ = βL and L = |xB − xA |.

A function f (x) defined on a convex set Ω is called convex if and only if it


satisfies
f (αx + βy) ≤ αf (x) + βf (y), ∀x, y ∈ Ω, (2.46)
and
α ≥ 0, β ≥ 0, α + β = 1. (2.47)
Geometrically speaking, the chord AB lies above the curve segment AP B
joining A and B (see Fig. 2.4). For example, for any point P between A
and B, we have xP = αxA + βxB with
Lα xB − xP
α= = ≥ 0,
L xB − xA
Lβ xP − xA
=β= ≥ 0, (2.48)
L xB − xA
which indeed gives α + β = 1. In addition, we know that
xA (xB − xP ) xB (xP − xA )
αxA + βxB = + = xP . (2.49)
xB − xA xB − xA
The value of the function f (xP ) at P should be less than or equal to the
weighted combination αf (xA ) + βf (xB ) (or the value at point Q). That is
f (xP ) = f (αxA + βxB ) ≤ αf (xA ) + βf (xB ). (2.50)

Example 2.9: For example, the convexity of f (x) = x2 − 1 requires


(αx + βy)2 − 1 ≤ α(x2 − 1) + β(y 2 − 1), ∀x, y ∈ ℜ, (2.51)
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 50

50 Introduction to Computational Mathematics

where α, β ≥ 0 and α + β = 1. This is equivalent to


αx2 + βy 2 − (αx + βy)2 ≥ 0, (2.52)
where we have used α + β = 1. We now have
αx2 + βy 2 − α2 x2 − 2αβxy − β 2 y 2

= α(1 − α)(x − y)2 = αβ(x − y)2 ≥ 0, (2.53)


which is always true because α, β ≥ 0 and (x − y)2 ≥ 0. Therefore, f (x) =
x2 − 1 is convex for ∀x ∈ ℜ.
A function f (x) on Ω is concave if and only if g(x) = −f (x) is convex.
An interesting property of a convex function f is that the vanishing of the
gradient df /dx|x∗ = 0 guarantees that the point x∗ is the global minimum or
maximum of f , depending on the sign of f ′′ (x), or the positive definiteness
of its Hessian in general. If a function is not convex or concave, then it is
much more difficult to find its global minima or maxima.
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 51

Chapter 3

Ordinary Differential Equations

Most mathematical models in physics, engineering, and applied mathemat-


ics are formulated in terms of differential equations. If the variables or
quantities (such as velocity, temperature, pressure) change with other in-
dependent variables such as spatial coordinates and time, their relationship
can in general be written as a differential equation or even a set of differ-
ential equations.

3.1 Ordinary Differential Equations

An ordinary differential equation (ODE) is a relationship between a func-


tion y(x) (of an independent variable x) and its derivatives y ′ , y ′′ , ..., y (n) .
It can be written in a generic form as

Ψ(x, y, y ′ , y ′′ , ..., y (n) ) = 0. (3.1)

The solution of the equation is a function y = f (x), satisfying the equation


for all x in a given domain Ω.
The order of the differential equation is equal to the order n of the
highest derivative in the equation. For example, the Riccati equation:

y ′ + a(x)y 2 + b(x)y = c(x), (3.2)

is a first-order ODE, and the Euler Equation

x2 y ′′ + a1 xy ′ + a0 y = 0, (3.3)

is a second-order ODE. The degree of the equation is defined as the power


to which the highest derivative occurs. Therefore, both the Riccati equation
and Euler equation are of the first degree.

51

You might also like