Intro To Comp
Intro To Comp
Yang-Comp-Maths2014 page v
Preface
v
November 3, 2014 11:34 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page vi
London, 2014
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page vii
Contents
Preface v
I Mathematical Foundations 1
1. Mathematical Foundations 3
1.1 The Essence of an Algorithm . . . . . . . . . . . . . . . . 3
1.2 Big-O Notations . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Differentiation and Integration . . . . . . . . . . . . . . . 6
1.4 Vector and Vector Calculus . . . . . . . . . . . . . . . . . 10
1.5 Matrices and Matrix Decomposition . . . . . . . . . . . . 15
1.6 Determinant and Inverse . . . . . . . . . . . . . . . . . . . 20
1.7 Matrix Exponential . . . . . . . . . . . . . . . . . . . . . . 24
1.8 Hermitian and Quadratic Forms . . . . . . . . . . . . . . . 26
1.9 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . 28
1.10 Definiteness of Matrices . . . . . . . . . . . . . . . . . . . 31
vii
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page viii
II Numerical Algorithms 71
5. Roots of Nonlinear Equations 73
5.1 Bisection Method . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 Simple Iterations . . . . . . . . . . . . . . . . . . . . . . . 75
5.3 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . 76
5.4 Iteration Methods . . . . . . . . . . . . . . . . . . . . . . . 78
5.5 Numerical Oscillations and Chaos . . . . . . . . . . . . . . 81
6. Numerical Integration 85
6.1 Trapezium Rule . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2 Simpson’s Rule . . . . . . . . . . . . . . . . . . . . . . . . 87
6.3 Gaussian Integration . . . . . . . . . . . . . . . . . . . . . 89
Contents ix
8. Interpolation 117
8.1 Spline Interpolation . . . . . . . . . . . . . . . . . . . . . . 117
8.1.1 Linear Spline Functions . . . . . . . . . . . . . . . 117
8.1.2 Cubic Spline Functions . . . . . . . . . . . . . . . 118
8.2 Lagrange Interpolating Polynomials . . . . . . . . . . . . . 123
8.3 Bézier Curve . . . . . . . . . . . . . . . . . . . . . . . . . 125
Contents xi
17. Data Mining, Neural Networks and Support Vector Machine 243
17.1 Clustering Methods . . . . . . . . . . . . . . . . . . . . . . 243
17.1.1 Hierarchy Clustering . . . . . . . . . . . . . . . . . 243
17.1.2 k-Means Clustering Method . . . . . . . . . . . . 244
17.2 Artificial Neural Networks . . . . . . . . . . . . . . . . . . 247
17.2.1 Artificial Neuron . . . . . . . . . . . . . . . . . . . 247
17.2.2 Artificial Neural Networks . . . . . . . . . . . . . 248
17.2.3 Back Propagation Algorithm . . . . . . . . . . . . 250
17.3 Support Vector Machine . . . . . . . . . . . . . . . . . . . 251
17.3.1 Classifications . . . . . . . . . . . . . . . . . . . . 251
17.3.2 Statistical Learning Theory . . . . . . . . . . . . . 252
17.3.3 Linear Support Vector Machine . . . . . . . . . . 253
17.3.4 Kernel Functions and Nonlinear SVM . . . . . . . 256
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page xii
Bibliography 319
Index 325
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 1
Part I
Mathematical Foundations
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 2
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 3
Chapter 1
Mathematical Foundations
3
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 4
Mathematical Foundations 5
or
f = o(g). (1.15)
If g > 0, f = o(g) is equivalent to f ≪ g. For example, for ∀x ∈ R, we
have
x2
ex ≈ 1 + x + O(x2 ) ≈ 1 + x + + o(x).
2
9.3248 × 10157 . The difference between these two values is 7.7740 × 10154 ,
which is still very large, though three orders smaller than the leading ap-
proximation.
Mathematical Foundations 7
From the basic definition of the derivative, we can verify that differ-
entiation is a linear operator. That is to say that for any two functions
f (x), g(x) and two constants α and β, the derivative or gradient of a linear
combination of the two functions can be obtained by differentiating the
combination term by term. We have
[αf (x) + βg(x)]′ = αf ′ (x) + βg ′ (x), (1.17)
which can easily be extended to multiple terms.
If y = f (u) is a function of u, and u is in turn a function of x, we want
to calculate dy/dx. We then have
dy dy du
= · , (1.18)
dx du dx
or
df [u(x)] df (u) du(x)
= · . (1.19)
dx du dx
This is the well-known chain rule.
It is straightforward to verify the product rule
(uv)′ = uv ′ + vu′ . (1.20)
If we replace v by 1/v = v −1 and apply the chain rule
d(v −1 ) dv 1 dv
= −1 × v −1−1 × =− 2 , (1.21)
dx dx v dx
we have the formula for quotients or the quotient rule
d( uv ) d(uv −1 ) −1 dv du v du − u dv
= = u( 2 ) + v −1 = dx 2 dx . (1.22)
dx dx v dx dx v
For a smooth curve, it is relatively straightforward to draw a tangent
line at any point; however, for a smooth surface, we have to use a tangent
plane. For example, we now want to take the derivative of a function of two
independent variables x and y, that is z = f (x, y) = x2 +y 2 /2. The question
is probably ‘with respect to’ what? x or y? If we take the derivative with
respect to x, then will it be affected by y? The answer is we can take
the derivative with respect to either x or y while taking the other variable
as constant. That is, we can calculate the derivative with respect to x in
the usual sense by assuming that y = constant. Since there is more than
one variable, we have more than one derivative and the derivatives can be
associated with either the x-axis or y-axis. We call such derivatives partial
derivatives, and use the following notation
∂z ∂f (x, y) ∂f f (x + h, y) − f (x, y)
≡ ≡ fx ≡ = lim . (1.23)
∂x ∂x ∂x y h→0,y=const h
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 8
The notation ∂f
∂x |y emphasises the fact that y = constant; however, we often
omit |y and simply write ∂f∂x as we know this fact is implied.
Similarly, the partial derivative with respect to y is defined by
∂z ∂f (x, y) ∂f f (x, y + k) − f (x, y)
≡ ≡ fy ≡ = lim . (1.24)
∂y ∂y ∂y x x=const,k→0 k
Then, the standard differentiation rules for univariate functions such as
f (x) apply. For example, for z = f (x, y) = x2 + y 2 /2, we have
∂f dx2
= + 0 = 2x,
∂x dx
and
∂f d(y 2 /2) 1
=0+ = × 2y = y,
∂y dy 2
where the appearance of 0 highlights the fact that dy/dx = dx/dy = 0 as
x and y are independent variables.
Differentiation is used to find the gradient for a given function. Now a
natural question is how to find the original function for a given gradient.
This is the integration process, which can be considered as the reverse of
the differentiation process. Since we know that
d sin x
= cos x, (1.25)
dx
that is, the gradient of sin x is cos x, we can easily say that the original
function is sin x since its gradient is cos x. We can write
cos x dx = sin x + C, (1.26)
where C is the constant of integration. Here dx is the standard notation
showing the integration is with respect to x, and we usually call this the
integral. The function cos x is called the integrand.
The integration constant comes from the fact that a family of curves
shifted by a constant will have the same gradient at their corresponding
points. This means that the integration can be determined up to an arbi-
trary constant. For this reason, we call it an indefinite integral.
Integration is more complicated than differentiation in general. Even
when we know the derivative of a function, we have to be careful. For
1
example, we know that (xn+1 )′ = (n + 1)xn or ( n+1 xn+1 )′ = xn for any n
integers, so we can write
1
xn dx = xn+1 + C. (1.27)
n+1
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 9
Mathematical Foundations 9
u w
a v b v
v
u u
The zero vector 0 is a special case where all its components are zeros.
The multiplication of a vector u with a scalar or constant α ∈ ℜ is carried
out by the multiplication of each component,
αu = (αu1 , αu2 , ..., αun ). (1.36)
Thus, we have
−u = (−u1 , −u2 , ..., −un ). (1.37)
The dot product or inner product of two vectors x and y is defined as
n
x · y = x1 y1 + x2 y2 + ... + xn yn = xi yi , (1.38)
i=1
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 11
Mathematical Foundations 11
which is a real number. The length or norm of a vector x is the root of the
dot product of the vector itself,
n
√
|x| = x = x · x = x2i . (1.39)
i=1
x2 x3 x3 x1 x1 x2
=
i+
j+
k
y2 y3 y3 y1 y1 y2
where θ is the angle between the two vectors. In addition, the vector
u = x × y is perpendicular to both x and y, following a right-hand rule.
It is straightforward to check that the cross product has the following
properties:
x × y = −y × x, (x + y) × z = x × z + y × z, (1.47)
and
(αx) × (βy) = (αβ)x × y, α, b ∈ ℜ. (1.48)
A very special case is u × u = 0. For unit vectors, we have
i × j = k, j × k = i, k × i = j. (1.49)
Example 1.3: For two 3-D vectors u = (4, 5, −6) and v = (2, −2, 1/2),
their dot product is
u · v = 4 × 2 + 5 × (−2) + (−6) × 1/2 = −5.
As their moduli are
√
||u|| = 42 + 52 + (−6)2 = 77,
√
||v|| = 22 + (−2)2 + (1/2)2 = 33/2,
we can calculate the angle θ between the two vectors. We have
u·v −5 10
cos θ = =√ √ =− √ ,
||u||||v|| 77 × 33/2 11 21
or
10
θ = cos−1 (− √ ) ≈ 101.4◦.
11 21
Their cross product is
w =u×v
Mathematical Foundations 13
while
√
√ 33
uv sin θ = 77 × × sin(101.4◦ ) ≈ 24.70 = w.
2
It is easy to verify that
u · w = 4 × (−19/2) + 5 × (−14) + (−6) × (−18) = 0,
and
v · w = 2 × (−19/2) + (−2) × (−14) + 1/2 × (−18) = 0.
Indeed, the vector w is perpendicular to both u and v.
Any vector v in an n-dimensional vector space V n can be written as a
combination of a set of n independent basis vectors or orthogonal spanning
vectors e1 , e2 , ..., en , so that
n
v = v1 e1 + v2 e2 + ... + vn en = vi ei , (1.50)
i=1
Mathematical Foundations 15
the application of the operator ∇ can lead to either a scalar field or vector
field depending on how the del operator is applied to the vector field. The
divergence of a vector field is the dot product of the del operator ∇ and u
∂u1 ∂u2 ∂u3
div u ≡ ∇ · u = + + , (1.63)
∂x ∂y ∂z
and the curl of u is the cross product of the del operator and the vector
field u
i j k
∂ ∂ ∂
curl u ≡ ∇ × u = ∂x ∂y ∂z . (1.64)
u u u
1 2 3
Δφ ≡ ∇2 φ = 0. (1.66)
Mathematical Foundations 17
and
⎛ ⎞
a11 dt a12 dt
Adt = ⎝ ⎠. (1.75)
a21 dt a22 dt
A diagonal matrix A is a square matrix whose every entry off the main
diagonal is zero (aij = 0 if i = j). Its diagonal elements or entries may or
may not have zeros. In general, it can be written as
⎛ ⎞
d1 0 ... 0
⎜ 0 d2 ... 0 ⎟
⎜ ⎟
D=⎜ .. ⎟. (1.76)
⎝ . ⎠
0 0 ... dn
For example, the matrix
⎛ ⎞
1 0 0
I = ⎝0 1 0⎠ (1.77)
0 0 1
is a 3 × 3 identity or unitary matrix. In general, we have
AI = IA = A. (1.78)
A zero or null matrix 0 is a matrix with all of its elements being zero.
There are three important matrices: lower (upper) triangular matrix,
tridiagonal matrix, and augmented matrix, and they are important in the
solution of linear equations. A tridiagonal matrix often arises naturally
from the finite difference and finite volume discretization of partial differ-
ential equations, and it can in general be written as
⎛ ⎞
b1 c1 0 0 ... 0 0
⎜ a2 b2 c2 0 ... 0 0 ⎟
⎜ ⎟
Q = ⎜ 0 a3 b3 c3 ... 0 0 ⎟.
⎜ ⎟
(1.79)
⎜ . . . ⎟
⎝ .. .. .. ⎠
0 0 0 0 ... an bn
An augmented matrix is formed by two matrices with the same number
of rows. For example, the following system of linear equations
a11 u1 + a12 u2 + a13 u3 = b1 ,
Mathematical Foundations 19
where L and U are lower and upper matrices with all the diagonal entries
being unity, and D is a diagonal matrix. On the other hand, the LUP
decomposition can be expressed as
A = LU P , or A = P LU , (1.88)
where P is a permutation matrix which is a square matrix and has exactly
one entry 1 in each column and each row with 0’s elsewhere. However, most
numerical libraries and software packages use the following LUP decompo-
sition
P A = LU , (1.89)
which makes it easier to decompose some matrices. However, the require-
ment for LU decompositions is relatively strict. An invertible matrix A
has an LU decomposition provided that the determinants of all its diagonal
minors or leading submatrices are not zeros.
A simpler way of decomposing a square matrix A for solving a system
of linear equations is to write
A = D + L + U, (1.90)
where D is a diagonal matrix. L and U are the strictly lower and upper
triangular matrices without diagonal elements, respectively. This decom-
position is much simpler to implement than the LU decomposition because
there is no multiplication involved here.
Mathematical Foundations 21
and
3
D12 = A1j Bj2 = 4 × 3 + 5 × (−2) + 0 × 2 = 2.
j=1
Mathematical Foundations 23
1
u2 = (b2 − α21 u1 ),
α22
November 3, 2014 11:34 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 24
i−1
1
ui = (bi − αij uj ), (1.104)
αii j=1
1 2
=I +A+ A + ..., (1.108)
2!
where I is an identity matrix with the same size as A, and A2 = AA and
so on. This (rather odd) definition in fact provides a method of calculating
November 3, 2014 11:34 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 25
Mathematical Foundations 25
the matrix exponential. The matrix exponentials are very useful in solving
systems of differential equations.
we have
⎛1 t+a 1 t+a ⎞
2 (e + et−a ) 2 (e − et−a )
eB = ⎝ ⎠.
1 t+a t−a 1 t+a t−a
2 (e −e ) 2 (e +e )
∞
(−1)n−1 n 1 1
ln(I + A) ≡ A = A − A2 + A3 + ..., (1.110)
n=1
n! 2 3
d tA
(e ) = AetA = etA A, (1.112)
dt
The matrices we have discussed so far are real matrices because all their
elements are real. In general, the entries or elements of a matrix can be
complex numbers, and the matrix becomes a complex matrix. For a matrix
A, its complex conjugate A∗ is obtained by taking the complex conjugate
of each of its elements. The Hermitian conjugate A† is obtained by taking
the transpose of its complex conjugate matrix. That is to say, for
⎛ ⎞
a11 a12 ...
A = ⎝ a21 a21 ... ⎠, (1.114)
... ... ...
we have
⎛ ⎞
a∗11 a∗12 ...
A∗ = ⎝ a∗21 a∗22 ... ⎠, (1.115)
... ... ...
and
⎛ ⎞
a∗11 a∗21 ...
A† = (A∗ )T = (AT )∗ = ⎝ a∗12 a22 ... ⎠. (1.116)
... ... ...
A square matrix A is called orthogonal if and only if A−1 = AT . If a
square matrix A satisfies A∗ = A, it is called an Hermitian matrix. It is
an anti-Hermitian matrix if A∗ = −A. If the Hermitian matrix of a square
matrix A is equal to the inverse of the matrix (or A† = A−1 ), it is called
a unitary matrix.
Mathematical Foundations 27
whose solutions are λ1 = 7 and λ2 = −3 (see the next section for further
details). Their corresponding eigenvectors are
√ √
−√ 2/2 √2/2
v1 = , v2 = .
2/2 2/2
We can see that v 1 ·v 2 = 0, which means that these two eigenvectors are or-
thogonal. Writing the quadratic form in terms of the intrinsic coordinates,
we have
ψ(q) = 7Q21 − 3Q22 .
Furthermore, if we assume ψ(q) = 1 as a simple constraint, then the equa-
tion 7Q21 − 3Q22 = 1 corresponds to a hyperbola.
Mathematical Foundations 29
or
a11 − λ a12 ... a1n
a21 a22 − λ ... a2n
.. .. = 0, (1.124)
. .
a an2 ... ann − λ
n1
and
n
det(A) = |A| = λi . (1.127)
i=1
We have
(4 − λ)(−3 − λ) − 18 = (λ − 6)(λ + 5) = 0.
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 30
For any real square matrix A with the eigenvalues λi = eig(A), the
eigenvalues of αA are αλi where α = 0 ∈ ℜ. This property becomes handy
when rescaling the matrices in some iteration formulae so that the rescaled
scheme becomes more stable. This is also the major reason why the pivoting
and removing/rescaling of exceptionally large elements works.
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 31
Mathematical Foundations 31
Chapter 2
33
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 34
For n = 1000, then the bubble algorithm will need about O(n2 ) ≈ O(106 )
calculations, while the quicksort only requires O(n log n) ≈ O(3 × 103 )
calculations (at least two orders less in this simple case).
and other quantities. In addition, the Hessian matrices are often used
in optimization while the spectral radius of a matrix is widely used in
the stability analysis of an iteration procedure. We will now review these
fundamental concepts.
n
vp = ( |vi |p )1/p , (2.1)
i=1
where p is a positive integer. From this definition, it is straightforward
to show that the p-norm satisfies the following conditions: v ≥ 0 for
all v, and v = 0 if and only if v = 0. This is the non-negativeness
condition. In addition, for any real number α, we have the scaling condition:
αv = αv.
The three most common norms are one-, two- and infinity-norms when
p = 1, 2, and ∞, respectively. For p = 1, the one-norm is just the simple
sum of the absolute value of each component |vi |, while the 2-norm (or
two-norm) v2 for p = 2 is the standard Euclidean norm because v2 is
the length of the vector v
√
v2 = v · v = v12 + v22 + ... + vn2 , (2.2)
where the notation u · v is the inner product of two vectors u and v.
For the special case p = ∞, we denote vmax as the maximum absolute
value of all the components vi , or vmax ≡ max |vi | = max(|v1 |, |v2 |, ..., |vn |).
n n
vi p 1/p
v∞ = lim ( |vi |p )1/p = lim vmax
p
| |
p→∞
i=1
p→∞
i=1
vmax
n
p 1 vi p p1 vi p 1p
= lim (vmax )p ( | | ) = vmax lim ( | | ) . (2.3)
p→∞ vmax p→∞
i=1
vmax
Since |vi /vmax | ≤ 1 and for all terms |vi /vmax | < 1, we have
|vi /vmax |p → 0, for p → ∞.
Thus, the only non-zero term in the sum is the one with |vi /vmax | = 1,
which means that
n
lim |vi /vmax |p = 1. (2.4)
p→∞
i=1
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 36
−2 + 2 0
√
and its corresponding norms are u + v1 = 9, u + v2 = 29 and
u + v∞ = 4. It is straightforward to check that
u + v1 = 9 < 12 + 5 = u1 + v1 ,
√ √
u + v2 = 29 < 42 + 3 = u2 + v2 ,
and
u + v∞ = 4 < 5 + 4 = u∞ + v∞ .
Matrices are the extension of vectors, so we can define the corresponding
norms. For an m × n matrix A = [aij ], a simple way to extend the norms is
to use the fact that Au is a vector for any vector u = 1. So the p-norm
is defined as
m n
Ap = ( |aij |p )1/p . (2.7)
i=1 j=1
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 37
Alternatively, we can consider that all the elements or entries aij form
a vector. A popular norm, called Frobenius form (also called the Hilbert-
Schmidt norm), is defined as
m
n 1/2
AF = a2ij . (2.8)
i=1 j=1
is the maximum of the absolute row sum. The max norm is defined as
Amax = max{|aij |}. (2.11)
From the definitions of these norms, we know that they satisfy the non-
negativeness condition A ≥ 0, the scaling condition αA = |α|A, and
the triangle inequality A + B ≤ A + B.
2 3
Example 2.2: For the matrix A = , it is easy to calculate that
4 −5
√
AF = A2 = 22 + 32 + 42 + (−5)2 = 54,
⎡ ⎤
|2| + |3|
A∞ = max ⎣ ⎦ = 9,
|4| + | − 5|
and Amax = 5.
and then denote Ωi as the circle, |z − aii | ≤ ri , centred at aii with a radius
ri in the complex plane z ∈ C. Such circles are often called Gerschgorin’s
circles or discs.
Since the eigenvalues λi (counting the multiplicity of roots) and their
corresponding eigenvectors u(i) are determined by
Au(i) = λi u(i) , (2.13)
(i)
for all i = 1, 2, ..., n, we have each component uk (k = 1, 2, ..., n) satisfying
n
(i) (i)
akj uj = λk uk , (2.14)
j=1
(i) (i) (i) (i)
where u(i) = (u1 , u2 , ..., un )T and uj is the j-th component of the
vector u(i) . Furthermore, we also define the largest absolute component of
u(i) (or its infinity norm) as
(i)
|u(i) | = u(i) ∞ = max {|uj |}. (2.15)
1≤j≤n
As the length of an eigenvector is not zero, we get |u(i) | > 0. We now have
(i)
(i) (i)
akk uk + akj uj = λk uk , (2.16)
j=k
s 2
❝ ❝ ❝ s
−5 0 5
s −2
−4
|λ − 2| ≤ r1 = |2| + |0| = 2,
|λ − 0| ≤ |4| + |0| = 4.
λi = 4, −2 + 2i, −2 − 2i,
which are marked as solid dots in the same figure. We can see that all these
eigenvalues λi are within the union of all the Greschgorin discs.
October 29, 2014 11:19 BC: 9404 – Intro to Comp Maths 2nd Ed. Yang-Comp-Maths2014 page 40
⎛ ⎞⎛ ⎞⎛ ⎞
3/5 4/5 0 20 0 0 3/5 4/5 0
= ⎝ 4/5 −3/5 0 ⎠⎝ 0 −5 0 ⎠⎝ 4/5 −3/5 0 ⎠.
0 0 1 0 0 30 0 0 1
In addition, the inverse of A can be obtained by
⎛ ⎞
−11 12 0
1 ⎝
A−1 = QΛ−1 QT = 12 −4 0 ⎠.
100
0 0 10/3
Since the determinant of A has the property
det(A) = det(Q) det(Λ) det(Q−1 ) = det(Λ), (2.32)
it is not necessary to normalize the eigenvectors. In fact, if we repeat the
above example with eigenvectors
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
3 4 0
u1 = ⎝ 4 ⎠, u2 = ⎝ −3 ⎠, u3 = ⎝ 0 ⎠, (2.33)
0 0 2
the eigendecomposition will lead to the same matrix A.
Another important question is how to construct a matrix for a given set
of eigenvalues and their corresponding eigenvectors (not necessarily mutu-
ally orthogonal). This is basically the reverse of the eigendecomposition
process. Now let us look at another example.
Example 2.6: Let us now calculate the spectral radius of the following
matrix
⎛ ⎞
−5 1/2 1/2
A = ⎝ 0 −1 −2 ⎠.
1 0 −3/2
From Gerschgorin’s theorem, we know that
|λ − (−5)| ≤ |1/2| + |1/2| = 1, |λ − (−1)| ≤ 2, |λ − (−3/2)| ≤ 1.
These Gerschgorin discs are shown in Figure 2.2. Since two discs (Ω2 and
Ω3 ) form a connected region (S), which means that there are exactly two
eigenvalues inside this connected region, and there is a single eigenvalue
inside the isolated disc (Ω1 ) centred at (−5, 0).
Ω1 1
Ω3 s
s❝ ❝ ❝
−6 −4 −2 s
−1
ρ(A) Ω2
−2
which is also shown in the same figure. Indeed, there are two eigenvalues
(λ2 and λ3 ) inside the connected region (S) and there is a single (λ1 ) inside
the isolated disc (Ω1 ).
The spectral radius is very useful in determining whether an iteration
algorithm is stable or not. Most iteration schemes can be written as
u(n+1) = Au(n) + b, (2.38)
where b is a known column vector and A is a square matrix with known
coefficients. The iterations start from an initial guess u(0) (often, set u(0) =
0), and proceed to the approximate solution u(n+1) . For the iteration
procedure to be stable, it requires that
ρ(A) ≤ 1.
If ρ(A) > 1, then the algorithm will not be stable and any initial errors
will be amplified in each iteration.
In the case of A is a lower (or upper) matrix
⎛ ⎞
a11 0 ... 0
⎜ a21 a22 ... 0 ⎟
⎜ ⎟
A=⎜ . .. ⎟, (2.39)
⎝ .. . ⎠
an1 an2 ... ann
then its eigenvalues are the diagonal entries: a11 , a22 , ..., ann . In addition,
the determinant of the triangular matrix A is simply the product of its
diagonal entries. That is
n
det(A) = |A| = aii = a11 a22 ...ann . (2.40)
i=1
2.7 Convexity
Knowing the properties of a function can be useful for finding the maxi-
mum or minimum of the function. In fact, in mathematical optimization,
nonlinear problems are often classified according to the convexity of the
defining function(s). Geometrically speaking, an object is convex if for any
two points within the object, every point on the straight line segment join-
ing them is also within the object. Examples are a solid ball, a cube or
a pyramid. Obviously, a hollow object is not convex. Three examples are
given in Fig. 2.3.
(a) (b)
f (x)
✻
rA
❍❍
❍❍ Q
❍❍
❍ B
❍
❍r
P
✲
0 ✛ Lβ ✲✛ Lα ✲ x
Fig. 2.4 Convexity of a function f (x). Chord AB lies above the curve segment joining
A and B. For any point P , we have Lα = αL, Lβ = βL and L = |xB − xA |.
Chapter 3
x2 y ′′ + a1 xy ′ + a0 y = 0, (3.3)
51