Outline
1 Numerical Linear Algebra
LU Factorization
Solving Ax = b
Solving Ax = b is central to scientific computing. It is also needed in:
• Kernel Ridge Regression
• Second order optimization methods
Steps to solve Ax = b:
• Factor the given matrix as follows:
A = LU,
where L is lower and U is upper triangular matrices.
• Solve Ax = b by solving LU x = b in following steps:
1 Solve Ly = b called forward sweep
2 Solve U x = y called backward sweep
Forward and Backward Sweeps
Forward Sweep:
ℓ11 0 ··· 0 y b
1 1
.. y
ℓ21
ℓ22 ··· . 2 b2
. . =.
. ..
.
.. ..
. ··· 0
ℓn1 ··· ··· ℓ yn bn
nn
Forward and Backward Sweeps
Backward Sweep:
u11 u12 ··· u1n x y
1 1
.. x
0
u22 ··· . 2 y2
. . = .
. .. .. .. ..
. ··· . .
0 ··· 0 unn xn yn
Algorithms for Forward and Backward Sweeps
Algorithm for forward sweep:
Algorithm for backward sweep:
Algebra of Triangular Matrices
1 Inverse of an upper (lower) triangular matrix is upper (lower) triangular
2 Product of two upper (lower) triangular matrices is upper (lower) triangular
3 Inverse of an unit upper (lower) triangular matrix is a unit upper (lower)
triangular
4 Product of two unit upper (lower) triangular matrices is a unit upper
(lower) triangular
Solving simultaneous linear systems: Algebraic way
Find x1 and x2 such that
3x1 + 5x2 = 9
6x1 + 7x2 = 4
Elementary Row Operations
1 Row switching: Ri ↔ Rj
2 Row multiplication: Ri ← kRi
3 Row addition: Ri ← Ri + kRj
Gauss Transforms
What is τ ?
1 0 v1 v
= 1
−τ 1 v2 0
More generally, which matrix to multiply on the left to create zeros below vk ?
v1 v1
.. ..
. .
vk vk
vk+1 = 0
. .
.. ..
vn 0
Gauss Transforms
Suppose v ∈ Rn with vk ̸= 0. If
vi
τ T = [0, . . . , 0, τk+1 , . . . , τn ], τi = , i = k + 1 : n,
vk
Define: Mk = In − τ eTk , then
1 ··· 0 0 ··· 0 v1 v1
.. .. .. .. .. .. ..
.
. . . . . .
0 1 0 0 vk
= vk
Mk v = 0 −τk+1 0 0 vk+1
0
. .. .. .. .. .. ..
.. ..
. . . . . . .
0 ··· −τn 0 ··· 1 vn 0
Upper Triangularizing a Matrix
1 4 7
A = 2 5 8
3 6 10
1 Make zeros below the diagonal of 1st column:
1 0 0 1 4 7
M1 = −2 1 0 =⇒ M1 A = 0 −3 −6
−3 0 1 0 −6 −11
2 Make zeros below the diagonal of the above matrix
1 0 0 1 4 7
M2 = 0 1 0 =⇒ M2 M1 A = 0 −3 −6
0 −2 1 0 0 1
Remarks on upper triangularization
1 At the start of the kth loop we have a matrix
A(k−1) = Mk−1 · · · M1 A
that is upper triangular in columns 1 through k − 1
2 The multipliers in the kth Gauss transform Mk are based on
(k−1)
A(k−1) (k + 1 : n, k) and akk must be non-zero in order to proceed
Solving simultaneous linear systems: Matrix view
3 5 x1 9
=
6 7 x2 4
Idea: Keep making zeros below the main diagonal. Then matrix becomes upper
triangular, which can be solved using backward sweep.
Existence of LU factorization
If no zero pivots are encountered, then Gauss transforms M1 , . . . , Mn−1 are
generated such that
Mn−1 · · · M1 A = U,
is upper triangular.
If Mk = In − τ (k) eTk , then Mk−1 = In + τ (k) eTk , and
A = LU,
where
L = M1−1 · · · Mn−1
−1
LU Factorization
Theorem
If A ∈ Rn×n and det(A(1 : k, 1 : k)) ̸= 0 for k = 1 : n − 1, then there exists a unit
lower trianguar L ∈ Rn×n and an upper triangular U ∈ Rn×n such that A = LU. If
this is the case and A is nonsingular, then the factorization is unique and
det(A) = u11 u22 · · · unn .
Simplify L
• We have
L = M1−1 · · · Mn−1
−1
• Construction of L is not complicated
L = M1−1 · · · Mn−1
−1
= (In − τ (1) eT1 )−1 · · · (In − τ (n−1) eTn−1 )−1
= (In + τ (1) eT1 ) · · · (In + τ (n−1) eTn−1 )
• Here τ k = [0, · · · , 0, τ k+1 , · · · , τ n ]T .
• Have a look at “mix" terms:
τ (i) eTi τ (j) eTj
Simplify L
• Does these “mix" terms:
τ (i) eTi τ (j) eTj
survive? They dont. Exercise!
• We have simplified L
n−1
X
L = In + τ (k) eTk , L(k + 1 : n, k) = τ (k) (k + 1 : n)
k=1
Practical Implementation
1 It is enough to update A(k + 1 : n, k + 1 : n)
2 We can overwrite A(k + 1 : n, k) with L(k + 1 : n, k)
Steps:
LU Algorithm
1: for k = 1 to n − 1 do
2: A(k + 1 : n, k) ← A(k + 1 : n, k)/A(k, k)
3: for i = k + 1 to n do
4: for j = k + 1 to n do
5: A(i, j) ← A(i, j) − A(i, k) · A(k, j)
6: end for
7: end for
8: end for
Vectorize j th loop.
LU Algorithm: After Vectorization of jth loop
1: for k = 1 to n − 1 do
2: A(k + 1 : n, k) ← A(k + 1 : n, k)/A(k, k)
3: for i = k + 1 to n do
4: A(i, k + 1 : n) ← A(i, k + 1 : n) − A(i, k) · A(k, k + 1 : n)
5: end for
6: end for
Eigenvalues and Eigenvectors
A ∈ Rn×n . A vector v ∈ Rn , v ̸= 0 is called an eigenvector of A, if there exists a
λ ∈ C such that
Av = λv
Here λ is called the eigenvalue of A
• The pair (λ, v) is called an eigenpair of A
• Each eigenvector has unique eigenvalue associated with it
• Each eigenvalue is associated with many eigenvectors
• Set of all eigenvalues of A is called the spectrum of A
Facts about Eigenvalues and Eigenvectors
• λ is an eigenvalue of A if and only if
det(λI − A) = 0
• The above equation is called characteristic equation of A
• Useful theoretical device, but of little value for computing eigenvalues
• Not hard to see that det(λI − A) = 0 is a polynomial of degree n
• Here det(λI − A) = 0 is called characteristic polynomial of A
Computing Eigenvalues and Eigenvectors
• Eigenvalue problem and problem of finding root is equivalent
• (Abel) No general formula for the roots of equation of degree > 4
• Hence no general formula for computing eigenvalues for n > 4
Division of numerical methods:
• Direct: Result in finite number of steps. Examples: LU, QR
• Iterative: Produces sequence of approximations towards the required result
Power method and extensions
Assume:
• A ∈ Rn×n
• A is semi-simple: A has n linearly independent eigenvectors, which forms
the basis of Rn
• Eigenvalues are ordered: |λ1 | ≥ |λ2 | ≥ · · · ≥ |λn |
• λ1 is called dominant eigenvalue
Power Method: If A has a dominant eigenvalue, then we can find it and an
associated eigenvector.
Power Method: Basic Idea
Idea: Generate the following sequence
q, Aq, A2 q, · · ·
Claim: The above sequence converges to largest eigenvector of A regardless of
the initial vector q. Why?
Power Method Finds the Largest Eigenvector
Proof: Given a vector q, since v1 , v2 , · · · , vn forma a basis for Rn , there exists
constants c1 , · · · , cn such that
q = c1 v1 + c2 v2 + · · · + cn vn
In general c1 will be non-zero. Multiplying q by A, we have
Aq = c1 Av1 + c2 Av2 + · · · + cn Avn
= c1 λ1 v1 + c2 λ2 v2 + · · · + cn λn vn
A q = c1 λ21 v1 + c2 λ22 v2 + · · · + cn λ2n vn
2
Aj q = c1 λj1 v1 + c2 λj2 v2 + · · · + cn λjn vn
= λj1 (c1 v1 + c2 (λ2 /λ1 )j v2 + · · · + cn (λn /λ1 )j vn )
Second term onwards goes to zero as j → ∞
Power Method Algorithm
Let qj = Aj q/λj1 , then qj → c1 v1 as j → ∞. We have
∥qj − c1 v1 ∥ = ∥c2 (λ2 /λ1 )j v2 + · · · + cn (λn /λ1 )j vn ∥
≤ |c2 ||λ2 /λ1 |j ∥v2 ∥ + · · · + |cn ||λn /λ1 |j ∥vn ∥
≤ (|c2 |∥v2 ∥ + · · · + |cn |∥vn ∥)|λ2 /λ1 |j
Note: We used |λi | ≤ |λ2 ||, i = 3, · · · , n.
Let C = |c2 |∥v2 ∥ + · · · + |cn |∥vn ∥, we have
∥qj − c1 v1 ∥ ≤ C|λ2 /λ1 |j , j = 1, 2, 3, . . .
Clearly, since, |λ1 | > |λ2 |, it follows that
|λ2 /λ1 | → 0 as j → ∞
Algorithm: Power Method
Find largest eigenvector of A
1: Choose a random vector q
2: for i = 1, 2, . . . do
Aqi
3: qi+1 ← ∥Aq i ∥∞
4: if ∥qi+1 − qi ∥ ≤ tol then
5: break
6: end if
7: end for
• Flops: 2n2 assuming A is not a sparse matrix
• Flops for sparse matrices is considerably less
• If A has five non-zero entries per row, then cost of Aqj is only 10n Flops
Inverse Iteration and Shift-and-Invert Strategy
Assumption: Let A ∈ Rn×n is semisimple with linearly independent eigenvectors
v1 , · · · , vn and associated eigenvalues λ1 , · · · , λn , arranged in descending order.
Fact
If A is non-singular, then all the eigenvalues of A are non-zeros. Show that if v is
an eigenvector of A associated with eigenvalue λ, then v is also an eigenvector
of A−1 associated with eigenvalue λ−1 .
Proof in class.
Inverse Iteration
Find smallest eigenvector of A
Key Idea: Smallest eigenvector of A is the largest eigenvector of A−1
1: Choose a random vector q, tolerance tol
2: for i = 1, 2, . . . do
3: qi+1 ← A−1 qi /∥Aqi ∥∞
4: if ∥qi+1 − qi ∥ ≤ tol then
5: break
6: end if
7: end for
• Only change compared to power method is in line 3.
Towards Shift-and-Invert Iteration
Fact
Let ρ ∈ R. Show that v is an eigenvector of A with eigenvalue λ, then v is also an
eigenvector of A − ρI with eigenvalue λ − ρ.
Proof in class.
Shift-and-Invert Idea
• Let λ1 ≥ λ2 ≥ · · · λn be the eigenvalues of A
• A − ρI has eigenvalues λ1 − ρ, λ2 − ρ, . . . , λn − ρ, here ρ is the shift
• To find eigenvector corresponding to eigenvalue λi , choose a shift, such
that the smallest eigenvalue of A − ρI is λi − ρ
• Now apply inverse power method to find the smallest eigenvalue δi = λi − ρ
of A − ρI
• The ith eigenvalue λi of A is δi + ρ
• How to guess ρ?
• Does Greshgorin theorem1 help? What is that?
1 https://en.wikipedia.org/wiki/Gershgorin_circle_theorem
Rayleigh Quotient Iteration
Idea: Use Rayleigh quotient as a shift for the next iteration.
1: Choose a random vector q
2: for i = 1, 2, . . . do
q ∗ Aqi
3: ρi ← i ∗
qi qi
4: Solve (A − ρi I)q̂i+1 = qi
−1
5: qi+1 ← σi+1 q̂i+1
6: if ∥qi+1 − qi ∥ ≤ tol then
7: break
8: end if
9: end for
• σj is a suitable scaling factor