0% found this document useful (0 votes)
42 views

2.4 Solving Systems of Linear Equations

- Line search methods find a descent direction (like Newton) but then perform a 1D search along that direction to ensure sufficient decrease in the objective function. This guarantees progress at each step and avoids divergence issues seen in pure Newton's method. - The descent principle and line search rules like Armijo ensure the objective decreases at each step so the algorithm cannot diverge if the function is bounded below. - With a suitable line search, convergence properties of Newton's method can be maintained while avoiding divergence problems.

Uploaded by

Xrebelzane
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

2.4 Solving Systems of Linear Equations

- Line search methods find a descent direction (like Newton) but then perform a 1D search along that direction to ensure sufficient decrease in the objective function. This guarantees progress at each step and avoids divergence issues seen in pure Newton's method. - The descent principle and line search rules like Armijo ensure the objective decreases at each step so the algorithm cannot diverge if the function is bounded below. - With a suitable line search, convergence properties of Newton's method can be maintained while avoiding divergence problems.

Uploaded by

Xrebelzane
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

2.

4 SOLVING SYSTEMS OF
LINEAR EQUATIONS

L and U Matrices
Lower Triangular Matrix

l 11
l
21

[ L ]=
l 31

l 41
Upper Triangular Matrix

0
l 22
l 32
l 42

0
0
l 33
0

l 34 l 44
0

u11 u12 u13 u13


0 u

u
u
22
23
24

[U ]=
0
0 u33 u34

0
0 u44
0

LU Decomposition for Ax=b


LU decomposition / factorization
[A]{x}=[L][U]{x}={b}
Forward substitution
[L]{d}={b}
Back substitution
[U]{x}={d}
Q:Why might I do this instead of Gaussian
elimination?

Complexity of LU Decomposition
to solve Ax=b:
decompose A into LU
-- cost 2n3/3 flops
solve Ly=b for y by forw. substitution -- cost n2 flops
solve Ux=y for x by back substitution -- cost n2 flops

slower alternative:
compute A-1
multiply x=A-1b
this costs about 3 times as much as LU

26 Sept. 2000

-- cost 2n3 flops


-- cost 2n2 flops

15-859B - Introduction to Scientific


Computing

Cholesky LU Factorization
If [A] is symmetric and positive definite, it is convenient to
use Cholesky decomposition.
[A] = [L][L]T= [U]T[U]
No pivoting or scaling needed if [A] is symmetric and
positive definite (all eigenvalues are positive)
If [A] is not positive definite, the procedure may encounter
the square root of a negative number
Complexity is that of LU (due to symmetry exploitation)

Cholesky LU Factorization
[A] = [U]T[U]
Recurrence relations
i 1

uii = aii uki2


k =1

i 1

uij =

aij uki ukj


k =1

uii

for j = i + 1, , n

Pivoting in LU Decomposition
Still need pivoting in LU decomposition
(why?)
Messes up order of [L]
What to do?
Need to pivot both [L] and a permutation
matrix [P]
Initialize [P] as identity matrix and pivot
when [A] is pivoted. Also pivot [L]

LU Decomposition with Pivoting


Permutation matrix [ P ]
- permutation of identity matrix [ I ]
Permutation matrix performs bookkeeping associated with
the row exchanges
Permuted matrix [ P ] [ A ]
LU factorization of the permuted matrix
[P][A]=[L][U]
Solution
[ L ] [ U ] {x} = [ P ] {b}

LU-factorization for real symmetric Indefinite matrix A


(constrained optimization has saddle points)
LU factorization

LDLT factorization

where

E cT I

cT
E
A=
= 1

c
B
B cE 1cT

cE I
E cT I
E
I E 1cT
A=
= 1

1 T
cE
I
B

cE
c
c
B
I

I
L = 1
cE

and
I

I
L =

E T cT I
=
I

E 1cT

Question: 1) If A is not singular, can I be guaranteed to find a nonsingular principal block E


after pivoting? Of what size?
2) Why not LU-decomposition?

History of LDL decomposition: 1x1, 2x2 pivoting


diagonal pivoting method with complete pivoting:
Bunch-Parlett, Direct methods fro solving symmetric indefinite
systems of linear equations, SIAM J. Numer. Anal., v. 8, 1971,
pp. 639-655
diagonal pivoting method with partial pivoting:
Bunch-Kaufman, Some Stable Methods for Calculating Inertia and
Solving Symmetric Linear Systems, Mathematics of
Computation, volume 31, number 137, January 1977, page
163-179
DEMOS

2.4 COMPLEXITY OF LINEAR


ALGEBRA; SPARSITY

Complexity of LU Decomposition
to solve Ax=b:
decompose A into LU
-- cost 2n3/3 flops
solve Ly=b for y by forw. substitution -- cost n2 flops
solve Ux=y for x by back substitution -- cost n2 flops

slower alternative:
compute A-1
multiply x=A-1b
this costs about 3 times as much as LU

26 Sept. 2000

-- cost 2n3 flops


-- cost 2n2 flops

15-859B - Introduction to Scientific


Computing

12

Complexity of linear algebra


lesson:
if you see A-1 in a formula, read it as solve a system , not invert a
matrix

Cholesky factorization

-- cost n3/3 flops

LDL

-- cost n3/3 flops

factorization

Q: What is the cost of Cramer s rule (roughly)?

26 Sept. 2000

15-859B - Introduction to Scientific


Computing

13

Sparse Linear Algebra


Suppose you are applying matrix-vector multiply and the matrix
has lots of zero elements
Computation cost? Space requirements?

General sparse matrix representation concepts


Primarily only represent the nonzero data values (nnz)
Auxiliary data structures describe placement of nonzeros in dense
matrix

And *MAYBE* LU or Cholesky can be done in O(nnz), so not


as bad as (O(n^3)); since very oftentimes nnz=O(n)

Sparse Linear Algebra


Because of its phenomenal computational and storage savings
potential, sparse linear algebra is a huge research topic.
VERY difficult to develop.
Matlab implements sparse linear algebra based on i,j,s format.
DEMO
Conclusion: Maybe I can SCALE well Solve O(10^12)
problems in O(10^12).

CS6963

15"
L12: Sparse Linear Algebra

SUMMARY SECTION 2
The heaviest components of numerical software are Numerical
differentiation (AD/DIVDIFF) and linear algebra.
Factorization is always preferable to direct (Gaussian)
elimination.
Keeping track of sparsity in linear algebra can enormously
improve performance.

3.1 FAILURE OF NEWTON


METHODS

Problem definition

min f (x)
f :R ! R
n

- continuously differentiable
- gradient is available
-Hessian is unavailable

Necessary optimality conditions: !f (x * ) = 0


Sufficient optimality conditions:

! 2 f (x * ) ! 0

DEMO

Algorithm: Newton.
Note: not only does the algorithm not converge, the function
values go to infinity.
So we should have known ahead of time we should have done
something else earlier.

Ways of enforcing that thinks do not blow up or


wander
1. Line-search methods.
Make a guess of a good direction.
Make good progress along that direction. At least know you will decrease
f.

2. Trust region model.


Create a quadratic model of the function.
Define a region where we believe trust the model and find a
good point in that region .
If at that point the model is far from f, less trustsmaller region, if not,
more larger region.

3.2 LINE SEARCH METHODS

3.2.1 LINE SEARCH METHODS:


ESSENTIALS

Line Search Methods Idea:


At the current point xk find a Newton-like direction d k
Along that direction d k do 1-dimensional minimization (simpler
than over whole space)

xk +1 ! arg min" f (xk + " dk )

Because the line search always decreases f, we will have an


accumulation point (cannot diverge if bounded below) unlike
Newton proper

Descent Principle
Descent Principle: Carry Out a one-Dimensional Search Along a
Line where I will decrease the function.

g ( ) = f ( x k + p k )

for f ( x k ) ' p k < 0

If this happens, there exists an alpha (why? ) such that.

f ( x k + ! pk ) < f ( x k )

So I will keep making progress.


Typical choice (why)?
Bk pk = !"f (xk );
Newton may need to be modified (why?)

Bk ! 0

Line Search-Armijo
I cannot accept just about ANY decrease, for I may NEVER
converge (why , example of spurious convergence).
IDEA: Accept only decreases PROPORTIONAL TO THE
SQUARE OF GRADIENT. Then I have to converge (since process
stops only when gradient is 0).
Example: Armijo Rule. It uses the concept of BACKTRACKING.

f (xk ) ! f (xk + " # k dk ) $ ! %" # k &f (xk ) dk


m

! "(0,1)

g(0)+ c1g()

! "(0,1/ 2)

g(0)+ g()

Some Theory
Global Convergence:

Fast Convergence:

Newton is accepted by LS

Extensions
Line Search Refinements:
Use interpolation
Wolfe and Goldshtein rule

Other optimization approaches


Steepest descent,
CG .

You might also like