0% found this document useful (0 votes)
10 views

Nonlinear Spring21

Uploaded by

a0906699275
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Nonlinear Spring21

Uploaded by

a0906699275
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Solution of Nonlinear Equations

Feng-Nan Hwang

Department of Mathematics
National Central University,
Jhongli District, Taoyuan City, Taiwan
Email: [email protected]
Homepage: http:/www.math.ncu.edu.tw/~hwangf

References:
Chapter 3 of the textbook

1 / 45
Introduction

I For a single equation, we are interested in finding the roots or


(zeros) of f (x) = 0, where f (x) is a given function of real x.
I For systems,
F (X ) = 0,
where X = (x1 , x2 , ..., xn )T .
I Example applications.
I In the theory of diffraction of light, we need to find the roots
of the equation
x − tan x = 0
I In the calculation of planetary orbits, we need to the roots of
Kepler’s equation
x − a sin x = b
for different values of a and b.

2 / 45
I This is not an easy task, in general. Let us look at three
equations
I f (x) = x 4 − 12x 3 + 47x 2 − 60x
I f (x) = x 4 − 12x 3 + 47x 2 − 60x + 24
I f (x) = x 4 − 12x 3 + 47x 2 − 60x + 24.1
I The first function has roots 0, 3, 4, and 5
I The real roots of the second function are 1 and 0.888...
I The third function has no real roots at all

3 / 45
I The basic questions:
1. Does the solution exist?
2. Is the solution unique?
3. How to find it?
I In this class, we don’t try to answer (1) and (2) — Too Hard!
I We assume that the problem has a unique solution and focus
only on question (3)
I We will study iterative methods for finding the solution: First
find an initial guess x0 , then a better guess x1 , . . . , in the end
we hope lim xn = x ∗
n→∞

4 / 45
Convergence rate

Two types of convergence rate: q-convergence and r-convergence


I q-convergence

kxn+1 − x ∗ k ≤ C kxn − x ∗ kp

I if p = 2, we say the convergence is q-quadratic


I if 1 < p < 2, we say the convergence is q-superlinear
I if p = 1, we say the convergence is linear and in this case we
need C < 1
I r-convergence
kxn − x ∗ k ≤ C ξn
and ξn converges to zero q-(quadratically, superlinearly, etc)

5 / 45
Section 3.1 Bisection method

6 / 45
Bisection method

I An observation: If f (x) is a continuous function on an interval


[a, b], and f (a) and f (b) have different signs (f (a)f (b) < 0),
then f (x) must have a root in the interval
I The basic algorithm: Compute c = (a + b)/2. If f (a)f (c) < 0,
then the root is in [a, c]; otherwise the root is in [c, b]. In
either case, a new interval containing the root is produced,
and the size of the new interval is half of the original one
Repeat the basic algorithm until the interval is very small then
any point in the interval can be used as approximations of the
root
I Intermediate-Value Theorem for Continuous Functions:
If f is continuous on [a, b], and if f (a) < y < f (b), then
f (x) = y for some x ∈ (a, b).

7 / 45
An example: Use the bisection method to find the root of
e x = sin(x)
A rough plot of f (x) = e x − sin(x) shows there is no positive root,
and the first root to the left of 0 is somewhere in the interval
[−4, −3].
f(x)=exp(x)−sin(x)
2

1.5

0.5

−0.5

−1
−5 −4 −3 −2 −1 0 1
x

8 / 45
The output obtained by bisection algorithm running a machine
similar to Marc-32

k c f (c)
1 -3.5000 -0.321
2 -3.2500 −0.694 × 10−1
3 -3.1250 −0.605 × 10−1
4 -3.1875 −0.625 × 10−1
.. .. ..
. . .
13 -3.1829 0.122 × 10−3
14 -3.1830 0.193 × 10−4
15 -3.1831 −0.124 × 10−4
16 -3.1831 0.345 × 10−5

9 / 45
I How do we implement the algorithm? What do we need?
I We need an initial interval. This is often the hardest thing to
find.
I We need some stopping conditions.
I If |f (c)| ≤ , we stop
I If k ≥ M, we stop, to avoid infinite loop
I If |b − a| ≤ δ, we stop

10 / 45
A pseudocode for the bisection algorithm
input a, b, M, δ, 
u ← f (a), v ← f (b), and e ← b − a
if sign(u) = sign(v ), then stop
for k = 1 to M do
e ← e/2, c ← a + e, w ← f (c)
if |e| < δ or |w | < , then stop
if sign(u) 6= sign(w ), then
b ← c, v ← w
else
a ← c, u ← w
end if
end do
Note 1: sign(u) 6= sign(v ) is better than if( uv < 0 ). Why?
Note 2: Compute midpoint as c ← a + (b − a)/2 rather than
c ← (a + b)/2. Why?

11 / 45
Convergence analysis
I We have three sequences of numbers: left point an , mid point
cn , and right point bn , with n = 0, 1, . . . , and they satisfy

bn+1 − an+1 = (bn − an )/2

or
bn − an = 2−n (b0 − a0 ),
I Taking the limit, we have

lim (bn − an ) = (b0 − a0 ) lim 2−n = 0,


n→∞ n→∞

no matter what b0 − a0 is. We also have

lim an = lim bn
n→∞ n→∞

I Let us assume the limit is r = limn→∞ an .

12 / 45
I Take the limit of
f (an )f (bn ) ≤ 0
and use the assumption that f (x) is continuous
(limn→∞ f (an ) = f (limn→∞ an )), we obtain

f (r )f (r ) ≤ 0

which implies that


f (r ) = 0
I So, r is the root −→ convergence

13 / 45
How fast is the convergence?

Suppose we use cn as the approximate solution


Since r ∈ (an , bn ), and cn is the midpoint, we always have

|r − cn | ≤ 2−1 (bn − an ) = 2−(n+1) (b0 − a0 )

This estimate can be useful. (ref. Theorem on Bisection Method


pg. 79)

14 / 45
I For example, if we start with the initial interval [50, 63], how
many steps do we need in order to have a relative accuracy to
be smaller than 10−12 ?
I This is what we want

|r − cn |/|r | ≤ 10−12

I Since we know r ≥ 50, thus it is sufficient to have

|r − cn |/50 ≤ 10−12 .

I Using the above estimate, all we need is

2−(n+1) (63 − 50)/50 ≤ 10−12

That means n ≥ 37

15 / 45
Some major problems with the bisection method

I Finding the initial interval is not easy


I Often slow
I Doesn’t work for higher dimensional problems

16 / 45
Section 3.2 Newton’s method

17 / 45
Newton’s method

I Motivation: We know how to solve f (x) = 0 if f is linear. For


nonlinear f , we can always approximate it with a linear
function
I Let x ∗ be the root, and x an approximation of x ∗ . Let
x ∗ = x + h. Using Taylor’s expansion

0 = f (x ∗ ) = f (x + h) = f (x) + hf 0 (x) + O(h2 )

I If h is small, then we can drop the O(h2 ) term and have

0 = f (x) + hf 0 (x)

which means
h = −f (x)/f 0 (x)

18 / 45
I If x is an approximation of x ∗ , then x + h should be a better
approximation of x ∗ . Hence,

x + h = x − f (x)/f 0 (x)
I Newton’s method can be defined as:

x ← x − f (x)/f 0 (x)

or
xn+1 = xn − f (xn )/f 0 (xn )

19 / 45
I Example: Find the root of f (x) = e x − 1.5 − tan−1 x.
Suppose we start with x0 = −7, then the iteration goes like
x0 = −7, f (x0 ) = −0.7 × 10−1
x1 = −10.7, f (x1 ) = −0.2 × 10−1
···
x3 = −14.0, f (x3 ) = −0.2 × 10−3
x4 = −14.1, f (x4 ) = −0.8 × 10−6
I In-class exercise. Find an efficient method for computing
square roots of a given positive number using Newton’s
method.

20 / 45
Geometrical interpretation

See http://www.math.umn.edu/˜garrett/qy/Newton.html

21 / 45
A little informal convergence theory
Define the error as en = xn − x ∗

en+1 = xn+1 − x ∗ = xn − f (xn )/f 0 (xn ) − x ∗

= en − f (xn )/f 0 (xn ) = (en f 0 (xn ) − f (xn ))/f 0 (xn )


Using Taylor’s expansion

0 = f (x ∗ ) ≈ f (xn − en )

= f (xn ) − en f 0 (xn ) + 0.5en2 f 00 (ξn )


Therefore
en+1 = 0.5 f 00 (ξn )/f 0 (xn ) en2 = Cen2
Assuming we are close to the solution, and that

f 00 (ξn )/f 0 (xn ) ≈ f 00 (x ∗ )/f 0 (x ∗ ) = C

22 / 45
So, we obtain

en+1 ≈ Cen2
or more precisely
|en+1 | ≤ C |en2 |

23 / 45
Theorem on Newton’s method

I Let f 00 be continuous and let x ∗ be a unique root of f . Then


there is a neighborhood of x ∗ such that if the initial guess x0
is in this neighborhood then Newton converges and satisfies

|xn+1 − x ∗ | ≤ C |xn − x ∗ |2

I Note: The convergence is q-quadratic!


I Good: The convergence is quadratic
I Bad: The initial guess has to be close to the solution
(something we don’t know)
How bad could this be?

24 / 45
Example: Find the root of f (x) = α − 1/x, for any given α > 0.
(We know the exact solution is x ∗ = 1/α)
I Using Newton’s method:

xn+1 = xn − (α − 1/xn )/(1/xn2 )

which is same as

xn+1 = 2xn − αxn2 , n = 0, 1, 2, . . .

25 / 45
Questions:
I Does the sequence x0 , x1 , x2 , . . . converge?
I How fast?
I Does the convergence depend on the initial guess x0 ?

26 / 45
Let us define the error

en = x ∗ − xn = 1/α − xn

Then

en+1 = 1/α − xn+1


= 1/α − 2xn + αxn2
= α(1/α − xn )2
= αen2

That is
en+1 = αen2
If it converges, then the rate is q-quadratic.

27 / 45
We now have
1 2
en+1 = αen2 = α(αen−1
2
)2 = (αen−1 )2
α
1 n+1
··· = (αe0 )2
α

Therefore, xn converges to x only if |αe0 | < 1

|e0 | ≤ 1/α
or
|1/α − x0 | < 1/α
or
−1/α < 1/α − x0 < 1/α
or
0 < x0 < 2/α
28 / 45
Some remarks

I Newton’s method converges only locally; i.e., the initial guess


has to be close enough to the solution
I The convergence is quadratic
I It needs the first derivative of f (x)
I Newton’s method works for higher dimensional problems

29 / 45
Implementation of Newton’s method

Input x and M
y ← f (x)
for k = 1 to M do
0
x ← x − y /f (x)
y ← f (x)
end do

30 / 45
Stopping conditions

I Using the residual information f (xk )


If |f (xk )| ≤  stop (absolute residual condition)
If |f (xk )| ≤ |f (x0 )| stop (relative residual condition)
I Using the step size information |xk+1 − xk |
If |xk+1 − xk | ≤ δ stop
If |xk+1 − xk |/|xk+1 | ≤ δ stop
I Maximum number of iterations M

31 / 45
A better implementation of Newton’s method

Input x0 , M, ε, δ
v ← f (x0 )
if |v | ≤ ε stop
for k = 1 to M do
0
x1 = x0 − v /f (x0 )
v ← f (x1 )
if |x1 − x0 | ≤ δ or |v | ≤ ε stop
x0 ← x1
end do

32 / 45
I Another condition: If |f 0 (x)| is too small, 1/|f 0 (x)| may
overflow the calculation
I A question: What happens if f 0 (x) is not available? Will
study in the next section.

33 / 45
Systems of nonlinear equation
I We wish to solve

f1 (x1 , x2 ) = 0
f2 (x1 , x2 ) = 0,

f1 and f2 are nonlinear functions of x1 , and x2 .


I Applying Taylor’s expansion in two variables around (x1 , x2 ) to
obtain:
(
∂f1 ∂f1
0 = f1 (x1 + h1 , x2 + h2 ) ≈ f1 (x1 , x2 ) + h1 ∂x1
+ h2 ∂x2
∂f2 ∂f2
0 = f2 (x1 + h1 , x2 + h2 ) ≈ f2 (x1 , x2 ) + h1 ∂x1 + h2 ∂x2

I Putting it into the matrix form


" #
    ∂f1 ∂f1 
0 f1 (x1 , x2 ) ∂x1 ∂x2 h1
= + ∂f2 ∂f2
0 f2 (x1 , x2 ) ∂x1 ∂x2
h2

34 / 45
I To simplify the notation we introduce the Jacobian matrix
" #
∂f1 ∂f1
J= ∂x1 ∂x2
∂f2 ∂f2
∂x1 ∂x2

I Then we have
     
0 f1 (x1 , x2 ) h1
= +J
0 f2 (x1 , x2 ) h2

I If J is nonsingular then there exists J −1 , so we can solve for


[h1 , h2 ]T
   
h1 −1 f1 (x1 , x2 )
= −J
h2 f2 (x1 , x2 )

35 / 45
I Newton’s method in 2D
" # " # " #
(k+1) (k) (k)
x1 x1 h1
(k+1) = (k) + (k)
x2 x2 h2

with " # " #


(k) (k) (k)
h1 f1 (x1 , x2 )
J (k) =− (k) (k)
h2 f2 (x1 , x2 )
I In-class Exercise. Solve the following nonlinear system by
using Newton’s method with the initial guess x (0) = (0, 1)T .
Perform only one iteration.

4x12 − x22

=0
4x1 x22 − x1 = 1

36 / 45
I In general, we can use Newton’s method for

F (X ) = 0

where X = (x1 , x2 , . . . , xn )T and F = (f1 , f2 , . . . , fn )T


I For higher dimensional function, the first derivative is defined
as a matrix
 ∂f1 ∂f1 ∂f1 
∂x1 ∂x2 · · · ∂xn
 ∂f2 ∂f2 · · · ∂f2 
0  ∂x ∂x2 ∂xn 
F (X ) =  . 1 . . .. 
 . . .
. .
. . 
∂fn ∂fn ∂fn
∂x1 ∂x2 ··· ∂xn

37 / 45
I Newton’s method in n dimensional space: Given
(0) (0)
X (0) = [x1 , · · · , xn ]T

X (k+1) = X (k) + H (k)

F 0 (X (k) )H (k) = −F (X (k) ),


which requires the solving of a large linear system of equations
at every iteration
I Four types of operations are involved in Newton’s method
I vector operations (not expensive)
I function evaluation (can be expensive)
I compute the Jacobian (can be expensive)
– topic of the next section
I solving matrix equations (very expensive)
– topic of the next chapter

38 / 45
Section 3.3 Secant method

39 / 45
Methods without using derivatives

I Basic idea:

x ← x − f (x)/f 0 (x)
If f 0 (x) is too hard or too expensive to compute, we can use
an approximation
I Questions:
I How to obtain an approximation?
I Do we lose the fast convergence?
I Two classes of methods
I Finite difference Newton’s method
I Secant method

40 / 45
Finite difference Newton’s method

I Let h be a small parameter, not zero, then

a = (f (x + h) − f (x))/h

can be a good approximation of f 0 (x)


I FD-Newton:
1. Compute a = (f (x + h) − f (x))/h
2. Compute x ← x − f (x)/a
I Remarks:
I The method needs an extra parameter h. What shall we use?
I The method needs two function evaluations per iteration
I What is the convergence rate?

41 / 45
Secant method
I Since h can be any small number in the FD-Newton’s
method, why don’t we simply use

h = xn − xn−1 ,

which may be positive or negative, but usually not zero


I Secant method:
1. Compute a = (f (xn ) − f (xn−1 ))/(xn − xn−1 )
2. Compute xn+1 = xn − f (xn )/a
I Remarks:
I Now we need only one function evaluation per iteration
I xn+1 depends on two previous iterations. For example, to
compute x2 , we need both x1 and x0 .
I How do we obtain x1 ? We need to use FD-Newton: Pick a
small parameter h, compute a0 = (f (x0 + h) − f (x0 ))/h, then
x1 = x0 − f (x0 )/a0

42 / 45
Which of the three methods is better?

An example: f (x) = x 2 − 1, and we take x0 = 2


iteration Newton FD-Newton(h = 10−7 ) Secant
x0 2 2 2
x1 1.25 1.2500000266 1.2500000266
x2 1.025 1.0250000179 1.0769230844
x3 1.000304878 1.0003048001 1.008264464
x4 1.000000046 1.00000004647 1.000304878
x5 1.0 1.0 1.0000012544
x6 1.0000000001
x7 1.0

43 / 45
Convergence rates

I If |hn | ≤ C |xn − x ∗ |, then the convergence of FD-Newton is


quadratic
I The convergence of secant method is q-superlinear. More
precisely √
|en+1 | ≤ C |en |(1+ 5)/2

(1 + 5)/2 ≈ 1.62 < 2
I Remark: When selecting algorithms for a particular problem,
one should consider not only the rate of convergence, but also
the cost of computing f and f 0

44 / 45
More remarks

I The three methods (bisection, Newton, and secant) illustrate


a common phenomenon is scientific computing: the trade-off
between speed and reliability (robustness).
I Speed is directly related to computing cost.
I An algorithm is robust if it is able to cope with a wide variety
of different numerical situations with the intervention of the
user.
I Bisection (and some other algorithms) is global, and all the
other Newton type algorithms are local.
I Local algorithms are often fast, and global algorithms are
often slow.

45 / 45

You might also like