Roots of Equations: 1.0.1 Newton's Method
Roots of Equations: 1.0.1 Newton's Method
x
f
/. x -> x1
The replacement x x
1
in the function f = f (x) is necessary because we are dealing with numerical
values in the calculation. The iteration of this function can be carried out by a special Mathematica
2 Lecture_007.nb
2012 G. Baumann
values in the calculation. The iteration of this function can be carried out by a special Mathematica
function called Nest[] or NestList[]. These function generate a nested expression of the newtonsMethod[]
function and deliver the approximation of the root. For example if we are going to determine one of the
roots of the polynomial
p(x) = x
3
- 4 x
2
+ 5 == 0
defined as
p(x_) := x
3
- 4 x
2
+ 5
whose graph is given in Figure 0.0
-2 -1 0 1 2 3 4
-15
-10
-5
0
5
x
p
(
x
)
Figure 0.0. Graph of the polynomial p(x) = x
3
- 4 x
2
+ 5.
If we apply Newton's Method to this polynomial we get for an initial value x
1
= 1.155 the following list of 7
approximations
res = NestList[newtonsMethod[p[x], ] &, 0.155, 7]
{0.155, 4.357, 3.82396, 3.64124, 3.61838, 3.61803, 3.61803, 3.61803]
The result is a list of approximations of the root starting with the initial value.
Example 0.0. Newton's Method I
Use Newton's Method to find 2
7
.
Solution 0.3. Observe that finding 2
7
is equivalent to finding the positive root of the equation
x
7
- 2 = 0
so we take
p(x_) := x
7
- 2
Lecture_007.nb 3
2012 G. Baumann
and apply Newton's Method to this formula with an appropriate initial value x
1
= 0.85. The iteration of the
method delivers
NestList[newtonsMethod[p[x], ] &, 0.85, 17]
{0.85, 1.48613, 1.30035, 1.17368, 1.11532, 1.10442, 1.10409, 1.10409, 1.10409,
1.10409, 1.10409, 1.10409, 1.10409, 1.10409, 1.10409, 1.10409, 1.10409, 1.10409]
This means that 2
7
= 1.10409 .
Example 0.0. Newton's Method II
Find the solution of the equation cos(x) = x.
Solution 0.4. To apply Newton's method to a function, we are not restricted to polynomials. However, we
can apply this method to any kind of function which allows a first order derivative. As in the current case,
we rewrite the equation as
p(x_) := cos(x) - x
and apply Newton's Method to this expression to get
res = NestList[newtonsMethod[p[x], ] &, 0.155, 7]
{0.155, 0.876609, 0.742689, 0.739088, 0.739085, 0.739085, 0.739085, 0.739085]
The symbolic expression for this iteration can be found by replacing the numerical initial value by a
general symbol as in the next line shown. We use x
1
as a symbol instead of a number.
symbolicNewton = NestList[newtonsMethod[p[x], ] &, x
1
, 2] // Simplify
x
1
,
Cos[x
1
] Sin[x
1
] x
1
1 Sin[x
1
]
,
Cos[x
1
] x
1
1 Sin[x
1
]
x
1
Cos[x
1
] Cos
Cos[x
1
] Sin[x
1
] x
1
1 Sin[x
1
]
(1 Sin[x
1
]) Sin[x
1
] x
1
,
(1 Sin[x
1
]) 1 Sin
Cos[x
1
] Sin[x
1
] x
1
1 Sin[x
1
]
The result is a symbolic representation of the nested application of Newton's method and thus represents
an approximation formula for the root if we insert an initial value x
1
into this formula.
symbolicNewton /. x
1
- 0.155
{0.155, 0.876609, 0.742689]
The symbolic formula delivers the same values for the approximation as expected. However, the sym-
bolic representation of Newton's formula allows us to set up a tree of approximation formulas which can
be efficiently used for different initial values. The advantage of the symbolic approach is that we get a
4 Lecture_007.nb
2012 G. Baumann
be efficiently used for different initial values. The advantage of the symbolic approach is that we get a
formula for the approximation which needs only a single numeric value to get the final answer. There are
for example no rounding errors.
In the following we will examine the error bounds of Newton's method. Let us assume f (x) has at least
two continuous derivatives for all x in some interval about the root a. Further assume that
f' (x) != 0.
This says that the graph of y = f (x) is not tangent to the x-axis when the graph intersects at x = a. Also
note that combining f ' (x) 0 with the continuity of f ' (x) implies that f ' (x) 0 for all x near a.
To estimate the error we use Taylor's theorem to write
f (a) = f (x
n
) + (a - x
n
) f ' (x
n
) +
1
2
(a - x
n
)
2
f '' (c
n
)
where c
n
is an unknown point between a and x
n
. Note that f (a) = 0 by assumption, and then divide f ' (x
n
)
to obtain
0 =
f (x
n
)
f ' (x
n
)
+ a - x
n
+ (a - x
n
)
2
f '' (c
n
)
2 f ' (x
n
)
solving for a - x
n+1
, we have
a - x
n+1
= (a - x
n
)
2
_
-f '' (c
n
)
2 f ' (x
n
)
.
This formula says that the error in x
n+1
is nearly proportional to the square of the error in x
n
. When the
initial error is sufficiently small, this shows that the error in the succeeding iterates will decrease very
rapidly. This formula can also be used to give a formal mathematical proof of the convergence of New-
ton's method.
For the estimation of the error, we are computing a sequence of iterates x
n
, and we would like to esti-
mate their accuracy to know when to stop the iteration. To estimate a - x
n
, we note that, since f (a) = 0,
we have
f (x
n
) = f (x
n
) - f (a) = f ' (x
n
) (x
n
- a)
for some x
n
between x
n
and a, by the mean-value theorem. Solving for the error, we obtain
a - x
n
=
-f (x
n
)
f ' (x
n
)
-f (x
n
)
f ' (x
n
)
provided that x
n
is so close to a that f ' (x
n
) = f ' (x
n
). From Newton's iteration formula this becomes
a - x
n
x
n+1
- x
n
.
This is the standard error estimation formula for Newton's method, and it is usually fairly accurate. The
following function uses this estimation of errors to terminate the iteration.
Lecture_007.nb 5
2012 G. Baumann
newtonsMethod[f_, x1_] := Block_{x1in = x1, xnew, e = 0.00001},
(+ --- generate an infinite loop --- +)
While_0 === 0,
(+ --- Newton's iteration formula --- +)
xnew = x -
f
x
f
/. x -> x1in;
(+ --- check the error related to (4.43) --- +)
If[Abs[xnew - x1in] < e, Return[xnew]];
x1in = N[xnew];
Print["x = ", xnew]
_
_
newtonsMethod]x
6
- x - 1, 1
x
6
5
x 1.14358
x 1.13491
x 1.13472
1.13472
From the discussion above, Newton's method converges more rapidly than the secant method. Thus
Newton's method should require fewer iterations to attain a given error. However, Newton's method
requires two function evaluations per iteration, that of f (x
n
) and f ' (x
n
). And the secant method requires
only one evaluation, f (x
n
), if it is programmed carefully to retain the value of f (x
n-1
) from the previous
iteration. Thus, the secant method will require less time per iteration than the Newton method.
Fixed-Point Method
The Newton method and the secant method are examples of one-point and two-point methods, respec-
tively. In this section, we give a more general introduction to iteration methods, presenting a general
theory for one-point iteration formulas for a single variable.
As a motivational example, consider solving the equation
x
2
- 7 = 0
for the root a = 7 = 2.64575. To find this number we use the same ideas as in Newton's approach to
set up a general iteration formula which can be stated as
x
n+1
= g(x
n
)
To solve the simple problem (0.0) we introduce four iteration schemes for this equation
1. x
n+1
= 7 + x
n
- x
n
2
6 Lecture_007.nb
2012 G. Baumann
2. x
n+1
=
7
x
n
3. x
n+1
= 1 + x
n
-
1
7
x
n
2
4. x
n+1
=
1
2
x
n
+
7
x
n
As stated above the iterations (0.0-0) all have the form (0.0) for appropriate continuous functions g(x).
For example, with (0.0) g(x) = 7 + x - x
2
.
The formulas (0.0-0) are represented by the graphs representing g(x) and its first order derivative,
respectively.
g(x)
g'(x)
2.0 2.2 2.4 2.6 2.8 3.0
-4
-2
0
2
4
x
y
Figure 0.0. Graph of the function g(x) = 7 + x - x
2
and its derivative g' (x) = 1 - 2 x.
We can iterate the function g(x) by using the Mathematica function NestList[] to generate a sequence of
numbers related to the iteration
x
n+1
= 7 + x
n
- x
n
2
which delivers the following result for a specific initial value x
0
= 2.6 as
NestList][7 + -
2
j &, 2.6, 6
{2.6, 2.84, 1.7744, 5.6259, 19.0249, 373.972, 140222.]
Assuming x = 2 and x = 3 as the lower and upper boundary of an interval in which the actual root is
located, we observe that the first order derivative of this function g is larger than one (see Figure 0.0).
The absolute value of the maximum of g' (2) = 3 which is larger than 1. Take this for the moment as an
observation and remember it in the following discussion. The second iteration formula
Lecture_007.nb 7
2012 G. Baumann
x
n+1
=
7
x
n
is also graphically represented on the interval x [2, 3] in the following Figure 0.0
g(x)
g'(x)
2.0 2.2 2.4 2.6 2.8 3.0
-1
0
1
2
3
x
y
Figure 0.0. Graph of the function g(x) = 7f x and its derivative g' (x) = -7x
2
.
Again we can use the function NestList[] to generate a sequence of numbers based on iteration formula
(0.0)
NestList_
7
2
&, 2.5, 6_
{2.5, 2.60714, 2.63612, 2.64339, 2.64517, 2.64561, 2.64572]
For this iteration formula we also observe that the magnitude of the first order derivative of g' is
smaller than one (compare Figure 0.0). For the iteration formula (0.0)
x
n+1
=
1
2
x
n
+
7
x
n
g(x) and g' (x) on the interval x [2, 3] is shown in the following Figure 0.0.
g(x)
g'(x)
2.0 2.2 2.4 2.6 2.8 3.0
0.0
0.5
1.0
1.5
2.0
2.5
x
y
Lecture_007.nb 9
2012 G. Baumann
Figure 0.0. Graph of the function g(x) =
1
2
|x +
7
x
] and its derivative g' (x) =
1
2
1 -
7
x
2
.
The generation of the sequence using the initial value x
0
= 2.6 shows
NestList_
1
2
+
7
&, 2.6, 6_
{2.6, 2.64615, 2.64575, 2.64575, 2.64575, 2.64575, 2.64575]
that we approach the same value as for the iteration formula (0.0). Here again the maximum of the first
order derivative of g' (x) is smaller than one.
All four iterations have the property that if the sequence x
n
n 0 has a limit a, then a is a root of the
defining equation. For each equation, we check this as follows: Replace x
n
and x
n+1
by a, and then show
that this implies a = 7 . The next lines show you the results of this calculation for the different cases,
respectively.
Solve[-a
2
+ a + 7 a, a]
{{a - 7 , {a 7
Solve_
7
a
a, a_
{{a - 7 , {a 7
Solve_-
a
2
7
+ a + 1 a, a_
{{a - 7 , {a 7
Solve_
1
2
a +
7
a
a, a_
{{a - 7 , {a 7
To explain this results, we are now going to discuss a general theory for one-point iteration formulas
which explains all the observed facts.
The iterations (0.0-0) all have the same form
x
n+1
= g(x
n
)
for appropriate continuous functions g(x). If the iterates x
n
converge to a point a, then
10 Lecture_007.nb
2012 G. Baumann
lim
n
x
n+1
= lim
n
g(x
n
)
a = g(a).
Thus a is a solution of the equation x = g(x), and a is called a fixed point of the function g. We call (0.0) a
fixed point equation.
The next step is to set up a general approach to explain when the iteration x
n+1
= g(x
n
) will converge to a
fixed point of g. We begin with a lemma on the existence of solutions of x = g(x).
Corollary Fix point Existence
Let g(x) be a continuous function on an interval [a, b], and suppose g satisfies the property
a <= x <= b and a <= g(x) <= b.
Then the equation x = g(x) has at least one solution a in the interval [a, b].
Proof 0.1. Define the function f (x) = x - g(x). It is continuous for a x b. Moreover, f (a) 0 and f (b) 0.
By the intermediate value theorem there must be a point x in [a, b] for which f (x) = 0. We usually denote
this value of x by a.
QED
The geometric meaning of Corollary 0.0 is shown in Figure 0.0 for a graphical interpretation of the
solution of x = g(x). The solutions a are the x-coordinates of the intersection points of the graphs of y = x
and y = g(x).
x
y
a b
a
b
a
g(x)
y=x
Figure 0.0. Example for the fix point lemma and demonstration of the geometric meaning of the fixed
point equation.
The following panel contains an animation of the interval selection for different functions g(x) used in
(0.0-0).
Lecture_007.nb 11
2012 G. Baumann
a
b
function 7+x -x
2 7
x
1+x-
x
2
7
1
2
7
x
+x
Figure 0.0. Representation of the fix point lemma for the different functions used in the iterations (0.0-0).
The observations so far made are formulated in the following theorem.
Theorem 0.0. Contraction Mapping
Assume g(x) and g' (x) are continuous for a x b, and assume g satisfies the conditions of Corollary
0.0. Further assume that
l = max
axb
g' (x) < 1
Then the following statements hold
S1: There is a unique solution a of x = g(x) in the interval [a, b].
S2: For any initial estimate x
0
in [a, b], the iterates x
n
will converge to a.
S3: a - x
n
l
n
1-l
x
0
- x
1
, n 0
12 Lecture_007.nb
2012 G. Baumann
S4: lim
n
a-x
n+1
a-x
n
= g' (a).
Thus for x
n
close to a
a - x
n+1
g' (a) (a - x
n
).
Proof 0.3. There is some useful information in the proof, so we go through most of the details of it. Note
first that the hypotheses on g allow us to use Corollary 0.0 to assert the existence of at least one solution
to x = g(x). In addition, using the mean value theorem, we have that for any two points w and z in [a, b],
g(w) - g(z) = g' (c) (w - z)
for some c between w and z. Using the property of l in this equation, we obtain
| g(w) - g(z) | = | g' (c) || w - z | <= l | w - z | for a <= w, z <= b.
S1: Suppose there are two solutions, denoted by a and b. Then a = g(a) and b = g( b). By subtracting
these, we find that
a - b = g(a) - g(b).
Take absolute values and use the estimation from above
| a - b | <= l | a - b |
(1 - l) | a - b | <= 0.
Since l < 1, we must have a = b; and thus, the equation x = g(x) has only one solution in the interval
[a, b].
S2: From the assumptions in Corollary 0.0, it can be shown that for any initial guess x
0
in [a, b], the
iterates x
n
will all remain in [a, b]. For example, if a x
0
b, then Corollary 0.0 implies a g(x
0
) b.
Since x
1
= g(x
0
), this shows x
1
is in [a, b]. Repeat the argument to show that x
2
= g(x
1
) is in [a, b], and
continue the argument inductively.
To show that the iterates converge, subtract x
n+1
= g(x
n
) from a = g(a), obtaining
a - x
n+1
= g(a) - g(x
n
) = g' (c
n
) (a - x
n
)
for some c
n
between a and x
n
. Using the assumption of the Lemma, we get
a - x
n
l a - x
n
n 0.
Inductively, we can then show that
a - x
n
l
n
a - x
0
, n 0.
Since l < 1, the right side of this expression goes to zero as n , and this then shows that x
n
a as
n . This kind of convergence is also known as a Cauchy sequence.
S3: If we use a - x
n
l a - x
n
with n = 0 we get
a - x
0
a - x
1
+ x
1
- x
0
l a - x
0
+ x
1
- x
0
Lecture_007.nb 13
2012 G. Baumann
(1 - l) a - x
0
x
1
- x
0
a - x
0
1
1 - l
x
1
- x
0
.
combining this with the final result of S2 we can conclude that
a - x
n
l
n
1 - l
x
0
- x
1
, n 0.
S4: We use a - x
n+1
= g(a) - g(x
n
) = g' (c
n
) (a - x
n
) to write
lim
n
a - x
n+1
a - x
n
= lim
n
g' (c
n
).
Each c
n
is between a and x
n
, and x
n
a, by S2. Thus, c
n
a. Combining this with the continuity of the
function g' (x) to obtain
lim
n
g' (c
n
) = g' (a)
thus finishes the proof.
QED
We need a more precise way to deal with the concept of the speed of convergence of an iteration
method. We say that a sequence x
n
n 0 converges to a with an order of convergence p 1 if
a - x
n+1
c a - x
n
p
, n 0
for some constant c 0. The cases p = 1, p = 2, and p = 3 are referred to as linear, quadratic, and cubic
convergence, respectively. Newton's method usually converges quadratically; and the secant method has
order of convergence p = 1 + 5 2. For linear convergence, we make the additional requirement that
c < 1; as otherwise, the error a - x
n
need to converge to zero.
If g' (a) < 1 in the preceding theorem, then the relation a - x
n+1
l a - x
n
shows that the iterates
x
n
are linearly convergent. If in addition, g' (a) 0, then the relation a - x
n+1
g' (a) (a - x
n
) proves the
convergence is exactly linear, with no higher order of convergence being possible. In this case, we call
the value g' (a) the linear rate of convergence.
In practice Theorem 0.0 is seldom used directly. The main reason is that it is difficult to find an interval
[a, b] for which the conditions of the Corollary is satisfied. Instead, we look for a way to use the theorem
in a practical way. The key idea is the result a - x
n+1
= g' (a) (a - x
n
), which shows how the iteration error
behaves when the iterates x
n
are near a.
Corollary 0.0. Convergence of the Fix-Point Method
Assume that g(x) and g' (x) are continuous for some interval c < x < d, with the fixed point a contained in
the interval. Moreover, assume that
| g' (a) | < 1.
Then, there is an interval [a, b] around a for which the hypotheses, and hence also the conclusion, of
14 Lecture_007.nb
2012 G. Baumann
Then, there is an interval a, b around for which the hypotheses, and hence also the conclusion, of
Theorem 0.0 are true. And if to be contrary, g' (a) > 1, then the iteration method x
n+1
= g(x
n
) will not
converge to a. When g' (a) = 1 no conclusion can be drawn.
If we check the iteration formulas (0.0-0) by this corollary we observe that the first and second iteration
scheme is not converging to the real root. This behavior is one of the shortcomings of the fix point
method. In general fixed point methods are only used in practice if we know the interval in which the fixed
point is located and if we have a function g(x) available satisfying the requirements of Theorem 0.0. The
following examples demonstrate the application of the fixed point theorem.
Example 0.0. Fixed Point Method I
Let g (x) = |x
2
- 1] 5 on [-1, 1]. The Extreme Value Theorem implies that the absolute minimum of g
occurs at x = 0 and g (0) = -1f 5. Similarly, the absolute maximum of g occurs at x = 1 and has the value
g(1) = 0. Moreover, g is continuous and
g' (x) =
2 x
5
2
5
, for all x (-1, 1).
So g satisfies all the hypotheses of Theorem 0.0 and has a unique fixed point in [-1, 1].
Solution 0.5. In this example, the unique fixed point x in the interval [-1, 1] can be determined alge-
braically. If
a = g(a) =
a
2
- 1
5
then a
2
- 5 a - 1 = 0
which, by the quadratic formula, implies that
solfp = Solve[a
2
- 5 a - 1 0, a]
{{a
1
2
5 - 29 , {a
1
2
5 + 29
g(x_) :=
1
5
[x
2
- 1]
g(x)
x
[.{x 4}
8
5
Note that g actually has two fixed points at x =
1
2
5 29 . However, the second solution of {{x
-0.19258240356725187}, {x 5.192582403567252}} is not located in the interval we selected but
included in the interval [4, 6]. This second solution of the fixed point equation does not satisfy the assump-
tions of Theorem 0.0, since g' (4) = 8f 5 > 1. Hence, the hypotheses of Theorem 0.0 are sufficient to
guarantee a unique fixed point but are not necessary (see Figure 0.0).
Lecture_007.nb 15
2012 G. Baumann
-1 0 1 2 3 4 5 6
0
2
4
6
x
g
(
x
)
1
2
(5- 29 )
1
2
(5+ 29 )
Figure 0.0. Fixed point equation g(x) = |x
2
- 1] 5 and the fixed points in the interval [-1, 1].
To demonstrate the use of the fixed point theorem in finding roots of equations let us examine the follow-
ing example.
Example 0.0. Fixed Point Method II
The equation x
3
+ 4 x
2
- 10 = 0 has a unique root in [1, 2]. There are many ways to change the equation
to the fixed-point form x = g(x) using simple algebraic manipulation. We select the following representa-
tion of the fixed point equation with
g(x) =
10
4 + x
1
2
.
Solution 0.7. Using the function g(x) as represented in (0.0) we first have to check the prerequisites of
Theorem 0.0. The function
g(x_) :=
10
x + 4
assumes the following values at the boundaries of the interval
{g(1), g(2)}
{ 2 ,
5
3
showing that the interval [1, 2] is mapped to itself. The first order derivative of g(x) generates the values
16 Lecture_007.nb
2012 G. Baumann
[
g(x)
x
[.x 1,
g(x)
x
[.x 2_
{-
1
5 2
, -
5
3
12
representing values which are smaller than one in magnitude. Thus the assumptions of Theorem 0.0 are
satisfied and the fixed point is reached by a direct iteration to be
NestList_
10
4 +
1/2
&, 1., 7_
{1., 1.41421, 1.35904, 1.36602, 1.36513, 1.36524, 1.36523, 1.36523]
which represents the root located in the interval [1, 2] (see Figure 0.0). A sufficient accuracy of the root is
reached within a few iteration steps.
1.0 1.2 1.4 1.6 1.8 2.0
1.0
1.2
1.4
1.6
1.8
2.0
x
g
(
x
)
Figure 0.0. Fixed point equation g(x) = |
10
4+x
]
1f2
and the fixed points in the interval [1, 2].
So for we did not talk about the accuracy of the fixed point method. Actually the accuracy of the method
is determined by part S3 of Theorem 0.0. This relation relates the actual fix point to the rate of conver-
gence rate l
a - x
n
l
n
1 - l
x
0
- x
1
, n 0
if we know the accuracy which is given by a - x
n
= e in the nth iteration we are able to estimate the
number of iterations by
Lecture_007.nb 17
2012 G. Baumann
Solve_e
l
n
|x0 - x1|
1 - l
, n_
{{n
log
e (1-l)
x0-x1
log(l)
This formula needs two of the iteration steps x
0
and x
1
and the convergence rate l = max(g' (x)) with
x [a, b].
Example 0.0. Fixed Point Method III
We will estimate the number of iterations for the fixed point problem
x = 2
-x
with x [1f 3, 1].
We are interested in an accuracy of e = 10
-5
.
Solution 0.8. The solution of equation (0.0) tells us that the number of iterations are related to the accu-
racy e and the convergence rate of the fixed point equation. The convergence rate l is determined for
this equation by defining g as
g(x_) := 2
-x
and its derivative
derg =
g(x)
x
-2
-x
log(2)
For the given interval we find
l = max [derg [.x
1
3
, derg [.x 1_
log(2)
2
which is smaller than one. The first two iterations of the given fixed point equation follow by
initials = NestList_2
-1
&,
1
3
, 1_
{
1
3
,
1
2
3
a
-
x
n