Note Set 1 - The Basics: 1.1 - Overview
Note Set 1 - The Basics: 1.1 - Overview
1.1 Overview
In this note set, we will cover the very basic tools of numerical methods.
Numerical methods are useful for both formal theory and methods because they free us
from having to employ restrictive assumptions in order to obtain solutions to important
problems.
Consider applications to formal theory first. Stylized formal models often show
that some result holds under a certain set of conditions. A good theorist however, will
attempt to characterize all conditions under which that result holds. The ideal is an if and
only if result. This may not be possible, but general conditions are more informative than
specific ones. When we go down this road, however, we may be able to derive certain
nice results, but lack specificity. Without restrictive assumptions, we may not be able to
solve our model analytically. When we make restrictive assumptions, then we would like
to at least make them realistic. The set of realistic assumptions will not always be the
ones that lead to an easy solution. Here is where numerical methods provide a great
advantage, because they free of form having to choose the set of assumptions that make
our model easy to solve, and allow us to choose these assumptions based on other
concerns (e.g. realism, generality).
f '( x) = lim
h 0
f ( x + h) f ( x )
h
We assume that have access to a numerical function that computes f ( x) . For example,
we may have the c++ code,
We would like to be able to take the function Func and the point x as inputs and return
the derivative of Func at x. The definition of the derivative suggests an approach- select a
small value, h, and compute,
f '( x)
f ( x + h) f ( x )
h
The key question here is, how small should h be? For example, why not choose
h = 1.0*1080 ? How small we can select h is limited by the precision of the computer.
Real numbers are stored on a computer using a finite number of bits representing the
decimal place, the sign, and the exponent. A float is represented by 4 bytes (or 32 bits)
and a double is represented by 8 bytes (or 64 bits). If we select h = 1.0*1080 , then the
computer will likely not be able to return a different value for f ( x + h) and f ( x) , and
are estimated derivative will be (arbitrarily) evaluated as zero.
There are two major sources of error in computing numerical derivativestruncation error and round-off error. The first trick we employ is to make sure that x and
x + h differ exactly by a number that can be represented by a computer. This will reduce
one source of round-off error to zero. This can be accomplished using the following lines
of code,
double Temp = x + h;
h = Temp x;
Let m denote the smallest number that a computer can represent. For example,
for a typical Intel/AMD pc, this number will be 2.22045*10-16 . There exist routines that
will compute this number for your computer. The remaining round off error will have
size ~ m | f ( x) / h | . To calculate the truncation error, consider the Taylor expansion,
1
2
m f
f ''
. It is often
f
| x | when x is not to close to zero. This leads to the
f ''
m | x |,
| x | 0
heuristic, h
. An alternative that is sometimes used is
,
otherwise
h m (1+ | x |) . This may seem a little ad-hoc, but we will see later that this method is
quite good (and certainly a lot better than guessing!).
The truncation error in the above calculation is O (h) . We can do better then this
by employing a higher order Taylor expansion,
f ( x + h) f ( x h ) 1
3 f '''( x)h 2 + ...
2h
This is the second difference formula for computing numerical derivatives. Notice that
this yields are more accurate calculation, with an accuracy of O(h 2 ) , but requires an
extra function evaluation (assuming that f is already available). One can use the same
approach to calculate an optimal h . We get, h m1 3 | x | .
We may also want to obtain higher-order derivatives. We can obtain these
derivatives by iterating the approach suggested above. For example, let use combine,
f ( x + h) = f ( x) + f '( x)h + 12 f ''( x)h 2 + 16 f '''( x)h3 + 241 f (4) ( x)h 4 ...
f ( x h) = f ( x) f '( x)h + 12 f ''( x)h 2 16 f '''( x)h3 + 241 f (4) ( x)h 4 + ...
to obtain approximation to f '( x) at two points,
f ( x + h) f ( x )
= f '( x) + 12 f ''( x)h + 16 f '''( x) h 2 + 241 f (4) ( x) h3 + ...
h
f ( x ) f ( x h)
= f '( x) 12 f ''( x)h + 16 f '''( x)h 2 + 241 f (4) ( x)h3 + ...
h
We can this apply the first-difference principle again to obtain,
f ( x + h) f ( x ) f ( x ) f ( x h)
h
h
= f ''( x) + 121 f (4) ( x)h 2 + ...
h
f ( x + h) 2 f ( x) + f ( x h) 1 (4)
12 f ( x)h 2 + ...
h2
Second Derivatives
First Derivatives
h
Forward Backward
1.00E-01
6.96E-03 -7.48E-03
1.00E-02
7.19E-04 -7.24E-04
1.00E-03
7.21E-05 -7.22E-05
1.00E-04
7.21E-06 -7.21E-06
1.00E-05
7.21E-07 -7.21E-07
1.00E-06
7.21E-08 -7.22E-08
1.00E-07
7.26E-09 -7.18E-09
1.00E-08
8.26E-10 -1.03E-08
1.00E-09
7.34E-09
7.34E-09
1.00E-10
2.29E-07 -8.81E-07
1.00E-11
5.78E-06 -5.32E-06
1.00E-12
7.89E-05 -3.21E-05
1.00E-13
5.69E-04 -5.42E-04
1.00E-14
2.69E-03 -8.17E-03
1.00E-15
7.33E-02 -5.17E-02
1.00E-16
-nan
-nan
1.00E-17
-nan
-nan
1.00E-18
-nan
-nan
1.00E-19
-nan
-nan
Optimal h
5.08E-09 -3.05E-09
Central
-2.63E-04
-2.62E-06
-2.62E-08
-2.62E-10
8.26E-13
-5.05E-11
4.13E-11
-4.73E-09
7.34E-09
-3.26E-07
2.29E-07
2.34E-05
1.38E-05
-2.74E-03
1.08E-02
-nan
-nan
-nan
-nan
-6.17E-12
h
1.00E-01
1.00E-02
1.00E-03
1.00E-04
1.00E-05
1.00E-06
1.00E-07
1.00E-08
1.00E-09
1.00E-10
1.00E-11
1.00E-12
1.00E-13
1.00E-14
1.00E-15
1.00E-16
1.00E-17
1.00E-18
1.00E-19
Optimal h
Forward
-1.45E-02
-1.56E-03
-1.57E-04
-1.58E-05
-3.38E-06
7.66E-05
7.66E-05
-1.44E-01
-1.44E-01
-1.44E-01
-1.11E+06
-1.11E+08
-1.11E+10
-1.44E-01
-1.41E+14
-nan
-nan
-nan
-nan
-5.28E-05
Backward
1.72E-02
1.59E-03
1.58E-04
1.57E-05
1.06E-06
-1.45E-04
7.66E-05
-1.25E+00
-1.44E-01
-1.11E+04
-1.11E+06
-1.44E-01
-1.11E+10
-1.06E+12
-1.44E-01
-nan
-nan
-nan
-nan
5.28E-05
0.100002
0.100003
0.100005
0.100006
0.100007
0.100009
(1 + x)
solution. Like numerical differentiation, you can probably guess what the first approach
to numerical integration will be.
Recall the definition of the Riemann integral. For any
a = x0 < t1 < x2 < t2 < ... < tn < xn = b , we consider the approximation,
x=a
f ( x)dx f ( ti ) ( xi +1 xi )
i =1
If we consider patricians such that the distance between xi and xi +1 is arbitrarily small for
all i , we get the value of the integral. This suggests the following formula for numerical
integration,
x =a
f ( x)dx f
i =1
( 12 ( xi + xi +1 ) ) ( xi +1 xi )
If we set xi = a + ni (b a) . We get,
x =a
More precisely, let us consider integrating the function between xi and xi +1 where
h = xi +1 xi . Using A Taylor expansion, we can obtain,
F ( xi +1 ) = F ( xi ) + f ( xi )h + 12 f '( xi )h 2 + ...
where F denotes the indefinite integral of f . Hence, we have,
xi +1
x = xi
x =a
n 1
f ( x)dx =
xi +1
x = xi
i =0
n 1
f ( x)dx = h f ( xi ) + h
1
2
i =0
n 1
i =0
1
2
f '( xi ) + ...
We therefore have,
x =a
n 1
This is almost exactly the same as the formula derived above. A higher-order expansion
will improve on the accuracy, but requires higher-order differentiability of the function
being integrated.
The methods outlined above are sometimes useful, but only problems that are
very messy or problems for which extremely high precision is desired. Ill elaborate on
this later. Perhaps the most widely used method is Gaussian quadrature. Gaussian
quadrature can be used for functions that are well approximated by a polynomial. In
particular, n -point quadrature wield yield an exactly correct expression for functions that
are equal to a 2n 1 -degree polynomial multiplied by some known weighting function
W ( x) . In particular, the formula is,
x=a
W ( x) f ( x)dx wi f ( xi )
i =1
This will produce an exactly correct answer when f ( x) is a polynomial. The trick then is
to find the weights wi and the evaluation points xi .
10
The following are the weighting functions that are typically used (each one is
given a name):
1.
Gauss-Legendre Quadrature:
2.
Guass-Chebychev Quadrature:
3.
Guass-Laguerre:
4.
Guass-Hermite:
5.
Gauss-Jacobi:
How do we go about finding the weights and evaluation points then? Consider
2 n 1
ax
i =0
. We have,
2 n 1
2
2 n 1 i x2
a
x
e
dx
ai
xi e x dx
=
x=
x =
i =0
i =0
x =
x =
x i e x dx = 12 (i 1)
x =
xe x dx = 0 ,
2
x =
x i 2 e x dx
e x dx = 2
2
n
2 n 1
2
x i e x dx = wi a j ( xi ) j
x =
i =1
j =0
ai
i =0
11
a0
x =
e x dx + a1
2
= a0 2 + a2
x =
1
2
xe x dx + a2
2
x 2 e x dx + a3
2
x =
x =
x3e x dx
2
1
2
w1 x13 + w2 x23 = 0
, x1 =
1
2
, x2 =
1
2
This procedure works more generally, and for all of the quadrature formulas, except that
we dont solve them by hand! Instead, there are standard computer programs that are
designed to computer the solutions to such systems of equations.
How do we choose which formula to use? The most important concern is the
range of the function. For example, is we want to compute the integral
x =3
xe
12 x 2
dx , we
would see that the function has a finite range. Therefore, we would apply Legendre,
Chebychev, or Jacoby. Guass Hermite would be a poor choice here because even though
we can write this integral as,
x =
1{3 x 7}xe
12 x 2
12
where n and n are standard normal random deviates with correlation . Now, we
observe yn = 1{ yn * 0} only if rn = 1{rn * 0} = 1 . This means that conditional on ( xn , zn ) ,
we observe three possible events- rn = 0 , yn = 0, rn = 1 , and yn = 1, rn = 1 . Computing the
integral of the first event is easy,
Pr(rn = 0 | xn , zn ) = Pr(rn * 0 | xn , zn ) = Pr( ' zn + n 0 | xn , zn )
= Pr(n ' zn | xn , zn ) = ( ' zn )
Consider one of the other events,
Pr( yn = 0, rn = 1| xn , zn ) = Pr(rn * 0, yn * 0 | xn , zn )
= Pr( ' xn + n 0, ' zn + n 0 | xn , zn )
= Pr( n ' xn , n ' zn | xn , zn )
=
' xn
' zn
1
2
= 2 2(1 )
1
( 2 2 + 2 )
2(1 2 )
d d
We can reduce the integrals by completing the square (or factoring the joint distraction
into a marginal and conditional),
Pr( yn = 0, rn = 1| xn , zn )
=
' zn
1
2
12 2
' xn
1
2
= (1 ) 2
1 2
1
( ) 2
2 (1 2 )
d d
, we obtain,
Pr( yn = 0, rn = 1| xn , zn )
=
' zn
' zn
1
2
1
2
12 2
12 2
' xn
1 2
u =
1
2
12 u 2
dud
' x
n
d
There are standard algorithms for computing efficiently, so we have reduced our
problem to a one dimensional integral. Since the bounds are half infinite, Guass-Leugerre
13
integration is a good choice here (with = 0 ). We need to transform the range however.
Define v = ' zn . We have,
Pr( yn = 0, rn = 1| xn , zn )
e v
v =0
1
2
v 12 ( v + ' zn ) 2
' x + (v + ' z )
n
n
dv
2
wi
i
1
2
v 12 ( vi + ' zn ) 2
ei
' x + (v + ' z )
n
i
n
choice. We have,
1
1
E
=
2
x =
1 + x2
1+ X
1
2
1 ( x )2
2 2
dx
2
1
e y dy
= y =
(1 + ( + y 2)) 2
2
1
e yi
wi
i
(1 + ( + yi 2)) 2
14
some functions will not be well approximated by a polynomial of any order. It is when
very high accuracy is desired and the function is poorly approximated by a polynomial
that we will rely on the Trapezoid and related formulas (or as we discuss later, on
simulation methods).
In Table 1.3, we consider several examples, taken from page 254 in Kenneth
Judds textbook. We see that Gaussian quadrature sometime performs extremely well,
even with only a few points. The trapezoid rule, alternatively, does well for a large
number of points.
15
Rule
Trapezoid
Simpson
GuassLegendre
Number of
Points
4
7
10
13
3
7
11
15
4
7
10
13
Truth
x1/ 4 dx
10
x 2 dx
e dx
( x + 0.05) + dx
.7212
.7664
.7797
.7858
.6496
.7816
.7524
.7922
.8023
1.7637
1.1922
1.0448
.9857
1.3008
1.0017
.9338
.9169
.8563
1.7342
1.7223
1.72
1.7193
1.4662
1.7183
1.6232
1.7183
1.7183
.6056
.5583
.5562
.5542
.4037
.5426
.4844
.5528
.5713
.8006
.8003
.8001
.8
.8985
.9
.9
.9
1.7183
1.7183
1.7183
1.7183
.5457
.5538
.5513
.55125
16
17
f ( x) f '( x)( x * x)
We have,
x* x
f ( x)
f '( x)
This suggests the following algorithm. Given a current point, xk , compute a new point
using,
xk +1 xk +
f ( xk )
f '( xk )
18
A nonlinear difference equation, however, can exhibit chaos in a single dimension. These
problems do exist for Newtons method in practice.
In practice, Newtons method works as follows. Set,
xk +1 xk +
f ( xk )
f '( xk )
f ( xk ) f ( xk 1 )
xk xk 1
19
f ( xk )
f ( xk ) f ( xk 1 )
This approach is called the secant method. It achieves q-superlinear convergence, which
is faster than the bisection method, but slower than Newtons method. Like Newtons
methods, there is no guarantee that the secant method will converge. A variant of the
secant method is the false position method, which makes sure to keep one point on each
side of the root, but is otherwise similar to the secant method. This procedure retains the
q-superlinear convergence rate, but is guaranteed to converge. A picture will illustrate
this.
The second problem relates to convergence of Newtons method, the Secant
method, and the false position method. These methods all converge faster than the
bisection method as we get close to the root, but may have worse performance far from
the root. Brents method follows the same lines as these other methods, but makes sure to
check the progress of the algorithm and reverts to the bisection method in cases of poor
performance. Brents method is otherwise similar to the false-position method, but uses
and inverse quadratic approximation rather that a linear one. This is the algorithm that
works best in practice, and which is the hardest to illustrate with a simple figure. The
Secant and Newtons method are of interest because they are the only algorithms that
extend to the multidimensional case.
Let us now consider an example where we can apply the one-dimensional rootfinding algorithms. Consider two countries that must divide a surplus (whose value is
normalized to one) among themselves. A country may choose to fight or back down or
agree to a default settlement of
1
2
the event that only one country fights, that country gets the full surplus. In the event that
20
both countries fight, each country wins with probability one half (and the country that
wins gets the full surplus). The surplus is discounted at rate 0 < < 1 however.
Each country knows its own cost, but not the cost of the other country. The costs
are known to be drawn from the common distribution, F , where F admits a derivative
on [0, 12 ]. We have the following utility functions,
Fight
Dont
Fight
( 12 c1 , 12 c2 )
(1 c1 , 0)
Dont
Country 1
Country 2
(0,1 c2 )
( 12 , 12 )
We will assume that all equilibria have the form, country 1 fights if their cost is lower
that c1 * and country 2 fights if their cost is lower than c2 * .
Country 1s expected utility from fighting, given country 2s strategy is,
F (c2 *)( 12 c1 ) + (1 F (c2 *))(1 c1 ) = 1 + F (c2 *)( 12 1) c1
while his expected utility from not fighting is,
F (c2 *) *0 + (1 F (c2 *)) * 12 = 12 12 F (c2 *)
Now, the cut point must be the point c1 = c1 * that equates these utilities,
c1* = 12 12 F (c2 *)(1 )
A similar calculation for the other country will show that,
c2 * = 12 12 F (c1*)(1 )
21
In numerical recipes, we could call the bisection method using the code,
/* Include NR Code */
22
#include "nr3.h"
#include "roots.h"
/* Define Nonlinear Equation to Solve */
const double Beta = 0.7;
double Func(double x);
double Func(double x)
{
return x - 0.5 - 0.5 * (x >= 0.0) * (1.0 - exp(-x)) * (1.0 - Beta);
}
int main()
{
/* Bisection Method */
try
{
cout << "rtbis, x: " << rtbis(Func,-10.0,10.0,0.00000001) << "\n\n";
}
catch(int i)
{
cout << "rtbis: FAILED\n\n";
}
return 0;
}
23
the new middle point. Otherwise, it becomes one of the new boundary points. Continuing
this procedure, we will obtain smaller and smaller intervals.
24