ProblemSheets2015 Solutions
ProblemSheets2015 Solutions
A steady and persistent effort spent on homework problems is essential for success in
the course.
You should expect to spend 4-6 hours per week on trying to solve the homework problems.
Since many involve small coding projects, the time it will take an individual student to
arrive at a solution is hard to predict.
The assignment sheets will be uploaded on the course webpage on Thursday every
week.
Some or all of the problems of an assignment sheet will be discussed in the tutorial
classes on Monday 1 21 weeks after the problem sheet has been published.
A few problems on each sheet will be marked as core problems. Every participant
of the course is strongly advised to try and solve at least the core problems.
If you want your tutor to examine your solution of the current problem sheet, please
put it into the plexiglass trays in front of HG G 53/54 by the Thursday after the
publication. You should submit your codes using the online submission interface.
This is voluntary, but feedback on your performance on homework problems can be
important.
You are encouraged to hand-in incomplete and wrong solutions, since you can re-
ceive valuable feedback even on incomplete attempts.
Please clearly mark the homework problems that you want your tutor to examine.
1
Prof. R. Hiptmair AS 2015
ETH Zrich
G. Alberti,
D-MATH
F. Leonardi Numerical Methods for CSE
Problem Sheet 0
These problems are meant as an introduction to E IGEN in the first tutorial classes of the
new semester.
(1a) Based on the C++ linear algebra library E IGEN implement a function
t e m p l a t e < c l a s s Matrix>
Matrix gramschmidt( c o n s t Matrix &A);
(1b) Test your implementation by applying it to a small random matrix and checking
the orthonormality of the columns of the output matrix.
Solution: See gramschmidt.cpp.
1
MatrixXd strassenMatMult( c o n s t MatrixXd & A, c o n s t
MatrixXd & B)
that uses Strassens algorithm to multiply the two matrices A and B and return the result
as output.
Solution: See Listing 1.
(2b) Validate the correctness of your code by comparing the result with E IGENs
built-in matrix multiplication.
Solution: See Listing 1.
(2c) Measure the runtime of your function strassenMatMult for random matri-
ces of sizes 2k , k = 4, . . . , 10, and compare with the matrix multiplication offered by the
-operator of E IGEN.
Solution: See Listing 1.
7 u s i n g namespace Eigen ;
8 u s i n g namespace s t d ;
9
2
19 i f ( n==2)
20 {
21 C<< A ( 0 , 0 ) * B ( 0 , 0 ) + A ( 0 , 1 ) *B ( 1 , 0 ) ,
22 A ( 0 , 0 ) *B ( 0 , 1 ) + A ( 0 , 1 ) *B ( 1 , 1 ) ,
23 A ( 1 , 0 ) *B ( 0 , 0 ) + A ( 1 , 1 ) *B ( 1 , 0 ) ,
24 A ( 1 , 0 ) *B ( 0 , 1 ) + A ( 1 , 1 ) *B ( 1 , 1 ) ;
25 r e t u r n C;
26 }
27
28 else
29 { MatrixXd
Q0( n / 2 , n / 2 ) ,Q1( n / 2 , n / 2 ) ,Q2( n / 2 , n / 2 ) ,Q3( n / 2 , n / 2 ) ,
30 Q4( n / 2 , n / 2 ) ,Q5( n / 2 , n / 2 ) ,Q6( n / 2 , n / 2 ) ;
31
32 MatrixXd A11=A . t o p L e f t C o r n e r ( n / 2 , n / 2 ) ;
33 MatrixXd A12=A . t o p R i g h t C o r n e r ( n / 2 , n / 2 ) ;
34 MatrixXd A21=A . b o t t o m L e f t C o r n e r ( n / 2 , n / 2 ) ;
35 MatrixXd A22=A . bottomRightCorner ( n / 2 , n / 2 ) ;
36
37 MatrixXd B11=B . t o p L e f t C o r n e r ( n / 2 , n / 2 ) ;
38 MatrixXd B12=B . t o p R i g h t C o r n e r ( n / 2 , n / 2 ) ;
39 MatrixXd B21=B . b o t t o m L e f t C o r n e r ( n / 2 , n / 2 ) ;
40 MatrixXd B22=B . bottomRightCorner ( n / 2 , n / 2 ) ;
41
50 C<< Q0+Q3Q4+Q6 ,
51 Q2+Q4,
52 Q1+Q3,
53 Q0+Q2Q1+Q5 ;
54 r e t u r n C;
55 }
3
56 }
57
58 i n t main ( v o i d )
59 {
60 srand ( ( u n s i g n e d i n t ) t i m e ( 0 ) ) ;
61
80 f o r ( u n s i g n e d i n t k = 4 ; k 1 0 ; k ++) {
81 tm_x . r e s e t ( ) ;
82 tm_strassen . r e s e t ( ) ;
83 f o r ( u n s i g n e d i n t r = 0 ; r < r e p e a t s ; ++ r ) {
84 u n s i g n e d i n t n = pow ( 2 , k ) ;
85 A = MatrixXd : : Random ( n , n ) ;
86 B = MatrixXd : : Random ( n , n ) ;
87 MatrixXd AB( n , n ) ;
88
89 tm_x . s t a r t ( ) ;
90 AB=A * B ;
91 tm_x . s t o p ( ) ;
4
92
93 tm_strassen . s t a r t ( ) ;
94 AB= s t r a s s e n M a t M u l t ( A , B) ;
95 tm_strassen . s t o p ( ) ;
96 }
97 s t d : : c o u t << " The s t a n d a r d m a t r i x m u l t i p l i c a t i o n
took : " << tm_x . min ( ) . count ( ) /
1000000. << " ms " << s t d : : e n d l ;
98 s t d : : c o u t << " The S t r a s s e n ' s a l g o r i t h m t o o k :
" << tm_strassen . min ( ) . count ( ) /
1000000. << " ms " << s t d : : e n d l ;
99
113 }
5
Listing 2: M ATLAB implementation for Problem 3 in file houserefl.m
1 f u n c t i o n Z = houserefl(v)
2 % Porting of houserefl.cpp to Matlab code
3 % v is a column vector
4 % Size of v
5 n = s i z e (v,1);
6
7 w = v/norm(v);
8 u = w + [1; z e r o s (n-1,1)];
9 q = u/norm(u);
10 X = e y e (n) - 2*q*q';
11
that is equivalent to the M ATLAB function houserefl(). Use data types from E IGEN.
Solution:
6
(3b) Show that the matrix X, defined at line 10 in Listing 2, satisfies:
X X = In
2
H INT: q = 1.
Solution:
X X = (In 2qq )(In 2qq )
= In 4qq + 4q q q q
=q=1
= In 4qq + 4qq
= In
(3c) Show that the first column of X, after line 9 of the function houserefl, is a
multiple of the vector v.
1
0
H INT: Use the previous hint, and the facts that u = w + and w = 1.
0
Solution: Let X = [X1 , , Xn ] be the matrix of line 9 in Listing 2. In view of the identity
X1 = e(1) 2q1 q we have
u2 (w1 +1)2 +w22 ++wn2 2(w1 +1)2 2w1 (w1 +1)
1 2q 2 1 2 n 1 u2 2(w1 +1)
1 i=1 i (w1 +1)2 +w22 ++wn2
2q q 2 u1 u2 2(w1 +1)w2 w=1 2(w1 +1)w2
1 2 H INT
X1 = = i=1 ui = (w1 +1) +w2 ++wn = +1) = w,
n 2 2 2 2 2(w 1
2q1 qn u1 un 2(w1 +1)wn 2(w1 +1)wn
2 n u2 (w +1)2 +w2 ++w2 2(w1 +1)
i=1 i 1 2 n
which is a multiple of v, since w = v
v .
(3d) What property does the set of columns of the matrix Z have? What is the
purpose of the function houserefl?
H INT: Use (3b) and (3c).
Solution: The columns of X = [X1 , , Xn ] are an orthonormal basis (ONB) of Rn
(cf. (3b)). Thus, the columns of Z = [X2 , , Xn ] are an ONB of the complement of
(3c)
Span(X1 ) = Span(v). The function houserefl computes an ONB of the comple-
ment of v.
7
(3e) What is the asymptotic complexity of the function houserefl as the length n
of the input vector v goes to ?
Solution: O(n2 ): this is the asymptotic complexity of the construction of the tensor prod-
uct at line 9 of Listing 3.
(3f) Rewrite the function as M ATLAB function and use a standard function of
M ATLAB to achieve the same result of lines 5-9 with a single call to this function.
H INT: It is worth reading [1, Rem. 1.5.11] before mulling over this problem.
Solution: Check the code in Listing 2 for the porting to M ATLAB code. Using the QR-
decomposition qr one can rewrite (cf. Listing 4) the C++ code in M ATLAB with a few
lines.
8
Prof. R. Hiptmair AS 2015
ETH Zrich
G. Alberti,
D-MATH
F. Leonardi Numerical Methods for CSE
Problem Sheet 1
You should try to your best to do the core problems. If time permits, please try to do the
rest as well.
(1a) For general vectors d = (d1 , . . . , dn ) and a = (a1 , . . . , an ) , sketch the matrix
A created in line 6 of Listing 5.
H INT: This M ATLAB script is provided as file arrowmatvec.m.
d1 a1
d a2
2
Solution: A =
dn1 an1
a1 a2 . . . an1 dn
1
(1b) The tic-toc timing results for arrowmatvec.m are available in Figure 1.
Give a detailed explanation of the results.
1
timings for arrowmatvec
10
0
10
1
10
2
10
3
10
time[s]
4
10 tictoc time
3
O(n )
5
10
6
10
7
10
8
10
9
10
0 1 2 3 4
10 10 10 10 10
vector size n
Solution: The standard matrix-matrix multiplication has runtimes growing with O(n3 )
and the standard matrix-vector multiplication has runtimes growing with O(n2 ). Hence,
the overall computational complexity is dominated by O(n3 ).
function y = arrowmatvec2(d,a,x)
that computes the same multiplication as in code 5 but with optimal asymptotic complex-
ity with respect to n. Here d passes the vector (d1 , . . . , dn )T and a passes the vector
(a1 , . . . , an )T .
2
Solution: Due to the sparsity and special structure of the matrix, it is possible to write
a more efficient implementation than the standard matrix-vector multiplication. See code
listing 6
(1d) What is the complexity of your algorithm from sub-problem (1c) (with respect
to problem size n)?
Solution: The efficient implementation only needs two vector-vector element-wise multi-
plications and one vector-scalar multiplication. Therefore the complexity is O(n).
(1e) Compare the runtime of your implementation and the implementation given
in code 5 for n = 25,6,...,12 . Use the routines tic and toc as explained in example [1,
Ex. 1.4.10] of the Lecture Slides.
Solution: The standard matrix multiplication has runtimes growing with O(n3 ). The
runtimes of the more efficient implementation are growing with O(n). See Listing 7 and
Figure 2.
3
2 ns = 2.^(2:12);
3 f o r n = ns
4 a = rand(n,1); d = rand(n,1); x = rand(n,1);
5 t = realmax;
6 t2 = realmax;
7 f o r k=1:nruns
8 t i c ; y = arrowmatvec(d,a,x); t = min( t o c ,t);
9 t i c ; y2 = arrowmatvec2(d,a,x); t2 = min( t o c ,t2);
10 end;
11 res = [res; t t2];
12 end
13 f i g u r e ('name','timings arrowmatvec and arrowmatvec2');
14 c1 = sum(res(:,1))/sum(ns.^3);
15 c2 = sum(res(:,2))/sum(ns);
16 l o g l o g (ns, res(:,1),'r+', ns, res(:,2),'bo',...
17 ns, c1*ns.^3, 'k-', ns, c2*ns, 'g-');
18 x l a b e l ('{\bf vector size n}','fontsize',14);
19 y l a b e l ('{\bf time[s]}','fontsize',14);
20 t i t l e ('{\bf timings for arrowmatvec and
arrowmatvec2}','fontsize',14);
21 l e g e n d ('arrowmatvec','arrowmatvec2','O(n^3)','O(n)',...
22 'location','best');
23 p r i n t -depsc2 '../PICTURES/arrowmatvec2timing.eps';
(1f) Write the E IGEN codes corresponding to the functions arrowmatvec and
arrowmatvec2.
Solution: See Listing 8 and Listing 9.
5 u s i n g namespace Eigen ;
6
4
1
timings for arrowmatvec and arrowmatvec2
10
0
10
1
10
2
10
3
10
time[s]
4
10
5
10
6
10
7 arrowmatvec
10
arrowmatvec2
8
10 O(n3)
O(n)
9
10
0 1 2 3 4
10 10 10 10 10
vector size n
5
19 MatrixXd A( n , n ) ;
20 MatrixXd D = d c u t . asDiagonal ( ) ;
21 // If you do not create the temporary matrix D, you
will get an error: D must be casted to MatrixXd
22 A << D, a . head ( n 1) , a c u t . t r a n s p o s e ( ) , d ( n 1) ;
23
24 y=A * A * x ;
25 }
26
29 i n t main ( v o i d )
30 {
31 // srand((unsigned int) time(0));
32 VectorXd a=VectorXd : : Random ( 5 ) ;
33 VectorXd d=VectorXd : : Random ( 5 ) ;
34 VectorXd x=VectorXd : : Random ( 5 ) ;
35 VectorXd y;
36
37 arrowmatvec ( d , a , x , y ) ;
38 s t d : : c o u t << " A * A * x = " << y << s t d : : e n d l ;
39 }
4 u s i n g namespace Eigen ;
5
6
M a t r i x & x , M a t r i x & Ax )
11 {
12 i n t n=d . s i z e ( ) ;
13 Ax =( d . a r r a y ( ) * x . a r r a y ( ) ) . m a t r i x ( ) ;
14 VectorXd Axcut=Ax . head ( n 1) ;
15 VectorXd a c u t = a . head ( n 1) ;
16 VectorXd x c u t = x . head ( n 1) ;
17
23 i n t main ( v o i d )
24 {
25 VectorXd a=VectorXd : : Random ( 5 ) ;
26 VectorXd d=VectorXd : : Random ( 5 ) ;
27 VectorXd x=VectorXd : : Random ( 5 ) ;
28 VectorXd Ax ( 5 ) ;
29
30 Atimesx ( d , a , x , Ax ) ;
31 VectorXd AAx ( 5 ) ;
32 Atimesx ( d , a , Ax , AAx ) ;
33 s t d : : c o u t << " A * A * x = " << AAx << s t d : : e n d l ;
34 }
7
(2a) We consider the function
It can the transformed into another form, f2 (x0 , h), using the trigonometric identity
+
sin() sin() = 2 cos ( ) sin ( ).
2 2
Thus, f1 and f2 give the same values, in exact arithmetic, for any given argument values
x0 and h.
1. Derive f2 (x0 , h), which does no longer involve the difference of return values of
trigonometric functions.
2. Suggest a formula that avoids cancellation errors for computing the approximation
(f (x0 + h) f (x0 ))/h) of the derivative of f (x) = sin(x) at x = x0 . Write a
M ATLAB program that implements your formula and computes an approximation
of f (1.2), for h = 1 1020 , 1 1019 , , 1.
H INT: For background information refer to [1, Ex. 1.5.43].
3. Plot the error (in doubly logarithmic scale using M ATLABs loglog plotting func-
tion) of the derivative computed with the suggested formula and with the naive im-
plementation using f1 .
Solution: Check the M ATLAB implementation in Listing 10 and the plot in Fig. 3. We
can clearly observe that the computation using f1 leads to a big error as h 0. This
is due to the cancellation error given by the subtraction of two number of approximately
same magnitude. The second implementation using f2 is very stable and does not display
round-off errors.
5 % Derivative
8
6 g1 = ( s i n (x+h) - s i n (x)) ./ h; % Naive
7 g2 = 2 .* c o s (x + h * 0.5) .* s i n (h * 0.5) ./ h; % Better
8 ex = c o s (x); % Exact
9
10 % Plot
11 l o g l o g (h, abs(g1-ex), 'r',h, abs(g2-ex), 'b', h, h,
'k--');
12 t i t l e ('Error of the approximation of f''(x_0)');
13 l e g e n d ('g_1','g_2', 'O(h)');
14 x l a b e l ('h');
15 y l a b e l ('| f''(x_0) - g_i(x_0,h) |');
16 g r i d on
10 -10
10 -15
10 -20
10 -20 10 -15 10 -10 10 -5 10 0
h
9
Which of the two formulas is more suitable for numerical computation? Explain why, and
provide a numerical example in which the difference in accuracy is evident.
Solution: We immediately derive ln(x x2 1)+ln(x+ x2 1) = log(x2 (x2 1)) = 0.
As x the left log consists of subtraction of two numbers of equal magnitude, whilst
the right log consists on the addition of two numbers of approximately the same magnitude.
Therefore, in the first case there may be cancellation for large values of x, making it worse
for numerical computation. Try, in M ATLAB, with x = 108 .
(2c) For the following expressions, state the numerical difficulties that may occur,
and rewrite the formulas in a way that is more suitable for numerical computation.
1. x + x1 x x1 , where x 1.
2. 1
a2 + b12 , where a 0, b 1.
Solution:
1. Inside the square roots we have the addition (rest. subtraction) of a small number
to a big number. The difference of the square roots incur in cancellation, since they
have the same, large magnitude. A = x+ x1 , B = x x1 then (AB)(A+B)/(A+B) =
2/x
= x(x2 +1+
2
x2 1)
1
x+ x +
1
x x
2. 1
a2 becomes very large as a approaches 0, whilst b12 1 as b 1. Therefore, the
relative size of a12 and b12 becomes so big, that, in computer arithmetic, a12 + b12 = a12 .
On the other hand a1 1 + ( ab )2 avoids this problem by performing a division between
two numbers with very different magnitude, instead of a summation.
y = kron(A, B) x, (2)
10
(3a) Obtain further information about the kron command from M ATLAB help
issuing doc kron in the M ATLAB command window.
Solution: See M ATLAB help.
(3b) Explicitly write Eq. (2) in the form y = Mx (i.e. write down M), for A =
1 2 5 6
( ) and B = ( ).
3 4 7 8
5 6 10 12
7 8 14 16
Solution: y = x.
15 18 20 24
21 24 28 32
(3c) What is the asymptotic complexity ( [1, Def. 1.4.3]) of the M ATLAB code
(2)? Use the Landau symbol from [1, Def. 1.4.4] to state your answer.
Solution: kron(A,B) results in a matrix of size n2 n2 and x has length n2 . So the
complexity is the same as a matrix-vector multiplication for the resulting sizes. In total
this is O(n2 n2 ) = O(n4 ).
(3d) Measure the runtime of (2) for n = 23,4,5,6 and random matrices. Use the
M ATLAB functions tic and toc as explained in example [1, Ex. 1.4.10] of the Lecture
Slides.
Solution: Since kron(A,B) creates a large matrix consisting of smaller blocks with
size n, i.e. B multiplied with A(i, j), we can split the problem up in n matrix-vector
multiplications of size n. This results in a routine with complexity n O(n2 ) = O(n3 ) The
implementation is listed in 11. The runtimes are shown in Figure 4.
7 % check size of A
8 [n,m] = s i z e (A);
9 assert(n == m, 'expected quadratic matrix')
11
10
15 % init
16 y = z e r o s (n*n,1);
17 % loop first over columns and then (!) over rows
18 f o r j = 1:n
19 % reuse B * x(...) part (constant in given column) =>
O(n^2)
20 z = B*x((j-1)*n+1:j*n);
21 % add to result vector (need to go through full
vector) => O(n^2)
22 f o r i = 1:n
23 y((i-1)*n+1:i*n) = y((i-1)*n+1:i*n) + A(i,j)*z;
24 end
25 end
26 % Note: complexity is O(n^3)
27 end
(3e) Explain in detail, why (2) can be replaced with the single line of M ATLAB code
and compare the execution times of (2) and (3) for random matrices of size n = 23,4,5,6 .
Solution:
7 % check size of A
12
8 [n,m] = s i z e (A); assert(n == m, 'expected quadratic
matrix')
9
10 % init
11 yy = z e r o s (n,n);
12 xx = r e s h a p e (x,n,n);
13
13
20
14
55 a4 = sum(T(1:5,2)) / sum(T(1:5,1).^4);
56 f i g u r e ('name','kron timings');
57 l o g l o g (T(:,1),T(:,2),'m+', T(:,1),T(:,3),'ro',...
58 T(:,1),T(:,4),'bd', T(:,1),T(:,5),'gp',...
59 T(:,1),(T(:,1).^2)*a2,'k-',
T(:,1),(T(:,1).^3)*a3,'k--',...
60 T(1:5,1),(T(1:5,1).^4)*a4,'k-.', 'linewidth', 2);
61 x l a b e l ('{\bf problem size n}','fontsize',14);
62 y l a b e l ('{\bf average runtime (s)}','fontsize',14);
63 t i t l e ( s p r i n t f ('tic-toc timing averaged over %d runs',
nruns),'fontsize',14);
64 l e g e n d ('slow evaluation','efficient evaluation',...
65 'efficient ev. with reshape','Kevin 1-line',...
66 'O(n^2)','O(n^3)','O(n^4)','location','northwest');
67 p r i n t -depsc2 '../PICTURES/kron_timings.eps';
(3f) Based on the E IGEN numerical library ( [1, Section 1.2.3]) implement a C++
function
t e m p l a t e < c l a s s Matrix>
v o i d kron( c o n s t Matrix & A, c o n s t Matrix & B, Matrix &
C) {
// Your code here
}
returns the Kronecker product of the argument matrices A and B in the matrix C.
H INT: Feel free (but not forced) to use the partial codes provided in kron.cpp as well as
the CMake file CMakeLists.txt (including cmake-modules) and the timing header
file timer.h .
Solution: See kron.cpp or Listing 14.
15
Vector & x, Vector & y);
(3h) Now, using a function definition similar to that of the previous sub-problem,
implement the C++ equivalent of (3) in the function kron_mv_fast.
H INT: Study [1, Rem. 1.2.23] about reshaping matrices in E IGEN.
(3i) Compare the runtimes of your two implementations as you did for the M ATLAB
implementations in sub-problem (3e).
Solution:
16
22 //! \brief Compute the Kronecker product C = A B .
Exploit matrix -vector product.
23 //! A,B and x must have dimension n \times n resp. n
24 //! \param[in] A Matrix n n
25 //! \param[in] B Matrix n n
26 //! \param[in] x Vector of dim n
27 //! \param[out] y Vector y = kron(A,B) * x
28 t e m p l a t e < c l a s s M a t r i x , c l a s s Vector >
29 void kron_fast ( const Matrix & A, const Matrix & B, const
Vector & x , Vector & y )
30 {
31 y = V e c t o r : : Zero (A . rows ( ) * B . rows ( ) ) ;
32
33 u n s i g n e d i n t n = A . rows ( ) ;
34 f o r ( u n s i g n e d i n t j = 0 ; j < A . c o l s ( ) ; ++ j ) {
35 V e c t o r z = B * x . segment ( j * n , n ) ;
36 f o r ( u n s i g n e d i n t i = 0 ; i < A . rows ( ) ; ++ i ) {
37 y . segment ( i * n , n ) += A( i , j ) * z ;
38 }
39 }
40 }
41
17
54 M a t r i x t = B * M a t r i x : : Map( x . data ( ) , n , n ) *
A. transpose ( ) ;
55 y = M a t r i x : : Map( t . data ( ) , n * n , 1 ) ;
56 }
57
58 i n t main ( v o i d ) {
59
74 kron_fast (A,B, x , y ) ;
75 s t d : : c o u t << " U s i n g k r o n _ f a s t : y = " << s t d : : e n d l <<
y << s t d : : e n d l ;
76 kron_super_fast (A,B, x , y ) ;
77 s t d : : c o u t << " U s i n g k r o n _ s u p e r _ f a s t : y = " <<
s t d : : e n d l << y << s t d : : e n d l ;
78
18
85 tm_kron . r e s e t ( ) ;
86 tm_kron_fast . reset ( ) ;
87 tm_kron_super_fast . reset ( ) ;
88 f o r ( u n s i g n e d i n t r = 0 ; r < r e p e a t s ; ++ r ) {
89 u n s i g n e d i n t M = pow ( 2 , p ) ;
90 A = Eigen : : MatrixXd : : Random (M,M) ;
91 B = Eigen : : MatrixXd : : Random (M,M) ;
92 x = Eigen : : VectorXd : : Random (M*M) ;
93
100 tm_kron_fast . s t a r t ( ) ;
101 kron_fast (A,B, x , y ) ;
102 tm_kron_fast . stop ( ) ;
103
104 tm_kron_super_fast . s t a r t ( ) ;
105 kron_super_fast (A,B, x , y ) ;
106 tm_kron_super_fast . stop ( ) ;
107 }
108
19
115 }
116
117 f o r ( a u t o i t = t im e s _ k r o n . begin ( ) ; i t ! =
t i m e s _k r o n . end ( ) ; ++ i t ) {
118 s t d : : c o u t << * i t << " " ;
119 }
120 s t d : : c o u t << s t d : : e n d l ;
121 f o r ( a u t o i t = t i m e s _ k r o n _ f a s t . begin ( ) ; i t ! =
t i m e s _ k r o n _ f a s t . end ( ) ; ++ i t ) {
122 s t d : : c o u t << * i t << " " ;
123 }
124 s t d : : c o u t << s t d : : e n d l ;
125 f o r ( a u t o i t = t i m e s _ k r o n _ s u p e r _ f a s t . begin ( ) ; i t ! =
t i m e s _ k r o n _ s u p e r _ f a s t . end ( ) ; ++ i t ) {
126 s t d : : c o u t << * i t << " " ;
127 }
128 s t d : : c o u t << s t d : : e n d l ;
129 }
(4a) What is the asymptotic complexity (for n ) of the evaluation of the M AT-
LAB command displayed above, with respect to the problem size parameter n?
Solution: Matrixvector multiplication: quadratic dependence O(n2 ).
function y = multAmin(x)
that computes the same multiplication as (4) but with a better asymptotic complexity with
respect to n.
20
tic-toc timing averaged over 3 runs
10 2
10 1
10 0
10 -1
average runtime (s)
10 -2
10 -3
10 -4
ML, slow evaluation
ML, efficient evaluation
ML, efficient ev. with reshape
10 -5 ML, oneliner
Cpp, slow
Cpp fast
Cpp superfast
10 -6 O(n 2 )
O(n 3 )
O(n 4 )
10 -7
10 0 10 1 10 2 10 3 10 4
problem size n
H INT: you can test your implementation by comparing the returned values with the ones
obtained with code (4).
Solution: For every j we have yj = jk=1 kxk + j nk=j+1 xk , so we pre-compute the two
terms for every j only once.
8 v(1) = x(n);
9 w(1) = x(1);
21
10 f o r j = 2:n
11 v(j) = v(j-1)+x(n+1-j);
12 w(j) = w(j-1)+j*x(j);
13 end
14 f o r j = 1:n-1
15 y(j) = w(j) + v(n-j)*j;
16 end
17 y(n) = w(n);
18
(4c) What is the asymptotic complexity (in terms of problem size parameter n) of
your function multAmin?
Solution: Linear dependence: O(n).
(4d) Compare the runtime of your implementation and the implementation given in
(4) for n = 25,6,...,12 . Use the routines tic and toc as explained in example [1, Ex. 1.4.10]
of the Lecture Slides.
Solution:
The matrix multiplication in (4) has runtimes growing with O(n2 ). The runtimes of
the more efficient implementation with hand-coded loops, or using the M ATLABfunction
cumsum are growing with O(n).
22
11
19 % timing multAmin
20 tic;
21 f o r k=1:nruns
22 y = multAmin(x);
23 end
24 ts(j,2) = t o c /nruns;
25
26 % timing multAmin2
27 tic;
28 f o r k=1:nruns
29 y = multAmin2(x);
30 end
31 ts(j,3) = t o c /nruns;
32 end
33
34 c1 = sum(ts(:,2)) / sum(ns);
35 c2 = sum(ts(:,1)) / sum(ns.^2);
36
45 p r i n t -depsc2 '../PICTURES/multAmin_timings.eps';
23
tic-toc timing averaged over 10 runs
10 0
Matlab naive
Matlab multAmin
Matlab multAmin2
10 -1 Cpp naive
Cpp naive (with loops)
Cpp fast
O(n)
O(n 2 )
10 -2
10 -3
runtime (s)
10 -4
10 -5
10 -6
10 -7
10 1 10 2 10 3 10 4
problem size n
(4e) Can you solve task (4b) without using any for- or while-loop?
Implement it in the function
function y = multAmin2(x)
24
(4f) Consider the following M ATLABscript multAB.m:
(4g) Run the code of Listing 18 several times and conjecture a relationship between
the matrices A and B from the output. Prove your conjecture.
H INT: You must take into account that computers inevitably commit round-off errors, see
[1, Section 1.5].
Solution: It is easy to verify with M ATLAB(or to prove) that B = A1 .
For 2 j n 1, we obtain:
n
(AB)i,j = ai,k bk,j = ai,j1 bj1,j + 2ai,j bj,j + ai,j+1 bj+1,j
k=1
i + 2i i = 0 if i < j,
= min(i, j 1) + 2 min(i, j) min(i, j + 1) = (i 1) + 2i i = 1 if i = j,
(j 1) + 2j (j + 1) = 0 if i > j.
25
(4h) Implement a C++ function with declaration
that realizes the efficient version of the M ATLAB line of code (4). Test your function by
comparing with output from the equivalent M ATLAB functions.
Solution:
3 #include <iostream>
4 #include <vector>
5
6 #include "timer.h"
7
26
23 unsigned int n = x. s i z e ();
24
25 Eigen::MatrixXd A(n,n);
26
44 v(0) = x(n-1);
45 w(0) = x(0);
46
57 int main(void) {
58 // Build Matrix B with 10x10 dimensions such that B
= i n v (A)
27
59 unsigned int n = 10;
60 Eigen::MatrixXd B = Eigen::MatrixXd::Zero(n,n);
61 f o r (unsigned int i = 0; i < n; ++i) {
62 B(i,i) = 2;
63 i f (i < n-1) B(i+1,i) = -1;
64 i f (i > 0) B(i-1,i) = -1;
65 }
66 B(n-1,n-1) = 1;
67 s t d ::cout << "B = " << B << s t d ::endl;
68
89 tm_slow.start();
90 multAminSlow(x, y);
91 tm_slow.stop();
92
28
93 tm_slow_loops.start();
94 multAminLoops(x, y);
95 tm_slow_loops.stop();
96
97 tm_fast.start();
98 multAmin(x, y);
99 tm_fast.stop();
100 }
101 times_slow.push_back( tm_slow.avg().count() );
102 times_slow_loops.push_back(
tm_slow_loops.avg().count() );
103 times_fast.push_back( tm_fast.avg().count() );
104 }
105
Pow(A, k)
that, using only basic linear algebra operations (including matrix-vector or matrix-matrix
multiplications), computes efficiently the k th power of the n n matrix A.
29
H INT: use the M ATLAB operator to test your implementation on random matrices A.
H INT: use the M ATLAB functions de2bi to extract the binary digits of an integer.
j=0 bj 2 , bj {0, 1}. Then
Solution: Write k in binary format: k = M j
M
j j
Ak = A 2 bj
= A2 .
j=0 j s.t. bj =1
M
We compute A, A2 , A4 , . . . , A2 (one matrix-matrix multiplication each) and we mul-
tiply only the matrices A2 such that bj 0.
j
7 % transform k in basis 2
8 bin_k = de2bi(k) ;
9 M = l e n g t h (bin_k);
10 X = e y e ( s i z e (A));
11
12 f o r j = 1:M
13 i f bin_k(j) == 1
14 X = X*A;
15 end
16 A = A*A; % now A{new} = A{initial} ^(2^j)
17 end
(5b) Find the asymptotic complexity in k (and n) taking into account that in M AT-
LAB a matrix-matrix multiplication requires a O(n3 ) effort.
Solution: Using the simplest implementation:
30
Using the efficient implementation from Listing 20, for each j {1, 2, . . . , log2 (k)} we
have to perform at most two multiplications (X A and A A):
(5c) Plot the runtime of the built-in M ATLAB power () function and find out the
complexity. Compare it with the function Pow from (5a).
Use the matrix
1 2i jk
Aj,k = exp ( )
n n
to test the two functions.
Solution:
8 f o r i = 1: l e n g t h (nn)
9 n = nn(i);
10
15 f o r j=1: l e n g t h (kk)
16 k = kk(j);
17 tic
18 f o r run = 1:nruns
19 X = Pow(A, k);
20 end
21 tt1(i,j) = t o c ;
31
22
23 tic
24 f o r run = 1:nruns
25 XX = A^k;
26 end
27 tt2(i,j) = t o c ;
28 n_k_err = [n, k, max(max(abs(X-XX)))]
29 end
30
31 end
32
33 f i g u r e ('name','Pow timings');
34 s u b p l o t (2,1,1)
35 n_sel=6; %plot in k only for a selected n
36 % expected logarithmic dep. on k, semilogX used:
37 s e m i l o g x (kk,tt1(n_sel,:),'m+',
kk,tt2(n_sel,:),'ro',...
38 kk,sum(tt1(n_sel,:))* l o g (kk)/( l e n g t h (kk)* l o g (k)),
'linewidth', 2);
39 x l a b e l ('{\bf power k}','fontsize',14);
40 y l a b e l ('{\bf average runtime (s)}','fontsize',14);
41 t i t l e ( s p r i n t f ('tic-toc timing averaged over %d runs,
matrix size = %d',...
42 nruns, nn(n_sel)),'fontsize',14);
43 l e g e n d ('our implementation','Matlab built-in',...
44 'O(C log(k))','location','northwest');
45
46 s u b p l o t (2,1,2)
47 k_sel = 35; %plot in n only for a selected k
48 l o g l o g (nn, tt1(:,k_sel),'m+', nn,
tt2(:,k_sel),'ro',...
49 nn, sum(tt1(:,k_sel))* nn.^3/sum(kk.^3),
'linewidth', 2);
50 x l a b e l ('{\bf dimension n}','fontsize',14);
51 y l a b e l ('{\bf average runtime (s)}','fontsize',14);
52 t i t l e ( s p r i n t f ('tic-toc timing averaged over %d runs,
power = %d',...
53 nruns, kk(k_sel)),'fontsize',14);
32
54 l e g e n d ('our implementation','Matlab built-in',...
55 'O(n^3)','location','northwest');
56 p r i n t -depsc2 'Pow_timings.eps';
our implementation
0.008
Matlab built-in
0.006 O(C log(k))
0.004
0.002
0
10 0 10 1 10 2
tic-toc timing
-3 power
averaged over 10 kruns, power = 36
10
average runtime (s)
8
6 our implementation
Matlab built-in
4 O(n 3 )
30 35 40 45 50 55 60
dimension n
Figure 6: Timings for Problem 5
The M ATLAB-function has (at most) logarithmic complexity in k but the timing is slightly
better than our implementation.
All the eigenvalues of the Vandermonde matrix A have absolute value 1, so the powers Ak
are stable: the eigenvalues of Ak are not approaching neither 0 nor when k grows.
(5d) Using E IGEN, devise a C++ function with the calling sequence
33
that computes the k th power of the square matrix A (passed in the argument A). Of course,
your implementation should be as efficient as the M ATLAB version from sub-problem (5a).
H INT: matrix multiplication suffers no aliasing issues (you can safely write A = A*A).
H INT: feel free to use the provided matPow.cpp.
H INT: you may want to use log and ceil.
H INT: E IGEN implementation of power (A.pow(k)) can be found in:
#include <unsupported/Eigen/MatrixFunctions>
Solution:
34
18 i f ( ( k & p ) == 0 ) {
19 X = X*A;
20 }
21
22 A = A*A;
23 p = p << 1;
24 }
25 A = X;
26 }
27
28 int main(void) {
29 // Check/Test with provided, complex, matrix
30 unsigned int n = 3; // s i z e of matrix
31 unsigned int k = 9; // power
32
36 Eigen::MatrixXcd A(n,n);
37
49 // Output results
50 s t d ::cout << "A = " << A << s t d ::endl;
51 s t d ::cout << "Eigen:" << s t d ::endl << "A^" << k << "
= " << A.pow(k) << s t d ::endl;
35
52 matPow(A, k);
53 s t d ::cout << "Ours:" << s t d ::endl << "A^" << k << "
= " << A << s t d ::endl;
54 }
H INT: Give the command doc eig in M ATLAB to understand what eig does.
H INT: You may use that eig applied to an n n-matrix requires an asymptotic computa-
tional effort of O(n3 ) for n .
H INT: in M ATLAB, the function diag(x) for x Rn , builds a diagonal, n n matrix
with x as diagonal. If M is a n n matrix, diag(M) returns (extracts) the diagonal of M
as a vector in Rn .
H INT: the operator v.^k for v Rn and k N {0} returns the vector with components
vik (i.e. component-wise exponent)
Ak = (SDS1 )k = SDk S1 ,
36
(6b) Fix k N. Discuss (in detail) the asymptotic complexity of getit n .
Solution: The algorithm comprises the following operations:
The complexity of the algorithm is dominated by the operations with higher exponent.
Therefore the total complexity of the algorithm is O(n3 ) for n .
37
Prof. R. Hiptmair AS 2015
ETH Zrich
G. Alberti,
D-MATH
F. Leonardi Numerical Methods for CSE
Problem Sheet 2
You should try to your best to do the core problems. If time permits, please try to do the
rest as well.
1
as desired.
In the sequel let vec(M) Rn denote the column vector obtained by reinterpreting the
2
internal coefficient array of a matrix M Rn,n stored in column major format as the
data array of a vector with n2 components. In MATLAB, vec(M) would be the column
vector obtained by reshape(M,n*n,1) or by M(:). See [1, Rem. 1.2.23] for the
implementation with Eigen.
Problem (5) is equivalent to a linear system of equations
Cvec(X) = b (6)
with system matrix C Rn and right hand side vector b Rn .
2 ,n2 2
(1b) Refresh yourself on the notion of sparse matrix, see [1, Section 1.7] and, in
particular, [1, Notion 1.7.1], [1, Def. 1.7.3].
(1d) Use the Kronecker product to find a general expression for C in terms of a
general A.
Solution: We have C = I A + A I. The first term is related to AX, the second to
XAT .
that returns the matrix C from (6) when given a square matrix A. (The function kron
may be used.)
2
Solution: See Listing 24.
3 f u n c t i o n C = buildC(A)
4
5 n = s i z e (A);
6 I = e y e (n);
7 C = kron(A,I) + kron(I,A);
(1f) Give an upper bound (as sharp as possible) for nnz(C) in terms of nnz(A). Can
C be legitimately regarded as a sparse matrix for large n even if A is dense?
H INT: Run the following MATLAB code:
n=4;
A=sym(A,[n,n]);
I=eye(n);
C=buildC(A)
Solution: Note that, for general matrices A and B we have nnz(AB) = nnz(A)nnz(B).
This follows from the fact that the block in position (i, j) of the matrix A B is aij B. In
our case, we immediately obtain
nnz(C) = nnz(I A + A I) nnz(I A) + nnz(A I) 2nnz(I)nnz(A),
namely
nnz(C) 2nnnz(A).
The optimality of this bound can be checked by taking the matrix A = [ 01 10 ].
This bound says that, in general, even if A is not sparse, we have nnz(A) 2n3 n4 .
Therefore, C can be regarded as a sparse matrix for any A.
3
(1h) Validate the correctness of your C++ implementation of buildC by comparing
with the equivalent Matlab function for n = 5 and
10 2 3 4 5
6 20 8 9 1
A = 1 2 30 4 5 .
6 7 8 20 0
1 2 3 4 10
R v
A=[ ], (7)
uT 0
4
(2a) Give a necessary and sufficient condition for the triangular matrix R to be
invertible.
Solution: R being upper triangular det(R) = ni=0 (R)i,i , means that all the diagonal
elements must be non-zero for R to be invertible.
for computing the solution of Ax = b (with A as in (7)) efficiently. Perform size check on
input matrices and vectors.
H INT: Use the decomposition from (2b).
H INT: you can rely on the triangularView() function to instruct E IGEN of the tri-
angular structure of R, see [1, Code 1.2.14].
H INT: using the construct:
5
t y p e d e f typename Matrix::Scalar Scalar;
you can obtain the scalar type of the Matrix type (e.g. double for MatrixXd). This
can then be used as:
Scalar a = 5;
H INT: using triangularView and templates you may incur in weird compiling errors.
If this happens to you, check http://eigen.tuxfamily.org/dox/TopicTemplateKeyword.html
H INT: sometimes the C++ keyword auto (only in std. C++11) can be used if you do not
want to explicitly write the return type of a function, as in:
MatrixXd a;
a u t o b = 5*a;
6
2 a1 0 . . . . . . . . . 0
0 2 a2 0 . . . . . . 0
b1 0
A = 0 b2 Rn,n
0 0
an1
0 0 . . . 0 bn2 0 2
with ai , bi R.
Remark. The matrix A is an instance of a banded matrix, see [1, Section 1.7.6] and, in
particular, the examples after [1, Def. 1.7.53]. However, you need not know any of the
content of this section for solving this problem.
1 t e m p l a t e < c l a s s Vector>
2 v o i d multAx( c o n s t Vector & a, c o n s t Vector & b, c o n s t
Vector & x, Vector & y);
7
ai + bi2 2, unless x = const. (in which case Ax 0, as we see from the first equation).
By contradiction ker A = {0}.
1 t e m p l a t e < c l a s s Vector>
2 v o i d solvelseAupper( c o n s t Vector & a, c o n s t Vector &
r, Vector & x);
solving Ax = r.
Solution: See banded_matrix.cpp.
1 t e m p l a t e < c l a s s Vector>
2 v o i d solvelseA( c o n s t Vector & a, c o n s t Vector & b,
c o n s t Vector & r, Vector & x);
that computes the solution of Ax = r by means of Gaussian elimination. You cannot use
any high level solver routines of E IGEN.
H INT: Thanks to the constraint ai , bi [0, 1], pivoting is not required in order to ensure
stability of Gaussian elimination. This is asserted in [1, Lemma 1.8.9], but you may just
use this fact here. Thus, you can perform a straightforward Gaussian elimination from top
to bottom as you have learned it in your linear algebra course.
Solution: See banded_matrix.cpp.
(3f) Implement solvelseAEigen as in (3d), this time using E IGENs sparse elim-
ination solver.
8
H INT: The standard way of initializing a sparse E IGEN-matrix efficiently, is via the triplet
format as discussed in [1, Section 1.7.3]. You may also use direct initialization of a sparse
matrix, provided that you reserve() enough space for the non-zero entries of each
column, see documentation.
Solution: See banded_matrix.cpp.
1 f u n c t i o n X = solvepermb(A,b)
2 [n,m] = s i z e (A);
3 i f ((n numel(b)) || (m numel(b))), e r r o r ('Size
mismatch'); end
4 X = [];
5 f o r l=1:n
6 X = [X,A\b];
7 b = [b(end);b(1:end-1)];
8 end
(4b) Port the M ATLAB function solvepermb to C++ using E IGEN. (This means
that the C++ code should perform exactly the same computations in exactly the same
order.)
Solution: See file solvepermb.cpp.
9
Issue date: 24.09.2015
Hand-in: 01.10.2015 (in the boxes in front of HG G 53/54).
10
Prof. R. Hiptmair AS 2015
ETH Zrich
G. Alberti,
D-MATH
F. Leonardi Numerical Methods for CSE
Problem Sheet 3
1
(1b) What is the asymptotic complexity of the loop body of the function rankoneinvit?
More precisely, you should look at the asymptotic complexity of the code in the lines 8-12
of Listing 25.
Solution: The total asymptotic complexity is dominated by the solution of the linear sys-
tem with matrix M done in line 10, which has asymptotic complexity of O(n3 ).
(1c) Write an efficient implementation in E IGEN of the loop body, possibly with
optimal asymptotic complexity. Validate it by comparing the result with the other imple-
mentation in E IGEN.
H INT: Take the clue from [1, Code 1.6.114].
Solution: See file rankoneinvit.cpp.
(1d) What is the asymptotic complexity of the new version of the loop body?
Solution: The loop body of the C++ function rankoneinvit_fast only consists in
vector-vector multiplications, and so the asymptotic complexity is O(n).
(1e) Tabulate the runtimes of the two inner loops of the C++ implementations with
different vector sizes n = 2k , k = 1, 2, 3, . . . , 9. Use, as test vector
Eigen::VectorXd::LinSpaced(n,1,2)
How can you read off the asymptotic complexity from these data?
H INT: Whenever you provide figure from runtime measurements, you have to specify the
operating system and compiler (options) used.
Solution: See file rankoneinvit.cpp.
2
4 end
(2a) Explain why the function given in Listing 26 may not give a good approximation
of the hyperbolic sine for small values of x, and compute the relative error
sinh_unstable(x) sinh(x)
sinh(x)
with Matlab for x = 10k , k = 1, 2, . . . , 10 using as exact value the result of the M ATLAB
built-in function sinh.
Solution: As x 0, the terms t and 1/t become close to each other, thereby creating
cancellations errors in y. For x = 103 , the relative error computed with Matlab is 6.2
1014 .
(2b) Write the Taylor expansion of length m around x = 0 of the function ex and
also specify the remainder.
Solution: Given m N and x R, there exists x [0, x] such that
m
xk ex xm+1
ex = + (8)
k=0 k! (m + 1)!
(2c) Prove that for every x 0 the following inequality holds true:
sinh x x. (9)
(2d) Based on the Taylor expansion, find an approximation for sinh(x), with 0 x
103 , so that the relative approximation error is smaller than 1015 .
Solution: The idea is to use the Taylor expansion given in (8). Inserting this identity in
the definition of the hyperbolic sine yields
ex ex 1 m xk ex xm+1 + ex (x)m+1
sinh(x) = = (1 (1)k ) + .
2 2 k=0 k! 2(m + 1)!
The parameter m gives the precision of the approximation, since (m + 1)! 0 as m .
We will choose it later to obtain the desired tolerance. Since 1 (1)k = 0 if k is even, we
3
set m = 2n for some n N to be chosen later. From the above expression we obtain the
new approximation given by
1 m xk n1 x2j+1
yn = (1 (1)k ) = ,
2 k=0 k! j=0 (2j + 1)!
with remainder
ex x2n+1 ex x2n+1 (ex ex )x2n+1
yn sinh(x) = = .
2(2n + 1)! 2(2n + 1)!
Therefore, by (9) and using the obvious inequalities ex ex and ex ex , the relative
error can be bounded by
yn sinh(x) ex x2n
.
sinh(x) (2n + 1)!
Calculating the right hand sides with M ATLAB for n = 1, 2, 3 and x = 103 we obtain
1.7 107 , 8.3 1015 and 2.0 1022 , respectively.
In conclusion, y3 gives a relative error below 1015 , as required.
(3a) In section [1, 1.7.6] you saw how a matrix can be stored in triplet (or coor-
dinate) list format. This format stores a collection of triplets (i, j, v) with i, j N, i, j 0
(the indices) and v K (the value at (i, j)). Repetitions of i, j are allowed, meaning that
the values at the same indices i, j must be summed together in the final matrix.
1
You may have a look at http://www.cplusplus.com/doc/tutorial/classes/.
4
Define a suitable structure:
that stores a matrix of type scalar in COO format. You can store sizes and indices in
std::size_t predefined type.
H INT: Store the rows and columns of the matrix inside the structure.
H INT: You can use a std::vector<your_type> to store the collection of triplets.
H INT: You can define an auxiliary structure Triplet containing the values i, j, v (with
the appropriate types), but you may also use the type Eigen::Triplet<double>.
(3b) Another format for storing a sparse matrix is the compressed row storage (CRS)
format (have a look at [1, Ex. 1.7.9]).
Remark. Here, we are not particularly strict about the compressed attribute, meaning that
you can store your data in std::vector. This may waste some memory, because the
std::vector container adds a padding at the end of is data that allows for push_back
with amortized O(1) complexity.
Devise a suitable structure:
5
TripletMatrix<scalar>::densify();
Eigen::Matrix<scalar, Eigen::Dynamic, Eigen::Dynamic>
CRSMatrix<scalar>::densify();
for the structure TripletMatrix and CRSMatrix that convert your matrices struc-
tures to E IGEN dense matrices types. This can be helpful in debugging your code.
that converts a matrix T in COO format to a matrix C in CRS format. Try to be as efficient
as possible.
H INT: The parts of the column indices vector in CRS format that correspond to indicd-
ual rows of the matrix must be ordered and without repetitions, whilst the triplets in the
input may be in arbitrary ordering and with repetitions. Take care of those aspects in you
function definition.
H INT: If you use a std::vector container, have a look at the function std::sort or
at the functions std::lower_bound and std::insert (both lead to a valid function
with different complexities). Look up their precise definition and specification in a C++11
reference.
H INT: You may want to sort a vector containing a structure with multiple values using a
particular ordering (i.e. define a custom ordering on your structure and sort according to
this ordering). In C++, the standard function std::sort provides a way to sort a vector
of type std::vector<your_type> by defining a your_type member operator:
that returns true if *this is less than other in your particular ordering. Sorting is then
performed according to this ordering.
(3e) What is the worse case complexity of your function (in the number of triplets)?
6
H INT: Make appropriate assumptions.
H INT: If you use the C++ standard library functions, look at the documentation: there
you can find the complexity of basic container operations.
Solution: In tripletToCTS.cpp, we present two alteratives: pushing back the colum-
n/value pairs to an unsorted vector and inserting into a sorted vector at the right position.
Let k be the number of triplets. The first method has k push_back amortized O(1) op-
erations and a sort of a vector of at most k entries (i.e. O(k log k) complexity). The second
methods inserts k triplets using two O(k) operation, for a complexity of O(k 2 ). However,
if there are many repeated triplets with the same index of if you are adding triplets to an
already defined matrix, the second method could prove to be less expensive than the first
one.
(4a) Learn about M ATLABs way of describing triangulations by two vectors and one
so-called triangle-node incidence matrix from [1, Ex. 1.7.21] or the documentation of the
M ATLAB function triplot.
(4b) (This problem is inspired by the dreary reality of software development, where
one is regularly confronted with undocumented code written by somebody else who is no
longer around.)
Listing 27 lists an uncommented M ATLAB code, which takes the triangle-node incidence
matrix of a planar triangulation as input.
Describe in detail what is the purpose of the function processmesh defined in the file
and how exactly this is achieved. Comment the code accordingly.
7
Listing 27: An undocumented M ATLAB function extracting some information from a tri-
angulation given in M ATLAB format
1 f u n c t i o n [E,Eb] = processmesh(T)
2 N = max(max(T)); M = s i z e (T,1);
3 T = s o r t (T')';
Listing 28: Another undocumented function for extracting specific information from a
planar triangulation
1 f u n c t i o n ET = getinfo(T,E)
2 % Another creative us of 'sparse '
3 L = s i z e (E,1); A = s p a r s e (E(:,1),E(:,2),(1:L)',L,L);
4 ET = [];
5 f o r tri=T'
6 Eloc = f u l l (A(tri,tri)); Eloc = Eloc + Eloc';
7 ET = [ET; Eloc([8 7 4])];
8 end
(4d) In [1, Def. 1.7.33] you saw the definition of a regular refinement of a triangular
mesh. Write a M ATLAB-function:
8
function [x_ref, y_ref, T_ref] = refinemesh(x,y,T)
that takes as argument the data of a triangulation in M ATLAB format and returns the cor-
responding data for the new, refined mesh.
Solution: See refinemesh.m.
(4e) [1, Eq. (1.7.29)][1, Eq. (1.7.30)] describe the sparse linear system of equa-
tions satisfied by the coordinates of the interior nodes of a smoothed triangulation. Justify
rigorously, why the linear system of equations [1, Eq. (1.7.32)] always has a unique solu-
tion. In other words, show that that the part Aint of the matrix of the combinatorial graph
Laplacian associated with the interior nodes is invertible for any planar triangulation.
H INT: Notice that Aint is diagonally dominant ( [1, Def. 1.8.8]).
This observation paves the way for using the same arguments as for sub-problem (3b) of
Problem 3 You may also appeal to [1, Lemma 1.8.12].
Solution: Consider ker Aint . Notice Aint is (non strictly, i.e. weakly) diagonally dom-
inant. However, since there is at least one boundary node, the solution vector cannot be
constant (the boundary node is connected to at least one interior note, for which the corre-
sponding row in the matrix does not sum to 0). Hence, the matrix is invertible.
that performs this transformation to the mesh defined by x, y, T . Return the column vectors
xs, ys with the new position of the nodes.
H INT: Use the system of equations in [1, (1.7.32)].
Solution: See smoothmesh.m.
9
a low-rank update according to [1, Eq. (1.6.108)], provided that the setup phase of an
elimination ( [1, 1.6.42]) solver has already been done for the system matrix.
In this problem, we examine the concrete application from [1, Ex. 1.6.115], where the
update formula is key to efficient implementation. This application is the computation of
the impedance of the circuit drawn in Figure 7 as a function of a variable resistance of a
single circuit element.
1
R 2 R 3
R 4
R R
R
R R
R
5 R 14 Rx 15 R 6 R 16
R
R
R
R
U ~
~
R
7 17 8 R 9
6
R
R
R
R
R R
10 R 11 R 12 R 13
(5a) Study [1, Ex. 1.6.3] that explains how to compute voltages and currents in a
linear circuit by means of nodal analysis. Understand how this leads to a linear system
of equations for the unknown nodal potentials. The fundamental laws of circuit analysis
should be known from physics as well as the principles of nodal analysis.
(5b) Use nodal analysis to derive the linear system of equations satisfied by the nodal
potentials of the circuit from Figure 7. The voltage W is applied to node #16 and node
#17 is grounded. All resistors except for the controlled one (colored magenta) have the
same resistance R. Use the numbering of nodes indicated in Figure 7.
H INT: Optionally, you can make the computer work for you and find a fast way to build
a matrix providing only the essential data. This is less tedious, less error prone and more
flexible than specifying each entry individually. For this you can use auxiliary data struc-
tures.
10
Solution: We use the Kirchhoffs first law (as in [1, Ex. 1.6.3]), stating that the sum the
currents incident to a node is zero. Let W R17 be the vector of voltages. Set f (Rx ) =
R/Rx . We rescale each sum multiplying by R. Let us denote by Wi,j = (Wi Wj ). The
system for each node i = 1, . . . , 15 becomes:
W1,2 + W1,5 =0
W2,1 + W2,3 + W2,5 + W2,14 =0
W3,2 + W3,4 + W3,15 =0
W4,3 + W4,6 + W4,15 =0
W5,1 + W5,2 + W5,7 + W5,14 =0
W6,4 + W6,9 + W6,15 + W6,16 =0
W7,5 + W7,10 + W7,11 + W7,17 =0
W8,9 + W8,12 + W8,13 + W8,5 =0 (10)
W9,6 + W9,8 + W9,13 =0
W10,7 + W10,11 =0
W11,7 + W11,10 + W11,12 + W11,17 =0
W12,8 + W12,11 + W12,13 =0
W13,8 + W13,9 + W13,12 =0
W14,2 + W14,5 + W14,17 + f (Rx )W14,15 =0
W15,3 + W15,4 + W15,6 + W15,8 + f (Rx )W15,14
=0
with the extra condition W16 = W, W17 = 0. We now have to obtain the system matrix.
The system is rewritten in the following matrix notation (with C R15,17 ):
11
with W = [W, W16 , W17 ], and with:
2 1 0 0 1 0 0 0 0 0 0 0 0 0 0
1 4 1 0 1 0 0 0 0 0 0 0 0 1 0
0 1 3 1 0 0 0 0 0 0 0 0 0 0 1
0 0 1 3 0 1 0 0 0 0 0 0 0 0 1
1 1 0 0 4 0 1 0 0 0 0 0 0 1 0
0 0 0 1 0 4 0 0 1 0 0 0 0 0 1
0 0 0 0 1 0 4 0 0 1 1 0 0 0
0
0 0 0 0 0 0 4 1 0 0 1 1 1
A(Rx ) = 0 0
0 0 0 0 0 1 0 1 3 0 0 0 1 0 0
0 0 0 0 0 0 1 0 0 2 1 0 0 0
0
0 0 0 0 0 0 1 0 0 1 4 1 0 0
0
0 0 0 0 0 0 0 1 0 0 1 3 1 0 0
0 0 0 0 0 0 0 1 1 0 0 1 3 0
0
0 1 0 0 1 0 0 0 0 0 0 0 0 3 + Rx RRx
R
0 0 1 1 0 1 0 1 0 0 0 0 0 RRx 4 + RRx
(5c) Characterize the change in the circuit matrix derived in sub-problem (5b) in-
duced by a change in the value of Rx as a low-rank modification of the circuit matrix. Use
as a base state R = Rx .
H INT: Four entries of the circuit matrix will change. This amounts to a rank-2-modification
in the sense of [1, Eq. (1.6.108)] with suitable matrices u and v.
Solution: The matrices
0 0
U= =V
1 1
1 1
are such that A(R) defined as A(Rx ) allows to write:
R
A(Rx ) = A(R) UV (1 ) /2.
Rx
12
Therefore, if we already have the factorization of A0 , we can use the SMW formula for
the cheap inversion of A(Rx ).
c l a s s ImpedanceMap {
public:
ImpedanceMap( d o u b l e R_, d o u b l e W_) : R(R_), W(W_) {
// TODO: build A0 = A(1), the rhs and factorize A_0
with lu = A0.lu()
};
d o u b l e o p e r a t o r () ( d o u b l e Rx) c o n s t {
// TODO: compute the perturbation matrix U and
solve (A + U U T ) x = rhs, from x, U and R compute
the impedance
};
private:
Eigen::PartialPivLU<Eigen::MatrixXd>> lu;
Eigen::VectorXd rhs;
d o u b l e R, W;
};
whose ()-operator returns the impedance of the circuit from Figure 7 when supplied with
a concrete value for Rx . Of course, this function should be implemented efficiently using
[1, Lemma 1.6.113]. The setup phase of Gaussian elimination should be carried out in the
constructor performing the LU-factorization of the circuit matrix.
Test your class using R = 1, W = 1 and Rx = 1, 2, 4, , 1024.
H INT: See the file impedancemap.cpp.
H INT: The impedance of the circuit is the quotient of the voltage at the input node #16
and the current through the voltage source.
13
Prof. R. Hiptmair AS 2015
ETH Zrich
G. Alberti,
D-MATH
F. Leonardi Numerical Methods for CSE
Problem Sheet 4
(1a) Guess the maximal order of convergence of the method from a numerical exper-
iment conducted in MATLAB.
1
4 f o r k=2:20, e(k+1) = e(k)* s q r t (e(k-1)); end
5 le = l o g (e); d i f f (le(2:end))./ d i f f (le(1:end-1)),
(1b) Find the maximal guaranteed order of convergence of this method through ana-
lytical considerations.
H INT: First of all note that we may assume equality in both the error recursion (11) and the
bound e(n+1) Ce(n) p that defines convergence of order p > 1, because in both cases
equality corresponds to a worst case scenario. Then plug the two equations into each other
and obtain an equation of the type . . . = 1, where the left hand side involves an error norm
that can become arbitrarily small. This implies a condition on p and allows to determine
C > 0. A formal proof by induction (not required) can finally establish that these values
provide a correct choice.
Solution: Suppose e(n) = Ce(n1) p (p is the largest convergence order and C is some
constant).
Then
In (11) we may assume equality, because this is the worst case. Thus,
i.e.
C p e(n1) p
2 p 1
2 = 1. (14)
2
Problem 2 Convergent Newton iteration (core problem)
As explained in [1, Section 2.3.2.1], the convergence of Newtons method in 1D may only
be local. This problem investigates a particular setting, in which global convergence can
be expected.
We recall the notion of a convex function and its geometric definition. A differentiable
function f [a, b] R is convex, if and only if its graph lies on or above its tangent at
any point. Equivalently, differentiable function f [a, b] R is convex, if and only if its
derivative is non-decreasing.
Give a graphical proof of the following statement:
If F (x) belongs to C 2 (R), is strictly increasing, is convex, and has a unique zero, then the
Newton iteration [1, (2.3.4)] for F (x) = 0 is well defined and will converge to the zero of
F (x) for any initial guess x(0) R.
Solution: The sketches in Figure 8 discuss the different cases.
(3a) Write a M ATLAB script that computes the order of convergence to the point x
of this iteration for the function f (x) = xex 1 (see [1, Exp. 2.2.3]). Use x(0) = 1.
Solution:
3
5 x0 = 1;
6 x_star = f z e r o (f,x0);
7
8 x = x0; upd = 1;
9 w h i l e (abs(upd) > e p s )
10 fx = f(x(end)); % only 2 evaluations of f at each
step
11 i f fx 0;
12 upd = fx^2 / (f(x(end)+fx)-fx);
13 x = [x, x(end)-upd];
14 e l s e upd = 0;
15 end
16 end
17 residual = f(x);
18 err = abs(x-x_star);
19 log_err = l o g (err);
20 ratios = (log_err(3:end)-log_err(2:end-1))...
21 ./(log_err(2:end-1)-log_err(1:end-2));
22
The output is
log(en+1 )log(en )
x error en log(en )log(en 1)
1.000000000000000 0.432856709590216
0.923262600967822 0.356119310558038
0.830705934728425 0.263562644318641 1.542345498206531
0.727518499997190 0.160375209587406 1.650553641703975
0.633710518522047 0.066567228112263 1.770024323911885
0.579846053882820 0.012702763473036 1.883754995643305
0.567633791946526 0.000490501536742 1.964598248590593
0.567144031581974 0.000000741172191 1.995899954235929
0.567143290411477 0.000000000001693 1.999927865685712
0.567143290409784 0.000000000000000 0.741551601040667
4
x
(3b) The function g(x) contains a term like exe , thus it grows very fast in x and the
method can not be started for a large x(0) . How can you modify the function f (keeping
the same zero) in order to allow the choice of a larger initial guess?
H INT: If f is a function and h [a, b] R with h(x) 0, x [a, b], then (f h)(x) = 0
f (x) = 0.
Solution: The choice f(x) = ex f (x) = x ex prevents the blow up of the function g
and allows to use a larger set of positive initial points. Of course, f(x) = 0 exactly when
f (x) = 0.
(4a) Find an equation satisfied by the smallest positive initial guess x(0) for which
Newtons method does not converge when it is applied to F (x) = arctan x.
H INT: Find out when the Newton method oscillates between two values.
H INT: Graphical considerations may help you to find the solutions. See Figure 9: you
should find an expression for the function g.
Solution: The function arctan(x) is positive, increasing and concave for positive x,
therefore the first iterations of Newtons method with initial points 0 < x(0) < y (0) sat-
isfy y (1) < x(1) < 0 (draw a sketch to see it). The function is odd, i.e., arctan(x) =
arctan(x) for every x R, therefore the analogous holds for initial negative values
(y (0) < x(0) < 0 gives 0 < x(1) < y (1) ). Moreover, opposite initial values give opposite
iterations: if y (0) = x(0) then y (n) = x(n) for every n N.
All these facts imply that, if x(1) < x(0) , then the absolute values of the following iter-
ations will converge monotonically to zero. Vice versa, if x(1) > x(0) , then the absolute
values of the Newtons iterations will diverge monotonically. Moreover, the iterations
change sign at each step, i.e., x(n) x(n+1) < 0.
It follows that the smallest positive initial guess x(0) for which Newtons method does not
converge satisfies x(1) = x(0) . This can be written as
f (x(0) )
x(1) = x(0) = x(0) (1 + (x(0) )2 ) arctan x(0) = x(0) .
f (x(0) )
5
Therefore, x(0) is a zero of the function
g(x) = 2x (1 + x2 ) arctan x with g (x) = 1 2x arctan x.
(4b) Use Newtons method to find an approximation of such x(0) , and implement it
with Matlab.
Solution: Newtons iteration to find the smallest positive initial guess reads
(n+1) (n) 2x(n) (1 + (x(n) )2 ) arctan x(n) x(n) + (1 (x(n) )2 ) arctan x(n)
x =x = .
1 2x arctan x(n) 1 2x(n) arctan x(n)
The implementation in Matlab is given in Listing 31 (see also Figure 9).
12 figure;
13 x1 = x0- a t a n (x0)*(1+x0^2); x2 = x1- a t a n (x1)*(1+x1^2);
14 X=[-2:0.01:2];
15 p l o t (X, a t a n (X),'k',...
16 X, 2*(X)-(1+(X).^2).* a t a n ((X)),'r--',...
17 [x0, x1, x1, x2, x2], [ a t a n (x0), 0, a t a n (x1), 0,
a t a n (x2)],...
18 [x0,x1],[0,0],'ro',[-2,2], [0,0],'k','linewidth',2);
19 l e g e n d ('arctan', 'g', 'Newton critical iteration'); a x i s
equal;
6
20 p r i n t -depsc2 'ex_NewtonArctan.eps'
In other words, 0 tells us which distance of the initial guess from x still guarantees local
convergence.
Solution:
lim x(k) = x lim x(k) x = 0
k k
Thus we seek an upper bound B(k) for x(k) x and claim that: lim B(k) = 0.
k
x(k) x Cx(k1) x p
C C p x(k2) x p
2
C C p C p x(k3) x p
2 3
x(0) x p
k1 k
CC p
k1
pi
C i=0 x(0) x p
k
=
geom. series pk 1
x(0) x p
k
= C p1
pk 1 k 1 1 pk
C p1 p0 = C 1p (C p1 0 ) = B(k)
const.
1
lim B(k) = 0 C p1 0 < 1
k
7
1
0 < 0 < C 1p
(5b) Provided that x(0) x < 0 is satisfied, determine the minimal kmin =
kmin (0 , C, p, ) such that
x(k) x < .
Solution: Using the previous upper bound and the condition , we obtain:
1 1 pk
x(k) x C 1p (C p1 0 ) <
Solving for the minimal k (and calling the solution kmin ), with the additional requirement
that k N, we obtain:
1 1
ln (C 1p ) + pk ln (C p1 0 ) < ln
<0
ln ( ) + p1 ln (C)
1
1
k > ln kmin N
,
ln (p)
1
ln (C 0 )
p1
ln ( ) + p1
1
ln (C) 1
kmin = ln
ln (C
1
p1 ) ln (p)
0
and plot kmin = kmin (0 , ) for the values p = 1.5, C = 2. Test you implementation for
1
every (0 , ) linspace(0, C 1p )2 (0, 1)2 {(i, j) i j}
H INT: Use a M ATLAB pcolor plot and the commands linspace and meshgrid.
Solution: See k_min_plot.m.
8
3 eps_max = C^(1/(1-p));
4
22 % Plotting
23 p c o l o r (eps_msh,tau_msh,k)
24 c o l o r b a r ()
25 t i t l e ('Minimal number of iterations for error < \tau')
26 x l a b e l ('\epsilon_0')
27 y l a b e l ('\tau')
28 xlim([0,eps_max])
29 ylim([0,eps_max])
30 s h a d i n g flat
9
(6a) What is the purpose of the following MATLAB code?
1 f u n c t i o n y = myfn(x)
2 l o g 2 = 0.693147180559945;
3
4 y = 0;
5 w h i l e (x > s q r t (2)), x = x/2; y = y + l o g 2 ; end
6 w h i l e (x < 1/ s q r t (2)), x = x*2; y = y - l o g 2 ; end
7 z = x-1;
8 dz = x* exp(-z)-1;
9 w h i l e (abs(dz/z) > e p s )
10 z = z+dz;
11 dz = x* exp(-z)-1;
12 end
13 y = y+z+dz;
Solution: The MATLAB code computes y = log (x), for a given x. The program can be
regarded as Newton iterations for finding the zero of
f (z) = ez x (15)
(6b) Explain the rationale behind the two while loops in lines #5, 6.
Solution: The purpose of the two while loops is to shift the function values of (15) and
modify the initial z0 = x 1 in such a way that good convergence is reached (according to
the function derivative).
10
(6e) Replace the while-loop of lines #9 through #12 with a fixed number of itera-
tions that, nevertheless, guarantee that the result has a relative accuracy eps.
Solution: Denote the zero of f (z) with z , and e(n) = z (n) z . Use Taylor expansion of
f (z), f (z):
1
f (z (n) ) = ez e(n) + ez (e(n) )2 + O((e(n) )3 )
2
1
= xe(n) + x(e(n) )2 + O((e(n) )3 ),
2
z z (n)
f (z ) = e + e e + O((e(n) )2 )
(n)
= x + xe(n) + O((e(n) )2 )
(n)
Considering Newtons Iteration z (n+1) = z (n) ff(z )
(z (n) )
,
f (z (n)
e(n+1) = e(n)
f (z (n)
xe(n) + 12 x(e(n) )2 + O((e(n) )3 )
= e(n)
x + xe(n) + O((e(n) )2 )
1 (n) 2
(e )
2
=
1 n n+1
( )1+2++2 (e0 )2
2
1 n+1 n+1
= 2 ( )2 (e0 )2
2
1 n+1
= 2 ( e 0 )2 ,
2
where e0 = z 0 z = x 1 log (x). So it is enough for us to determine the number n of
(n+1)
iteration steps by elog x = eps. Thus
log (log (2) log (log (x)) log (eps)) log (log (2) log (e0 ))
n = 1
log (2)
log ( log (eps)) log ( log (e0 ))
1
log (2)
The following code is for your reference.
11
function y = myfn(x)
log2 = 0.693147180559945;
y = 0;
w h i l e (x > sqrt(2)), x = x/2; y = y + log2; end
w h i l e (x < 1/sqrt(2)), x = x*2; y = y - log2; end
z = x-1;
dz = x*exp(-z)-1;
e0=z-log(x);
k=(log(-log(eps))-log(-log(abs(e0))))/log(2);
f o r i=1:k
z = z+dz;
dz = x*exp(-z)-1;
end
y = y+z+dz;
12
If the starting value is chosen to be less than the zero
point, then xk > x for any k 1, and then f(xk ) > 0.
x0
x1
f(xk )
xk+1 = xk f (xk ) < xk , for any k 1,
xk+1 xk xk1
gk (xk )
13
1.5 arctan
g
Newton critical iteration
0.5
0.5
1.5
2 1.5 1 0.5 0 0.5 1 1.5 2
14
0.2
12
10
0.15
0.1
6
4
0.05
0 0
0 0.05 0.1 0.15 0.2 0.25
0
14
Prof. R. Hiptmair AS 2015
ETH Zrich
G. Alberti,
D-MATH
F. Leonardi Numerical Methods for CSE
Problem Sheet 5
(1a) Show that the iteration (16) is consistent with F(x) = 0 in the sense of [1,
Def. 2.2.1], that is, show that x(k) = x(0) for every k N, if and only if F(x(0) ) = 0 and
DF(x(0) ) is regular.
Solution: If F(x(k) ) = 0 then y(k) = x(k) + 0 = x(k) and x(k+1) = y(k) 0 = x(k) .
So, by induction, if F(x(0) ) = 0 then x(k+1) = x(k) = x(0) for every k.
Conversely, if x(k+1) = x(k) , then, by the recursion of the Newton method:
1
(1b) Implement a C++ function
that computes a step of the method (16) for a scalar function F, that is, for the case n = 1.
Here, f is a lambda function for the function F R R and df a lambda to its derivative
F R R.
H INT: Your lambda functions will likely be:
http://en.cppreference.com/w/cpp/language/lambda
https://msdn.microsoft.com/en-us/library/dd293608.aspx
uses the function mod_newt_step from subtask (1b) in order to apply (16) to the
following scalar equation
arctan(x) 0.123 = 0 ;
determines empirically the order of convergence, in the sense of [1, Rem. 2.1.19] of
the course slides;
2
implements meaningful stopping criteria ([1, Section 2.1.2]).
3
system of equations and in this problem we will supplement the theoretical considerations
from class by implementation in E IGEN. We will also learn about a simple fixed point
iteration for that system, see [1, Section 2.2]. Refresh yourself about the relevant parts
of the lecture. You should also try to recall the Sherman-Morrison-Woodbury formula [1,
Lemma 1.6.113].
Consider the nonlinear (quasi-linear) system:
A(x)x = b ,
as in [1, Ex. 2.4.19]. Here, A Rn Rn,n is a matrix-valued function:
(x)
1
1
(x) 1
A(x) = , (x) = 3 + x2
1 (x) 1
1 (x)
where 2 is the Euclidean norm.
(2a) A fixed point iteration fro (Problem 2) can be obtained by the frozen argument
technique; in a step we take the argument to the matrix valued function from the previous
step and just solve a linear system for the next iterate. State the defining recursion and
iteration function for the resulting fixed point iteration.
(2b) We consider the fixed point iteration derived in sub-problem (2a). Implement a
function computing the iterate x(k+1) from x(k) in E IGEN.
H INT: (Optional) This is classical example where lambda C++11 functions may become
handy.
Write the iteration function as:
where func type will be that of a lambda function implementing A. The vector b will be
an input random r.h.s. vector. The vector x will be the input x(k) and x_new the output
x(k+1) .
Then define a lambda function:
4
1 auto A = [ /* TODO */ ] (const Eigen::VectorXd & x) ->
Eigen::SparseMatrix<double> & { /* TODO */ };
returning A(x) for an input x (capture the appropriate variables). You can then call your
stepping function with:
1 fixed_point_step(A, b, x, x_new);
(2c) Write a routine that finds the solution x with the fixed point method applied to
the previous quasi-linear system. Use x(0) = b as initial guess. Supply it with a suitable
correction based stopping criterion as discussed in [1, Section 2.1.2] and pass absolute and
relative tolerance as arguments.
Solution: See quasilin.cpp .
(2d) Let b Rn be given. Write the recursion formula for the solution of
A(x)x = b
(2e) The matrix A(x), being symmetric and tri-diagonal, is cheap to invert. Rewrite
the previous iteration efficiently, exploiting, the Sherman-Morrison-Woodbury inversion
formula for rank-one modifications [1, Lemma 1.6.113].
5
Solution: We replace the inversion with the SMW formula:
1
(k+1) (k) (k) x(k) (x(k) )
x =x (A(x )+ ) (A(x(k) )x(k) b)
x 2
(k)
x (x ) (x(k) A(x(k) )1 b)
(k) (k)
= A(x(k) )1 (b + (k) )
x 2 + (x(k) ) A(x(k) )1 x(k)
(2g) Repeat subproblem (2c) for the Newton method. As initial guess use x(0) = b.
Solution: See quasilin.cpp.
6
Figure 11: non-linear circuit for Problem 1
In this problem we deal with a very simple non-linear circuit element, a diode. The current
through a diode as a function of the applied voltage can be modelled by the relationship
Uk Uj
Ikj = (e UT
1),
with suitable parameters , and the thermal voltage UT .
Now we consider the circuit depicted in Fig. 11 and assume that all resistors have resis-
tance R = 1.
(3a) Carry out the nodal analysis of the electric circuit and derive the corresponding
non-linear system of equations F(u) = 0 for the voltages in nodes 1,2, and 3, cf. [1,
Eq. (2.0.2)]. Note that the voltages in nodes 4 and 5 are known (input voltage and ground
voltage 0).
Solution: We consider Kirchhoffs law Ikj = 0 for every node k. Ikj contributes to the
k,j
sum if node j is connected to node k trough one element (R,C,L,D).
(1) 3U1 0 U2 U3 = 0
(2) 3U2 U1 U3 U = 0
( UU3 TU )
(3) 3U3 0 U1 U2 + e 1 =0
7
Thus the nonlinear system of equations reads
3U1 U2 U3 0 U1
F(u) =
3U2 U1 U3 U
= 0
for u = U2 .
0 U3
3U3 U1 U2 + e UT U e UT U3
that computes the output voltages Uout (at node 1 in Fig. 11) for a sorted vector of input
voltages Uin (at node 4) for a thermal voltage UT = 0.5. The parameters alpha, beta
pass the (non-dimensional) diode parameters.
Use Newtons method to solve F(u) = 0 with a tolerance of = 106 .
Solution: An iteration of the multivariate Newton method is:
1
u(k+1) = u(k) (DF(u(k) ) F(u(k) )
Fi (u)
The components of the Jacobian DF are defined as DF(u)ij = uj . For the function
obtained in a) we get
3 1 1
DF(u) = J =
1 3 1
U3 U
.
1 1 3 + U e UT
T
(3c) We are interested in the nonlinear effects introduced by the diode. Calcu-
late Uout = Uout (Uin ) as a function of the variable input voltage Uin [0, 20] (for non-
dimensional parameters = 8, = 1 and for a thermal voltage UT = 0.5) and infer the
nonlinear effects from the results.
Solution: The nonlinear effects can be observed by calculating the differences between
the solutions Uout , see file nonlinear_circuit.cpp.
8
Problem 4 Julia set
Julia sets are famous fractal shapes in the complex plane. They are constructed from the
basins of attraction of zeros of complex functions when the Newton method is applied to
find them.
In the space C of complex numbers the equation
z3 = 1 (18)
has three solutions: z1 = 1, z2 = 21 + 21 3i, z3 = 12 21 3i (the cubic roots of unity).
(4a) As you know from the analysis course, the complex plane C can be identified
with R via (x, y) z = x + iy. Using this identification, convert equation (18) into a
2
x3 3xy 2 1 0
Thus, equation (18) is equivalent to F(x, y) = ( ) = ( ).
3x2 y y 3 0
(4b) Formulate the Newton iteration [1, Eq. (2.4.1)] for the non-linear equation
F(x) = 0 with x = (x, y)T and F from the previous sub-problem.
Solution: The iteration of Newtons method for multiple variables reads
1
x(k+1) = x(k) (DF(x(k) )) F(x(k) ),
3x2 3y 2 6xy
where DF is the Jacobian DF(x) = ( ).
6xy 3x2 3y 2
(4c) Denote by x(k) the iterates produced by the Newton method from the previous
sub-problem with some initial vector x(0) R2 . Depending on x(0) , the sequence x(k) will
either diverge or converge to one of the three cubic roots of unity.
Analyze the behavior of the Newton iterates using the following procedure:
use equally spaced points on the domain [2, 2]2 R2 as starting points of the
Newton iterations,
color the starting points differently depending on which of the three roots is the limit
of the sequence x(k) .
9
H INT:: useful M ATLABcommands: pcolor, colormap, shading, caxis. You may
stop the iteration once you are closer in distance to one of the third roots of unity than
104 .
The three (non connected) sets of points whose iterations are converging to the different zi
are called Fatou domains, their boundaries are the Julia sets.
Solution: For each starting point at most N_it iterations are accomplished. For a given
(k) (k)
starting point x(0) , as soon as the condition x1 + ix2 zi < 104 , i {1, 2, 3} is
reached, we assign a color depending on the root zi and on the number of iterations done.
Each attractor is associated to red, green or blue; lighter colors correspond to the points
with faster convergence. The points that are not converging in N_it iterations are white.
We set the color scale using the M ATLABcommands colormap and caxis.
Figure 12: Julia set for z 3 1 = 0 on a mesh containing 1000 1000 points, N_it=20.
10
2 close all;
3 i f nargin <2; N_it = 25; num_grid_points = 200; end;
4
11 % roots of unity:
12 z1 = [ 1; 0];
13 z2 = [-1/2; s q r t (3)/2];
14 z3 = [-1/2; - s q r t (3)/2];
15
16 f o r ny = 1: l e n g t h (yy)
17 f o r nx = 1: l e n g t h (xx)
18 % for each starting point in the grid
19 v = [xx(nx); yy(ny)];
20
21 f o r k=1:N_it;
22 F = [v(1)^3-3*v(1)*v(2)^2-1;
3*v(1)^2*v(2)-v(2)^3];
23 DF = [3*v(1)^2-3*v(2)^2, -6*v(1)*v(2);
24 6*v(1)*v(2) , 3*v(1)^2-3*v(2)^2];
25 v = v - DF\F; % Newton update
26
11
36 end
37 st = 1/N_it; % build a RGB colormap ,
38 % 1s at the beginning to recognize slowly -converging
points (white)
39 mycolormap = [ [1,1-st:-st:0, z e r o s (1,2*N_it)];
40 [1, z e r o s (1,N_it), 1-st:-st:0,
z e r o s (1,N_it)];
41 [1, z e r o s (1,2*N_it), 1-st:-st:0] ]';
42 % mycolormap = 1 - mycolormap; % cmy, pyjamas
version...
43 % mycolormap = [ [1,1-st:-st:0, st:st:1, st:st:1,];
44 % [1,st:st:1, 1-st:-st:0, st:st:1,];
45 % [1,st:st:1, st:st:1, 1-st:-st:0]
]';
46 colormap(mycolormap); % built in: colormap
jet(256);
47 % this is the command that creates the plot:
48 p c o l o r (col);
49 c a x i s ([1,3*N_it+1]); s h a d i n g flat;
a x i s ('square','equal','off');
50 p r i n t -depsc2 'ex_JuliaSet.eps';
12
Prof. R. Hiptmair AS 2015
ETH Zrich
G. Alberti,
D-MATH
F. Leonardi Numerical Methods for CSE
Problem Sheet 6
dp = dipoleval(t,y,x)
that returns the row vector (p (x1 ), . . . , p (xm )), when the argument x passes (x1 , . . . , xm ),
m N small. Here, p denotes the derivative of the polynomial p P n interpolating the
data points (ti , yi ), i = 0, . . . , n, for pairwise different ti R and data values yi R.
H INT: Differentiate the recursion formula [1, Eq. (3.2.30)] and devise an algorithm in the
spirit of the Aitken-Neville algorithm implemented in [1, Code 3.2.31].
Solution: Differentiating the recursion formula [1, (3.2.30)] we obtain
pi (t) yi , i = 0, . . . , n,
pi (t) 0, i = 0, . . . , n,
(t ti0 )pi1 ,...,im (t) (t tim )pi0 ,...,im1 (t)
pi0 ,...,im (t) = ,
tim ti0
pi1 ,...,im (t) + (t ti0 )pi1 ,...,im (t) pi0 ,...,im1 (t) (t tim )pi0 ,...,im1 (t)
pi0 ,...,im (t) = .
tim ti0
The implementation of the above algorithm is given in file dipoleval_test.m.
1
(1c) For validation purposes devise an alternative, less efficient, implementation of
dipoleval (call it dipoleval_alt) based on the following steps:
1 c l a s s LinearInterpolant {
2 public:
3 LinearInterpolant( / * TODO: pass pairs * / ) {
4 // TODO: construct your data from (t_i, y_i)'s
5 }
6
7 d o u b l e o p e r a t o r () ( d o u b l e x) {
8 // TODO: return I(x)
9 }
10 private:
11 // Your data here
12 };
H INT: Recall that C++ provides containers such as std::vector and std::pair.
Solution: See linearinterpolant.cpp.
2
(2b) Test the correctness of your code.
(3a) Using the Horner scheme, write an efficient C++ implementation of a function
which returns the pair (p(x), p (x)), where p is the polynomial with coefficients in c. The
vector c contains the coefficient of the polynomial in the monomial basis, using Matlab
convention (leading coefficient in c[0]).
Solution: See file horner.cpp.
(3b) For the sake of testing, write a naive C++ implementation of the above function
which returns the same pair (p(x), p (x)). This time, p(x) and p (x) should be calculated
with the simple sums of the monomials constituting the polynomial.
Solution: See file horner.cpp.
(3d) Check the validity of the two functions and compare the runtimes for polyno-
mials of degree up to 220 1.
Solution: See file horner.cpp.
3
Problem 4 Lagrange interpolant
Given data points (ti , yi )ni=1 , show that the Lagrange interpolant
n n
x tj
p(x) = yi Li (x), Li (x) =
i=0 j=0 ti tj
ji
is given by:
n
yj
p(x) = (x)
j=0 (x tj ) (tj )
n n n
yj yj
p(x) = (x) = (x tj )
j=0 (x tj ) (tj ) j=0 (x tj ) j=0 (ti tj ) j=0
n
ji
n
yj n n n
x tj
= n (x tj ) = yj .
j=0 j=0 (ti tj ) j=0 j=0 j=0 ti tj
ji ji ji
4
Prof. R. Hiptmair AS 2015
ETH Zrich
G. Alberti,
D-MATH
F. Leonardi Numerical Methods for CSE
Problem Sheet 7
(1a) Determine , such that s, is a cubic spline in S3,M with respect to the node
set M = {1, 0, 1, 2}. Verify that you actually obtain a cubic spline.
Solution: We immediately see that = 1 is necessary to get a polynomial of 3rd degree.
Furthermore, from the condition
11 11
8 = s1, (1 ) = s1, (1+ ) = + 8 + we get = .
3 3
It remains to check the continuity of s, s , s in the nodes 0 and 1. Indeed, we have
4(x + 1)3 + 4(x 1)3
12(x + 1)2 + 12(x 1)2 1 x 0,
s1, (x) = 3x2 8 s1, (x) 6x 0 < x 1,
3x + 16x
2
6x + 16 1 < x < 2.
1
0 0+ 1 1+ 0 0+ 1 1+
s 2+ 1 8 + 8 + 11/3 = 1, s 1 1 8 8
s 4 4 8 3 8 3 + 16 = 11/3 s 8 8 5 5
s 12 + 12 0 -6 6 + 16 s 0 0 -6 -6
They agree for our choice of the parameters.
(1b) Use M ATLAB to create a plot of the function defined in (19) in dependance of
and .
Solution:
4 x1 = -1:0.01:0;
5 y1 = (x1+1).^4+ a l p h a *(x1-1).^4+1;
6 x2 = 0:0.01:1;
7 y2 = -x2.^3-8* a l p h a *x2+1;
8 x3 = 1:0.01:2;
9 y3 = b e t a *x3.^3 + 8.*x3.^2+11/3;
10
15 close all;
16 p l o t (x,y,nodes,data,'ro','linewidth',2);
17 l e g e n d ('cubic spline', 'data
points','Location','SouthEast');
18 x l a b e l ('x','fontsize',14); y l a b e l ('s(x)','fontsize',14);
19 t i t l e ('Cubic spline with parameters','fontsize',14)
20 p r i n t -depsc2 'ex_CubicSpline.eps'
2
Cubic spline with parameters
10
s(x)
5
10
cubic spline
data points
15
1 0.5 0 0.5 1 1.5 2
x
(2a) What is the dimension of the subspace of 1-periodic spline functions in S2,M ?
Solution: Counting argument, similar to that used to determine the dimensions of the
spline spaces. We have n + 1 unknowns dk and n unknowns cj , the constraint d0 = s(t0 ) =
3
s(t0 + 1) = s(tn ) = dn leaves us with a total of 2n unknowns. The continuity of the deriva-
tives in the n nodes impose the same number of constraints, therefore the total dimension
of the spline space is 2n n = n.
(2b) What kind of continuity is already guaranteed by the use of the representation
(20)?
Solution: We observe that s(tj ) = s[tj1 ,tj ] (tj ) = dj = s[tj ,tj+1 ] (tj ) = s(t+j ), thus we get
continuity for free. However the derivatives do not necessarily match.
(2c) Derive a linear system of equations (system matrix and right hand side) whose
solution provides the coefficients cj and dj in (20) from the function values yj = f ( 12 (tj1 +
tj )), j = 1, . . . , n.
H INT: By [1, Def. 3.5.1] we know S2,M C 1 ([0, 1]), which provides linear constraints at
the nodes, analogous to [1, Eq. (3.5.6)] for cubic splines.
Solution: We can plug t = 21 (tj + tj1 ) into (20) and set the values equal to yj . We obtain
= 1/2 and the following conditions:
1 1
dj + cj + dj1 = yj , j = 1, ..., n. (21)
4 4
We obtain conditions on dj by matching the derivatives at the interfaces. The derivative of
the quadratic spline can be computed from (20), after defining j = tj tj1 :
2
s [tj1 ,tj ] (t) = 1
j ( dj +4 (1 )cj +(1 )2 dj1 ) = 1 j (2 dj +4(12 )cj 2(1 )dj1 ).
Setting = 1 in [tj1 , tj ] and = 0 in [tj , tj+1 ], the continuity of the derivative in the node
tj enforces the condition
2dj 4cj 4cj+1 2dj
= s [tj1 ,tj ] (tj ) = s (tj ) = s (t+j ) = s [tj1 ,tj ] (tj ) = ;
j j+1
(this formula holds for j = 1, . . . , n if we define tn+1 = t1 + 1 and cn+1 = c1 ). Simplifying
for dj we obtain:
c c
2 jj + 2 j+1 cj j+1 + cj+1 j cj (tj+1 tj ) + cj+1 (tj tj1 )
dj = =2 =2 j = 1, . . . , n.
j+1
,
1
j + j+1
1
j + j+1 tj+1 tj1
Plugging this expression into (21), we get the following system of equations:
1 cj+1 (tj tj1 ) + cj (tj+1 tj ) 1 cj (tj1 tj2 ) + cj1 (tj tj1 )
+cj + = yj , j = 1, . . . , n,
2 tj+1 tj1 2 tj tj2
(22)
4
with the periodic definitions t2 = tn2 1, c0 = cn , c1 = cn1 . We collect the coefficients
and finally we obtain
B1 C2 0 0 0 0 0 0 A0 c1 y1
A1 B2 C3 0 0 0 0 0 0 c2 y2
0 A2 B3 C4 0 0
0 0 0 c3 y3
= . (24)
0
0 0 0 0 0 0 An2 Bn1 Cn cn1 yn1
Cn+1 0 0 0 0 0 0 An1 Bn cn yn
function s=quadspline(t,y,x)
which takes as input a (sorted) node vector t (of length n1, because t0 = 0 and tn = 1 will
be taken for granted), a n-vector y containing the values of a function f at the midpoints
2 (tj1 + tj ), j = 1, . . . , n, and a sorted N -vector x of evaluation points in [0, 1].
1
The function is to return the values of the interpolating quadratic spline s at the positions
x.
You can test your code with the one provided by quadspline_p.p (available on the
lecture website).
Solution: See Listing 35.
5
Listing 35: Construction of the quadratic spline and evaluation.
1 % part d of quadratic splines problem
2 % given: t = nodes (n-1 vect.),
3 % y = data (n vect.),
4 % x = evaluation pts (N vect.)
5 % create the interpolating quadratic spline and evaluate
in x
6 f u n c t i o n eval_x = quadspline_better(t,y,x)
7 % the number of nodes:
8 n = l e n g t h (y); % N = length(x);
9 % ensure nodes and data are line vectors:
10 t = t(:)'; y = y(:)';
11 % create (n+3) extended vectors using the periodicity:
12 ext_t = [t(end)-1,0,t,1,1+t(1)];
13 % increments in t:
14 de_t = d i f f (ext_t); % (n+2)
15 dde_t = ext_t(3:end) - ext_t(1:end-2); % (n+1)
16
6
35
7
71 end
72
73 end
It is important not to get lost in the indexing of the vectors. In this code they can be repre-
sented as:
t=(t1, t2, ..., t{n-1}) length = n 1,
ext_t=(t{n-1}-1, t0=0, t1, t2,..., t{n-1}, tn=1, 1+t1) length
= n + 3,
de_t=(1-t{n-1}, t1, t2-t1,..., t{n-1}-t{n-2}, 1-t{n-1}, t1) ,
dde_t=(t1+1-t{n-1}, t2, t3-t1,..., 1-t{n-2}, t1+1-t{n-1}) .
The vectors A,B,C,c,d have length n and correspond to the definitions given in (2c)
(with the indexing from 1 to n).
(2e) Plot f and the interpolating periodic quadratic spline s for f (t) = exp(sin(2t)),
10
n = 10 and M = { nj }j=0 , that is, the spline is to fulfill s(t) = f (t) for all midpoints t of
knot intervals.
Solution: See code 36 and Figure 13.
3 i f nargin <3
4 f = @(t) exp( s i n (2* p i *t)); % function to be
interpolated
5 n = 10; % number of subintervals
6 N = 200; % number of evaluation
points
7 func = @quadspline_better;
8 end
9
8
points [0,t,1]
14 y = f(middle_points); % evaluate f on the
middle_points
15
25 end
26
27 % test:
28 % ex_QuadraticSplinesPlot(@(t)sin(2 * pi * t).^3, 10,500)
2.5
1.5
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
9
and in n, provided one exploits the sparse structure of the system and uses a rank-one
modification to reduce the system to the inversion of a (strictly diagonally dominant) tridi-
agonal matrix.
Figure 14: Error and timing for quadratic splines in equispaced nodes.
0 0
10 10
Error Time
1 O( n3 ) O(n)
10
4
10
5 2
10 10
6
10
7
10
3
10
8
10
9
10
10 4
10 10
0 2 4 0 2 4
10 10 10 10 10 10
Number of intervals Number of intervals
10
the mappings and = h describe exactly the same curve. On the other hand, the
selection of nodes will affect the interpolants s1 and s2 and leads to different interpolating
curves.
Concerning the choice of the nodes, we will consider two options:
1
equidistant parametrization: tk = kt, t = (25)
n
pl pl1
k
segment length parametrization: tk = nl=1 . (26)
l=1 pl pl1
Point data will be generated by the M ATLAB function heart that is available on the
course webpage.
which uses global polynomial interpolation (using the intpolyval function, see [1,
Code 3.2.28]) through the n + 1 points pi R2 , i = 0, . . . , n, whose coordinates are stored
in the 2 (n + 1) matrix xy and returns sampled values of the obtained curve in a 2 N
matrix pol. Here, t passes the node vector (t0 , t1 , . . . , tn ) Rn+1 in the parameter domain
and N is the number of equidistant sampling points.
H INT: Code for intpolyval is available as intpolyval.m.
Solution: See Listing 37.
3 x = xy(1,:);
4 y = xy(2,:);
5
11
11
22 pol = [polx; p o l y ];
23 spl = [splx; sply];
24 pch = [pchx; pchy];
25
26 end
which has the same purpose, arguments and return values as polycurveintp, but now
uses monotonicity preserving cubic Hermite interpolation (available through the M ATLAB
built-in function pchip, see also [1, Section 3.4.2]) instead of global polynomial interpo-
lation.
Plot the obtained curves for the heart data set in the figure created in sub-problem (3b).
Use both parameterizations (25) and (26).
Solution: See Listing 37 and Figure 15.
12
which has the same purpose, arguments and return values as polycurveintp, but now
uses complete cubic spline interpolation.
The required derivatives s1 (0), s2 (0), s1 (1), and s2 (1) should be computed from the
directions of the line segments connecting p0 and p1 , and pn1 and pn , respectively. You
can use the M ATLAB built-in function spline. Plot the obtained curves (heart data) in
the same figure as before using both parameterizations (25) and (26).
H INT: read the M ATLAB help page about the spline command and learn how to impose
the derivatives at the endpoints.
Solution: The code for the interpolation of the heart:
3 xy = heart();
4 n = s i z e (xy,2) - 1;
5
6 %evaluation points:
7 tt = 0:0.005:1;
8
9 figure;
10 h o l d on;
11
24 h o l d off;
25 p r i n t -depsc2 '../PICTURES/ex_CurveIntp.eps'
13
26
27 end
28
36 % plotting function
37 f u n c t i o n plot_interpolations (xy,pol,spl,pch)
38 p l o t (xy(1,:),xy(2,:),'o', pol(1,:),pol(2,:),'-.', ...
39 spl(1,:),spl(2,:),'-', pch(1,:),pch(2,:),'--',
'linewidth',2);
40 a x i s equal;
41 a x i s ([-105 105 -105 105]);
42 l e g e n d ('data','polynomial','spline','pchip','Location','Southoutside'
43 end
100 100
50 50
0 0
50 50
100 100
100 50 0 50 100 100 50 0 50 100
data data
polynomial polynomial
spline spline
pchip pchip
14
Remarks:
Global polynomial interpolation fails. The degree of the polynomial is too high (n),
and the typical oscillations can be observed.
In this case, when two nodes are too close, spline interpolation with equidistant
parametrization introduces small spurious oscillations.
Problem 4 Approximation of
In [1, Section 3.2.3.3] we learned about the use of polynomial extrapolation (= interpola-
tion outside the interval covered by the nodes) to compute inaccessible limits limh0 (h).
In this problem we apply extrapolation to obtain the limit of a sequence x(n) for n .
We consider a quantity of interest that is defined as a limit
with a function T {n, n+1, . . .} R. However, computing T (n) for very large arguments
k may not yield reliable results.
The idea of extrapolation is, firstly, to compute a few values T (n0 ), T (n1 ), . . . , T (nk ),
k N, and to consider them as the values g(1/n0 ), g(1/n1 ), . . . , g(1/nk ) of a continuous
function g ]0, 1/nmin ] R, for which, obviously
Thus we recover the usual setting for the application of polynomial extrapolation tech-
niques. Secondly, according to the idea of extrapolation to zero, the function g is approx-
imated by an interpolating polynomial p Pk1 with pk1 (n1 j ) = T (nj ), j = 1, . . . , k.
In many cases we can expect that pk1 (0) will provide a good approximation for x . In
this problem we study the algorithmic realization of this extrapolation idea for a simple
example.
The unit circle can be approximated by inscribed regular polygons with n edges. The
length of half of the circumference of such an n-edged polygon can be calculated by
elementary geometry:
15
n 2 3 4 5 6 8 10
T (n) = Un
2 2 3
2 3 2 2 5
4 10 2 5 3 4 2 2 5
2
( 5 1)
that uses the Aitken-Neville scheme, see [1, Code 3.2.31], to approximate by extrapola-
tion from the data in the above table, using the first k values, k = 1, . . . , 7.
Solution: See pi_approx.cpp.
16
Prof. R. Hiptmair AS 2015
ETH Zrich
G. Alberti,
D-MATH
F. Leonardi Numerical Methods for CSE
Problem Sheet 8
(1a) Given a knot set T = {t0 < t1 < < tn }, which also serves as the set of
interpolation nodes, and values yj , j = 0, . . . , n, write down the linear system of equations
that yields the slopes s (tj ) of the natural cubic spline interpolant s of the data points
(tj , yj ) at the knots.
Solution: Let hi = ti ti1 . Given the natural condition on the spline, one can remove the
columns relative to c0 = s (t0 ) and cn = s (tn ) from the system matrix, which becomes:
2/h1 1/h1 0 0 0
b 0
0 a1 b1 0
0 0
b1 a2 b2
2 2 1
A = , ai = + , bi = (29)
hi hi+1 hi+1
bn3 an2 bn2 0
0 0 bn2 an1 bn1
0 0 1/hn 2/hn
c = [c0 , c1 , . . . , cn ] (30)
b = [r0 , . . . , rn ] (31)
y1 y0 yn yn1
r0 = 3 2
, rn = 3
h1 h2n
The system becomes Ac = b.
1
(1b) Argue why the linear system found in subsubsection (1a) has a unique solution.
H INT: Look up [1, Lemma 1.8.12] and apply its assertion.
Solution: Notice that ai = h2i + hi+1
2
> hi+1 + hi
1 1
= bi +bi1 . The matrix is (strictly) diagonally
dominant and, therefore, invertible.
(1c) Based on E IGEN devise an efficient implementation of a C++ class for the
computation of a natural cubic spline interpolant with the following definition:
1 c l a s s NatCSI {
2 public:
3 //! \brief Build the cubic spline interpolant with
natural boundaries
4 //! Setup the data structures you need.
5 //! Pre-compute the coefficients of the spline
(solve system)
6 //! \param[in] t, nodes of the grid (for pairs (t_i,
y_i)) (sorted!)
7 //! \param[in] y, values y_i at t_i (for pairs (t_i,
y_i))
8 NatCSI( c o n s t c o n s t std::vector<double > & t, c o n s t
c o n s t std::vector<double > & y);
9
15 private:
16 // TODO: store data for the spline
17 };
H INT: Assume that the input array of knots is sorted and perform binary searches for the
evaluation of the interpolant.
Solution: See natcsi.cpp.
2
Problem 2 Monotonicity preserving interpolation (core problem)
This problem is about monotonicity preserving interpolation. Before starting, you should
revise [1, Def. 3.1.15], [1, 3.3.2] and [1, Section 3.4.2] carefully.
Note now that {s(j) j = 0, . . . , n} is a basis for Rn+1 (indeed, they constitute a linearly
independent set with cardinality equal to the dimension of the space). As a consequence,
every y Rn+1 can be written as a linear combination of the s(j) s, namely
n
y = j s(j) .
j=0
as desired.
3
Problem 3 Local error estimate for cubic Hermite interpolation (core
problem)
Consider the cubic Hermite interpolation operator H of a function defined on an interval
[a, b] to the space P3 polynomials of degree at most 3:
H C 1 ([a, b]) P3
defined by:
Assume f C 4 ([a, b]). Show that for every x ]a, b[ there exists [a, b] such that
1 (4)
(f Hf )(x) = f ( )(x a)2 (x b)2 . (33)
24
with
f (x) Hf (x)
C = .
(x a)2 (x b)2
4
Then (a) = (b) = (a) = (b) = 0 (using the definition of H). Moreover (x) = 0 (by
construction). Therefore, by Rolles theorem ( has at least two local extrema), 1 , 2
]a, b[, 1 2 such that (1 ) = (2 ) = 0.
has at least 4 zeros in [a, b] (a, b, 1 and 2 are pairwise distinct).
has at least 3 zeros in [a, b] ( has at least 3 local extrema).
(3) has at least 2 zeros in [a, b] ( has at least 2 local extrema).
(4) has at least 1 zero in [a, b] ((3) has at least one local extrema), let be such zero.
0 = (4) ( ) = f (4) ( ) 24C.
C = 24 f ( ).
1 (4)
where IT is the polynomial interpolation operator for the node set T , until
max f (t) IT (t) tol max f (t) . (37)
tS tS
function t = adaptivepolyintp(f,a,b,tol,N)
that implements the algorithm described above and takes as arguments the function handle
f, the interval bounds a, b, the relative tolerance tol, and the number N of equidistant
sampling points (in the interval [a, b]), that is,
j
S = {a + (b a) , j = 0, . . . , N } .
N
H INT: The function intpolyval from [1, Code 3.2.28] is provided and may be used
(though it may not be the most efficient way to implement the function).
Solution: See Listing 39.
5
(4b) Extend the function from the previous sub-problem so that it reports the quantity
max f (t) TT (t) (38)
tS
6
(4c) For f1 (t) = sin(e2t ) and f2 (t) = 1+16t
t
2 plot the quantity from (38) versus the
number of interpolation nodes. Choose plotting styles that reveal the qualitative decay of
this error as the number of interpolation nodes is increased. Use interval [a, b] = [0, 1],
N=1000 sampling points, tolerance tol = 1e-6.
Solution: See Listing 40 and Figure 16.
13 p r i n t -depsc '../PICTURES/plot_adaptivepolyintp.eps'
7
Error decay for adaptive polynomial interpolation
2
10
f1
f2
1
10
0
10
1
10
2
10
error
3
10
4
10
5
10
6
10
7
10
0 50 100 150
number of nodes
Figure 16: The quantity from (38) versus the number of interpolation nodes.
8
Prof. R. Hiptmair AS 2015
ETH Zrich
G. Alberti,
D-MATH
F. Leonardi Numerical Methods for CSE
Problem Sheet 9
where d([a, b], ) is the geometric distance of the integration contour C from the
interval [a, b] C in the complex plane. The contour must be contractible in the domain
D of analyticity of f and must wind around [a, b] exactly once, see [1, Fig. 142].
Now we consider the interval [1, 1]. Following [1, Rem. 4.1.87], our task is to find an
upper bound for this expression, in the case where f possesses an analytical extension to
a complex neighbourhood of [1, 1].
For the analysis of the Chebychev interpolation of analytic functions we used the elliptical
contours, see [1, Fig. 156],
= ( )d, (40)
I
1
where is the derivative of w.r.t the parameter . Recall that the length of a complex
number z viewed as a vector in R2 is just its modulus.
()
Solution: = sin( i log()), therefore:
2
H INT: The result of (1b), together with the knowledge that describes an ellipsis, tells
you the maximal range (1, max ) of . Sample this interval with 1000 equidistant steps.
H INT: Apply geometric reasoning to establish that the distance of and [1, 1] is 12 ( +
1 ) 1.
H INT: If you cannot find max use max = 2.4.
H INT: You can exploit the properties of cos and the hyperbolic trigonometric functions
cosh and sinh.
Solution: The ellipse must be restricted such that the minor axis has length 2/3 (2
times the smallest point, in absolute value, where f is not-analytic). Since this corresponds
to the imaginary part of (), when = /2, we find:
(1d) Based on the result of (1c), and [1, Eq. (4.1.89)], give an optimal bound for
f Ln f L ([1,1]) ,
where Ln is the operator of Chebychev interpolation on [1, 1] into the space of polyno-
mials of degree n.
Solution: Let M be the approximation of (1c). Then
M 2(2 + 2 )
f Ln f L ([1,1]) .
n+1 1
(1e) Graphically compare your result from (1d) with the measured supremum norm
of the approximation error of Chebychev interpolation of f on [1, 1] for polynomial
degree n = 1, . . . , 20. To that end, write a M ATLAB-code and rely on the provided function
intpolyval (cf. [1, Code 4.4.12]).
H INT: Use semi-logarithmic scale for your plot semilogy.
Solution: See cheby_analytic.m.
(1f) Rely on pullback to [1, 1] to discuss how the error bounds in [1, Eq. (4.1.89)]
will change when we consider Chebychev interpolation on [a, a], a > 0, instead of
[1, 1], whilst keeping the function f fixed.
3
Solution: The rescaled function f will have a different domain of analyticity and a
different growth behavior in the complex plane. The larger a, the closer the pole of f
will move to [1, 1], the more the choice of the ellipses is restricted (i.e. max becomes
smaller). This will result in a larger bound.
Using [1, Eq. (4.1.99)], if follows immediately that the asymptotic behaviour of the inter-
polation does not change after rescaling of the interval. In fact, if is the affine pullback
from [a, a] to [1, 1], then:
Given a mesh T = {0 t0 < t1 < < tn 1} on the unit interval I = [0, 1], n N, we
define the piecewise linear interpolant
IT C 0 (I) P1,T = {s C 0 (I), s[tj1 ,tj ] P1 j}, s.t. (IT f )(tj ) = f (tj ), j = 0, . . . , n;
(2a) If we choose the uniform mesh T = {tj }nj=0 with tj = j/n, given a function
f C 2 (I), what is the asymptotic behavior of the error
f IT f L (I) ,
when n ?
H INT: Look for a suitable estimate in [1, Section 4.5.1].
Solution: Equation [1, (4.5.12)] says
1
f IT f L (I) f (2) L (I) ,
2n2
4
because the meshwidth is h = 1/n. So, the convergence is quadratic, i.e., algebraic with
order 2.
f I R, f (t) = t , 0<<2?
1 P = p o l y f i t ( l o g (x), l o g (y),1);
2 slope = P(1);
Solution: The interpolant is implemented in Listing 41, the convergence for our choice
of f is studied in file PWlineConv.m and the results are plotted in Figure 17. The
convergence is clearly algebraic, the rate is equal to if it is smaller than 2, and equal to
2 otherwise. In brief, we can say that the order is min{, 2}.
Be careful with the case = 1: here the interpolant gets exactly the solution, with every
mesh.
5
2 % compute and evaluate piecewise linear interpolant
3 % t and y data vector of the same size
4 % t_ev vector with evaluation points
5 % --> y_ev column vector with evaluations in t_ev
6
14 y_ev = z e r o s ( s i z e (t_ev));
15 f o r k=1:n
16 t_left = t(k);
17 t_right = t(k+1);
18 ind = f i n d ( t_ev t_left & t_ev < t_right );
19 y_ev(ind) = y(k) +
(y(k+1)-y(k))/(t_right-t_left)*(t_ev(ind)-t_left);
20 end
21 % important! take care of last node:
22 y_ev( f i n d (t_ev == t(end))) = y(end);
f (t) = ( 1)t2
6
Figure 17: h-convergence of piecewise linear interpolation for f (t) = t , =
0.1, 0.3, . . . , 2.9. The convergence rates are shown in the small plot.
3
10
2
conv. rate
1.5
1
0.5
4
0 10
0.5 1 1.5 2 2.5 3 5 10 20 50
alpha n = # subintervals
is monotonically decreasing for 0 < < 2, therefore we can expect a large error in the first
subinterval, the one that is closer to 0.
In line 23 of the code in PWlineConv.m, we check our guess: the maximal error is
found in the first interval for every (0, 2) ( 1) and in the last one for > 2.
7
every t (0, 1/n) and 0 < < 2 ( 1) we compute the minimum of the error function
1
(t) = f (t) (IT f )(t) = t t , ((0) = (1/n) = 0),
n1
1
(t) = t1 ,
n1
1 1/(1) 1
(t ) = 0 if t = ,
n 2n
/(1) 1/(1) 1
max (t) = (t ) =
= /(1) 1/(1) = O(n ) = O(h ).
t(0,1/n) n n n
The order of convergence in h = 1/n is equal to the parameter , as observed in Figure 17.
(2f) Since the interpolation error is concentrated in the left part of the domain, it
seems reasonable to use a finer mesh only in this part. A common choice is an alge-
braically graded mesh, defined as
j
G = {tj = ( ) , j = 0, . . . , n},
n
for a parameter > 1. An example is depicted in Figure 18 for = 2.
0.81
algeb. graded mesh, beta=2
0.64
0.49
0.36
0.25
0.16
0.09
0.04
0.01
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
uniform mesh
8
For a fixed parameter in the definition of f , numerically determine the rate of conver-
gence of the piecewise linear interpolant IG on the graded mesh G as a function of the
parameter . Try for instance = 1/2, = 3/4 or = 4/3.
How do you have to choose in order to recover the optimal rate O(n2 ) (if possible)?
Solution: The code in file PWlineGraded.m studies the dependence of the convergence
rates on and . The result for = 0.5 is plotted in Figure 19.
The comparison of this plot with the analogous ones for different values of suggests that
the choice of = 2/ guarantees quadratic convergence, run the code to observe it.
Proceeding as in (2e), we can see that the maximal error in the first subinterval (0, t1 ) =
(0, 1/n ) is equal to 1/n (/(1) 1/(1) ) = O(n ). This implies that a necessary
condition to have quadratic convergence is 2/. In order to prove un upper bound on
the optimal , we should control the error committed in every subinterval, here the exact
computation of (t ) becomes quite long and complicate.
For larger values of the grading parameter, the error in last few subintervals begins to
increase. The variable LocErr contains the index of the interval where the maximal
error is attained (take a look at its values). It confirms that the largest error appears in the
first subinterval if 2 and in the last one if 2, the intermediate cases are not
completely clear.
Figure 18 has been created with the code in Listing 42.
9
Figure 19: h-convergence of piecewise linear interpolation for f (t) = t , = 0.5, on alge-
braically graded meshes with parameters [1, 5]. The convergence rates in dependence
on are shown in the small plot.
2
10
2.5
3
conv. rate
1.5 10
0.5
4
10
0 1 2
10 10 10
0
1 2 3 4 5 n = # subintervals
beta
(n) 2j + 1
j = cos ( ) , j = 0, . . . , n 1. (49)
2n
We define the family of discrete L2 semi inner products, cf. [1, Eq. (4.2.21)],
n1
(n) (n)
(f, g)n = f (j )g(j ), f, g C 0 ([1, 1]) (50)
j=0
10
and the special weighted L2 inner product
1 1
(f, g)w = f (t)g(t) dt f, g C 0 ([1, 1]) (51)
1 1 t2
(3a) Show that the Chebyshev polynomials are an orthogonal family of polynomials
with respect to the inner product defined in (51) according to [1, Def. 4.2.24], namely
(Tk , Tl )w = 0 for every k l.
H INT: Recall the trigonometric identity 2 cos(x) cos(y) = cos(x + y) + cos(x y).
Solution: For k, l = 0, . . . , n with k l, by using the substitution s = arccos t (ds =
1t
1
2
dt) and simple trigonometric identities we readily compute
1 1
(Tk , Tl )w = cos(k arccos t) cos(l arccos t) dt
1 1 t2
= cos(ks) cos(ls) ds
0
1
= cos((k + l)s) + cos((k l)s) ds
2 0
1
= ([sin((k + l)s)/(k + l)]0 + [sin((k l)s)/(k l)]0 )
2
= 0,
since k + l 0 and k l 0.
Consider the following statement.
11
2 #include <math.h>
3 #include <vector>
4 #include <Eigen/Dense>
5
6 using namespace s t d ;
7
12
k l. For k, l = 0, . . . , n + 1 with k l by (70) we have
n
(n) (n)
(Tk , Tl )n+1 = Tk (j )Tl (j )
j=0
n
2j + 1 2j + 1
= cos (k ) cos (l ) (52)
j=0 2(n + 1) 2(n + 1)
1 n 2j + 1 2j + 1
= (cos ((k + l) ) + cos ((k l) )).
2 j=0 2(n + 1) 2(n + 1)
1 eim 1 eim 1
= im 2(n+1)
1 1 = 1 iR,
e
e im 2(n+1) 2 i sin m 2(n+1)
1 j
implying Re(eim 2(n+1) nj=0 eim n+1 ) = 0 as desired.
(3d) Given a function f C 0 ([1, 1], find an expression for the best approximant
qn Pn of f in the discrete L2 -norm:
qn = argmin f pn+1 ,
pPn
where n+1 is the norm induced by the scalar product ( , )n+1 . You should express qn
through an expansion in Chebychev polynomials of the form
n
qn = j Tj (54)
j=0
13
for suitable coefficients j R.
H INT: The task boils down to determining the coefficients j . Use the theorem you have
just proven and a slight extension of [1, Cor. 4.2.14].
Solution: In view of the theorem, the family {T0 , . . . , Tn } is an orthogonal basis of Pn
with respect to the inner product ( , )n+1 . By (52) and (53) we have
2 1 n (cos(0) + cos(0)) = n + 1 if k = 0,
2k = Tk n+1 = (Tk , Tk )n+1 = 21 j=0 (55)
2 j=0 cos(0) = (n + 1)/2
n
otherwise.
that returns the vector of coefficients (j )j in (54) given a function f . Note that the degree
of the polynomial is indirectly passed with the length of the output alpha. The input f is
a lambda-function, e.g.
(3f) Test bestpolchebnodes with the function f (x) = (5x)12 +1 and n = 20. Ap-
proximate the supremum norm of the approximation error by sampling on an equidistant
grid with 106 points.
H INT: Again, [1, Code 4.1.70] is useful for evaluating Chebychev polynomials.
Solution: See file ChebBest.cpp. The output (plotted with Matlab) is shown in Fig-
ure 20.
14
1.2
Chebyshev best approximation
Exact function
1
0.8
0.6
0.4
0.2
0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
H INT: Again use the above theorem to express the coefficients of a Chebychev expansion
of Lj .
Solution: We have already seen that {Tl /l l = 0, . . . , n} is an ONB of Pn . Thus we can
write
n
(Lj , Tl )n+1 n n
(n+1) (n+1) Tl
Lj = 2
Tl = Lj (k )Tl (k ) 2
l=0 l l=0 k=0 l
(n+1)
By definition of Lagrange polynomials we have Lj (k ) = jk , whence
n
(n+1) Tl
Lj = Tl (l ) .
l=0 2l
15
was defined in [1, Section 3.4]. For f C 4 ([a, b]) it enjoys h-convergence with rate 4 as
we have seen in [1, Exp. 4.5.15].
Now we consider cases, where perturbed or reconstructed slopes are used. For instance,
this was done in the context of monotonicity preserving piecewise cubic Hermite interpo-
lation as discussed in [1, Section 3.4.2].
(4a) Assume that piecewise cubic Hermite interpolation is based on perturbed slopes,
that is, the piecewise cubic function s on M satisfies:
j = O(h ) , N0 ,
for mesh-width h 0.
H INT: Use a local generalized cardinal basis functions, cf. [1, 3.4.3].
Solution: Let s be the piecewise cubic polynomial interpolant of f . We can rewrite s
using the local representation with cardinal basis:
since H3 (t)L ([ti1 ,ti ]) = H4 (t)L ([ti1 ,ti ]) = O(h) (attain maximum at t = 3h (ti
1
t)
3 2
resp. minimum at t = 3h 2
(ti t), with value h(( 23 ) ( 32 ) )).
16
(4b) Implement a strange piecewise cubic interpolation scheme in C++ that satisfies:
s(xj ) = f (xj ) , s (xj ) = 0
and empirically determine its convergence on a sequence of equidistant meshes of [5, 5]
with mesh-widths h = 2l , l = 0, . . . , 8 and for the interpoland f (t) = 1+t
1
2.
As a possibly useful guideline, you can use the provided C++ template, see the file
piecewise_hermite_interpolation_template.cpp.
Compare with the insight gained in (4a).
Solution: According to the previous subproblem, since s (xj ) = f (xj ) f (xj ), i.e.
j = O(1), = 0, the convergence order is limited to O(h).
For the C++ solution, cf. piecewise_hermite_interpolation.cpp.
17
Subtracting the second equation to the first equation:
f (xj+1 ) f (xj1 )
= 2f (xj ) + O(h2 )
h
For the one-sided difference we expand at x = xj+2 and x = xj+1 :
f (xj+1 ) f (xj )
= f (xj ) + f (xj )h/2 + O(h2 )
h
f (xj+2 ) f (xj )
= f (xj ) + f (xj )h + O(h2 )
2h
Subtracting the first equation to half of the second equation:
18
Prof. R. Hiptmair AS 2015
ETH Zrich
G. Alberti,
D-MATH
F. Leonardi Numerical Methods for CSE
Problem Sheet 10
(1a) Prove the following interleaving property of the zeros of the Legendre polyno-
mials. For all n N0 we have
1. Understand that it is enough to show that every pair of zeros (ln , l+1
n
) of Pn is
separated by a zero of Pn1 .
2. Argue by contradiction.
3. By considering the auxiliary polynomial jl,l+1 (t jn ) and the fact that the Gauss
quadrature is exact on P2n1 prove that Pn1 (ln ) = Pn1 (l+1
n
) = 0.
4. Choose s Pn2 such that s(jn ) = Pn1 (jn ) for every j l, l + 1, and using again
that Gauss quadrature is exact on P2n1 obtain a contradiction.
Solution: By [1, Lemma 5.3.27], Pn has exactly n distinct zeros in ] 1, 1[. Therefore,
it is enough to prove that every pair of zeros of Pn is separated by one zero of Pn1 . By
1
contradiction, assume that there exists l = 1, . . . , n 1 such that ]ln , l+1
n
[ does not contain
any zeros of Pn1 . As a consequence, Pn1 (l ) and Pn1 (l+1 ) have the same sign, namely
n n
q(t) = (t jn ).
jl,l+1
By construction, q Pn2 , whence Pn1 q P2n3 . Thus, using (57) and the fact that
Gaussian quadrature is exact on P2n1 we obtain
1 n
0= Pn1 q dt = wj Pn1 (jn )q(jn ) = wl Pn1 (ln )q(ln ) + wl+1 Pn1 (l+1
n
)q(l+1
n
).
1 j=1
Hence
[wl Pn1 (ln )q(ln )] [wl+1 Pn1 (l+1
n
)q(l+1
n
)] 0.
On the other hand, by [1, Lemma 5.3.29] we have wl , wl+1 > 0. Moreover, by construction
of q we have that q(ln )q(l+1
n
) > 0. Combining these properties with (56) we obtain that
Choose now s Pn2 such that s(jn ) = Pn1 (jn ) for every j l, l + 1. Using again (57)
and the fact that Gaussian quadrature is exact on P2n1 we obtain
1 n
0= Pn1 s dt = wj Pn1 (jn )s(jn ) = wj (Pn1 (jn ))2 .
1 j=1 jl,l+1
Since Pn1 has only n 1 zeros and the weights wj are all positive, the right hand side of
this equality is strictly positive, thereby contradicting the equality itself.
Solution 2: There is a shorter proof of this fact based on the recursion formula [1,
Eq. (5.3.32)]. We sketch the main steps.
We will prove the statement by induction. If n = 1 there is nothing to prove. Suppose now
that the statement is true for n. By [1, Eq. (5.3.32)] we have Pn+1 (jn ) = n+1
n
Pn1 (jn ) for
2
every j = 1, . . . , n. Further, since the statement is true for n we have (1)nj Pn1 (jn ) > 0.
Therefore
(1)n+1j Pn+1 (jn ) > 0, j = 1, . . . , n. (58)
Since the leading coefficient of Pn+1 is positive, we have Pn+1 (x) > 0 for all x > n+1
n+1
and
(1) Pn+1 (x) > 0 for all x < 1 . Combining these two inequalities with (58) yields
n+1 n+1
(1b) By differentiating [1, Eq. (5.3.32)] derive a combined 3-term recursion for the
sequences (Pn )n and (Pn )n .
Solution: Differentiating [1, Eq. (5.3.32)] immediately gives
2n + 1 2n + 1 n
Pn+1 (t) = Pn (t) + tPn (t) P (t), P0 = 0, P1 = 1,
n+1 n+1 n + 1 n1
which combined with [1, Eq. (5.3.32)] gives the desired recursions.
that fills the matrices Lx and DLx in RN (n+1) with the values (Pk (xj ))jk and (Pk (xj ))jk ,
k = 0, . . . , n, j = 0, . . . , N 1, for an input vector x RN (passed in x).
Solution: See file legendre.cpp.
that computes the Gauss points jk [1, 1], j = 1, . . . , k, k = 1, . . . , n, using the zero find-
ing approach outlined above. The Gauss points should be returned in an upper triangular
n n-matrix.
3
H INT: For simplicity, you may want to write a C++ function
that computes Pk (x) for a scalar x. Reuse parts of the function legvals.
Solution: See file legendre.cpp.
0.4
P8
0.3 P7
0.2
0.1
-0.1
-0.2
-0.3
0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
Figure 21: P7 and P8 on a part of [1, 1]. The secant method fails to find the zeros of P8
(blue curve) when started with the zeros of P7 (red curve).
(1f) Fix your function gaussPts taking into account the above considerations. You
should use the regula falsi, that is a variant of the secant method in which, at each step,
we choose the old iterate to keep depending on the signs of the function. More precisely,
4
given two approximations x(k) , x(k1) of a zero in which the function f has different signs,
compute another approximation x(k+1) as zero of the secant. Use this as the next iterate,
but then chose as x(k) the value z {x(k) , x(k1) } for which signf (x(k+1) ) signf (z).
This ensures that f has always a different sign in the last two iterates.
H INT: The regula falsi variation of the secant method can be easily implemented with a
little modification of [1, Code 2.3.25]:
1 f u n c t i o n x = secant_falsi(x0,x1,F,rtol,atol)
2 fo = F(x0);
3 f o r i=1:MAXIT
4 fn = F(x1);
5 s = fn*(x1-x0)/(fn-fo); % correction
6 i f (F(x1 - s)*fn < 0)
7 x0 = x1; fo = fn; end
8 x1 = x1 - s;
9 i f (abs(s) < max(atol,rtol* min(abs([x0;x1]))))
10 x = x1; r e t u r n ; end
11 end
We want to approximate this integral using global Gauss quadrature. The nodes (vector x)
and the weights (vector w) of n-point Gaussian quadrature on [1, 1] can be computed us-
ing the provided M ATLAB routine [x,w]=gaussquad(n) (in the file gaussquad.m).
function GaussConv(f_hd)
that produces an appropriate convergence plot of the quadrature error versus the number
n = 1, . . . , 50 of quadrature points. Here, f_hd is a handle to the function f .
Save your convergence plot for f (t) = sinh(t) as GaussConv.eps.
5
H INT: Use the M ATLAB command quad with tolerance eps to compute a reference value
of the integral.
H INT: If you cannot implement the quadrature formula, you can resort to the M ATLAB
function
function I = GaussArcSin(f_hd,n)
6
Convergence of Gauss quadrature
10 0
Gauss quad.
O(n -3 )
10 -1
10 -2
error
10 -3
10 -4
10 -5
10 -6
10 0 10 1 10 2
n = # of evaluation points
(2c) Transform the integral (59) into an equivalent one with a suitable change of vari-
able so that Gauss quadrature applied to the transformed integral converges much faster.
Solution: With the change of variable t = sin(x), dt = cos xdx
1 /2
I = arcsin(t) f (t) dt = x f (sin(x)) cos(x) dx.
1 /2
(the change of variable has to provide a smooth integrand on the integration interval)
function GaussConvCV(f_hd)
which plots the quadrature error versus the number n = 1, . . . , 50 of quadrature points for
the integral obtained in the previous subtask.
Again, choose f (t) = sinh(t) and save your convergence plot as GaussConvCV.eps.
H INT: In case you could not find the transformation, you may rely on the function
function I = GaussArcSinCV(f_hd,n)
7
implemented in GaussArcSinCV.p that applies n-points Gauss quadrature to the trans-
formed problem.
Solution: See Listing 45 and Figure 23:
(2e) Explain the difference between the results obtained in subtasks (2a) and (2d).
Solution: The convergence is now exponential. The integrand of the original integral be-
longs to C 0 ([1, 1]) but not to C 1 ([1, 1]) because the derivative of the arcsin function
blows up in 1. The change of variable provides an analytic integrand: x cos(x) sinh(sin x).
Gauss quadrature ensures exponential convergence only if the integrand is analytic. This
explains the algebraic and the exponential convergence.
8
Convergence
0
of Gauss quadrature for rephrased problem
10
Gauss quad.
eps
10 -5
error
10 -10
10 -15
10 -20
0 10 20 30 40 50
n = # of evaluation points
and then use a standard quadrature rule (like Gauss-Legendre quadrature) on [b, b].
(3a) For the integrand g(t) = 1/(1 + t2 ) determine b such that the truncation error
ET satisfies:
b
ET = g(t)dt g(t)dt 106 (60)
b
9
(3b) What is the algorithmic difficulty faced in the implementation of the truncation
approach for a generic integrand?
Solution: A good choice of b requires a detailed knowledge about the decay of f , which
may not be available for f defined implicitly.
A second option (S) is the transformation of the improper integral to a bounded domain
by substitution. For instance, we may use the map t = cot(s).
(3c) Into which integral does the substitution t = cot(s) convert f (t)dt?
Solution:
dt
= (1 + cot2 (s)) = (1 + t2 ) (62)
ds
0 f (cot(s))
f (t)dt = f (cot(s))(1 + cot2 (s))ds = ds, (63)
0 sin2 (s)
because sin2 () = 1
1+cot2 ()
.
that uses the transformation from (3d) together with n-point Gauss-Legendre quadrature
to evaluate f (t)dt. f passes an object that provides an evaluation operator of the form:
10
1 (double) -> double
function A: fA analytic , fA Pk k N ;
function B: fB C 0 (I) , fB C 1 (I) ;
function C: fC P12 ,
where Pk is the space of the polynomials of degree at most k defined on I. The following
quadrature rules are applied to these functions:
The corresponding absolute values of the quadrature errors are plotted against the number
of function evaluations in Figure 24. Notice that only the quadrature errors obtained with
an even number of function evaluations are shown.
11
Plot #1 Plot #2
0 0
10 10
1
10
2
Absolute error
Absolute error
10
2
10
4
10
3
10
6 4
10 10
0 1 2 0 1 2
10 10 10 10 10 10
Number of function evaluations Number of function evaluations
Plot #3
0
10
5
10
Absolute error
Curve 1
10
10 Curve 2
Curve 3
15
10
20
10
0 5 10 15 20 25 30 35 40
Number of function evaluations
Figure 24: Quadrature convergence plots for different functions and different rules.
(4a) Match the three plots (plot #1, #2 and #3) with the three quadrature rules
(quadrature rule A, B, and C). Justify your answer.
12
(4b) The quadrature error curves for a particular function fA , fB and fC are plotted
in the same style (curve 1 as red line with small circles, curve 2 means the blue solid line,
curve 3 is the black dashed line). Which curve corresponds to which function (fA , fB ,
fC )? Justify your answer.
Solution: Curve 1 red line and small circles fC polynomial of degree 12:
integrated exactly with 8 evaluations with global Gauss quadrature.
Curve 2 blue continuous line only fA analytic function:
exponential convergence with global Gauss quadrature.
Curve 3 black dashed line fB non smooth function:
algebraic convergence with global Gauss quadrature.
13
Prof. R. Hiptmair AS 2015
ETH Zrich
G. Alberti,
D-MATH
F. Leonardi Numerical Methods for CSE
Problem Sheet 11
returns a structure QuadRule containing nodes (xj ) and weights (wj ) of a Gauss-Legendre
quadrature ( [1, Def. 5.3.28]) on [1, 1] with n nodes. Have a look at the file gauleg.hpp
and gauleg.cpp, and understand how the implementation works and how to use it.
H INT: Learn/remember how linking works in C++. To use the function gauleg (de-
clared in gauleg.hpp and defined in gauleg.cpp) in a file file.cpp, first include
the header file gauleg.hpp in the file file.cpp, and then compile and link the files
gauleg.cpp and file.cpp. Using gcc:
1
If you want to use CMake, have a look at the file CMakeLists.txt.
Solution: See documentation in gauleg.hpp and gauleg.cpp.
(1b) Study [1, 5.3.37] in order to learn about the convergence of Gauss-Legendre
quadrature.
1 t e m p l a t e < c l a s s func>
2 d o u b l e quadsingint(func&& f, u n s i g n e d i n t n);
that approximately evaluates (66) using 2n evaluations of f . An object of type func must
provide an evaluation operator
1 d o u b l e o p e r a t o r ( d o u b l e t) c o n s t ;
For the quadrature error asymptotic exponential convergence to zero for n must be
ensured by your function.
H INT: A C++ lambda function provides such operator.
H INT: You may use the classical binomial formula 1 t2 = 1 t 1 + t.
H INT: You can use the template quadsingint_template.cpp.
Solution: Exploiting the hint, we see that the integrand is non-smooth in 1.
2
obtaining
1
W (f ) = 1 t2 f (t)dt
1
/2 /2
= 1 sin2 s f (sin s) cos s ds = cos2 s f (sin s)ds.
/2 /2
(1d) Give formulas for the nodes cj and weights wj of a 2n-point quadrature rule on
[1, 1], whose application to the integrand f will produce the same results as the function
quadsingint that you implemented in (1c).
Solution: Using substitution (I). Let (xj , wj ), j = 1, . . . , n be the Gauss nodes and
weights relative to the Gauss quadrature of order n in the interval [0, 1]. The nodes are
mapped from xj in [0, 1] to cl for l 1, . . . , 2n in [1, 1] as follows:
c2ji = (1)i (1 x2j ), j = 1, . . . , n, i = 0, 1.
The weights wl , l = 1, . . . , 2n, become:
w2ji = 2wj xj 2 x2j ,
2
j = 1, . . . , n, i = 0, 1.
Using substitution (II). Let (xj , wj ), j = 1, . . . , n be the Gauss nodes and weights relative
to the Gauss quadrature of order n in the interval [1, 1]. The nodes are mapped from xj
to cj as follows:
cj = sin(xj /2), j = 1, . . . , n
The weights wj , j = 1, . . . , n, become:
wj = wj cos2 (xj /2)/2.
3
Problem 2 Nested numerical quadrature
A laser beam has intensity
= {(x, y)T R2 x 0, y 0, x + y 1}
as a double integral.
H INT: The radiant power absorbed by a surface is the integral of the intensity over the
surface.
Solution: The radiant power absorbed by can be written as:
1 1y
I(x, y)dxdy = 0 0 I(x, y)dxdy.
1 t e m p l a t e < c l a s s func>
2 d o u b l e evalgaussquad( d o u b l e a, d o u b l e b, func&& f, c o n s t
QuadRule & Q);
that evaluates an the N -point quadrature for an integrand passed in f in [a, b]. It should
rely on the quadrature rule on the reference interval [1, 1] that supplied through an object
of type QuadRule. (The vectors weights and nodes denote the weights and nodes of
the reference quadrature rule respectively.)
H INT: Use the function gauleg declared in gauleg.hpp and defined in gauleg.cpp
to compute nodes and weights in [1, 1]. See Problem 1 for further explanations.
H INT: You can use the template laserquad_template.cpp.
Solution: See laserquad.cpp and CMakeLists.txt.
4
1 t e m p l a t e < c l a s s func>
2 d o u b l e gaussquadtriangle(func&& f, i n t N)
5
Problem 3 Weighted Gauss quadrature
The development of an alternative quadrature formula for (66) relies on the Chebyshev
polynomials of the second kind Un , defined as
sin((n + 1) arccos t)
Un (t) = , n N.
sin(arccos t)
Recall the role of the orthogonal Legendre polynomials in the derivation and definition of
Gauss-Legendre quadrature rules (see [1, 5.3.25]).
As regards the integral (66), this role is played by the Un , which are orthogonal polyno-
respect to a weighted L2 inner product, see [1, Eq. (4.2.20)], with weight given
mials with
by w( ) = 1 2 .
for every n 1.
sin(arccos t)
Solution: The case n = 0 is trivial, since U0 (t) = sin(arccos t) = 1, as desired. Using the
sin(arccos t)
trigonometric identity sin 2x = sin x cos x, we have U1 (t) = 2sin(arccos t) = 2 cos arccos t =
2t, as desired. Finally, using the identity sin(x + y) = sin x cos y + sin y cos x, we obtain
for n 2
sin((n + 1) arccos t)t + cos((n + 1) arccos t) sin(arccos t)
Un+1 (t) =
sin(arccos t)
= Un (t)t + cos((n + 1) arccos t).
Similarly, we have
sin((n + 1 1) arccos t)
Un1 (t) =
sin(arccos t)
sin((n + 1) arccos t)t cos((n + 1) arccos t) sin(arccos t)
=
sin(arccos t)
= Un (t)t cos((n + 1) arccos t).
Combining the last two equalities we obtain the desired 3-term recursion.
6
(3b) Show that Un Pn with leading coefficient 2n .
Solution: Let us prove the claim by induction. The case n = 0 is trivial, since U0 (t) = 1.
Let us now assume that the statement is true for every k = 0, . . . , n and let us prove it for
n + 1. In view of Un+1 (t) = 2tUn (t) Un1 (t), since by inductive hypothesis Un Pn and
Un1 Pn1 , we have that Un+1 Pn+1 . Moreover, the leading coefficient will be 2 times
the leading order coefficient of Un , namely 2n+1 , as desired.
7
provides the exact value of (66) for f Pn1 (assuming exact arithmetic).
H INT: Use all the previous subproblems.
Solution: Since Uk is a polynomial of degree exactly k, the set {Uk k = 0, . . . , n 1} is a
basis of Pn1 . Therefore, by linearity it suffices to prove the above identity for f = Uk for
every k. Fix k = 0, . . . , n 1. Setting x = /(n + 1), from (70) we readily derive
n n
j sin((k + 1) arccos jn )
wj Uk (jn ) = sin2 ( )
j=1 j=1 n + 1 n+1 sin(arccos jn )
n
= x sin(jx) sin((k + 1)jx)
j=1
x n
= (cos((k + 1 1)jx) cos((k + 1 + 1)jx))
2 j=1
n
x
= Re (eikxj ei(k+2)xj )
2 j=0
x n
1 ei(k+2)
= Re ( eikxj ).
2 j=0 1 ei(k+2)x
1 1 1 cos(kx) 1
Re ( ) = Re ( )= = .
1e ikx 1 cos(kx) i sin(kx) (1 cos(kx)) + sin(kx)
2 2 2
8
To summarise, we have proved that
n
wj Uk (jn ) = k0 , k = 0, . . . , n 1.
j=1 2
Finally, the claim follows from (3c), since U0 (t) = 1 and so the integral in (66) is nothing
else than the weighted scalar product between Uk and U0 .
(3f) Show that the quadrature formula (71) gives the exact value of (66) even for
every f P2n1 .
H INT: See [1, Thm. 5.3.21].
Solution: The conclusion follows by applying the same argument given in [1, Thm. 5.3.21]
with the weighted L2 scalar product with weight w defined above.
QUn (f ) W (f )
1 template<typename Function>
2 double quadU(const Function &f, unsigned int n)
that gives QUn (f ) as output, where f is an object with an evaluation operator, like a lambda
function, representing f , e.g.
9
(3i) Test your implementation with the function f (t) = 1/(2 + e3t ) and n = 1, . . . , 25.
Tabulate the quadrature error En (f ) = W (f ) QUn (f ) using the exact value W (f ) =
0.483296828976607. Estimate the parameter 0 q < 1 in the asymptotic decay law
En (f ) Cq n characterizing (sharp) exponential convergence, see [1, Def. 4.1.31].
Solution: See file quadU.cpp. An approximation of q is given by En (f )/En1 (f ).
B = 12 Cx1 , C = 1
3x21
1
4 = 1
x3
3x21 1
= 13 x1 , A = 11
27 , i.e.
3 16 1 11
x1 = , C = , B = , A = . (77)
4 27 18 27
Then
1 1 16 81
= x4 dx A 0 + B 0 + C x41 = C x41 = . (78)
5 0 27 256
Hence, the quadrature is exact for polynomials up to degree 3.
(4b)
Compute an approximation of z(2), where the function z is defined as the solution of the
initial value problem
t
z (t) = , z(1) = 1 . (79)
1 + t2
10
Solution: We know that
2
z(2) z(1) = z (x)dx, (80)
1
z(1) = 1,
1 1
z (1) = = ,
1+1 2 2
2 1 1
z (1) = = ,
(1 + 12 )2 2
7 (4)
7
28
z ( ) = 7 2 = ,
4 1 + (4) 65
we obtain
1 11 1 1 1 16 28
z(2) = z (x + 1)dx + z(1) + + 1 = 1.43 . . .
0 27 2 18 2 27 65
For sake of completeness, using the antiderivative of z :
2 1
z(2) = z (x)dx + z(1) = log(x2 + 1)21 + 1 = 1.45 . . .
1 2
11
Prof. R. Hiptmair AS 2015
ETH Zrich
G. Alberti,
D-MATH
F. Leonardi Numerical Methods for CSE
Problem Sheet 12
1 t e m p l a t e < c l a s s State>
2 c l a s s RKIntegrator {
3 public:
4 RKIntegrator( c o n s t Eigen::MatrixXd & A,
5 c o n s t Eigen::VectorXd & b) {
6 // TODO: given a Butcher scheme in A,b, initialize
RK method for solution of an IVP
7 }
8
9 t e m p l a t e < c l a s s Function>
10 std::vector<State> solve( c o n s t Function &f, d o u b l e T,
11 c o n s t State & y0,
12 u n s i g n e d i n t N) c o n s t {
13 // TODO: computes N uniform time steps for the ODE
y'(t) = f(y) up to time T of RK method with
initial value y0 and store all steps (y_k) into
return vector
14 }
15 private:
16 t e m p l a t e < c l a s s Function>
1
17 v o i d step( c o n s t Function &f, d o u b l e h,
18 c o n s t State & y0, State & y1) c o n s t {
19 // TODO: performs a single step from y0 to y1 with
step size h of the RK method for the IVP with rhs f
20 }
21
which implements a generic RK method given by a Butcher scheme to solve the au-
tonomous initial value problem y = f (y), y(t0 ) = y0 .
H INT: See rkintegrator_template.hpp for more details about the implementa-
tion.
Solution: See rkintegrator.hpp.
(1b) Test your implementation of the RK methods with the following data. As
autonomous initial value problem, consider the predator/prey model (cf. [1, Ex. 11.1.9]):
y1 (t) = (1 1 y2 (t))y1 (t) (82)
y2 (t) = (2 y1 (t) 2 )y2 (t) (83)
y(0) = [100, 5] (84)
with coefficients 1 = 3, 2 = 2, 1 = 2 = 0.1.
Use a Runge-Kutta single step method described by the following Butcher scheme (cf. [1,
Def. 11.4.9]):
0 0
1 1
3 3 0 (85)
2 2
3 0 3 0
1 3
4 0 4
2
Problem 2 Order is not everything (core problem)
In [1, Section 11.3.2] we have seen that Runge-Kutta single step methods when applied to
initial value problems with sufficiently smooth solutions will converge algebraically (with
respect to the maximum error in the mesh points) with a rate given by their intrinsic order,
see [1, Def. 11.3.21].
In this problem we perform empiric investigations of orders of convergence of several
explicit Runge-Kutta single step methods. We rely on two IVPs, one of which has a
perfectly smooth solution, whereas the second has a solution that is merely piecewise
smooth. Thus in the second case the smoothness assumptions of the convergence theory
for RK-SSMs might be violated and it is interesting to study the consequences.
3
derive
ENk1 C2(k1)rk ENk1
= 2rk , rk log ( ) / log(2).
ENk C2krk ENk
A reasonable approximation of the order of convergence is given by
1
r rk , K = {k = 1, . . . , 15 ENk > 5n 1014 }. (87)
#K kK
See file errors.hpp for the implementation.
(2b) Calculate the analytical solutions of the logistic ODE (see [1, Ex. 11.1.5])
Solution: As far as (88) is concerned, the solution is y(t) = (1+et )1 (see [1, Eq. (11.1.7)]).
Let us now consider (89). Because of the absolute value on the right hand side of the
differential equation, we have to distinguish two cases y(t) < 1.1 and y(t) > 1.1. Since
the initial condition is given by y(0) = 1 < 1.1, we start with the case y(t) < 1.1. For
y(t) < 1.1, the differential equation is y = 2.1 y. Separation of variables
y(t) 1 t
1 d y = d t
2.1 y 0
y(t) = 10 t
11 e + 0.1.
4
(2c) Use the function errors from (2a) with the ODEs (88) and (89) and the
methods:
0
1/2 1/2
1 1 2
1/6 2/3 1/6
Eul: 1.06
RK2: 2.00
RK3: 2.84
RK4: 4.01
This corresponds to the expected orders. However, in the case of the ODE (89) we obtain
Eul: 1.09
RK2: 1.93
RK3: 1.94
RK4: 1.99
The convergence orders of the explicit Euler and RungeKutta 2 methods are as expected,
but we do not see any relevant improvement in the convergence orders of RK3 and RK4.
This is due to the fact that the right hand side of the IVP is not continuously differentiable:
the convergence theory breaks down.
See file order_not_all.cpp for the implementation.
5
Problem 3 Integrating ODEs using the Taylor expansion method
In [1, Chapter 11] of the course we studied single step methods for the integration of initial
value problems for ordinary differential equations y = f (y), [1, Def. 11.3.5]. Explicit
single step methods have the advantage that they only rely on point evaluations of the
right hand side f .
This problem examines another class of methods that is obtained by the following reason-
ing: if the right hand side f Rn Rn of an autonomous initial value problem
y = f (y) , y(0) = y0 , (90)
with solution y R Rn is smooth, also the solution y(t) will be regular and it is possible
to expand it into a Taylor sum at t = 0, see [1, Thm. 2.2.15],
m
y(n) (0) n
y(t) = t + Rm (t) , (91)
n=0 n!
with remainder term Rm (t) = O(tm+1 ) for t 0.
A single step method for the numerical integration of (90) can be obtained by choosing
m = 3 in (91), neglecting the remainder term, and taking the remaining sum as an approx-
imation of y(h), that is,
dy 1 d2 y 1 d3 y
y(h) y1 = y(0) + (0)h + (0)h 2
+ (0)h3 .
dt 2 dt2 6 dt3
Subsequently, one uses the ODE and the initial condition to replace the temporal deriva-
l
tives ddtyl with expressions in terms of (derivatives of ) f . This yields a single step integra-
tion method called Taylor (expansion) method.
dy d2 y
(3a) Express dt (t) and dt2 (t) in terms of f and its Jacobian Df .
H INT: Apply the chain rule, see [1, 2.4.5], then use the ODE (90).
Solution: For the first time derivative of y, we just use the differential equation:
dy
(t) = y (t) = f (y(t)).
dt
For the second derivative, we use the previous equation and apply chain rule and then once
again insert the ODE:
d2 y d
2
(t) = f (y(t)) = Df (y(t)) y (t) = Df (y(t)) f (y(t)).
dt dt
Here Df (y(t)) is the Jacobian of f evaluated at y(t).
6
(3b) Verify the formula
d3 y
(0) = D2 f (y0 )(f (y0 ), f (y0 )) + Df (y0 )2 f (y0 ) . (92)
dt3
H INT: this time we have to apply both the product rule [1, (2.4.9)] and chain rule [1,
(2.4.8)] to the expression derived in the previous sub-problem.
To gain confidence, it is advisable to consider the scalar case d = 1 first, where f R R
is a real valued function.
Relevant for the case d > 1 is the fact that the first derivative of f is a linear mapping
Df (y0 ) Rn Rn . This linear mapping is applied by multiplying the argument with the
Jacobian of f . Similarly, the second derivative is a bilinear mapping D2 f (y0 ) Rn Rn
Rn . The i-th component of D2 f (y0 )(v, v) is given by
d3 y d
3
(t) = (f (y(t))f (y(t))) .
dt dt
Product rule and chain rule give
d
(f (y(t))f (y(t))) = f (y(t))y (t)f (y(t)) + f (y(t))f (y(t))y (t).
dt
Inserting the ODE y (t) = f (y(t)) once again yields
d3 y
(t) = f (y(t))f (y(t))2 + f (y(t))2 f (y(t)).
dt3
This already resembles Formula 92. The first term is quadratic in f (y(t)) and involves
the second derivative of f , whereas the second term involves the first derivative of f in
quadratic form.
To understand the formula for higher dimensions, we verify it componentwise. For each
component yi (t) we have a function fi Rn R.
7
For the first derivative of yi (t), this is straightforward:
yi (t) = fi (y(t)).
with the Jacobian Df (y(t)), which contains the gradients of the components of f row-
wise.
Now, we apply product rule to yi (t) to obtain
d T T
yi (t) = ( (grad fi (y(t))) ) y (t) + (grad fi (y(t))) y (t).
dt
The second term of the sum again builds up to
For the first term, we first write the scalar product of the two vectors as a sum and then
interchange the order of derivatives. This is possible as long as functions are sufficiently
differentiable.
n
d T d
( (grad fi (y(t))) ) y (t) = (yj fi (y(t))) fj (y(t)) .
dt j=1 dt
8
(3c) We now apply the Taylor expansion method introduced above to the predator-
prey model (97) introduced in Problem 1 and [1, Ex. 11.1.9].
To that end write a header-only C++ class TaylorIntegrator for the integration of
the autonomous ODE of (97) using the Taylor expansion method with uniform time steps
on the temporal interval [0, 10].
H INT: You can copy the implementation of Problem 1 and modify only the step method
to perform a single step of the Taylor expansion method.
H INT: Find a suitable way to pass the data for the derivatives of the r.h.s. function f to the
solve function. You may modify the signature of solve.
H INT: See taylorintegrator_template.hpp.
Solution: For our particular example, we have
u ( 1 y2 )y1
y = ( ), f (y) = ( 1 ),
v (2 2 y1 )y2
1 y2 1 y1
Df (y) = ( 1 ),
2 y 2 (2 2 y1 )
0 1 0 2
Hf1 (y) = ( ), Hf2 (y) = ( ).
1 0 2 0
See taylorintegrator.hpp.
(3e) What is the disadvantage of the Taylor method compared with a Runge-Kutta
method?
Solution: As we can see in the error table, the error of the studied Runge-Kutta method and
Taylors method are practically identical. The obvious disadvantage of Taylors method
in comparison with Runge-Kutta methods is that the former involves rather complicated
higher derivatives of f . If we want higher order, those formulas get even more compli-
cated, whereas explicit Runge-Kutta methods work with only a few evaluations of f itself,
9
yielding results which are comparable. Moreover, the Taylor expansion method cannot be
applied for f , when this is given in procedural form.
2u1 u2 = u1 (u2 + u1 ) ,
ui1 + 2ui ui+1 = ui (ui1 + ui+1 ) , i = 2, . . . , n 1 ,
2un un1 = un (un + un1 ) , (93)
ui (0) = u0,i i = 1, . . . , n ,
ui (0) = v0,i i = 1, . . . , n ,
(4a) Write (93) as a first order IVP of the form y = f (y), y(0) = y0 (see [1,
Rem. 11.1.23]).
Solution: The second order IVP can be rewritten as a first order one by introducing v:
ui = vi i = 1, . . . , n ,
2v1 v2 = u1 (u2 + u1 ) ,
vi1 + 2vi vi+1 = ui (ui1 + ui+1 ) i = 2, . . . , n 1 ,
2vn vn1 = un (un + un1 ) ,
ui (0) = u0,i i = 1, . . . , n ,
vi (0) = v0,i i = 1, . . . , n .
10
In order to use standard interfaces to RK-SSM for first order ODEs, collect u and v in a
(2n)-dimensional vector y = [u; v]. Then the system reads
yn+1
y = f (y) = .
1 y2n
C g(y1 , . . . , yl )
(4b) Apply the function errors constructed in Problem 2 to the IVP obtained in
the previous subproblem. Use
and the classical RK method of order 4. Construct any sparse matrix encountered as a
sparse matrix in E IGEN. Comment the order of convergence observed.
Solution: See file system.cpp for the implementation. We observe convergence of
order 4.00: this is expected since the function f is smooth.
11
Prof. R. Hiptmair AS 2015
ETH Zrich
G. Alberti,
D-MATH
F. Leonardi Numerical Methods for CSE
Problem Sheet 13
H INT: Remember what property distinguishes an orthogonal matrix. Thus you see that
the assertion we want to verify boils down to showing that the bilinear expression t
Y(t) Y(t) does not vary along trajectories, that is, its time derivative must vanish. This
can be established by means of the product rule [1, Eq. (2.4.9)] and using the differential
equation.
Solution: Let us consider the time derivative of Y Y:
d
(Y Y) = Y Y + Y Y
dt
= (AY) Y + Y AY
= Y AY + Y AY
=0
This implies Y(t) Y(t) is constant. From this, it follows that the orthogonality is pre-
served, as claimed. In fact I = Y(0) Y(0) = Y(t) Y(t).
1
(1b) Implement three C++ functions
which determine, for a given initial value Y(t0 ) = Y0 and for given step size h, approxi-
mations for Y(t0 + h) using one step of the corresponding method for the approximation
of the ODE (94)
Solution: The explicit Euler method is given by
Yk+1 = Yk + hf (tk , Yk ).
Yk+1 = Yk + hAYk .
2
Finally, the implicit mid-point method is given by
hence
1
Yk+1 = Yk + h2 A(Yk + Yk+1 ) Yk+1 = (I h2 A) (Yk + h2 AYk ).
(1c) Investigate numerically, which one of the implemented methods preserves or-
thogonality in the sense of sub-problem (1a) for the ODE (94) and which one doesnt. To
that end, consider the matrix
8 1 6
M = 3 5 7
9 9 2
and use the matrix Q arising from the QR-decomposition of M as initial data Y0 . As
matrix A, use the skew-symmetric matrix
0 1 1
A = 1 0 1 .
1 1 0
To that end, perform n = 20 time steps of size h = 0.01 with each method and compute the
Frobenius norm of Y(T ) Y(T ) I. Use the functions from subproblem (1b).
Solution: Orthogonality is preserved only by the implicit midpoint rule. Explicit and
implicit Euler methods do not preserve orthogonality. See matrix_ode.cpp.
From now we consider a non-linear ODE that is structurally similar to (94). We study the
initial value problem
3
which solves (95) on [0, T ] using the C++ header-only class ode45 (in the file ode45.hpp).
The initial value should be given by a n n E IGEN matrix Y0. Set the absolute tolerance
to 1010 and the relative tolerance to 108 . The output should be an approximation of
Y(T ) Rnn .
H INT: The ode45 class works as follows:
1. Call the constructor, and specify the r.h.s function f and the type for the solution
and the initial data in RhsType, example:
1 ode45<StateType> O(f);
2. (optional) Set custom options, modifying the struct options inside ode45, for
instance:
1 O.options.<option_you_want_to_change> = <value>;
Relative and absolute tolerances for ode45 are defined as rtol resp. atol variables in
the struct options. The return value is a sequence of states and times computed by
the adaptive single step method.
H INT: The type RhsType needs a vector space structure implemented with operators *,
*, *=, += and assignment/copy operators. Moreover a norm method must be available.
Eigen vector and matrix types, as well as fundamental types are eligible as RhsType.
H INT: Have a look at the public interface of ode45.hpp. Look at the template file
matrix_ode_template.cpp.
Solution: The class ode45 can take Eigen::MatrixXd as StateType. Alterna-
tively, one can transform Matrices to Vectors (and viceversa), using the Eigen::Map
function (similar to M ATLAB own reshape function). See matrix_ode.cpp.
4
(1e) Show that the function t Y (t)Y(t) is constant for the exact solution Y(t)
of (95).
H INT: Remember the general product rule [1, Eq. (2.4.9)].
Solution: By the product rule and using the fact that Y is a solution of the IVP we obtain
d d d
(Y (t)Y(t)) = (Y (t))Y(t) + Y (t) (Y(t))
dt dt dt
=((Y(t) Y (t))Y(t)) Y(t) + Y(t)((Y(t) Y (t))Y(t))
which (numerically) determines if the statement from (1e) is true, for t = T and for the
output of matode from sub-problem (1d). You must take into account round-off errors.
The functions input should be the same as that of matode.
H INT: See matrix_ode_template.cpp.
Solution: Let Yk be the output of matode. We compute Y0 Y0 Yk Yk and check if the
(Frobenius) norm is smaller than a constant times the machine eps. Even for an orthogo-
nal matrix, we have to take into account round-off errors. See matrix_ode.cpp.
(1g) Use the function checkinvariant to test wether the invariant is preserved
by ode45 or not. Use the matrix M defined above and and T = 1.
Solution: The invariant is not preserved. See matrix_ode.cpp.
0 0 0 0
1 1 0 0
(96)
1/2 1/4 1/4 0
1/6 1/6 2/3
5
(2a) Consider the prey/predator model
y1 (t) = (1 y2 (t))y1 (t) (97)
y2 (t) = (y1 (t) 1)y2 (t) (98)
y(0) = [100, 1] . (99)
Write a C++ code to approximate the solution up to time T = 1 of the IVP. Use a RK-SSM
defined above. Numerically determine the convergence order of the method for uniform
steps of size 2j , j = 2, . . . , 13.
Use, as a reference solution, an approximation with 214 steps.
What do you notice for big step sizes? What is the maximum step size for the solution to
be stable?
H INT: You can use the rkintegrator.hpp implemented in Problem Sheet 12. See
stabrk_template.cpp.
Solution: The scheme is of order 3. With big step sizes the scheme is unstable. At least
64 steps are needed for the solution to be stable. See stabrk.cpp.
(2b) Calculate the stability function S(z), z = h, C of the method given by the
table (96).
Solution: We obtain the stability function S(z) by applying our method to the model
problem y(t) = y, y(0) = y0 and by writing the result in the form y1 = S(z)y0 , where
z = h. For the increment, we obtain
k1 = y0 ,
k2 = (y0 + hk1 )
= y0 (1 + z),
k3 = (y0 + h4 (k1 + k2 ))
= y0 (1 + 21 z + 14 z 2 ),
and for the update
hy0 1 1
y1 = y0 + (1 + 1 + z + 4 (1 + z + z 2 ))
6 2 4
1 2 1 6
= y0 (1 + z + z + z ) .
2 6
=S(z)
6
Problem 3 Initial Condition for Lotka-Volterra ODE
Introduction. In this problem we will face a situation, where we need to compute the
derivative of the solution of an IVP with respect to the initial state. This paragraph will
show how this derivative can be obtained as the solution of another differential equation.
Please read this carefully and try to understand every single argument.
We consider IVPs for the autonomous ODE
y = f (y) (100)
with smooth right hand side f D Rd , where D Rd is the state space. We take for
granted that for all initial states, solutions exist for all times (global solutions, see [1,
Ass. 11.1.38]).
By its very definition given in [1, Def. 11.1.39]), the evolution operator
R D D, (t, y) (t, y)
satisfies
(t, y) = f ((t, y)).
t
Next, we can differentiate this identity with respect to the state variable y. We assume that
all derivatives can be interchanged, which can be justified by rigorous arguments (which
we wont do here). Thus, by the chain rule, we obtain and after swapping partial derivatives
t and Dy
Dy
(t, y) = Dy (t, y) = Dy (f ((t, y))) = D f ((t, y)) Dy (t, y).
t t
Abbreviating W(t, y) = Dy (t, y) we can rewrite this as the non-autonomous ODE
Here, the state y can be regarded as a parameter. Since (0, y) = y, we also know
W(0, y) = I (identity matrix), which supplies an initial condition for (101). In fact, we
can even merge (100) and (101) into the ODE
d
[y() , W(, y0 )] = [f (y(t)) , D f (y(t))W(t, y0 )] , (102)
dt
which is autonomous again.
7
Now let us apply (101)/(102). As in [1, Ex. 11.1.9], we consider the following autonomous
Lotka-Volterra differential equation of a predator-prey model
u = (2 v)u
(103)
v = (u 1)v
on the state space D = R2+ , R+ = { R > 0}. All the solutions of (103) are periodic and
their period depends on the initial state [u(0), v(0)]T . In this exercise we want to develop
a numerical method which computes a suitable initial condition for a given period.
(3a) For fixed state y D, (101) represents an ODE. What is its state space?
Solution: By construction, is a function with values in Rd . Thus, Dy has values in
Rd,d , and so the state space of (101) is Rd,d , a space of matrices.
(3b) What is the right hand side function for the ODE (101), in the case of the
y = f (y) given by the Lotka-Volterra ODE (103)? You may write u(t), v(t) for solutions
of (103).
Solution: Writing y = [u, v]T , the map f associated to (103) is f (y) = [(2v)u, (u1)v]T .
Therefore
2 v u
D f (y) = [ ].
v u1
Thus, (103) becomes
2 v(t) u(t)
W = [ ] W. (104)
v(t) u(t) 1
This is a non-autonomous ODE.
(3c) From now on we write R R2+ R2+ for the evolution operator associated
with (103). Based on derive a function F R2+ R2+ which evaluates to zero for the
input y0 if the period of the solution of system (103) with initial value
u(0)
y0 = [ ]
v(0)
is equal to a given value TP .
Solution: Let F be defined by
F(y) = (TP , y) y.
If the solution of the system (103) with initial value y0 has period TP , then we have
(TP , y0 ) = y0 , so F(y0 ) = 0.
8
(3d) We write W(T, y0 ), T 0, y0 R2+ for the solution of (101) for the underlying
ODE (103). Express the Jacobian of F from (3c) by means of W.
Solution: By definition of W we immediately obtain
D F(y) = W(TP , y) I
(3e) Argue, why the solution of F(y) = 0 will, in gneneral, not be unique. When will
it be unique?
H INT: Study [1, 11.1.21] again. Also look at [1, Fig. 374].
Solution: If y0 R2+ is a solution, then every state on the trajectory y0 will also be a
solution. Only if the trajectory collapses to a point, that is, if y0 is a stationary point,
f (y0 ) = 0, we can expect uniqueness.
that computes (T, [u0 , v0 ]T ) and W(T, [u0 , v0 ]T ). The first component of the output
pair should contain (T, [u0 , v0 ]T ) and the second component the matrix W(T, [u0 , v0 ]T ).
See LV_template.cpp.
H INT: As in (102), both ODEs (for and W) must be combined into a single autonomous
differential equation on the state space D Rdd .
H INT: The equation for W is a matrix differential equation. These cannot be solved
directly using ode45, because the solver expects the right hand side to return a vector.
Therefore, transform matrices into vectors (and vice-versa).
Solution: Writing w = [u, v, W11 , W21 , W12 , W22 ]T , by (103) and (104) we have that
9
with initial conditions
(3g) Using PhiAndW, write a C++ routine that determines initial conditions u(0)
and v(0) such that the solution of the system (103) has period T = 5. Use the multi-
dimensional Newton method for F(y) = 0 with F from (3c). As your initial approximation,
use [3, 2]T . Terminate the Newton method as soon as F(y) 105 . Validate your
implementation by comparing the obtained initial data y with (100, y).
H INT: Set relative and absolute tolerances of ode45 to 1014 and 1012 , respectively. See
file LV_template.cpp.
H INT: The correct solutions are u(0) 3.110 and v(0) = 2.081.
Solution: See file LV.cpp.
These methods fit the concept of single step methods as introduced in [1, Def. 11.3.5] and,
usually, converge algebraically according to [1, (11.3.20)].
A step with size h of the so-called exponential Euler single step method for the ODE
y = f (y) with continuously differentiable f Rd Rd reads
where D f (y) Rd,d is the Jacobian of f at y Rd , and the matrix function Rd,d Rd,d
is defined as (Z) = (exp(Z) Id) Z1 . Here exp(Z) is the matrix exponential of Z, a
special function exp Rd,d Rd,d , see [1, Eq. (12.1.32)].
The function is implemented in the provided file ExpEul_template.cpp. When
plugging in the exponential series, it is clear that the function z (z) = exp(z)1
z is
analytic on C. Thus, (Z) is well defined for all matrices Z R .
d,d
10
(4a) Is the exponential Euler single step method defined in (105) consistent with the
ODE y = f (y) (see [1, Def. 11.3.10])? Explain your answer.
Solution: In view of (105), consistency is equivalent to (h D f (y0 )) f (y0 ) = f (y0 ) for
h = 0. Since is not defined for the zero matrix, this should be intended in the limit as
h 0. By definition of the matrix exponential we have ehZ = Id + hZ + O(h2 ) as h 0.
Therefore
whence
lim (h D f (y0 )) f (y0 ) = f (y0 ),
h0
as desired.
(4b) Show that the exponential Euler single step method defined in (105) solves the
linear initial value problem
y = A y , y(0) = y0 Rd , A Rd,d ,
exactly.
H INT: Recall [1, Eq. (12.1.32)]; the solution of the IVP is y(t) = exp(At)y0 . To facilitate
formal calculations, you may assume that A is regular.
Solution: Given y(0), a step of size t of the method gives
(4c) Determine the region of stability of the exponential Euler single step method
defined in (105) (see [1, Def. 12.1.49]).
Solution: The discrete evolution h associated to (105) is given by
11
Thus, for a scalar linear ODE of the form y = y for C we have (f (y) = y, Df (y) = )
eh 1
h (y) = y + h y = eh y.
h
Thus, the stability function S C C is given by S(z) = ez . Therefore, the region of
stability of this method is
that implements (105). Here f and df are objects with evaluation operators representing
the ODE right-hand side function f Rd Rd and its Jacobian, respectively.
H INT: Use the supplied template ExpEul_template.cpp.
Solution: See ExpEul.cpp.
(4e) What is the order of the single step method (105)? To investigate it, write a C++
routine that applies the method to the scalar logistic ODE
y = y (1 y) , y(0) = 0.1 ,
in the time interval [0, 1]. Show the error at the final time against the stepsize h = T /N ,
N = 2k for k = 1, . . . , 15. As in Problem 2 in Problem Sheet 12, for each k compute and
show an approximate order of convergence.
H INT: The exact solution is
y(0)
y(t) = .
y(0) + (1 y(0))et
12
Solution: Error = O(h2 ). See ExpEul.cpp.
13
Prof. R. Hiptmair AS 2015
ETH Zrich
G. Alberti,
D-MATH
F. Leonardi Numerical Methods for CSE
Problem Sheet 14
1
(1c) Test your implementation implicit_RKIntegrator of general implicit
RK SSMs with the routine provided in the file implicit_rk3prey.cpp and comment
on the observed order of convergence.
Solution: As expected, the observed order of convergence is 3 (see [1, Ex. 12.3.44]).
x y = [x2 y3 x3 y2 , x3 y1 x1 y3 , x1 y2 x2 y1 ] .
(a y + c(y (a y))) y = 0
2
Dt y(t)2 = 2y(t) y(t) = 0
2
y(t)2 = const. t.
(2b) Compute the Jacobian Df (y). Compute also the spectrum (Df (y)) in the
stationary state y = a, for which f (y) = 0. For simplicity, you may consider only the case
a = [1, 0, 0] .
Solution: Using the definition of cross product, a simple but tedious calculation shows
that
ca2 y2 ca3 y3 a3 + 2ca1 y2 ca2 y1 a2 ca3 y1 + 2ca1 y3
Df (y) = a3 ca1 y2 + 2ca2 y1 ca3 y3 ca1 y1 a1 + 2ca2 y3 ca3 y2 .
a2 + 2ca3 y1 ca1 y3 a1 ca2 y3 + 2ca3 y2 ca1 y1 ca2 y2
2
Thus for y = a we obtain
c(a2 + a2 ) a3 + ca1 a2 a2 + ca3 a1
2 3
Df (a) = a3 + ca1 a2 c(a21 + a23 ) a1 + ca2 a3 .
a2 + ca1 a3 a1 + ca2 a3 c(a21 + a22 )
A direct calculation gives that the spectrum is given by
2
(Df (a)) = {0, c a2 i a2 } = {0, c i}.
Therefore, the problem is stiff for large c (see [1, 12.2.14]).
(2c) For a = [1, 0, 0] , (106) was solved with the standard M ATLAB integrators
ode45 and ode23s up to the point T = 10 (default Tolerances). Explain the different
dependence of the total number of steps from the parameter c observed in Figure 25.
Solution: In the plot, we see that, for the solver ode45, the number of steps rises with
c. On the other hand, ode23s uses roughly the same amount of steps, irregardless of the
value of c chosen.
As ode45 is an explicit solver, it suffers under stability-based step-size limits for large
c > 0. The implicit solver ode23s must however not take smaller step-sizes to satisfy the
tolerance for big c. In other words: the problem becomes stiffer, the greater the parameter
c is. This is expected from what we saw above.
(2d) Formulate the non-linear equation given by the implicit mid-point rule for the
initial value problem (106).
Solution: With the formula for the implicit mid-point rule
yk + yk+1
yk+1 = yk + hf ( )
2
the formulation of (106) is given as
yk+1 yk yk + yk+1 yk + yk+1 yk + yk+1
=a( ) + c( ) (a ( ))
h 2 2 2
(2e) Solve (106) with a = [1, 0, 0] , c = 1 up to T = 10. Use the implicit mid-point
rule and the class developed for Problem 1 with N = 128 timesteps (use the template
cross_template.cpp). Tabulate yk 2 for the sequence of approximate states gen-
erated by the implicit midpoint method. What do you observe?
Solution: See file cross.cpp. As expected from (2a), the norm of the approximate
states is constant.
3
1400
ode45
ode23s
1200
1000
Anzahl Zeitschritte
800
600
400
200
0
0 10 20 30 40 50 60 70 80 90 100
c
Figure 25: Subproblem (2c): number of steps used by standard M ATLAB integrators in
relation to the parameter c.
y = f (y)
with smooth f .
Solution: The linear implicit mid-point rule is obtained by developing the increment k1
of the implicit mid point rule by its Taylor series
h h
k1 = f (yk + k1 ) = f (yk ) + Df (yk )k1 + O(h2 )
2 2
4
and only taking the linear terms. Since the non-linear method is given by yk+1 = yk + hk1 ,
the linearization reads
1
h
yk+1 = yk + hklin.
1 , klin.
1 = (I Df (yk )) f (yk ).
2
(2g) Implement the linearimplicit midpoint rule using the template provided in
cross_template.cpp. Use this method to solve (106) with a = [1, 0, 0] , c = 1 up to
T = 10 and N = 128. Tabulate yk 2 for the sequence of approximate states generated by
the linear implicit midpoint method. What do you observe?
Solution: See file cross.cpp. The sequence of the norms is not exactly constant: this
is due to the approximation introduced with the linearization.
y = f (y) (107)
Wk1 = f (y0 )
1
Wk2 = f (y0 + hk1 ) ahJk1 (108)
2
y1 = y0 + hk2
where
J = Df (y0 )
W = I ahJ
1
a= .
2+ 2
5
(3a) Compute the stability function S of the Rosenbrock method (108), that is,
compute the (rational) function S(z), such that
y1 = S(z)y0 , z = h,
when we apply the method to perform one step of size h, starting from y0 , of the linear
scalar model ODE y = y, C.
Solution: For a scalar ODE, the Jacobian is just the derivative w.r.t. y, whence J =
Df (y) = f (y) = (y) = . The quantity W is, therefore, a scalar quantity as well:
W = 1 ahJ = 1 ah
= y0 + h
W
2y
(y0 + 2 h 1ah ) ah
1 y0 0
= y0 + h 1ah
1 ah
(1 ah) + h(1 ah) + 21 h2 2 ah2 2
2
= y0
(1 ah)2
(1 + a2 z 2 2az) + z(1 az) + 21 z 2 az 2
= y0
(1 az)2
1 + (1 2a)z + ( 21 2a + a2 )z 2
= y0
(1 az)2
Since 21 2a + a2 = 0, it follows
1 + (1 2a)z
S(z) =
(1 az)2
(3b) Compute the first 4 terms of the Taylor expansion of S(z) around z = 0. What
is the maximal q N such that
6
H INT: The idea behind this sub-problem is elucidated in [1, Rem. 12.1.19]. Apply [1,
Lemma 12.1.21].
Solution: We compute:
S(0) = 1,
S (0) = 1,
S (0) = 4a 2a2 ,
S (0) = 18a2 12a3 ,
therefore
Using [1, Lemma 12.1.21], we deduce that the maximal order q of the scheme is q = 2.
taking as input function handles for f and Df (e.g. as lambda functions), an initial data
(vector or scalar) y0 = y(0), a number of steps N and a final time T . The function returns
the sequence of states generated by the single step method up to t = T , using N equidistant
steps of the Rosenbrock method (108).
H INT: See rosenbrock_template.cpp.
Solution: See rosenbrock.cpp.
7
(3d) Explore the order of the method (108) empirically by applying it to the IVP for
the limit cycle [1, Ex. 12.2.5]:
0 1
f (y) = [ ] y + (1 y2 )y, (109)
1 0
with = 1 and initial state y0 = [1, 1] on [0, 10]. Use fixed timesteps of size h = 2k , k =
4, . . . , 10 and compute a reference solution with h = 212 step size. Monitor the maximal
mesh error:
maxyj y(tj )2 .
j
(1 y2 ) 2y02 1 2y1 y0
Df (y) = [ ].
1 2y1 y0 (1 y2 ) 2y12
(3e) Show that the method (108) is L-stable (cf. [1, 12.3.37]).
H INT: To investigate the A-stability, calculate the complex norm of S(z) on the imaginary
axis Re z = 0 and apply the following maximum principle for holomorphic functions:
Solution: We start by proving that the method is A-stable [1, 12.3.30], meaning that
S(z) < 1, z C, Re(z) < 0. First of all, we can compute the complex norm of the
stability function at z = iy, y R as:
8
Notice that 1 4a + 4a2 = 2a2 , therefore:
1 + 2a2 y 2
S(iy)2 = < 1.
1 + 2a2 y 2 + a4 y 4
The norm of the function S is bounded on the imaginary axis. Observe that S(z)
0, z , which, follows from the fact that the degree of the denominator of S
is bigger
than the polynomial degree of the numerator. Notice that 1 2a = 1 2 (2)+2
1
= 2 2.
The only pole of the function is at z = 1/a, therefore the function is holomorphic on the
left complex plane.
Applying the theorem of the hint on concludes that the absolute value of the function is
bounded by 1 on the left complex plane.
The L-stability follows immediately with A-stability and the fact that the absolute value
of the function converges to zero as z .
c1 0
c2 a21
c A
= , (110)
bT
cs as1 as,s1
b1 bs1 bs
with =/ 0.
More concretely, in this problem the scalar linear initial value problem of second order
9
Kutta Method). It is a Runge-Kutta method described by the Butcher scheme
0
1 1 2 . (112)
1/2 1/2
(4a) Explain the benefit of using SDIRK-SSMs compared to using Gauss-Radau RK-
SSMs as introduced in [1, Ex. 12.3.44]. In what situations will this benefit matter much?
H INT: Recall that in every step of an implicit RK-SSM we have to solve a non-linear
system of equations for the increments, see [1, Rem. 12.3.24].
(4b) State the equations for the increments k1 and k2 of the Runge-Kutta method
(112) applied to the initial value problem corresponding to the differential equation y =
f (t, y).
Solution: The increments ki are given by
k1 = f (t0 + h, y 0 + hk1 ) ,
k2 = f (t0 + h(1 ), y 0 + h(1 2)k1 + hk2 ) .
(4c) Show that, the stability function S(z) of the SDIRK-method (112) is given by
1 + z(1 2) + z 2 (1/2 2 + 2 )
S(z) =
(1 z)2
and plot the stability domain using the template stabdomSDIRK.m.
For = 1 is this method:
Astable?
Lstable?
10
where S(z) is the stability function and z = h.
In the case of the SDIRK-method, we get
k1 = (yk + hk1 )
k2 = (yk + h(1 2)k1 + hk2 ),
therefore
k1 = yk ,
1 h
k2 = (yk + h(1 2)k1 ).
1 h
Furthermore
h
yk+1 = yk + (k1 + k2 ),
2
with z = h and by plugging in k1 and k2 we arrive at
z z(1 2)
yk+1 = (1 + (2 + )) yk .
2(1 z) 1 z
=S(z)
Hence
2(1 z)2 + 2z(1 z)2 + z 2 (1 2)
S(z) =
2(1 z)2
1 + z(1 2) + z 2 ( 2 2 + 21 )
S(z) = .
(1 z)2
11
C Re z < 0}. In order to do this we consider the stability function on the imaginary axis
1 iy (iy)2 /22
S1 (iy)2 =
1 iy4
1 iy + y/22
=
1 iy4
(1 + y 2 /2)2 + y 2
=
(1 + y 2 )2
1 + 2y 2 + y 4 /4
= 1, y R.
1 + 2y 2 + y 4
Since the only pole (z = 1) of the rational function S1 (z) lies in the positive half plane
of C, the function S1 is holomorphic in the left half plane. Furthermore S1 is bounded
by 1 on the boundary of this half plane (i.e. on the imaginary axis). So by the maximum
principle for holomorphic functions (hint) S1 is bounded on the entire left half plane by 1.
This implies in particular that S1 is A-stable.
Verification of the L-stability of (113):
S1 is not L-stable (cf. definition [1, 12.3.37]), because
1 z z 2 /2 1
lim S1 (z) = lim = 0.
Re z Re z 1 2z + z 2 2
(4d) Formulate (111) as an initial value problem for a linear first order system for
the function z(t) = (y(t), y(t)) .
Solution: Define z1 = y, z2 = y, then the initial value problem
y + y + y = 0, y(0) = 1, y(0) = 0 (114)
is equivalently to the first order system
z1 = z2
z2 = z1 z2 , (115)
with initial values z1 (0) = 1, z2 (0) = 0.
1 t e m p l a t e < c l a s s StateType>
2 StateType sdirtkStep( c o n s t StateType & z0, d o u b l e h,
d o u b l e gamma);
12
that realizes the numerical evolution of one step of the method (112) for the differential
equation determined in subsubsection (4d) starting from the value z0 and returning the
value of the next step of size h.
H INT: See sdirk_template.cpp.
Solution: Let
0 1
A = ( ).
1 1
Then
k1 = Ay 0 + hAk1
k2 = Ay 0 + h(1 2)Ak1 + hAk2 ,
so
k1 = (I hA)1 Ay 0
k2 = (I hA)1 (Ay 0 + h(1 2)Ak1 ).
(4f) Use your C++ code to conduct a numerical experiment, which gives an indica-
tion of the order of the method (with = 3+6 3 ) for the initial value problem from subsub-
section (4d). Choose y 0 = [1, 0] as initial value, T=10 as end time and N=20,40,80,...,10240
as steps.
Solution: The numerically estimated order of convergence is 3, see sdirk.cpp.
References
[1] R. Hiptmair. Lecture slides for course "Numerical Methods for CSE".
http://www.sam.math.ethz.ch/~hiptmair/tmp/NumCSE/NumCSE15.pdf. 2015.
13