0% found this document useful (0 votes)

30 views180 pages

Computational Methods

This document provides an introduction to using Python for mathematics and computational methods. It covers Python programming basics like input/output, functions, control flow, plotting and more. It then discusses several numerical methods like solving equations, interpolation, integration and differential equations.

Uploaded by

Anshuman Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views180 pages

Computational Methods

Uploaded by

Anshuman Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 180

Mathematics and Computational Methods

Contact: [email protected]
Department of Materials Science and Engineering
Indian Institute of Technology, Kanpur
Disclaimer: The document (under preparation) is not a mathematics textbook.
The main focus is on enhancing your learning experience by using computer
programs. The document starts with a brief introduction to Python, which will
be used for writing small programs and plotting in the rest of the chapters. You
are advised to read standard mathematics textbooks to understand the topics
and use this document only to improve your knowledge by learning to use simple
computer programs.
Contents

1 Brief Introduction to Python 1

1.1 Input, output and simple arithmetic . . . . . . . . . . . . . . . . . . . 1
1.1.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Mathematical functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Control statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.1 IF statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.2 WHILE statement . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.3 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 FOR loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Lists and Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 User-defined function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.7 Simple plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.7.1 2D plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.7.2 3D plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.7.3 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.8 3D graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.8.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Numerical Methods 10
2.1 Solution of non-linear equations . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1 Relaxation method . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Bisection method . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.3 Newton-Raphson method . . . . . . . . . . . . . . . . . . . . . . 13
2.1.4 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Solution of linear equations . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Gaussian elimination and back-substitution . . . . . . . . . . . 14
2.2.2 Jacobi Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.3 Gauss-Seidel method . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.4 Systematic formulation of iterative solution of Ax=b . . . . . . 19
2.2.5 Convergence criteria for iterative methods . . . . . . . . . . . . 22
2.2.6 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

ii
CONTENTS iii

2.4 Numerical Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.4.1 Trapezoidal method . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.2 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Numerical differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5.1 Forward difference . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5.2 Backward difference . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5.3 Central difference . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5.4 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6 Maxima and Minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3 Partial Differentiation 29
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Total differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Some useful formulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.1 Chain rule 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.2 Chain rule 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.3 Chain rule 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.4 Reciprocal relation . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.5 Cyclic relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.6 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 How to find the maximum and minimum . . . . . . . . . . . . . . . . 38
3.4.1 Second derivative test . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.2 Method of Lagrange multipliers . . . . . . . . . . . . . . . . . . 40
3.4.3 Geometrical interpretation . . . . . . . . . . . . . . . . . . . . . 42
3.4.4 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.5 Change of variables: Legendre transform . . . . . . . . . . . . . . . . . 43
3.5.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.6 Differentiation of integrals . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.6.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4 Multiple integrals 48
4.1 Double and triple integrals . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 Change of variables: Jacobians . . . . . . . . . . . . . . . . . . . . . . 51
4.2.1 Cartesian coordinate system . . . . . . . . . . . . . . . . . . . . 51
4.2.2 Polar coordinate system . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.3 Cylindrical coordinate system . . . . . . . . . . . . . . . . . . . 53
4.2.4 Spherical coordinate system . . . . . . . . . . . . . . . . . . . . 54
4.2.5 Jacobians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.6 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3 Octave files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5 Vector analysis 58
5.1 Triple products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.1.1 Scalar triple product . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.1.2 Vector triple product . . . . . . . . . . . . . . . . . . . . . . . . . 58
CONTENTS iv

5.1.3 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2 First derivative of scalar and vector fields . . . . . . . . . . . . . . . . 59
5.2.1 Gradient and directional derivative . . . . . . . . . . . . . . . . 59
5.2.2 Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2.3 Curl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2.4 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3 Second derivative of scalar and vector fields . . . . . . . . . . . . . . . 62
5.3.1 Divergence of gradient or Laplacian . . . . . . . . . . . . . . . . 62
5.3.2 Laplacian of a vector field . . . . . . . . . . . . . . . . . . . . . . 62
5.3.3 Curl of gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.3.4 Divergence of curl . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.3.5 Curl of curl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.3.6 Gradient of divergence . . . . . . . . . . . . . . . . . . . . . . . . 62
5.3.7 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.4 Line integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.4.1 Exact Differential . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.4.2 Scalar potential for a conservative force field . . . . . . . . . . . 65
5.4.3 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.5 Green’s theorem in plane . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.5.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.6 Divergence and divergence theorem . . . . . . . . . . . . . . . . . . . . 66
5.6.1 Physical significance of divergence . . . . . . . . . . . . . . . . . 66
5.6.2 Equation of continuity . . . . . . . . . . . . . . . . . . . . . . . . 66
5.6.3 Divergence theorem: volume and surface integral . . . . . . . . 67
5.6.4 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.7 Curl and Stokes’ theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.7.1 Physical significance of curl . . . . . . . . . . . . . . . . . . . . 68
5.7.2 Stoke’s theorem: surface and line integral . . . . . . . . . . . . 68
5.7.3 Vector potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.7.4 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6 Coordinate transformation and Tensor analysis 72

6.1 Linear Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2 Orthogonal Transformation . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2.1 Rotation in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2.2 Rotation in 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2.3 Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2.4 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.3 Matrix Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.3.1 Eigenvectors and eigenvalues . . . . . . . . . . . . . . . . . . . 76
6.3.2 Similarity transformation and matrix diagonalization . . . . . 77
6.3.3 Rotation matrix revisited . . . . . . . . . . . . . . . . . . . . . . 79
6.3.4 Quadratic form and principal axis theorem (optional) . . . . . 80
6.3.5 Degenerate eigenvalues: Gram-Schmidt orthogonalization . . . 81
6.3.6 Diagonalization of Hermitian matrices . . . . . . . . . . . . . . 83
6.3.7 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
CONTENTS v

7 Ordinary Differential Equations 86

7.1 Linear and non-linear equations . . . . . . . . . . . . . . . . . . . . . . 86
7.2 First order differential equations . . . . . . . . . . . . . . . . . . . . . 86
7.2.1 Separable equations (optional) . . . . . . . . . . . . . . . . . . . 87
7.2.2 Exact equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.2.3 Homogeneous equations (optional) . . . . . . . . . . . . . . . . 91
7.2.4 Linear first order equations . . . . . . . . . . . . . . . . . . . . . 93
7.2.5 Bernoulli equation . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.2.6 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.3 Second order linear differential equations . . . . . . . . . . . . . . . . 99
7.3.1 Constant coefficients and zero right hand side . . . . . . . . . 99
7.3.2 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.3.3 Constant coefficients and non-zero right hand side . . . . . . . 105
7.3.4 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7.4 Coupled first order differential equations . . . . . . . . . . . . . . . . . 106
7.4.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.5 Converting higher order to 1st order equations . . . . . . . . . . . . . . 108
7.5.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

8 Fourier Series and Fourier Transform 110

8.1 Fourier series of functions of period 2π . . . . . . . . . . . . . . . . . . 110
8.1.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.2 Fourier series of functions of period 2L . . . . . . . . . . . . . . . . . . 111
8.2.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
8.3 Fourier cosine/sine series . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.3.1 Fourier series expansion of even functions . . . . . . . . . . . . 113
8.3.2 Fourier series expansion of odd functions . . . . . . . . . . . . 113
8.3.3 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.4 Half range expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.4.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

9 Series Solutions of Ordinary Differential Equations 115

9.1 Legendre equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
9.2 Bessel equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
9.3 Sturm-Liouville boundary value problems (SL-BVP) . . . . . . . . . . 115
9.3.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

10 Solution of Selected Partial Differential Equations 120

10.1 Boundary and initial conditions . . . . . . . . . . . . . . . . . . . . . . 120
10.2 Classification of PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
10.2.1Linear PDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
10.2.2Non-linear PDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
10.2.3Solutions of linear PDEs: principle of superposition . . . . . . 122
10.2.4Further classification of secone-order linear PDEs . . . . . . . 123
10.2.5Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
10.3 Laplace’s equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
10.3.1Solution in one dimension . . . . . . . . . . . . . . . . . . . . . 124
10.3.2Solution in two dimension . . . . . . . . . . . . . . . . . . . . . 125
CONTENTS vi

10.3.3Solving by separating variables . . . . . . . . . . . . . . . . . . 127

10.3.4Numerical solution . . . . . . . . . . . . . . . . . . . . . . . . . . 133
10.3.5Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
10.4 Diffusion or heat equation . . . . . . . . . . . . . . . . . . . . . . . . . 135
10.4.1Solution in one dimension . . . . . . . . . . . . . . . . . . . . . 135
10.4.2Solution in two dimension . . . . . . . . . . . . . . . . . . . . . 137
10.4.3Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
10.5 Wave equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
10.5.1Solution in one dimension . . . . . . . . . . . . . . . . . . . . . 139
10.5.2Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
10.6 Schrödinger equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
10.7 Poisson’s equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

11 Probability and Statistics 143

11.1 Deterministic vs. stochastic process . . . . . . . . . . . . . . . . . . . 143
11.2 How to count? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
11.2.1Permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
11.2.2Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
11.2.3Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
11.2.4Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
11.2.5Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
11.3 Discrete probability functions . . . . . . . . . . . . . . . . . . . . . . . 150
11.3.1Mean value and variance . . . . . . . . . . . . . . . . . . . . . . 151
11.3.2Cumulative distribution function . . . . . . . . . . . . . . . . . 152
11.3.3Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
11.4 Continuous probability functions . . . . . . . . . . . . . . . . . . . . . 153
11.4.1Mean value and variance . . . . . . . . . . . . . . . . . . . . . . 153
11.5 Binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
11.6 Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
11.7 Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

A Introduction to Partial Differential Equations 155

A.1 Classification of PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
A.1.1 Linear PDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
A.1.2 Non-linear PDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
A.1.3 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
A.2 Method of characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . 157
A.2.1 General solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
A.2.2 Initial condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
A.2.3 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
A.3 Canonical form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
A.3.1 Hyperbolic equation . . . . . . . . . . . . . . . . . . . . . . . . . 163
A.4 Boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
A.5 Elliptic PDE: Laplace equation . . . . . . . . . . . . . . . . . . . . . . . 165
A.6 Hyperbolic PDE: Wave equation . . . . . . . . . . . . . . . . . . . . . . 165
A.6.1 Homogeneous wave equation . . . . . . . . . . . . . . . . . . . . 165
A.6.2 Non-homogeneous wave equation . . . . . . . . . . . . . . . . . 167
List of Figures

2.1 Bisection method: given f (x) is continuous between x1 and x2 and

if f (x1 ) and f (x2 ) have opposite signs, then there must be one root
between x1 and x2 . The interval is bisected (xm being the midpoint)
to further narrow down the search and sign of f (xm ) tells whether
the root is located between [x1 , xm ] or [xm , x2 ]. . . . . . . . . . . . . . . 11
2.2 Newton-Raphson method: take some initial guess xn and draw a
tangent at that point. Find the location where the tangent crosses
the x−axis (xn+1 ) and draw a tangent there. Every time we do this,
we move closer to the root. . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Linear interpolation: we know the values of f (x1 ) and f (x2 ), but
would like to know f (x). Linear interpolation would be the simplest
method of doing this. Since we know the slope of the line connecting
the points f (x1 ) and f (x2 ), we can get f (x) (red point) along this line.
However, it may differ from the actual value of the function at x.
Smaller the distance between x1 and x2 , better would be the match
with the actual value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Estimating the area under the curve (a) by dividing the total area in
small rectangular slices of width h and (b) by dividing the area in a
set of trapezoids. Clearly, the second approximation is more accurate. 25
2.5 Calculation of the first derivative using forward, backward and cen-
tral difference method. The filled circles are data points given at a
regular interval h. In case of the forward and backward difference,
we can calculate the derivative on the same set of points (filled cir-
cles). On the other hand, using the central difference, we can get
the derivative on a different set of points (open circles), lying in the
middle of the original set of data points. . . . . . . . . . . . . . . . . . 27

3.1 A function of two variables z(x, y). Along the red curve, y is constant
and along the blue curve, x is constant. Note that, along the red
curve, we can equivalently define x(y, z) with y constant. Similarly,
along the blue curve, we can equivalently define y(x, z) with x con-
stant. Along the dotted curve, we can define either x(y, z) or y(x, z)
with z constant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Tangent or linear approximation of differential ∆y ≈ dy = y 0 dx. . . . . 32
3.3 Curve with a (a) maximum point, (b) minimum point and (c) inflec-
tion point. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

vii
LIST OF FIGURES viii

3.4 Surface with a (a) maximum point, (b) minimum point and (c) saddle
point. Along the blue (red) curve, x (y) remains constant. The two
curves are crossing each other atp the origin. . . . . . . . . . . . . . . . 39
3.5 Finding minimum distance d = x2 + y 2 from the origin, with the
constraint that the point lies on the curve. . . . . . . . . . . . . . . . . 40
3.6 Tangent drawn at (a) minimum d and (b) any other d. See Fig 3.5 for
the definition of d. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.1 Various different areas over which a double integral needs to be cal-
culated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Polar coordinate system. . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3 Cylindrical coordinate system. . . . . . . . . . . . . . . . . . . . . . . . 53
4.4 Spherical coordinate system. . . . . . . . . . . . . . . . . . . . . . . . . 54

5.1 Example of a scalar field. . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.2 Amount of water crossing through area A0 is same as amount of
water crossing through the area A. . . . . . . . . . . . . . . . . . . . . 67
5.3 (left) Line integral over a closed path in xy plane, such that the nor-
mal is pointing towards the k̂ at the given point. (center) We general-
ize the area element and take it to be on the surface of a hemisphere.
(right) Flat view of the hemisphere. . . . . . . . . . . . . . . . . . . . . 68
5.4 In a fishing net, the net forms the open surface and the rim (made
of metal or plastic) is the curve bounding the open surface. . . . . . . 69

6.1 Vector ~r can be transformed to vector r~0 in various ways. . . . . . . . 72

6.2 Anti-clockwise rotation by an angle θ: We can either rotate the vector
keeping the reference axes fixed (left) or rotate the reference system
keeping the vector fixed (right). . . . . . . . . . . . . . . . . . . . . . . 74
6.3 Reflection of a vector about a line making an angle θ with the x-axis. 75
6.4 (Left) When multiplied with some matrix A, an ordinay vector ~v trans-
forms to v~0 . (Right) When multiplied with some matrix A, an eigen
vector ~v transforms to λ~v . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.5 In (x, y) coordinate system, vector ~r is transformed to vector R ~ by
some transformation matrix A. If we rotate the coordinate system
(rotation matrix B) to go to a new coordinate system (x0 , y 0 ), then
~r0 is transformed to vector R ~ 0 (same transformation). In the new
coordinate system, the transformation matrix from ~r0 to R ~ 0 is B −1 AB.
This is known as the similarity transformation. . . . . . . . . . . . . . 78
6.6 A quadratic equation 5x21 −4x1 x2 +5x22 = 20, when represented with re-
spect to the principal axes, simplifies to 3y12 + 7y22 = 20. The principal
axes y1 and y2 are oriented along (î + ĵ) and (−î + ĵ), respectively. . . 81
6.7 Two vectors ~u and ~v are not orthogonal. We can define a set of vec-
tors, ~u0 =~u and ~v 0 =~v − (û · ~v )û, which are perpendicular to each other.
The blue vectors in the diagram just show the direction of ~u0 and ~v 0 . . 82

7.1 Displacement as a function of time for damped harmonic motion. . . 103

7.2 Different systems having similar differential equation: (a) spring-
mass and (b) RLC circuit connected in series. Images are take from
Wikipedia. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
LIST OF FIGURES ix

10.1 Boundary conditions in one (left) and two (right) dimension. Within
the domain, the unknown function f is determined by the differential
d2
equation (Laplace’s equation in this case, ∆ = dx 2 for 1D and ∆ =
∂2 ∂2
∂x2
+ ∂y2 for 2D, respectively). Along the boundaries, values of f are
given by the boundary conditions. Obviously, the function f must
vary smoothly as we move from the interior to the boundary of a
domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
10.2 Harmonic functions obey mean-value property – average value of the
function at the boundary is equal to its value at the center. Examples
shown for (a) one-dimensional domain, (b) and (c) two-dimensional
domain. In case of (b), f = 10 and f = 0 along the top and bottom
part of the perimeter, respectively, such that f = 5 along the line
lying in the middle. What would be the value of the function at the
center in case of (c), where the function is zero everywhere, except at
a single point at the boundary? . . . . . . . . . . . . . . . . . . . . . . 124
10.3 Plot of two harmonic functions (a) x2 − y 2 and (b) 2xy. They do not
have any maximum or minimum, but only a saddle point at (0, 0).
The third one (c) x2 + y 2 is not a harmonic function and it has a
minimum at (0, 0). We can easily verify that x2 + y 2 does not satisfy
Laplace’s equation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
10.4 A bar, having finite width in the y−direction and semi-infinite in the
x− direction. One side (along the y−axis) is held at 200◦ and the long
sides (along the x − axis) is held at 0◦ . The far end is also held at 0◦ .
What would be the temperature distribution within the bar? . . . . . 127
10.5 Eq. 10.21 plotted for n ranging from (a) 1 − 3, (b) 1 − 29, and (c) 1 − 299.129
10.6 Temperature distribution in a circular plate, with boundary condi-
tions shown in Figure 10.2(b). I have plotted Eq. 10.37 by including
99 terms in the sum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
10.7 Finite difference method: the domain is divided in a square of rectan-
gular grid. Note that, grid points (red) are also placed at the bound-
ary. However, values of f (x, y) remain fixed (boundary conditions) at
these grid points. On the other hand, f (x, y) changes with each iter-
ation inside the domain (black points), until convergence is achieved. 133
10.8 (a) Boundary conditions in a square plate, (b) analytically calculated
temperature distribution (Eq. 10.25) and (c) numerically calculated
temperature distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . 135
10.9 (a) A bar is uniformly heated to 50◦ initially. Then, two of its faces
(red) are brought in contact with thermal reservoirs at 0◦ and rest
of the faces (white) are insulated, such that heat flow is essentially
one-dimensional. (b) The temperature profile as a function of time,
as the bar cools down. . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
10.10Let us imagine that a sound is created at the middle of a long tunnel.
2
At t = 0, the pulse has the form e−x . Half of it travels to the right
and the rest travels to the left. I assume v = 1. . . . . . . . . . . . . . 138
2
10.11Wave propagation for initial position G(x) = e−x and initial velocity
H(x) = 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
LIST OF FIGURES x

10.12
Wave propagation for initial position G(x) = 0 and initial velocity
2
H(x) = 2xe−x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

11.1 (a) All possible arrangements of 15 distinguishable pool balls in 15

slots is known as factorial. The first slot can be filled by one of the
15 balls. The second slot can be filled by one of the remaining 14
balls. Thus, there are 15 × 14 × 13 · · · 2 × 1 = 1307674368000 = 15!
arrangements possible. (b) If only 3 slots are available, total number
of possible arrangements reduces to 15 × 14 × 13 = 2730. Both (a) and
(b) are examples of permutation when repetition is not allowed. (c)
Combination lock is an example of permutation, when repetition is
allowed. There are 4 slots and each can be occupied by one of the
10 single digits, ranging from 0-9. The correct combination is one of
the 104 possible 4 digit number. In case of any permutation, order
does matter, i.e., (1234) is different from (4321). (d) In how many
ways we can make a committee of 3 out of 5? In this case, order
does not matter, i.e., a committee made of (X,Y,Z) is same as (X,Z,Y)
or (Z,X,Y) etc. This is an example of combination, when repetition is
not allowed. (e) In how many ways we can arrange the balls, when
some of them are indistinguishable: 7 red, 7 yellow and 1 black. (f)
In how many ways you can choose 3 scoops of ice cream out of n
number of flavors available in the shop? Repetition is allowed in this
case, i.e., you can opt for all 3 chocolate or 1 chocolate, 1 vanilla & 1
strawberry or 2 chocolate & 1 strawberry or any other combination.
Order does not matter in this case, i.e., an arrangement of chocolate
on top, followed by vanilla and strawberry is same as vanilla on top,
followed by strawberry and chocolate. . . . . . . . . . . . . . . . . . . 145
11.2 Three (r) scoops of ice cream (blue dots) chosen from five (n) different
flavors: (a) 3 chocolate, (b) 1 vanilla, 1 strawberry, 1 butterscotch,
(c) 2 chocolate, 1 mango and (d) 1 chocolate, 1 vanilla, 1 mango.
In order to find all possible combinations, we have to calculate the
number of ways 7 objects can be arranged, which can be divided in
two types, 3 blue dots and 4 red vertical lines. . . . . . . . . . . . . . 148
11.3 Plot of the probability functions: (a) throwing a single die and (b)
throwing two dice simultaneously (see Table 11.2). (c) Plot of cumu-
lative probability distribution function in case of throwing two dice. . 151
11.4 (a) Bar chart of the probability function shown in Fig. 11.3(b). Height
of each bar is proportional to the value of the data point and width
of each bar is equal to 1. (b) Width of each bar halved and additional
points added. We can keep halving the width, untill the points start
touching each other (in the limit of bar width→ 0) and we get a con-
tinious line, which represents a continuous probability distribution
function. (c) In this case, probability is given by the area under the
Rb
curve a f (x)dx. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
LIST OF FIGURES xi

A.1 Graph of f (x, y): two possible solutions of of Eq. A.15, plotted assum-
ing a = b = 1. Contour lines or level curves are shown at the base of
the plots. Along the contour lines, f (x, y) is constant. Contour lines
are parallel to y = x − constant. . . . . . . . . . . . . . . . . . . . . . . . 157
A.2 Solution of Eq. A.15 (a = b = 1): f (x, y) = u(x − y) = constant and
initial condition is given along the x−axis as f (x, 0) = g(x). Values on
the x−axis will be “carried” or “transported” along the straight lines,
because f (x, y) is constant along the lines. Thus, the solution can be
written as, f (x, y) = g(x − y). . . . . . . . . . . . . . . . . . . . . . . . . 159
A.3 The data curve (solid line) is given along the x−axis: f (x, 0) = g(x).
The characteristic curves (dashed lines) originate from the data curve.161
x
A.4 (Left) Color map with contour lines for f (x, y) = y+1 = r. (Right)
Contour lines are along x = r(y + 1) and f (x, y) = constant = r along
these lines. Clearly, there exist a singularity at the point (0,-1). . . . 162
List of Tables

2
2.1 Values of x obtained at different steps while solving x = e−x by re-
laxation method. We start with an initial guess of 1. . . . . . . . . . . 10
2.2 Values of x0 , x1 and x2 after successive iterations using the Jacobi
method, starting with the initial values of x0 = −2, x1 = 2 and x2 = 3. 18
2.3 Different iterative schemes for solving a set of linear equations Ax = b. 20

10.1 List of boundary conditions for a one-dimensional domain, extending

from 0 to l. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
10.2 List of important partial differential equations: ∇ is the gradient op-
erator and f (x, y, z) or f (r, θ, φ) is a function of multiple variables in
general, depending on whether we are using the Cartesian or spher-
ical coordinate system or some other coordinate system. In case of
Laplace, Poisson and Helmholtz equation, f (x, y) = X(x)Y (y) are so-
∂2 ∂2
lutions for ∇2 = ∂x 2 + ∂y 2 . In case of diffusion, Schrödinger and wave
d 2
equation, f (x, t) = X(x)T (t) are solutions for ∇2 = dx 2 . . . . . . . . . . 123
10.3 Real and imaginary part of the function f (x, y) = (x + ιy)n . Interest-
ingly, all of them are harmonic functions. . . . . . . . . . . . . . . . . 125

11.1 List of all possible outcomes (known as the sample space) if we throw
two dice simultaneously. Each outcome is termed as a sample point
and there are 36 sample points in this case. . . . . . . . . . . . . . . . 144
11.2 A sample space (sum of two dice) and probability derived from Ta-
ble 11.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

xii
Chapter 1

Brief Introduction to Python

If you have ever done any introductory coding course, you must have started with
the “Hello world” code. First, create a file named “hello.py”, type the following and
save it.
print(’Hello world’)

In a Linux terminal, type the following and press enter, and you are on your way
to tame the python.
python hello.py

1.1 Input, output and simple arithmetic

Generally, in a computer program, we input some values; it calculates certain
things and gives us some output. Here is an example of a code calculating the
sum of the two numbers.
x = float(input("Type the value of x: "))
y = float(input("Type the value of y: "))
print("Calculating the sum x+y",x+y)
Note that, we have used float, but it can be an integer or complex number as
well. Do the following exercises to practice other simple arithmetic operations like
subtraction, multiplication, division, and raising to the power.

1.1.1 Exercise
1. Write a code to subtract y from x (Ans: x − y).
2. Write a code to multiply x with y (Ans: x ∗ y).
3. Write a code to divide x by y (Ans: x/y).
4. Write a code for raising x to the power of y (Ans: x ∗ ∗y).
5. Take two integers and find out what you get by doing x//y and x%y. The
second one is known as modulo.
6. Write a code to get the distance traveled s = ut + 21 at2 , where one has to input
the values of u, a, t to calculate the value of s.

1
1.2. MATHEMATICAL FUNCTIONS

1.2 Mathematical functions

Let us learn how to handle more complicated mathematical functions than just
doing simple arithmetic. First, one has to “import” the function, say exponential,
from the numpy package.1
from numpy import exp
x = exp(2)
print("Value of exp**2 is", x)

The numpy package contains several functions, like log (both natural and base
10), exponential, trigonometric, hyperbolic, positive square root, and constants
like e and pi. One can import more than one function at a time.

from numpy import exp,log,log10,sin,cos,acos,atan,sinh,cosh,tanh,sqrt,e,pi

The following command will import all the functions available in the math pack-
age, although it may not be a good practice.

from math import *

There is an alternate way of doing the above operation.

import numpy as np
x = np.exp(2)
print("Value of exp**2 is", x)

1.2.1 Exercise
1. Write a code, that will take r and θ value (polar coordinate) as input and
convert them to Cartesian coordinate.

1.3 Control statements

Now let us endow our codes with decision-making power. The code will decide
what to do next based on the decision taken. There are several ways of executing
this.

1.3.1 IF statement
Let us see an example where we want to check whether an integer is divisible by
13 or not.
m = int(input("Type any integer m: "))
if m%13==0 :
print("m is divisible by 13")
else :
print("m is not divisible by 13")
1
We already have used some built-in functions like print, input, float, which are not part of any
package.

2
1.3. CONTROL STATEMENTS

Note the space at the beginning of the line next to the lines containing if or else
statements. Let us show another example where we need to check whether a
person is eligible to get vaccine or not; the eligibility criteria being the age of the
person should lie between between 18 and 45.
m = int(input("Type the age in integer: "))
if m<=17 :
print("not eligible, age should be at least 18")
elif m>=46:
print("not eligible, age should be at most 45")
else:
print("eligible")
Here elif stands for else if and we can have as many of them as we wish, each
one checking for a new condition. We could have combined the selection criteria
in a single line,
m = int(input("Type the age in integer: "))
if m<=17 or m>=46:
print("not eligible, age should be at least 18 or at most 45")
else:
print("eligible")

1.3.2 WHILE statement

While statement is similar to the if statement, but it does the operation repeti-
tively. Let us rewrite the code for identifying numbers divisible by 13, but this
time we use the while statement instead of if statement.
print("Testing whether you know an integer divisible by 13")
print("The code will terminate if you are right")
m = int(input("Type any integer m: "))
while m%13 != 0:
print("m is not divisible by 13, try again")
m = int(input("Type any integer m: "))
Unlike the previous code (using the if statement), this one will keep asking for
user input until it finds a number divisible by 13. Suppose you are testing
whether someone is dumb or not and do not want to give more than ten chances.
print("Testing whether you know an integer divisible by 13")
print("The code will terminate if you are right")
m = int(input("Type any integer m: "))
s=0
while m%13 != 0:
s += 1
if s > 9:
print("WRONG ATTEMPTS. 10 chances exhausted!!")
break
print("m is not divisible by 13, try again")
m = int(input("Type any integer m: "))

1.3.3 Exercise
1. Write a code to find out whether an integer is even or odd.

3
1.4. FOR LOOP

1.4 FOR loop

We use loops to do something repetitively, like executing a portion of the code
again and again. We already have learned how to do it using the WHILE state-
ment. However, FOR loops are commonly used for this purpose. So, how do we
tell the code that we want to execute a portion of it, say 1000 times? Python pro-
vides us a built-in function named range to do this. For example, the following
code prints the integers, starting from 0 to 99.
for n in range(100):
print(n)

Note that, by default, the loop starts from zero. However, we can start from any
number, say 90 to 100.
for n in range(90,100):
print(n)

Also, note that, by default, the interval between two numbers is one, which we
can change according to need, for example, to 5.
for n in range(10,100,5):
print(n)

We can even move backward by taking some negative interval, for example, 100
to 15.
for n in range(100,10,-5):
print(n)

Let us do some meaningful exercise, rather than just printing some numbers; for
100
X
example, calculate the sum of all integers from 1 to 100, i.e., n.
n=1

s=0
for n in range(101):
s += n
print(s)

1.4.1 Exercise
m
X 1
1. Calculate the sum (n is an integer), till the result converges to the third
n
n=1
decimal place.
m
X 1
2. Calculate the sum (n is an integer), till the result converges to the
n2
n=1
third decimal place.
m
X 1
3. Calculate the sum (n is an integer), till the result converges to the
n4
n=1
third decimal place. Compare the values of m for various powers of n and
confirm that the sum converges faster for higher powers of n.

4
1.5. LISTS AND ARRAYS

1.5 Lists and Arrays

In science and engineering, we often have to deal with vectors and matrices. Let
us learn how to handle them in python codes. First, let us write a code to find the
length of a vector.
import numpy as np
v = [1.0, 2.0, 3.0]
length = np.sqrt( v[0]**2 + v[1]**2 + v[2]**2 )
print(length)
v = [1.0, 2.0, 3.0] defines a list containing three real numbers, representing coor-
dinates of a point in 3D space. We can do it in different ways, for example,
import numpy as np
x = 1.0
y = 2.0
z = 3.0
v = [x, y, z]
length = np.sqrt( v[0]**2 + v[1]**2 + v[2]**2 )
print(length)
Note that merely changing the values of x, y or z after the statement v = [x, y, z]
will not change the list. Instead, you have to redefine v for that to happen. One
way of doing this is,
v = [1.0, 2.0, 3.0]
print(v)
v(0) = 5.0
print(v)
v(1) = 6.0
print(v)
v(2) = 8.0
print(v)
There are several built-in functions like max, min, len, and sum. Find yourself
how do they act on a given list.
v = [1.0, 2.0, 3.0]
a = max(v)
print(a)
a = min(v)
print(a)
a = len(v)
print(a)
a = sum(v)
print(a)
Another handy function, map, apply some other mathematical function on each
element belonging to a list. Thus, for example, we can calculate the exponential
of all the numbers present in a list.
import numpy as np
v = [0.0, 1.0, 2.0, 3.0]
expval = list(map(np.exp,v))
print(expval)

5
1.5. LISTS AND ARRAYS

Note that we have converted the values to another list by using another built-in
function list. We can add an element to the existing list of elements.
v = [1.0, 2.0, 3.0]
print(v)
v.append(4.0)
print(v)
We can also create an empty list by v = [ ] and keep adding elements by using
v.append(1.0) etc. Similarly, we can eliminate an element from a list by using
v.pop().
Lists are one-dimensional, and thus, they have limited scope. Therefore, you
have to use arrays, which can handle vectors, as well as matrices. We will use
some built-in functions from the NumPy package. For example, let us create a
vector containing three integer elements:
import numpy as np
v = np.zeros(3,int)
print(v)
v[0] = 1
v[1] = 2
v[2] = 3
print(v)
Similarly, we can create a 2 × 2 matrix by doing,
import numpy as np
v = np.zeros([2,2],int)
print(v)
v[0,0] = 1
v[0,1] = 2
v[1,0] = 3
v[1,1] = 4
print(v)
Note how we are referring to the individual elements of an array, like v[0], v[1] or
v[0, 1], v[1, 0] etc. Instead of all zeros, we can also create arrows with all ones, by
calling the function ones from the NumPy package, like v = ones(3, int). Similarly,
we can create an empty array using the function empty from the NumPy package,
like v = empty(3, int). We can take a list and convert it to an array by using the
function array from the NumPy package. Check various ways you can create
arrays from lists.
import numpy as np
r = [1.0, 2.0, 3.0]
v = np.array(r, float)
print(v)
v = np.array([4.0, 5.0, 6.0], float)
print(v)
v = np.array([[1, 2],[3,4]], int)
print(v)

1.5.1 Exercise
1. Define a list having five elements and find their mean value.

6
1.6. USER-DEFINED FUNCTION

1.6 User-defined function

We already have learned to use functions from the NumPy package. Now, let
us learn to write user-defined functions. For example, the following program
calculates the square of the distance of a point from the origin.
def f(x,y):
return x**2 + y**2
x=float(input("Type the value of x: "))
y=float(input("Type the value of y: "))
print(f(x,y))

1.6.1 Exercise
sin x
1. Write a code to find whether f (x) = x has any root between x = 2 and x = 4.

1.7 Simple plotting

One of the reasons behind Pythons popularity lies in handling everything seam-
lessly, starting from coding to plotting.

1.7.1 2D plots
We can create ordinary graphs in Python by using the function plot from the
matplotlib.pyplot package.
import matplotlib.pyplot as plt
x = [-5.0, -4.0, -3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0]
y = [25.0, 16.0, 9.0, 4.0, 1.0, 0.0, 1.0, 4.0, 9.0, 16.0, 25.0]
plt.plot(x,y)
plt.show( )

Note that, values of the dependent and independent variable are inserted via two
lists. Instead of data input directly in the program file (say you have 10000 data
points, which makes it nearly impossible to do it this way), it is better to read the
data from some file.
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt("data1.dat", float)
x = data[:,0]
y = data[:,1]
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.xlim(-5,5)
plt.ylim(0,25)
plt.plot(x,y)
plt.show( )

The data file data1.dat consists of two columns like,

7
1.8. 3D GRAPHICS

-5.0 25.0
-4.0 16.0
-3.0 9.0
-2.0 4.0
-1.0 1.0
0.0 0.0
1.0 1.0
2.0 4.0
3.0 9.0
4.0 16.0
5.0 25.0
We can also plot functions, for example a trigonometric function like cos x. In this
case, we have to create an arry of x−values by using linspace function from the
numpy package.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-6.28,6.28,100)
y = np.cos(x)
plt.plot(x,y)
plt.show( )

1.7.2 3D plots
The following code illustrates how to do 3D plotting (for example, a function
z(x, y) = x2 + y 2 ) in python.
from mpl_toolkits import mplot3d
import matplotlib.pyplot as plt
import numpy as np
def f(x,y):
value=x**2 + y**2
return value
x = np.linspace(-2,2,100)
y = np.linspace(-2,2,100)
x, y = np.meshgrid(x,y)
z = f(x,y)
fig = plt.figure( )
ax = plt.axes(projection="3d")
ax.plot_surface(x,y,z)
plt.show( )

1.7.3 Exercise
sin x
1. Plot y(x) = x .

2. Plot z(x, y) = x2 − y 2 .

1.8 3D graphics
Python also provides us tools for generating 3D graphics. We have to import
vpython package for this purpose. The following code shows how to generate a

8
1.8. 3D GRAPHICS

series of spheres connected by a line. You can think it as one-dimensional chain

of atoms.
import vpython as vpy
L=2
R=0.2
for i in range(-L,L+1):
vpy.sphere(pos = vpy.vector(i,0,0), radius=R)
vpy.curve(pos = [vpy.vector(i-1,0,0),vpy.vector(i,0,0)])

1.8.1 Exercise
1. Write a code to generate a square lattice.

2. Write a code to generate a centered rectangular lattice.

9
Chapter 2

Numerical Methods

While a problem’s analytical solution is the most accurate one, we often need to
find a numerical answer. 1 In this chapter, I am going to describe the algorithms
of several numerical methods, as well as some example python codes. Read the
Appendix section for a brief introduction to Python programming.

2.1 Solution of non-linear equations

In this section, we shall learn to numerically find the roots of non-linear equations
of single variables, containing quadratic or even higher-order terms, logarithmic,
trigonometric terms, etc.

2.1.1 Relaxation method

2
Suppose we have to solve a non-linear equation x = e−x . There is no analytical
method of solving this equation. We can either solve graphically by plotting the
2
straight line y = x and the curve y = e−x , and finding their point of intersection.
Or, we can solve numerically by starting with some initial guess and iterating to
the point of convergence. This is known as the relaxation method, the simplest
1
Remember that computers were originally invented for this purpose only!

Iteration x
0 1
1 0.36787944
2 0.87342302
3 0.46632719
10 0.7086265
20 0.66416843
30 0.65520278
40 0.65252326

2
Table 2.1: Values of x obtained at different steps while solving x = e−x by relax-
ation method. We start with an initial guess of 1.

10
2.1. SOLUTION OF NON-LINEAR EQUATIONS

f(x)

x1 xm x2

Figure 2.1: Bisection method: given f (x) is continuous between x1 and x2 and
if f (x1 ) and f (x2 ) have opposite signs, then there must be one root between x1
and x2 . The interval is bisected (xm being the midpoint) to further narrow down
the search and sign of f (xm ) tells whether the root is located between [x1 , xm ] or
[xm , x2 ].

way of numerically solving a non-linear equation. Let us write Python code to do

it using a computer.

import math as ma
x = float(input("Enter initial guess:"))
for n in range(50):
x1 = x
x = ma.exp(-x*x)
err = abs(x1 -x)
if err < 0.001:
break
print("The solution is:")
print(x)
2
Starting with an initial value of x0 = 1, we get x1 = e−x0 = e−1 = 0.36787944 in the
2
next step. We repeat the process and get x2 = e−x1 = e−0.13533528 = 0.87342302. We
have to keep doing this (see Table 2.1) till the values converge adequately; say the
difference between two successive x values become less than 0.001.

2.1.2 Bisection method

The working principle of the bisection method is graphically depicted in Fig-
ure 2.1. Given f (x) is a continuous function between x1 and x2 , a root exists
within this interval if the signs of f (x1 ) and f (x2 ) are different. However, the root
(f (x) crossing the zero line) can be anywhere between x1 and x2 . So how do we

11
2.1. SOLUTION OF NON-LINEAR EQUATIONS

find it precisely? First, we bisect the interval [x1 , x2 ] into two halves [x1 , xm ] and
[xm , x2 ] (this being the origin of the name: bisection method). Now, the sign of
f (xm ) is going to determine whether the root is in the interval [x1 , xm ] and [xm , x2 ].
We have to keep bisecting the interval in this manner until it becomes so small
that we can confidently say that the root exists at the midpoint. Let us discuss
the algorithm and write a Python code based on that.

• Step 1: Select the interval and check whether f (x1 )f (x2 ) > 0. If true, change
the initial guess of the interval, as the root does not exist within this interval.
x1 +x2
• Step 2: Halve the interval at xm = 2 .

• Step 3: If f (x1 )f (xm ) < 0, then the root lies between x1 and xm . In this case,
keep the lower boundary at x1 , but reset the upper boundary to x2 = xm .
Otherwise, if f (x1 )f (xm ) > 0, then the root lies between xm and x2 . In this
case, reset the lower boundary to x1 = xm , but keep the upper boundary at
x2 .

• Step 4: Check whether the lower and upper boundary distance is sufficiently
small (say less than 0.001). In that case, take the midpoint of the upper and
lower boundary to be the final answer. Otherwise, go back to step 2.

# Define the function

def f(x):
return x**3 - 15*x - 4
# Bisection part
def bisect(x1,x2):
if(f(x1) * f(x2) >= 0):
print("Not a good choice of the interval")
return
# Loop starts and will run for 500 steps
for i in range(500):
xm = (x1 + x2) / 2
if(abs(x2 - x1) <= 0.001):
# The middle point can be a root (by fortune!)
if(f(xm) == 0.0):
break
#Find out whether the root in left or right of the midpoint
if(f(x1) * f(xm) < 0.0):
x2 = xm
else:
x1 = xm
print("Root at:",xm)
print("Value of the function:",f(xm))
# Main code
x1=float(input("Enter the value of x1:"))
x2=float(input("Enter the value of x2:"))
bisect(x1,x2)

12
2.1. SOLUTION OF NON-LINEAR EQUATIONS

root

f(xn)

xn+2 xn+1 xn

Figure 2.2: Newton-Raphson method: take some initial guess xn and draw a
tangent at that point. Find the location where the tangent crosses the x−axis
(xn+1 ) and draw a tangent there. Every time we do this, we move closer to the
root.

2.1.3 Newton-Raphson method

The working principle of the Newton-Raphson method is graphically illustrated in
Figure 2.2. We select an initial guess at xn , where the function has a value of f (xn )
and draw a tangent at that point, having value f 0 (xn ). Clearly, ∆x = xn − xn+1 and
since f ∆x
(xn )
= f 0 (xn ), we can write the following,

f (xn )
xn+1 = xn − 4x = xn − . (2.1)
f 0 (xn )

Next, we draw a tangent at xn+1 and get to the next point xn+2 and so on. Thus,
after each iteration, we get closer to the root and finally converge when two suc-
cessive points xn+1 and xn are very close to each other (less than some tolerance
limit). Finally, let us write our Python code for the Newton-Raphson method.

# Define the function

def f(x):
return x**3 - 15*x - 4
def df(x):
return 3*x**2 - 15
# Newton-Raphson part
def newton(x1):
# Loop starts and will run for 500 steps
for i in range(500):
x2 = x1 - f(x1) / df(x1)
if(abs(x2 - x1) <= 0.001):

13
2.2. SOLUTION OF LINEAR EQUATIONS

break
x1 = x2
print("Root at:",x2)
print("Value of the function:",f(x2))
# Main code
x1=float(input("Enter the value of x1:"))
newton(x1)

2.1.4 Exercise
2
1. Write Python code to solve x = 2 − e−x using relaxation method.
Answer: 1.98015456

2. Test the given bisection method program with different initial guesses to find
all the roots. √ √
Answer: −2 − 3, −2 + 3, 4.

3. Repeat the above problem with Newton-Raphson method.

√
4. Write a bisection method program to find the roots of x3 − 3x + 2.
√
−1− 3 −1+ 3
√ √
Answer: 2 , 2 , 2.

5. Repeat the above problem with Newton-Raphson method.

2.2 Solution of linear equations

Given two linear equations, we can quickly solve for two unknowns. However, if
we have three or more unknowns, solving those many linear equations becomes
somewhat challenging. Linear algebra provides us several systematic ways of
solving a set of linear equations. When applied in conjunction with computer
programming, we can handle a very large system with good accuracy and speed.
Several methods for solving a system of linear equations can be classified mainly
in two categories; non-iterative (like the Gaussian elimination) and iterative (like
the Jacobi method and the Gauss-Seidel method).

2.2.1 Gaussian elimination and back-substitution

Let us start with a set of linear equations,

3x0 + 3x1 + x2 = 12, (2.2)

2x0 + x1 + 2x2 = 10,
x0 + 2x1 + 3x2 = 14.

We can express this in the form of a matrix and two vectors,

    
3 3 1 x0 12
2 1 2 x1  = 10 . (2.3)
1 2 3 x2 14

14
2.2. SOLUTION OF LINEAR EQUATIONS

Equivalently, we can also use a shorthand notation, Ax = b, where A is a 3 × 3

matrix, and x and b are two vectors. In general, we can express matrix A (known
as the coefficient matrix) and vector b in a system of three linear equations in the
form of,    
a00 a01 a02 b0
A = a10 a11 a12 , b = b1  .
   (2.4)
a20 a21 a22 b2
Before trying to solve the actual problem, let us solve a more straightforward
system,

x0 + a01 x1 + a02 x2 = b0 , (2.5)

x1 + a12 x2 = b1 ,
x2 = b2 .

We can quickly solve by back-substituting, from third to the second and finally,
to the first equation. Thus we should aim to convert any given set of three linear
equations in the form of Eq. 2.5. Equivalently, we need to convert the coefficient
matrix A to an upper triangular matrix. We will use the Gaussian elimination
method, which is based on the following principles.
• Multiplication of any row of the coefficient matrix A and the corresponding
row of vector b by any constant does not change the solution.
• Adding or subtracting any multiple of a row of the coefficient matrix A with
any other row and doing the same to the vector b does not change the solu-
tion.
First, let us write the A matrix and b vector in a compact form (known as the
augmented matrix),
3 3 1 12
2 1 2 10 . (2.6)
1 2 3 14
Keep in mind that our aim is to make the coefficient matrix A an upper triangular
matrix. First, we do: (second row) −2/3× (first row),

3 3 1 12
0 −1 4/3 2 . (2.7)
1 2 3 14

Next, we do: (third row) −1/3× (first row),

3 3 1 12
0 −1 4/3 2 . (2.8)
0 1 8/3 10

Finally, we do: (third row) + 1× (second row),

3 3 1 12
0 −1 4/3 2 . (2.9)
0 0 4 12

15
2.2. SOLUTION OF LINEAR EQUATIONS

Now, we can solve: 4x2 = 12 ⇒ x2 = 3, −x1 + 34 x2 = 2 ⇒ x1 = 2 and 3x0 + 3x1 + x2 =

12 ⇒ x0 = 1. Let us write a Python code to solve a system of linear equations using
Gaussian elimination and back-substitution.

import numpy as np
import sys
# system size
n = int(input(’Number of unknowns? ’))
# initializing matrix and vector
a = np.zeros([n,n],float)
b = np.zeros(n,float)
d = np.zeros([n,n+1],float)
print(’Enter the coefficient matrix row by row’)
for i in range(n):
for j in range(n):
print(’Enter row’, i, ’column’, j)
a[i][j] = float(input())
print(’Enter the b vector’)
for i in range(n):
b[i] = float(input())
# Constructing the augmented matrix
for i in range(n):
for j in range(n+1):
if j <= (n-1):
d[i,j] = a[i,j]
else:
d[i,j] = b[i]
print("The augmented matrix is")
print(d)
# Checking for zeros at the diagonals
for i in range(n):
if d[i][i] == 0.0:
sys.exit(’Error: division by zero!!’)
for j in range(i+1,n):
r = d[j][i] / d[i][i]
for k in range(n+1):
d[j][k] = d[j][k] - r * d[i][k]
# back-substitution
b[n-1] = d[n-1][n]/d[n-1][n-1]
for i in range(n-2,-1,-1):
b[i] = d[i][n]
for j in range(i+1,n):
b[i] = b[i] - d[i][j] * b[j]
b[i] = b[i]/d[i][i]
print("Solution is:")
print(b)

16
2.2. SOLUTION OF LINEAR EQUATIONS

2.2.2 Jacobi Method

Unlike Gaussian elimination, the Jacobi method is an iterative technique. Let the
system of linear equation be given by,

a00 x0 + a01 x1 + · · · · +a0n xn = b0 , (2.10)

a10 x0 + a11 x1 + · · · · +a1n xn = b1 ,
·······························
an0 x0 + an1 x1 + · · · · +ann xn = bn .

We use the first equation to solve x0 , the second equation to solve x1 , and so on.
1
x0 = [b0 − a01 x1 − a02 x2 − · · · · −a0n xn ], (2.11)
a00
1
x1 = [b1 − a10 x0 − a12 x2 − · · · · −a1n xn ],
a11
·······································
1
xn = [bn − an0 x0 − an1 x1 − · · · · −ann−1 xn−1 ].
ann
Note that, for the method to work, each of the diagonal entries of the coefficient
matrix must be non-zero. Otherwise, we must interchange rows or columns to
avoid any zero along the diagonal. For example, let us solve the following system
of linear equations,

4x0 + 2x1 + 3x2 = 8, (2.12)

3x0 − 5x1 + 2x2 = −14,
−2x0 + 3x1 + 8x2 = 27.

Since there is no zero along the diagonal of the coefficient matrix, it is straight-
forward to write the solution as
1
xm+1
0 = [8 − 2xm m
1 − 3x2 ], (2.13)
4
1
xm+1
1 = [−14 − 3xm m
0 − 2x2 ],
−5
1
xm+1
2 = [27 + 2xm m
0 − 3x1 ].
8
The superscripts denote successive iterations. We start with an initial guess of
x0 = −2, x1 = 2 and x2 = 3. After the first iteration, we get
1
x0 = [8 − 2(2) − 3(3)] = −1.25, (2.14)
4
1
x1 = [−14 − 3(−2) − 2(3)] = 2.8,
−5
1
x2 = [27 + 2(−2) − 3(2)] = 2.125.
8

17
2.2. SOLUTION OF LINEAR EQUATIONS

Iteration x0 x1 x2
1 -1.250 2.800 2.125
2 -0.994 2.900 2.013
3 -0.959 3.009 2.039
10 -0.986 2.993 2.004
11 -0.999 3.010 2.006
12 -1.009 3.003 1.996

Table 2.2: Values of x0 , x1 and x2 after successive iterations using the Jacobi
method, starting with the initial values of x0 = −2, x1 = 2 and x2 = 3.

In the second iteration, we use this set of values to get,

1
x0 = [8 − 2(2.8) − 3(2.125)] = −0.994, (2.15)
4
1
x1 = [−14 − 3(−1.25) − 2(2.125)] = 2.9,
−5
1
x2 = [27 + 2(−1.25) − 3(2.8)] = 2.013.
8
We have to keep doing this until convergence is achieved, as shown in the Ta-
ble 2.2. For example, after 12 iterations, the values are very close to the actual
solution x0 = −1, x1 = 3, and x2 = 2. A higher number of iterations (say 25 or
more) will further improve the solution. Moreover, we can also tune the level of
accuracy (i.e., up to what decimal place the answers should match, instead of the
third decimal place used here). Let me show a rather rudimentary python code to
solve the problem.

N = 50
x0 = -2
x1 = 2
x2 = 3
for n in range(1,N):
y0 = x0
y1 = x1
y2 = x2
x0 = (8.0 - 2.0 * y1 - 3.0 * y2) / 4.0
x1 = (-14.0 - 3.0 * y0 - 2.0 * y2) / (-5)
x2 = (27.0 + 2.0 * y0 - 3.0 * y1) / 8.0
er = (abs(x0 - y0) + abs(x1 - y1) + abs(x2 - y2))/3.0
if er < 0.001:
break
print ("Convergence achieved in",n,"steps")
print (x0,x1,x2)

It takes approximately 23 steps to get the answer on my computer, rounded off at

the third decimal place.

18
2.2. SOLUTION OF LINEAR EQUATIONS

2.2.3 Gauss-Seidel method

Gauss-Seidel method is a minor modification of the Jacobi method and generally
improves the rate of convergence. Unlike the Jacobi method, the Gauss-Seidel
method uses the latest value of a variable whenever available.
1
xm+1
0 = [8 − 2xm m
1 − 3x2 ], (2.16)
4
1
xm+1
1 = [−14 − 3xm+1
0 − 2xm
2 ],
−5
1
xm+1
2 = [27 + 2xm+10 − 3xm+1
1 ].
8
Compare with Eq. 2.13 and understand the difference.2 Few minor adjustments
convert the Jacobi code to the Gauss-Seidel.

N = 50
x0 = -2
x1 = 2
x2 = 3
for n in range(1,N):
y0 = x0
y1 = x1
y2 = x2
x0 = (8.0 - 2.0 * x1 - 3.0 * x2) / 4.0
x1 = (-14.0 - 3.0 * x0 - 2.0 * x2) / (-5)
x2 = (27.0 + 2.0 * x0 - 3.0 * x1) / 8.0
er = (abs(x0 - y0) + abs(x1 - y1) + abs(x2 - y2))/3.0
if er < 0.001:
break
print ("Convergence achieved in",n,"steps")
print (x0,x1,x2)

It takes approximately 7 steps to get the answer on my computer, rounded off at

the third decimal place. Thus, we achieve significant improvement over the Jacobi
method, as Gauss-Seidel takes less than one-third of the steps to converge.

2.2.4 Systematic formulation of iterative solution of Ax=b

We have learned about two iterative methods to solve a set of linear equations.
Our approach was effective but rudimentary. However, there is an elegant way to
get similar outcomes. We keep working with a 3×3 system for simplicity, but it can
be generalized to any n × n system. First, we express the set of linear equations
Ax = b in matrix form.     
a00 a01 a02 x0 b0
a10 a11 a12  x1  = b1  . (2.17)
a20 a21 a22 x2 b2
2
The superscripts m and m + 1 denote successive iterations.

19
2.2. SOLUTION OF LINEAR EQUATIONS

General xm+1 = −[D + ωL]−1 [(1 − ω)L + U ]xm + [D + ωL]−1 b

Jacobi xm+1 = −[D]−1 [L + U ]xm + [D]−1 b ω=0
Gauss-Seidel xm+1 = −[D + L]−1 [U ]xm + [D + L]−1 b ω=1

Table 2.3: Different iterative schemes for solving a set of linear equations Ax = b.

According to the Jacobi method, the iterative scheme is,

a00 xm+1
0 = b0 − a01 xm m
1 − a02 x2 , (2.18)
a11 xm+1
1 = b1 − a10 xm m
0 − a12 x2 ,
a22 xm+1
2 = b2 − a20 xm m
0 − a21 x1 .

We can write in matrix form as,

   m+1     m  
a00 0 0 x0 0 a01 a02 x0 b0
m+1  m
 0 a11 0  x1 = − a10 0 a12
  x1 + b1  .
  (2.19)
m+1
0 0 a22 x2 a20 a21 0 xm
2 b2

Note that we have a diagonal matrix (D) on the left-hand side, and the matrix
on the right-hand side is a sum of lower (L) and upper (U ) triangular matrices.
Finally, we can express the above equation in a compact form like,

xm+1 = P xm + q, (2.20)

where P = −D −1 (L + U ) and q = D −1 b. Let us work out an example, say Eq. 2.12.

   
4 2 3 8
A = 3 −5
 2 , b = −14 , (2.21)
−2 3 8 27

such that,      
4 0 0 0 0 0 0 2 3
D = 0 −5 0 , L =  3 0 0 , U = 0 0 2 . (2.22)
0 0 8 −2 3 0 0 0 0
Thus,
    
0.25 0 0 0 2 3 0 −0.50 −0.75
P = −D −1 (L + U ) =  0 −0.20 0   3 0 2 = 0.60 0 0.40 
0 0 0.125 −2 3 0 0.25 −0.375 0
(2.23)
and     
0.25 0 0 8 2
q = D −1 b =  0 −0.20 0  −14 =  2.8  . (2.24)
0 0 0.125 27 3.375
Finally, I leave it as an exercise to verify that P xm + q yields Eq. 2.13. Let us write
a Python code for the Jacobi method.

import numpy as np

20
2.2. SOLUTION OF LINEAR EQUATIONS

import sys
# system size
n = int(input(’Number of unknowns? ’))
# initializing matrix and vector
a = np.zeros([n,n],float)
b = np.zeros(n,float)
x = np.zeros(n,float)
d = np.zeros([n,n],float)
p = np.zeros([n,n],float)
q = np.zeros(n,float)
print(’Enter the coefficient matrix row by row’)
for i in range(n):
for j in range(n):
print(’Enter row’, i, ’column’, j)
a[i][j] = float(input())
print(’Enter the b vector’)
for i in range(n):
b[i] = float(input())
# Constructing the diagonal matrix
for i in range(n):
if a[i][i] == 0.0:
sys.exit(’Error: division by zero!!’)
else:
d[i][i] = 1.0/a[i][i]
# matrix a changed to L+U (diagonal=0)
a[i][i] = 0.0
# Constructing the p matrix
p = -d.dot(a)
print("The p matrix is")
print(p)
# Constructing the q matrix
q = d.dot(b)
print("The q matrix is")
print(q)
print("Enter the initial guess")
for i in range(n):
b[i] = float(input())
# Jacobi iteration starts
for i in range(100):
x = p.dot(b) + q
er = 0.0
for j in range(n):
er += abs(b[j] - x[j])
b[j] = x[j]
if er < 0.001:
print("Converged in ",i," steps")
print(x)
sys.exit("Well done!")

21
2.2. SOLUTION OF LINEAR EQUATIONS

2.2.5 Convergence criteria for iterative methods

Say we have set up an iterative scheme xm+1 = P xm + q to solve a set of linear
equations. It can be either the Jacobi or Gauss-Seidel or some other method,
depending on what P or q we select. We already have seen that different schemes
have different convergence rates. For example, Gauss-Seidel is generally faster
than the Jacobi method. But, can it fail to converge altogether? Again, we already
have seen an example where a set of linear equations solvable by the Gaussian
elimination (non-iterative) could not be solved by the Jacobi method (iterative). Let
us now try to understand under what conditions an iterative scheme converges
or fails to do so.
Let the iterative scheme be a and s be the exact solution. Once we reach the
exact solution, any further iteration yields the same result only (see Exercise),

s = P s + q. (2.25)

Subtracting the above from xm+1 = P xm + q, we can write,

xm+1 − s = P (xm − s). (2.26)

Defining the error (with respect to the actual solution) as em = xm − s, we can

rewrite the above equation as,
em+1 = P em . (2.27)
Thus, whether the error blows up or not after every iteration depends on the
matrix P . Without going into detail, I will indicate some general properties of P
that may tell you whether the iterative scheme is going to converge or not.

• ||P || < 1 (norm of P ) should be less than 1.3

• All the eigen values of P satisfies |λ < 1| (this is known as the spectral radius
of P ).

Although norms are somewhat easy to calculate, the spectral radius is a better
indicator. Is there a way to directly say something about the convergence from the
coefficient Xmatrix A without calculating P ? The matrix A is diagonally dominant
if |aii | > |aij | (absolute value of the diagonal element is greater that the sum of
j,j6=i
the absolute values of other entries in a row). A diagonally dominant coefficient
matrix A ensures convergence (see exercise).

2.2.6 Exercise
1. Solve Eq. 2.12 using the Gaussian elimination method (by hand).

2. What would happen if you try to solve the example problem (the Jacobi or
Gauss-Seidel method) with an initial guess, which is exactly equal to the
actual solution, i.e., x0 = −1, x1 = 3, and x2 = 2? Modify the given Python
code and check the answer.
3
There are different types of norms. ||P ||1 is the sum of absolute values column wise and taking
the maximum one. ||P ||∞ is the sum of absolute values row wise and taking the maximum one.

22
2.3. INTERPOLATION

3. Write a Python code to solve Eq. 2.2 using the Jacobi and Gauss-Seidel
method.

4. Is it guaranteed that if I can solve a set of linear equations by Gaussian

elimination (non-iterative), I must be able to solve the same by the Jacobi or
Gauss-Seidel (iterative) method?

5. Derive the form of P and q for the Gauss-Seidel method.

Answer: P = −(D + L)−1 U , q = (D + L)−1 b.

6. Alternate and quick derivation: express the coefficient matrix as A = D +

L + U and substitute in Ax = b. Rearrange the terms to get the Jacobi and
Gauss-Seidel equations.

7. A general equation for solving Ax = b using iterative method: [D +ωL]xm+1 =

−[(1 − ω)L + U ]xm + b. What is the value of ω for the Jacobi and Gauss-Seidel
method?

8. Run three iterations using Eq. 2.20, Eq. 2.23 and Eq. 2.24 and match with
Table 2.2 (round off at third decimal place).

9. Calculate ||P ||1 and ||P ||∞ for the following matrix,
 
4 2 3
P = 3 −5
 2 .
−1 3 7

Answer: ||P ||1 = 12 and ||P ||∞ = 11.

10. For the Jacobi method, prove that if the coefficient matrix is diagonally dom-
inant, then ||P ||∞ < 1.

11. Why do you think that the Jacobi method failed to converge for Eq. 2.2?

2.3 Interpolation
We know the value of some function at two points, x1 and x2 , but would like to
know its value at x, lying in-between. Linear interpolation would be the simplest
thing to do in this situation. The working principle is depicted in Figure 2.3. We
can easily find the slope of the straight line joining the points f (x1 ) and f (x2 ), and
then it is straightforward to get the value at x, lying along the line,

f (x2 ) − f (x1 )
f (x) ≈ f (x1 ) + d = f (x1 ) + (x − x1 ). (2.28)
x2 − x1
Note that the value obtained along the straight line will most likely differ from
the actual value of the function, f (x) (unless the actual function is linear itself).
The larger the distance between x1 and x2 , the more will be the error (difference
between the actual and interpolated value).
Let us write a python code for interpolation. In order to demonstrate various
√
concept, we assume a function f (x) = x. We take two points x1 and x2 on this

23
2.3. INTERPOLATION

f(x2)
f(x)

f(x1) d

x1 x x2

Figure 2.3: Linear interpolation: we know the values of f (x1 ) and f (x2 ), but would
like to know f (x). Linear interpolation would be the simplest method of doing
this. Since we know the slope of the line connecting the points f (x1 ) and f (x2 ),
we can get f (x) (red point) along this line. However, it may differ from the actual
value of the function at x. Smaller the distance between x1 and x2 , better would
be the match with the actual value.

curve and apply interpolation to get the value at the midpoint, i.e., x = x1 +x
2 .
2

Since we have started with a known function, we can accurately calculate the
difference between the actual and interpolated value.

import numpy as np
def interpolation(x1, x2, y1, y2, x):
fx = y1 + (x - x1) * (y2 - y1) / (x2 - x1)
return fx
# Driving program (assuming a function y=x**0.5)
x1 = float(input("Enter the value of x1:"))
y1 = np.sqrt(x1)
x2 = float(input("Enter the value of x2:"))
y2 = np.sqrt(x2)
x = (x1 + x2) / 2
nans = interpolation(x1, x2, y1, y2, x)
ans = np.sqrt(x)
err = abs(nans - ans)
print("Interpolated value at x=",x)
print(nans)
print("Absolute value of deviation from the actual value:",err)

Actual value of f (x) is 1.22474 at x = 1.5. If you start with x1 = 1 and x2 = 2, the
error turns out to be 0.01763. On the other hand, if you start with x1 = 1.25 and
x2 = 1.75, the error reduces to 0.00429.

24
2.4. NUMERICAL INTEGRATION

(a) (b)

a h b a h b
Figure 2.4: Estimating the area under the curve (a) by dividing the total area
in small rectangular slices of width h and (b) by dividing the area in a set of
trapezoids. Clearly, the second approximation is more accurate.

2.3.1 Exercise
1. Let’s assume that we want the value at x = 8.5 and start with x1 = 8 and
x2 = 9. Using the example code, find the error. Why the error is so small
compared to the case, when we tried to calculate the value at x = 1.5, starting
with x1 = 1 and x2 = 2?

2.4 Numerical Integration

Let us start with evaluating the definite integrals for a given function f (x) with
respect to x, varying from x = a to x = b,
Z b
I= f (x)dx. (2.29)
a

2.4.1 Trapezoidal method

There are several ways of evaluating the above integral, and the trapezoidal method
is the easiest and most commonly used. The concept behind this method is shown
in Figure 2.4. It is easy to understand that the area under the curve can be ap-
proximated by dividing it into small rectangular slices of width h. However, it will
not be very accurate. We can improve by dividing the area into a set of trape-
zoids. Let us assume that we divide the interval into N slices, each having width
of h = (b − a)/N . We can calculate the area of the nth slice,
1
An = h [f (a + (n − 1)h) + f (a + nh)] . (2.30)
2

25
2.5. NUMERICAL DIFFERENTIATION

Now we can add all such slices to get the total area under the curve,
N −1
N " #
X 1 1 f (a) + f (b) X
I≈ An = h f (a) + f (a + h) + f (a + 2h) + · · · + f (b) = h + f (a + nh) .
n=1
2 2 2 n=1
(2.31)
Using this algorithm, let us write a python code to evaluate the following inte-
Z 2
gral, I = (5x4 + 4x3 + 3x2 + 2x + 1)dx and match the answer with the analytical
1
result.

def f(x):
return 5*x**4 + 4*x**3 + 3*x**2 + 2*x + 1
N = 20
a = 1.0
b = 2.0
h = (b-a)/N
s = 0.5 * f(a) + 0.5 * f(b)
for m in range(1,N):
s += f(a + m * h)
print(h * s)

In this code, I have used a user-defined function and a for loop. Running the code,
you get an answer of 57.037915625, while the analytical result is 57. We can
improve the accuracy by reducing the width (h) of the trapezoids (see exercise).

2.4.2 Exercise
1. In the given example code, reduce the width of the trapezoids (h) and check
the difference with the analytical result.

2.5 Numerical differentiation

We will learn this topic as the primary step for numerical solutions of differential
equations. The textbook definition of a derivative is,

f (x + h) − f (x)
f 0 (x) = lim . (2.32)
h→0 h
Practically it is impossible to take h → 0, but we can take h as small as possible.
The working principle is depicted in Figure 2.5. You will be given a set of values
(say from some experimental observation) at a regular interval h. You have to
calculate the difference between the adjacent values and divide by the width of
the interval (h). There are three ways of doing this, as discussed below.

26
2.5. NUMERICAL DIFFERENTIATION

x+h/2
x-h/2
x+h
x-h

x
x

Figure 2.5: Calculation of the first derivative using forward, backward and central
difference method. The filled circles are data points given at a regular interval h.
In case of the forward and backward difference, we can calculate the derivative
on the same set of points (filled circles). On the other hand, using the central
difference, we can get the derivative on a different set of points (open circles),
lying in the middle of the original set of data points.

2.5.1 Forward difference

In this case, we can calculate the derivative at x by calculating the difference with
the next point, located at x + h.

f (x + h) − f (x)
f 0 (x) ≈ . (2.33)
h

2.5.2 Backward difference

In this case, we can calculate the derivative at x by calculating the difference with
the previous point, located at x − h.

f (x) − f (x − h)
f 0 (x) ≈ . (2.34)
h

2.5.3 Central difference

In this case, we do not get the derivative on the same set of data points (x values)
provided but obtain the values on a different set of points, lying in the middle of
the original set of data points,

f (x + h/2) − f (x − h/2)
f 0 (x) ≈ . (2.35)
h

27
2.6. MAXIMA AND MINIMA

The only way to get the derivative on the original set of data points is to double
the interval to 2h, which may worsen the numerical error. To get the second
derivative, we have to apply the central difference twice,

f 0 (x + h/2) − f 0 (x − h/2)
f 00 (x) ≈ (2.36)
h
[f (x + h) − f (x)]/h − [f (x) − f (x − h)]/h f (x + h) − 2f (x) + f (x − h)
= = .
h h2
Note the difference of outcome in the case of first and second derivative calcu-
lation. In the case of the first derivative, we get the values in the middle of the
given data set. On the other hand, in the case of the second derivative, we get the
values at the given data points.
Finally, let us write a code for calculating the first derivative using the central
difference method. We will use a known function x3 /3 to compare between the
numerical and analytical result.

import numpy as np
def f(x):
return x**3/3.0
n = 50
dx = 2.0/n
df = np.zeros(n,float)
for i in range(1,n+1):
xl = -1.0 + (i-1) * dx
fxl = f(xl)
xr = -1.0 + i * dx
fxr = f(xr)
df = (fxr - fxl) / (xr - xl)
xm = (xr + xl) / 2.0
err = abs(df - xm*xm)
print("Error with analytical result")
print(err)

2.5.4 Exercise
1. Vary the interval (try n = 10, 20, 30, 40, 60) in the given code and check how the
error is changing.

2. Write a code for calculating the second derivative using the central difference
method. You can use the same function, y = x3 /3.

2.6 Maxima and Minima

28
Chapter 3

Partial Differentiation

This chapter is prepared based on Mathematical Methods in the Physical Sci-

ences, Mary L. Boas, Chapter 4.[1] The aim of the chapter is to start with a brief
discussion of one variable functions, generalize it for two variable functions and
finally, for n-variable functions.

3.1 Introduction
First let us understand why do we need partial differentiation. If I have a function
dy
of a single variable, like y(x); then its derivative dx can be interpreted (a) geomet-
rically as slope of y(x) or (b) physically as rate of change of y with respect to x.
For example, rate of change of position is equal to velocity, rate of change of ve-
locity is acceleration etc. Derivatives are used in several problems like differential
equations or finding maxima and minima etc. Partial derivatives are used when
we are dealing with a function of more than one variable, i.e., we will generalize
what we have learned in single variable calculus and do multi-variable calculus.
As shown in Fig. 3.1, z(x, y) is a function of two variables. If we draw a plane
parallel to the xz plane, then y is constant on that plane and the plane will inter-
sect the surface z(x, y) along the red curve. Now, rate of change of z with respect
to x (when y is constant) can be calculated along the red curve by the partial
∂z
derivative ∂x . Similarly, rate of change of z with respect to y (when x is constant)
∂z
can be calculated along the blue curve by the partial derivative ∂y . Sometimes,
∂z

the variable kept constant is specified as a subscript, like ∂x y , meaning partial
derivative of z with respect to x, when y is held constant. We can also take second
∂ ∂z ∂2z ∂ ∂z ∂2z

(and higher) derivative, like ∂x ∂x = ∂x2 ≡ zxx and ∂y ∂x = ∂y∂x ≡ zyx . For most
of the applied problems, zxy = zyx and this is known as the reciprocity relation.
In simple words, it says that order of differentiation does not matter.

Example: if we consider a “pure”, i.e., single component (C=1) and single phase
(P=1) system, which is subjected to only one form of work (say Newtonian or
“PV” work), then thermodynamic functions like internal energy (U (V, S)), enthalpy
(H(P, s)), Helmholtz free energy (A(V, T )) and Gibbs free energy (G(P, T )) are bivari-
ate functions. Thus, you should visualize U, H, A, G as surfaces.

29
3.1. INTRODUCTION

z
z(x,y) with y=constant

y
z(x,y) with x=constant

x
Figure 3.1: A function of two variables z(x, y). Along the red curve, y is constant
and along the blue curve, x is constant. Note that, along the red curve, we can
equivalently define x(y, z) with y constant. Similarly, along the blue curve, we can
equivalently define y(x, z) with x constant. Along the dotted curve, we can define
either x(y, z) or y(x, z) with z constant.

Application: power series It is often useful to write the power series of a func-
tion, like

y(x) = c0 + c1 (x − x0 ) + c2 (x − x0 )2 + c3 (x − x0 )3 + c4 (x − x0 )4 · ·· (3.1)

and one needs to determine the coefficients. By putting x = x0 in the above

equation, it is easy to show that c0 = y(x0 ). Taking derivatives of Eq. 3.1, we find

y 0 (x) = c1 + 2c2 (x − x0 ) + 3c3 (x − x0 )2 + 4c4 (x − x0 )3 · · · (3.2)

00 2
y (x) = 2c2 + 3.2c3 (x − x0 ) + 4.3c4 (x − x0 ) · · ·
y 000 (x) = 3.2c3 + 4.3.2c4 (x − x0 ) · ··

and it is easy to find the coefficients by putting x = x0 in the above set of

equations,
such that c1 = y 0 (x0 ), c2 = 21 y 00 (x0 ), c3 = 61 y 000 (x0 ) and in general cn =
1 dn y
n! dxn . Thus, the power series can be expressed as,
x=x0

1 00 1
y(x) = y(x0 ) + y 0 (x0 )(x − x0 ) + y (x0 )(x − x0 )2 + y 000 (x0 )(x − x0 )3 + · · · (3.3)
2! 3!
and in a more compact notation as,

∞
d n
X
y(x) = h y(x0 ) , (3.4)
dx
n=0

30
3.2. TOTAL DIFFERENTIAL

where h = (x − x0 ). This is known as the Taylor series expansion of a function

f (x). Now, it is straightforward to generalize and write the power series for a two
variable function like,
∞
∂ n

X 1 ∂
z(x, y) = h +k z(x0 , y0 ) , (3.5)
n! ∂x ∂y
n=0

where h = (x − x0 ) and k = (y − y0 ).

3.1.1 Exercise
1. Derive Eq. 3.5, following the method shown for Eq. 3.4.

2. Replace x = (x0 + h) in Eq. 3.4 and write an alternative form of the Taylor
series expansion. Repeat it for Eq. 3.5.

3. Write the linear and quadratic approximation of a function, using Eq. 3.4
and Eq. 3.5.

4. We can write the quadratic approximation for a two variable function (derived
in the last problem) in a more compact form, by considering two vectors
~x = (x, y) and ~x0 = (x0 , y0 ) as,
1
z(~x) = z(~x0 ) + [(~x − ~x0 )T ] · ∇z(~x0 ) + [(~x − ~x0 )T ] · Hz(~x0 )[(~x − ~x0 )], (3.6)
2
where Hz(~x0 ) is a 2×2 Hessian matrix. Derive the components of the Hessian
matrix.

5. What would be the form of the Hessian matrix if you are dealing with a
function of n-variables?

6. Maclaurin series for two variable function (put x0 = 0, y0 = 0 in Taylor series

expansion)

(a) f (x, y) = cos x sinh y

3 2
Answer: y + y6 − x2y
ln(1+x)
(b) f (x, y) = 1+y
2
Answer: x − x2 − xy
(c) f (x, y) = exy
(d) f (x, y) = ln(1 + x − y)

3.2 Total differential

Let us consider a function of single variable: y(x). As shown in Fig. 3.2, differential
is not equal to the derivative and we can write

∆y dy
= + = y 0 + . (3.7)
∆x dx

31
3.2. TOTAL DIFFERENTIAL

Δy
dy
Δx=dx
x
Figure 3.2: Tangent or linear approximation of differential ∆y ≈ dy = y 0 dx.

Now, in the limit → 0 (which happens when dx → 0), dy is a good approximation

to ∆y. Thus, using linear or tangent approximation, we can write the differential
as:
∆y = y(x + ∆x) − y(x) ≈ dy = y 0 dx . (3.8)
Note that, ∆y is change of value along the curve y(x), while dy is change of value
along the tangent line (Fig. 3.2) and they are almost equal if the interval (∆x or
dx, and they are exactly equal) is not too large. This simple equation has a very
important application in numerical solutions in initial value problems, because if
we know the value of the function and its first derivative, i.e., y(x) and y 0 (x) (initial
condition), then we can determine the function at y(x + ∆x) and so on.
Whatever we learned for one variable problem can be extended to a function of
two variables z(x, y), for which total differential is given by:

∂z ∂z
∆z = z(x + ∆x, y + ∆y) − z(x, y) ≈ dz = dx + dy . (3.9)
∂x ∂y

A geometrical interpretation, very similar to the case of one variable function, can
be given in this case as well; ∆z is change along the surface, while dz is change
along the tangent plane (consult the text book Mary L. Boas for more detail).
We can also generalize for a n-variable function f (x1 , x2 , · · ·, xn ) and write the
total differential as:
n
X ∂f
∆f ≈ df = dxi . (3.10)
∂xi
i=1

We can still think of a geometrical interpretation, although we can not draw it; ∆f
is change along a n-dimensional surface, while df is change along n-dimensional
tangent plane.

32
3.2. TOTAL DIFFERENTIAL

3.2.1 Exercise
1. Take a function y(x) = x3 . Using the linear/tangent approximation, estimate
y(x + dx) for x = 1 and ∆x = 1, 0.1, 0.01, 0.001 and 0.0001. How much the do
the approximate values differ form the actual values? What do you conclude
from this?

2. To the linear/tangent approximation y(x + dx) = y(x) + y 0 dx, add higher order
terms. Compare with the power series obtained in the previous section.

3. Take a function y(x) = x3 . Using the quadratic approximation, estimate

y(x + dx) for x = 1 and ∆x = 1, 0.1, 0.01, 0.001 and 0.0001. How much the do
the approximate values differ form the actual values? What do you conclude
from this?

4. Can you apply the linear/tangent approximation backward? How would you
modify the formula to do that?

5. Write the forward and backward expansion for y(x) upto 3rd order term and
add them. What is the advantage of doing this? Something similar is done
in case of molecular dynamics simulations, known as the Verlet algorithm.
Read about the Verlet algorithm.

6. Using differential to evaluate approximate values and errors

1 1 2
(a) For very large n, prove n2
− (n+1)2
≈ n3
.
1 1 3
(b) For very large n, prove n3
− (n+1)3
≈ n4
.
1 1 m
(c) For very large n, prove − (n+1)
nm m ≈ nm+1 .
√ √
(d) For very large n, prove n + ∆n − n ≈ 2∆n √ . Let n be the Avogadro
n
number ∼ 1023 (ignore the 6.023
√ term). Using
√ the formula just derived,
find approximate values of 1023 + ∆n − 1023 , for ∆n = 10, 102 , · · ·, 106 .
(e) For an ideal gas, the internal energy is given by U = 32 N kB T and the
standard deviation is given by ∆U 2 = 32 N kB
2 T 2 , where N is the number of

atoms present in the gas (standard statistical mechanics result). Based

on what we have done so far, what can be concluded regarding the
fluctuations of internal energy in the thermodynamic limit?
(f) R1 = 20 ohms and R2 = 10 ohms are connected in parallel. If R1 is
changed to 20.1 ohms, what would be to value of R2 to keep net resis-
tance unchanged?
(g) Electrical resistance R = ρl
A . If relative error in length and area measure-
ment is 5% and 2%, what would be the relative error in R measurement?
(h) Gravitational force is given by F = Gmr12m2 . If the relative error to estimate
m1 , r and F is 3%, 5% and 2%, respectively, find the relative error of m2
in the worst case.

7. Molar volume v(P, T ) is a function of pressure and temperature. Write the

total differential of v and describe the physical significance of every term.

33
3.3. SOME USEFUL FORMULAE

3.3 Some useful formulae

Generalization of total differential: We took a function [z(x, y)] of two indepen-
dent variables, x and y; and derived the formula for total differential [Eq. 3.9] by
applying tangent approximation. Let us generalize by saying that total differential
of z(x, y) can be taken in a same manner when x and y are functions of some other
independent variable, like x(t) and y(t). We can generalize it further to any num-
ber of functions and any number of independent variables; like f (w, x, y, z), where
w, x, y, z are functions of three independent variables w(r, s, t), x(r, s, t), y(r, s, t),
z(r, s, t). Let us write two chain rules to demonstrate the concept.

3.3.1 Chain rule 1

Given a function z(x, y), where x and y are (differentiable) functions of t, i.e., x(t)
and y(t), we can write a chain rule as:

dz ∂z dx ∂z dy
= + . (3.11)
dt ∂x dt ∂y dt

Example: Find dz/dt, given z = xy, where x = t3 /3, y = sin t.

Method 1 (using differentiation): z = t3 sin t/3 ⇒ dz 2 3
dt = t sin t + t cos t/3.
Method 2 (using differential): dz = ydx + xdy = (y)(t2 ) + (x)(cos t) = t2 sin t + t3 cos t/3.

Example: Find dz/dt, given z = xe−y where x = cosh t, y = cos t.

Method 1 (using differentiation): dz/dt = e− cos t sinh t + cosh te− cos t sin t.
Method 2 (using differential): dz = e−y dx − xe−y dy. Now, dx = sinh tdt and dy =
− sin tdt and the answer is dz/dt = e− cos t sinh t + cosh te− cos t sin t.

3.3.2 Chain rule 2

Let us take a function: z(x, y), where x, y are functions of two independent vari-
ables s and t; i.e., x(s, t), y(s, t). We can write another chain rule as,

∂z ∂z ∂x ∂z ∂y
= + , (3.12)
∂s t ∂x ∂s ∂y ∂s

∂z ∂z ∂x ∂z ∂y
= + .
∂t s ∂x ∂t ∂y ∂t

The above equation can be expressed in a matrix form,

∂z ∂z ∂x ∂x

∂z ∂z ∂s ∂t
∂s ∂t = ∂x ∂y ∂y ∂y . (3.13)
∂s ∂t

We can easily use the matrix form to further generalize for any number of variables
(see exercise).

34
3.3. SOME USEFUL FORMULAE

3.3.3 Chain rule 3

∂

Now we consider a case when x, y, z are function of t, and then applying ∂t y
[which implies change along the curve such that y is held constant] to Eq. 3.9, we
get

∂z ∂z ∂x
= / . (3.14)
∂x y ∂t y ∂t y

We get two more (see exercise), for ∂y
∂z and ∂x
∂y .
x z

Caution!!!! Be careful when you are dealing with too many variables. Let us
take the example of cartesian to polar coordinate transformation.

x = r cos θ,
y = r sin θ,
p
r = x2 + y 2 ,
y
θ = arctan . (3.15)
x
∂y ∂θ 1/x x
Now, from the above equations, ∂θ = r cos θ = x and ∂y Why one
= 1+y 2 /x2
= r2
.

is not reciprocal of the other? This is because, we have actually calculated ∂y ∂θ r
2
∂θ ∂y
and ∂y . If we take y = x tan θ and then calculate ∂θ = cosx2 θ = rx , then it is
x x
∂θ
indeed reciprocal of ∂y . I present a simple general proof below.
x

3.3.4 Reciprocal relation

Given a function of two independent variables z(x, y). Convince yourself that we
can define x(y, z) and y(x, z) [Fig. 3.1]. You can consider the surface of a unit
sphere, and obviously, the choice of dependent p and independent variables
p are in
your hand, i.e., you can take z(x, y) = 1 − x 2 − y 2 or x(y, z) = 1 − y 2 − z 2 or
√ ∂z ∂z ∂y
y(x, z) = 1 − x2 − z 2 . Thus, similar to ∂y , ∂x , we can also define ∂x ∂y , ∂x etc.

∂ ∂

Now, applying ∂x z
and ∂y [which implies change along the curve such
z
that z is held constant] to Eq. 3.9, we get

∂z ∂z ∂y
0= .1 + ,
∂x y ∂y x ∂x z

∂z ∂x ∂z
0= + .1
∂x y ∂y z ∂y x

∂y
Multiplying the second equation with ∂x z and use the first equation to get,

∂x ∂y
=1. (3.16)
∂y z ∂x z

35
3.3. SOME USEFUL FORMULAE

3.3.5 Cyclic relation

∂y
In the derivation shown above, multiply the second equation with ∂z to get the
x
following relation:

∂x ∂y ∂z
= −1 . (3.17)
∂y z ∂z x ∂x y

3.3.6 Exercise
1. Write chain rule 2 in matrix form for u(x, y, z) and x(s, t), y(s, t), z(s, t).

2. Jacobian matrix: consider Eq. 3.15, where x(r, θ) and y(r, θ). Write total
differential of x(r, θ) and y(r, θ) and show that it can be expressed in the form
of the following matrix equation,

dx dr
=A .
dy dθ

Find the form of A (Jacobian matrix). Repeat for r(x, y) and θ(x, y) and in this
case prove that the Jacobian matrix is A−1 . Note that, if we compare term
by term between A and A−1 , they are not reciprocals.

3. Geometrical interpretation of the elements of A and A−1 :

∂y
∂y

(a) What is the geometrical interpretation of ∂x ∂x

∂r θ , ∂r , ∂θ r , ∂θ .
θ r
∂r ∂θ ∂r ∂θ

(b) What is the geometrical interpretation of ∂y , ∂y , ∂x y , ∂x y .
x x

4. “Algebraic” method for finding partial derivative:

∂r
∂r ∂θ
∂θ
(a) Calculate ∂x θ
, ∂y , ∂x r , ∂y . The simplest option is to write two
θ r
total differentials, of dx and of dy; and then solve by taking first dr = 0
and then dθ = 0. This is possibly the most robust method for finding
partial derivatives. Verify your answer with problem 2, using reciprocal
relation.
(b) Calculate ∂y ∂y ∂x ∂x

∂r x , ∂θ x , ∂r y , ∂θ y . Verify your answer with problem 2,
using reciprocal relation.

5. Use the “algebraic” method to solve the following problems. If there are less
number of variables, simple elimination/substitution works. Otherwise, you
have to apply Cramer’s rule in complicated cases.
h i
(a) Given w = f (ax + by), find b ∂w ∂w

∂x y − a ∂y x .
Answer: 0
(b) Given z = xe−y , x = cosh t, y = cos s, find ∂z ∂z

∂s t and ∂t s .
Answer: z sin s and e−y sinh t.
dx ∂x
(c) Given x = yz, y = sin(y + z), find dy . Is this different from ∂y z and

∂x
∂y ? Give proper justification.
y

36
3.4. HOW TO FIND THE MAXIMUM AND MINIMUM

(d) Given m = pq, a sin p − p = q, b cos q + q = p, find ∂p ∂p ∂p ∂b

∂q m , ∂q a , ∂q b , ∂a p

and ∂a ∂q m . Note that, for the last two, you need to apply Cramer’s rule.

(e) Given z = r + s2 , x + y = s3 + r3 − 3, xy = s2 − r2 , solve ∂x , ∂x

∂z s ∂z r
and
∂x

∂z y at (r, s, x, y, z) = (−1, 2, 3, 1, 3).
Answer: 27 , 4, 3.
∂2w ∂2w
6. Given a function w(x, y), where x = r cos θ and y = r sin θ. Calculate ∂r2
, ∂θ2 ,
∂2w ∂2w
∂θ∂r and ∂r∂θ .

7. In the following problems, explicit forms of the functions are not given.
∂s ∂s

(a) Given s(v, T ), v(p, T ), cp = T ∂T p
, cv = T ∂T v , find (cp − cv ).
∂s
∂v
Answer: T ∂v T ∂T p
(b) Given u(x, y), y(x, z), find ∂u

∂x z .
(c) Given x(y, z), y(x, z), z(x, y), derive the following:

∂y ∂y ∂z
= / ,
∂z x ∂t x ∂t x

∂x ∂x ∂y
= / .
∂y z ∂t z ∂t z

(d) Derive the cyclic relation [Eq. 3.17] just by using substitution/elimination
technique.
(e) Write the cyclic relation between v (molar volume), P (pressure) and T
(temperature). Is there any physical significance of each term?

8. Change of variables
2 2
(a) In the wave equation ∂∂xF2 = v12 ∂∂tF2 , substitute r = x + vt and s = x − vt
and express the equation in terms of s, r. Then, solve the equation.
∂2F ∂2F
(b) Express Laplace equation ∂x2
+ ∂y 2
= 0 in polar coordinates r, θ.
∂2F ∂2F ∂2F
(c) Express Laplace equation ∂x2
+ ∂y 2
+ ∂z 2
= 0 in cylindrical polar coor-
dinates r, θ, z.
∂2F ∂2F ∂2F
(d) Express Laplace equation ∂x2
+ ∂y 2
+ ∂z 2
= 0 in spherical polar coordi-
nates r, θ, φ.
∂2z ∂ z 2 ∂ z 2
(e) Solve the partial differential equation ∂x2
− 5 ∂x∂y + 6 ∂y 2 = 0 by substitut-
ing s = y + 2x and t = y + 3x.
Answer: z = f (y + 3x) + g(y + 2x)
∂ z 2 ∂2z 2
∂ z
(f) Solve the partial differential equation 2 ∂x 2 + ∂x∂y − 10 ∂y 2 = 0 by substi-
tuting s = 5x − 2y and t = 2x + y.
∂2w ∂2w
(g) Solve the partial differential equation ∂x2
− ∂y 2
= 1 by substituting x =
s + t and y = s − t.

37
3.4. HOW TO FIND THE MAXIMUM AND MINIMUM

0 1 1
@(x) -x . 2 @(x) x . 2 @(x) x . 3

-0.2 0.8
0.5

-0.4 0.6

-0.6 0.4

-0.5
-0.8 0.2

(a) (b) (c)

-1 0 -1
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1

Figure 3.3: Curve with a (a) maximum point, (b) minimum point and (c) inflection
point.

3.4 How to find the maximum and minimum

3.4.1 Second derivative test
Function of a single variable
We are all familiar with the problem of finding maximum and minimum in case of
a function like y(x). At the point of maximum and minimum, the first derivative
of the function is zero and depending on the sign of the second derivative, we
decide whether it is a point of maximum or minimum. We can use a quadratic
approximation and expand the function about the extremum point x0 as,

(x − x0 )2 00
y(x) = y(x0 ) + y (x0 ), (3.18)
2!
since the first derivative y 0 (x0 ) = 0. Now, if x0 is a point of maximum, at any
other point y(x) should be less than y(x0 ), i.e., y(x) − y(x0 ) < 0 for any x. This will
be satisfied, provided y 00 (x0 ) < 0 , which is the condition for the maximum point,
which also means a concave down curvature. On the other hand, if x0 is a point
of minimum, y(x) − y(x0 ) > 0 for any x. This will be satisfied, provided y 00 (x0 ) > 0 ,
which is the condition for the minimum point, which also means a concave up
curvature. Note that, the first derivative changes sign as we move from the left to
right of a maximum or minimum point but the second derivative does not change
sign.
There can be a third case, if curvature of a function changes from concave
down to concave up (or vice versa). In that case, the concavity (second deriva-
tive) changes sign and must pass through a value of 0 at some point, which is
known as the inflection point. Thus, if x0 is an inflection point, it must satisfy
y 00 (x0 ) = 0 . Such an example is shown in Fig. 3.3 (c). While the second derivative
changes sing as we move from the left to right of a inflection point, first derivative
does not change sign. Also see problem set for further details.

Function of two variables

Let us generalize what we did for a function of one variable and try to get the
condition of maximum and minimum for a function of two variables. Again, we

38
3.4. HOW TO FIND THE MAXIMUM AND MINIMUM

0 z z
-20
140
100
z
120
-40 (a) (b)
-60
100

80
50
(c)
-80
60 0
-100
40
-120
20 -50

-140
0
10
-100
10
5 10
10
x 0 5
5 10

x 5 5 10
0 0
-5
y 0
x 0 5

y
-5
-5
-10 -5 0
-10
-10 -10
-5

-10 -10
-5
y

Figure 3.4: Surface with a (a) maximum point, (b) minimum point and (c) saddle
point. Along the blue (red) curve, x (y) remains constant. The two curves are
crossing each other at the origin.

apply the quadratic approximation, setting the first order term to zero at the point
of extremum,
∂ 2

∂
z(x, y) = z(x0 , y0 ) + h +k z(x0 , y0 ), (3.19)
∂x ∂y
where h = (x − x0 ) and k = (y − y0 ). We can expand and write,

bk 2 b2

2 2
z(x, y) − z(x0 , y0 ) = ah + 2bhk + ck = a h + + c− k2 , (3.20)
a a
∂ 2 z(x0 ,y0 ) ∂ 2 z(x0 ,y0 ) ∂ 2 z(x0 ,y0 )
where a = ∂x2
,b = ∂x∂y , c = ∂y 2
. For a minimum point, [z(x, y) −
ac−b2
z(x0 , y0 )] > 0, which is satisfied if a > 0. Since a > 0, we also need 0 and a >
(ac − b2 ) > 0, which further implies that c > 0. Thus, the condition for minimum
point is,
2
zxx > 0, zyy > 0, zxx zyy > zxy . (3.21)

Similarly, in case of a maximum point, [z(x, y) − z(x0 , y0 )] < 0, which is satisfied if

2
a < 0 and ac−b
a < 0. Since a < 0, we also need (ac − b2 ) > 0, which further implies
that c < 0. Thus, the condition for maximum point is,

2
zxx < 0, zyy < 0, zxx zyy > zxy . (3.22)

2
Finally, if we have a > 0, ac−ba < 0, which implies that ac < b2 . Similarly, if
2
a < 0, ac−b
a > 0, which also implies that ac < b2 . In such a case we do not have a
maximum or minimum, but a saddle point [Fig 3.4] and the condition is,

2
zxx zyy < zxy . (3.23)

Such a condition is obviously satisfied if zxx and zyy has opposite sign and such
an example is shown in Fig 3.4 (c).

39
3.4. HOW TO FIND THE MAXIMUM AND MINIMUM

1.5
y
1

0.5

0
x
-0.5
d (x,y)
-1

-1.5
-1.5 -1 -0.5 0 0.5 1 1.5
p
Figure 3.5: Finding minimum distance d = x2 + y 2 from the origin, with the
constraint that the point lies on the curve.

3.4.2 Method of Lagrange multipliers

We now discuss about finding maximum and minimum with a given constraint.
As shown
p in Fig. 3.5, we have to minimize the distance of a point from the origin
d = x + y 2 , with the constraint that the point must lie on the curve, say y =
2

x2 − a. We can solve it “directly” by substituting y and write d as a function of x

and then, minimize d. Instead, we will learn a technique, which is more efficient
for solving such problems, such that we can solve for any number of variables
and any number of constraints.
We want to find the maximum or minimum of a function f (x, y), where x and
y are related by an equation φ(x, y) = c. Note that, the constraint implies that f is
a function of only one independent variable. To find the maximum/minimum, we
set df = 0 and since φ(x, y) = c, dφ = 0. Now, the differentials are:

∂f ∂f
df = dx + dy = 0,
∂x ∂y
∂φ ∂φ
dφ = dx + dy = 0. (3.24)
∂x ∂y
Adding, we get
∂f ∂φ ∂f ∂φ
+λ dx + +λ dy = 0. (3.25)
∂x ∂x ∂y ∂y

40
3.4. HOW TO FIND THE MAXIMUM AND MINIMUM

Finally, we need to solve for x, y, λ from three equations:

∂f ∂φ
+λ = 0, (3.26)
∂x ∂x

∂f ∂φ
+λ = 0,
∂y ∂y
φ(x, y) − c = 0.

Normally, first we define a new function from f (x, y) and φ(x, y):

F (x, y) = f (x, y) + λ[φ(x, y) − c] , (3.27)

where λ is the Lagrange multiplier. Taking partial derivative of the above function
yields Eq. 3.26,

∂F ∂f ∂φ
= +λ =0,
∂x ∂x ∂x

∂F ∂f ∂φ
= +λ =0,
∂y ∂y ∂y
∂F
= φ(x, y) − c = 0 . (3.28)
∂λ

From the above set of equations, we have to solve via substitution/elimination

to get the values of x and y at the extremum point. Note that, this method can
be extended for functions of any number of variables, as well as any number of
constraints.
Let us apply this method
p to solve the problem we stated in the beginning. In
2 2 2
p = x + y , φ(x, y) = y − x and c = −a. First we define the
this case, d = f (x, y)
2 2 2
function, F (x, y) = x + y + λ(y − x + a) = 0. Now, we can write,

∂F x
=p − 2λx = 0,
∂x x + y2
2

∂F y
=p + λ = 0,
∂y x + y2
2

∂F
= y − x2 + a = 0.
∂λ
1
The first equation implies that either x = 0 or λ = √ ,
and from the second
2 x2 +y 2
q
equation we get y = − 12 and from the third equation we get x = ± a − 12 and the
q
1 1

minimum distance is given by d = 4 + a − 2 . Compare the results with the
“direct” method, as suggested previously.

41
3.4. HOW TO FIND THE MAXIMUM AND MINIMUM

1.5 1.5
y (a) y (b)
1 1

0.5 0.5

0 0
x x
-0.5 -0.5

-1 -1

-1.5 -1.5
-1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1 -0.5 0 0.5 1 1.5

Figure 3.6: Tangent drawn at (a) minimum d and (b) any other d. See Fig 3.5 for
the definition of d.

3.4.3 Geometrical interpretation

Let us try to understand the geometrical interpretation of what we are doing.
From Eq. 3.28, we find that ∂f ∂φ ∂f ∂φ
∂x = −λ ∂x and ∂y = −λ ∂y . This implies that, if we
∂f ∂f ∂φ ∂φ
take two vectors, ~vf = ∂x x̂ + ∂y ŷ and ~vφ =
then ~vf and ~vφ are parallel to
∂x x̂ + ∂y ŷ,
~ and ~vφ = ∇φ,
each other. It is not too difficult to recognize the two vectors; ~vf = ∇f ~
where ∇ ~ is the gradient operator ∇ ~ = x̂ + ŷ . But gradients are normal to the
∂ ∂
∂x ∂y
curves f (x, y) and φ(x, y) at some given point on the curve. The method of Lagrange
multiplier implies that the curves f (x, y) and φ( x, y) have common normal or in
other words, common tangent at the point of maximum or minimum. See Fig. 3.6
for example; in panel (a) we find that the function f (x, y) and the constraint φ(x, y)
has common tangent at the point of minimum; while from panel (b), it is clear
that they do not have common tangent at any other point. Thus, in summary,
the point of extremum is the point where the function f (x, y) and the constraint
φ(x, y) has a common tangent line (or a common tangent plane, if we have more
number of variables).

3.4.4 Exercise
1. Find the maximum and minimum points of the functions:

(a) z(x, y) = x2 + y 2 + 2x − 4y + 10
Answer: Minimum point at (-1,2)
(b) z(x, y) = x2 − y 2 + 2x − 4y + 10
Answer: Saddle point at (-1,-2)

2. A point can move along the curve xy = c. Find its minimum distance from
the origin. Plot and try to understand the geometrical interpretation of the
method of Lagrange multipliers.

42
3.5. CHANGE OF VARIABLES: LEGENDRE TRANSFORM

2 2
3. You have to fit a box (rectangular parallelepiped) within an ellipsoid xa2 + yb2 +
z2
c2
= 1, such that the edges of the box are parallel to the axes. What would
be the maximum possible volume of such a box?
Answer: 8abc
√
3 3

4. Find the minimum distance from the origin to the intersection of xy = 6 with
7x + 24z = 0.

5. Consider a rectangle, with isosceles triangles at two of the ends. For a fixed
perimeter, if I want maximize the total area, what would be the value of θ
(angle between the sides of the triangle and sides of the rectangle)? Also try
to identify the shape.

6. A box has three of its faces in the coordinate planes and one vertex in the
plane ax + by + cz = d. Find the maximum volume of the box.

7. A point can move only along the line ax + by − c = 0. What would be the
location of the point, such that the sum of the squares of its distances from
(1,0) and (-1,0) is minimum.

3.5 Change of variables: Legendre transform

We already have seen the use of partial differentiation for making change of vari-
ables, for example from rectangular to polar coordinates. Now we are going to
learn some other type of change of variables, known as the Legendre transform.
In the area of physical sciences, it is mainly used (a) in thermodynamics to show
the connection between different thermodynamic potentials like internal energy,
enthalpy, Helmholtz and Gibbs free energy; and (b) in classical mechanics to
switch between Lagrangian and Hamiltonian dynamics.
Let me first discuss the technical steps involved in this process. Let us take
the function f (x, y) and write its differential,

df = pdx + qdy, (3.29)

where p = ∂f /∂x and q = ∂f /∂y. Note that, (p, x) and (q, y) form a conjugate pair
of variables. Now, instead of x and y as independent variable, if I want x and q
to be the independent variable, what do I do? How to transform f (x, y) ⇒ g(x, q)?
Subtract d(qy) from df , such that,

df − d(qy) = pdx + qdy − qdy − ydq, (3.30)

d(f − qy) = pdx − ydq.

Thus, the Legendre transform is given by

g = f − qy , (3.31)

such that
dg = pdx − ydq , (3.32)

43
3.6. DIFFERENTIATION OF INTEGRALS

which clearly shows that g is a function of x and q. Thus, we have performed a

Legendre transform of a function f (x, y) ⇒ g(x, q) by switching from the variable
y to its conjugate q. We could have picked the other pair of conjugate variables
for this purpose, i.e., f (x, y) ⇒ h(p, y) or could have done it for both the pairs,
i.e., f (x, y) ⇒ k(p, q). Note that, if we consider 2 variables, then there are 4 such
functions; if we consider 3 variables, then there are 9 such functions and so on.

Maxwell relations From the previous equation, we can write that

∂g ∂g
= p, = −y. (3.33)
∂x ∂q
∂2g ∂2g
Using reciprocity relations ∂q∂x = ∂x∂q , we can further write that,

∂p ∂y
=− . (3.34)
∂q x ∂x q

See problem set for further details on applications of Legender transformation

and Maxwell relations in thermodynamics. You are also advised to read [2] for
more details.

3.5.1 Exercise
1. Combined form of first and second law is: du = T ds − pdv, which tells us
that internal energy is a function of entropy and volume, i.e., u(s, v). Find
a Legendre transformation to get the following functions (and also find the
name of each of the functions):
(a) a function of (T, v)
(b) a function of (s, p)
(c) a function of (T, p)
2. Using du and other three functions in the previous problem, find all possible
Maxwell relations.

3.6 Differentiation of integrals

Let us start with the statement (somewhat obvious) that, derivative of an integral
of a function is the function itself,
Z x
d
f (t)dt = f (x) . (3.35)
dx a

Note that, I have taken the lower limit to be a constant, while the upper limit is a
variable x. If we interchange the limit, we get a negative sign,
Z a
d
f (t)dt = −f (x) . (3.36)
dx x

44
3.6. DIFFERENTIATION OF INTEGRALS

Since one of the limits is a variable, the definite integral yields a function, instead
of some constant value, i.e., Z x
f (t)dt = g(x), (3.37)
a
and its derivative is given by g 0 (x) = f (x), according to Eq. 3.35. Starting from the
last statement, we can prove Eq. 3.35. Since f (t) = dg dt , we can write the integral
of f (t) as, Z x
I= f (t)dt = [g(t)]xa = g(x) − g(a). (3.38)
a
Now, we can easily calculate the derivative of the integral as,
Z x
dI d dg
= f (t)dt = = f (x). (3.39)
dx dx a dx

Now, instead of x, if the upper limit is some function like v(x), then we can write
dI dI dx
dv = f (v) = dx dv , which implies that,

Z v(x)
d dv
f (t)dt = f (v) . (3.40)
dx a dx

We can further generalize and take both the upper and lower limit to be functions
of x, such that
Z v(x)
d dv du
f (t)dt = f (v) − f (u) . (3.41)
dx u(x) dx dx

Integral representation of a function Functions are not always given in some

explicit form like g(x) = sin x, but very often specified by an integral like,
Z b
g(x) = f (x, t)dt, (3.42)
a

or in a more general form like,

Z v(x)
g(x) = f (x, t)dt. (3.43)
u(x)

Now, if we try to calculate the derivative of g(x) (or derivative of the integral of
f (x, t)) as given in Eq. 3.42, we can differentiate within the integral sign,
Z b Z b
dg(x) d ∂f (x, t)
= f (x, t)dt = dt , (3.44)
dx dx a a ∂x

because both the limits of the integration are constants (a and b). On the other
hand, if the function is described by Eq. 3.43 (both the limits are function of x),

45
3.6. DIFFERENTIATION OF INTEGRALS

then we can combine both Eq. 3.41 and Eq. 3.44 to write (Leibniz rule),
Z v(x) Z v(x)
dg(x) d ∂f (x, t) dv du
= f (x, t)dt = dt + f (x, v) − f (x, u) . (3.45)
dx dx u(x) u(x) ∂x dx dx

3.6.1 Exercise
Z ∞
2
1. Find tn e−at dt, when n is odd and a > 0.
0
Z ∞
2
2. Find tn e−at dt, when n = 2, 4, 6, ...., 2m.
0

46
Bibliography

[1] Boas, Mary L., “Mathematical Methods in the Physical Sciences” (Third Edi-
tion), WILEY.

[2] Zia, R. K. P. and Redish, Edward F. and McKay, Susan R., “Making sense of
the Legendre transform”, American Journal of Physics, 77, 614-622 (2009).

47
Chapter 4

Multiple integrals

This chapter is prepared based on Mathematical Methods in the Physical Sci-

ences, Mary L. Boas,[1] Chapter 5.

4.1 Double and triple integrals

Z b
We all known that f (x)dx represents area under the curve f (x). The same
a Z Z
concept can be extended in case of a double integral, f (x, y)dxdy, which rep-
resents volume under the surface f (x, y). The most tricky part during evaluation
is to decide whether to integrate over dx or dy first. Most of the time, it depends
on the convenience and let me try to convince you by giving some examples.

Integral over a rectangular area: The simplest case is when both the x and y
limits are constants and f (x, y) = g(x)h(y), such that
Z Z Z b Z d Z b Z d
f (x, y)dxdy = g(x)h(y)dydx = g(x)dx h(y)dy . (4.1)
x=a y=c x=a y=c

For example, see the figure in first row of column (i) in Fig. 4.1.

Repeated integral: y first Now let us consider the figure in second row of col-
umn (i) in Fig. 4.1, where the area of integration is not the rectangle, but a tri-
angle. In this case also, maximum and minimum value of x and y ranges from
(a, b) and (c, d), respectively. However, we can not set the limits of x from a − b and
limits of y from c − d, as we did for the rectangle. In this case we can solve in two
ways, first let us do the dy integration first (you can think this as “column-wise”
process), !
Z Z Z b Z yh (x)
f (x, y)dxdy = f (x, y)dy dx. (4.2)
x=a y=yl (x)
| {z }
F (x)

If we start at the origin, yl = 0 and yh (x) = d − db x. Assuming f (x, y) = 1, integral

over dy yields a function of x, F (x) = d − db x and integral over x (with limit x = 0

48
4.1. DOUBLE AND TRIPLE INTEGRALS

y(x)
h y=d
y=d
x(y)
h

dy
dy
x(y) dx
dx
l
dx
y=c x=a y(x) y=c
x=a x=b l x=b y=d
y(x)
d h
x(y) x(y)

dy
y(x) l dx h

dy
dx h dx
dy
y=c
y(x)
c y(x)=c x=a l x=b
x=a l x=b
y(x)
y=d h y=d
x(y)=a

x(y) x(y) x(y)

dy
h

dy
dx dx
l h
dy

dx
l

y=c y(x)
a b x=a l x=b y=c
(i) (ii) (iii)

Figure 4.1: Various different areas over which a double integral needs to be cal-
culated.

to x = b) produces the final answer to be db − db db

2 = 2 . The red arrows in the
figure indicate that the y-values are restricted between the horizontal line and the
sloped line, as height of the columns are changing continuously.

Repeated integral: x first Alternately, as shown in the third row of column (i)
in Fig. 4.1, we can do the dx integration first (you can think this as “row-wise”
process), !
Z Z Z d Z xh (y)
f (x, y)dxdy = f (x, y)dx dy. (4.3)
y=c x=xl (y)
| {z }
F (y)

If we start at the origin, xl = 0 and xh (x) = b − db y. Assuming f (x, y) = 1, integral

over dx yields a function of y, F (y) = b − db y and integral over y (with limit y = 0
to y = d) produces the final answer to be db − db db
2 = 2 . The red arrows in the
figure indicate that the x-values are restricted between the horizontal line and the
sloped line, as width of the rows are changing continuously.

Repeated integral: either one first From the above discussion, it is clear that
the outcome does not depend on which of two integrals (dx or dy) is evaluated

49
4.1. DOUBLE AND TRIPLE INTEGRALS

first, i.e.,
! !
Z Z Z b Z yh (x) Z d Z xh (y)
f (x, y)dxdy = f (x, y)dy dx = f (x, y)dx dy. (4.4)
x=a y=yl (x) y=c x=xl (y)
| {z } | {z }
F (x) F (y)

There is no specific rule and we decide which integration to do first based on

convenience. However, I show some examples in Fig. 4.1. In column (ii), the
boundaries at x = a or x = b are either vertical straight lines or points and equa-
tions of the other two boundaries (curves) are known. In such cases, it would be
more convenient to evaluate dy integral first (“column-wise”). On the other hand,
in column (iii), the boundaries at y = c and y = d are either horizontal lines of
points, and equations of the other two boundaries (curves) are known. In such
cases, it would be more convenient to evaluate dx integral first (“row-wise”).

4.1.1 Exercise
1. Evaluate the following double integrals (it is always a good idea to draw
the area over which you are integrating). Also try to solve the problems by
interchanging the order of integration (whenever possible).
Z 1 Z 6
(a) 3xdydx.
x=0 y=2
Answer: 6
Z 2 Z 4
(b) 2dxdy.
y=0 x=2y
Answer: 8
Z 4 Z x/2
(c) 3ydydx.
x=0 y=0
Answer: 8
Z 1 Z ex
(d) 2ydydx.
x=0 y=x
e2 5
Answer: 2 − 6
Z 3 Z 1
(e) dxdy.
y=0 x=1−y/3
Answer: 23
Z 1 Z 2x
(f) 3(x + y)dydx
x=0 y=0
Answer: 4
Z 2 Z 4
√
(g) 5y xdxdy
y=0 x=y 2
Answer: 32
Z 1 Z √1−x2
(h) 3ydydx
x=0 y=0
Answer: 1

50
4.2. CHANGE OF VARIABLES: JACOBIANS

Z π Z π
sin x
(i) dxdy
y=0 x=y 2x
Answer: 1
Z 1 Z 1
ex
(j) √ dxdy
y=0 x=y 2 x
Z 2 Z 2 √
2
(k) 2e−y /2 dydx
x=0 y=x

2. Evaluate the following triple integrals

Z 2 Z 2x Z y−x
(a) 6dxdydz
x=1 y=x z=0
Answer: 7
Z 3 Z 2 Z 2y+z
6
(b) ydxdzdy
y=−2 z=1 x=y+z 7
Answer: 10

4.2 Change of variables: Jacobians

Some commonly used coordinate systems are: Cartesian, polar, cylindrical and
spherical. We use them, depending on the symmetry. So far we have done the
double or triple integrals in Cartesian coordinates only. If we want to do double or
triple integrals in polar, cylindrical or spherical coordinates, we need to know the
area or volume element in each of the coordinate systems. Finally, we will learn
to generalize the methodology for any arbitrary coordinate system.

4.2.1 Cartesian coordinate system

Differential area (2D system) and volume (3D system) elements are given by,

dA = dxdy & dV = dxdydz . (4.5)

Differential length element is given by,

ds2 = dx2 + dy 2 , (4.6)

p p p
ds = dx2 + dy 2 = 1 + (dy/dx)2 dx = (dx/dy)2 + 1 dy.

Double and triple integrals in Cartesian coordinate system is expressed as,

Z Z Z Z Z
f (x, y)dxdy & f (x, y, z)dxdydz (4.7)

4.2.2 Polar coordinate system

We make the substitution:
x = r cos θ, y = r sin θ. (4.8)

51
4.2. CHANGE OF VARIABLES: JACOBIANS

y
eθ j
er
i

θ
dr

rd
(x,y)
dθ (r,θ)
θ
x

Figure 4.2: Polar coordinate system.

Note that, in the Cartesian coordinate system, the basis vectors (orthogonal) are
taken to be (î, ĵ). In the polar coordinate system, the basis vectors (orthogonal)
are (êr , êθ ). How does one coordinate system transform into another?

êr = î cos θ + ĵ sin θ, (4.9)

êθ = −î sin θ + ĵ cos θ.

Note that, we have rotated the coordinate system (î, ĵ) by θ to get (êr , êθ ) and the
rotation matrix is defined by,

cos θ sin θ
R(θ) = . (4.10)
− sin θ cos θ

I just mentioned about the rotation matrix and we will learn in detail later.
Let us come back to the topic of interest in this chapter: what is the form
of differential area element in polar coordinates? As shown in Fig. 4.2, the area
element is:
dA = rdrdθ . (4.11)
The length element, equal to the line connecting the two points (r, θ) and (r+dr, θ+
dθ) is given by,

ds2 = dr2 + r2 dθ2 (4.12)

p p
ds = 2 2
(dr/dθ) + r dθ = 2 2
1 + r (dθ/dr) dr.

Thus, double integrals in the polar coordinate system is expressed as,

Z Z
f (r, θ)rdrdθ. (4.13)

52
4.2. CHANGE OF VARIABLES: JACOBIANS

(r+dr,θ+dθ,z+dz)
rdθ

dz
(r,θ,z)
dr

θ dθ

e rdθ
z

dr e
θ
j

e
i r

Figure 4.3: Cylindrical coordinate system.

4.2.3 Cylindrical coordinate system

This is very similar to polar coordinate system. We make the following substitu-
tion:
x = r cos θ, y = r sin θ, z = z. (4.14)
In the Cartesian coordinate system, the basis vectors (orthogonal) are taken to
be (î, ĵ, k̂). In the cylindrical coordinate system, the basis vectors (orthogonal) are
(êr , êθ , êz ). How does one coordinate system transform into another?

êr = î cos θ + ĵ sin θ + k̂, (4.15)

êθ = −î sin θ + ĵ cos θ + k̂,
êz = î0 + ĵ0 + k̂.

Note that, we have rotated the coordinate system (î, ĵ, k̂) by an angle θ about the
z axis to get (êr , êθ , êz ) and the rotation matrix is defined by,
 
cos θ sin θ 0
Rz (θ) = − sin θ cos θ 0 . (4.16)
0 0 1

Again, the rotation matrix is just mentioned here, as we will learn about the
properties of rotation matrices later.
Let us come back to the topic of interest in this chapter: what is the form of
differential volume element in cylindrical coordinates? As shown in Fig. 4.3, the
volume element is:
dV = rdrdθdz . (4.17)

53
4.2. CHANGE OF VARIABLES: JACOBIANS

dr
k
eφ
j
θ dφ
rsin
i

rdθ
e
θ
r dθ
θ

φ r rs dφ
sin in
θ θ
θdφ
rsin

Figure 4.4: Spherical coordinate system.

The area element (on the surface of the cylinder of radius r0 ) is:

dA = r0 dθdz. (4.18)

The length element, equal to the line connecting the two points (r, θ, z) and (r +
dr, θ + dθ, z + dz) is given by,

ds2 = dr2 + r2 dθ2 + dz 2 . (4.19)

Thus, triple integrals in the cylindrical coordinate system is expressed as,

Z Z Z
f (r, θ, z)rdrdθdz. (4.20)

4.2.4 Spherical coordinate system

We make the following substitution:

x = r sin θ cos φ, y = r sin θ sin φ, z = r cos θ. (4.21)

The volume element is:

dV = r2 sin θdrdθdφ . (4.22)
The area element (on the surface of the sphere of radius r0 ) is:

dA = r02 sin θdθdφ. (4.23)

54
4.2. CHANGE OF VARIABLES: JACOBIANS

The length element, equal to the line connecting the two points (r, θ, φ) and (r +
dr, θ + dθ, φ + dφ) is given by,

ds2 = dr2 + r2 dθ2 + r2 sin2 θdφ2 . (4.24)

4.2.5 Jacobians
Say, I want to evaluate some area integral in ploar coordinate, instead of cartesian
coordinates (because of symmetry, which can make life simpler). We have to make
the substitution given in Eq. 4.8 and also write the area element dxdy in terms
of variables in polar coordinate. We already have derived the area element in the
polar coordinate. Is there a general method of getting it, which can be used for
any transformation (x, y, z) → (u, v, w).
Let us start by writing the differential of Eq. 4.8.

∂x ∂x
dx = (cos θ)dr + (−r sin θ)dθ = dr + dθ, (4.25)
∂r ∂θ
∂y ∂y
dy = (sin θ)dr + (r cos θ)dθ = dr + dθ.
∂r ∂θ
Rearranging the above equations:
∂x ∂x
dx ∂r ∂θ dr x, y dr ∂(x, y) dr
= ∂y ∂y =J = (4.26)
dy ∂r ∂θ
dθ r, θ dθ ∂(r, θ) dθ

Similarly, we can write the Jacobian for any transformation (x, y, z) to (u, v, w) as:
 ∂x ∂x ∂x 
∂u ∂v ∂w
x, y, z ∂(x, y, z) ∂y ∂y ∂y 
J = =  ∂u ∂v ∂w
, (4.27)
u, v, w ∂(u, v, w) ∂z ∂z ∂z
∂u ∂v ∂w

and the differential volume element is given by:

dV = |J|dudvdw . (4.28)

4.2.6 Exercise
1. Derive the length element ds in polar coordinates, using the differential of
Eq. 4.8.

2. Derive the length element ds in cylindrical coordinates, using the differential

of Eq. 4.14.

3. Derive the length element ds in spherical coordinates Writing the differential

of Eq. 4.21.

4. Find the differential volume elements in cylindrical and spherical coordi-

nates using Jacobian.

5. Starting with dA and dV in spherical coordinates, find the surface area and
volume of a sphere by selecting appropriate limits.

55
4.3. OCTAVE FILES

6. Solve same problem in different coordinate system: write a triple integral for
finding the volume inside the cone z 2 = x2 + y 2 and between z = 1 and z = 2
using
• Cartesian coordinate system
• cylindrical coordinate system
• spherical coordinate system
7π
Answer: 3

7. Find the volume of a cone of height h, which is equal to the radius of the
base r. (Hint: use cylindrical coordinates)
3
Answer: πh3
8. Find the volume of the cone defined as: θ = α < π2 and lying inside the

sphere r ≤ a. (Hint: use spherical coordinates)

• What happens if α = π4 ?
9. Find the volume inside the cylinder x2 + y 2 = a2 ; and between the quadratic
surface z = ax2 + y 2 & the (x, y) plane.
x2 y2 z2
10. Find the volume of the ellipsoid a2
+ b2
+ c2
= 1.
∂(u,v) ∂(x,y) ∂(u,v)
11. Either prove or verify that ∂(x,y) ∂(u,v) = 1, where ∂(x,y) is the Jacobian and
∂(x,y)
∂(u,v) is the inverse Jacobian.

12. Find the inverse Jacobians for polar, cylindrical and spherical coordinate
systems.
13. Find the Jacobians for the following transformations:
(a) Parabolic cylindrical coordinates: x = 12 (u2 − v 2 ), y = uv
(b) Elliptic cylindrical coordinates: x = a cosh u cos v, y = a sinh u sin v
14. Evaluate the following integrals:
Z ∞
2
(a) I = e−ax dx
−∞
√
Z 1 Z 1−x2
2 +y 2 )
(b) I = dx e−(x dy
0 0
Answer: π4 (1 − e−1 )
Z ∞Z ∞
x2 + y 2
(c) I = 2 − y 2 )2
e−2xy dxdy (Hint: parabolic cylindrical)
0 0 1 + (x
x−y 2
Z 1/2 Z 1−x
(d) I = dydx (Hint: substitute x = 12 (r − s), y = 21 (r + s))
x−0 y=x x + y
1
Answer: 12

4.3 Octave files

Integrate rectangular area

56
Bibliography

[1] Boas, Mary L., “Mathematical Methods in the Physical Sciences” (Third Edi-
tion), WILEY.

57
Chapter 5

Vector analysis

This chapter is prepared based on Mathematical Methods in the Physical Sci-

ences, Mary L. Boas,[1] Chapter 6.

5.1 Triple products

5.1.1 Scalar triple product

Ax Ay Az
~ · (B
~ × C)
~ = Bx By Bz

A (5.1)
Cx Cy Cz

Note that, scalar triple product represents volume of a parallelepiped, bounded

~ B
by three vectors A, ~ and C. ~ The cross product is the area of a parallelogram,
which is then multiplied by height to get the volume. Clearly, we can use any two
sides as base, and the volume should not change. Using this, we can prove that
~ B
A·( ~ × C)
~ = (A×
~ B)·
~ C~ etc. Thus, it does not matter where we put the dot and cross
product in a scalar triple product and often it is represented as (A ~B~ C).
~ However,
~ ~ ~ ~ ~ ~
it might pick a negative sign, like A · (B × C) = −A · (C × B) etc. It should be most
convenient to figure out all possible combinations (with appropriate sign) from the
determinant, because interchanging two rows introduces a negative sign.

5.1.2 Vector triple product

~ × (B
A ~ × C)
~ = B(
~ A~ · C)
~ − C(
~ A~ · B)
~ (5.2)
It is not very difficult to realize that the above triple product is a linear combi-
nation of B~ and C,~ i.e., a vector lying in the same plane, as the two vectors in
parenthesis. The middle vector has a positive sign and coefficient of each vector
is a dot product of the other two.

5.1.3 Exercise
~ to be along x axis
1. Derive the formula for vector triple product, assuming B
and C~ in the xy plane.

58
5.2. FIRST DERIVATIVE OF SCALAR AND VECTOR FIELDS

2. Let us change from rectangular to some general coordinate system (any three
non-coplanar vectors, not perpendicular to each other). Derive the Jacobian,
used in multiple integrals for changing variables.
3. Using reciprocal lattice vectors ~b1 , ~b2 and ~b3 , find the direction perpendicu-
lar to the plane with Miller index (hkl). Also find the inter-planar spacing
between (hkl) planes.
4. Differentiation of a vector: in Cartesian coordinates, a vector is represented
~ = Ax î + Ay ĵ + Az k̂. Evaluate the following time derivatives.
as A
dA~
(a) dt =?
(b) d ~
dt (aA) =?
d ~ ~
(c) dt (A · B) =?
d ~ ~
(d) dt (A × B) =?

~ = Ar êr + Aθ êθ . In this case, ~

dA
5. Let us take a vector in polar coordinates:
A dt =?
~ dAθ
Answer: ddtA = dA dθ dθ

dt
r
− A θ dt êr + dt + A r dt êθ

~ = Ar êr + Aθ êθ + Az êz . In this

6. Let us take a vector in cylindrical coordinates: A
~
dA
case, dt =?
~ = Ar êr + Aθ êθ + Aφ êφ . In this
7. Let us take a vector in spherical coordinates: A
~
dA
case, dt =?
~ · B)
8. Find (A ~ 2 − [(A
~ × B)
~ × B]
~ ·A
~ =?
Answer: A B2 2

~ × (B
9. Prove the Jacobi identity: A ~ × C)
~ +B
~ × (C
~ × A)
~ +C
~ × (A
~ × B)
~ = 0.
~ × B)
10. Prove the Lagrange’s identity: (A ~ · (C
~ × D)
~ = (A
~ · C)(
~ B ~ · D)
~ − (A
~ · D)(
~ B ~ · C)
~
~ × B),
11. Evaluate the scalar triple product of (A ~ (B~ × C)
~ and (C
~ × A).
~

5.2 First derivative of scalar and vector fields

5.2.1 Gradient and directional derivative
Let φ(x, y, z) be a scalar field. Gradient of φ (read as “grad φ” or “del φ”) is defined
as:
~ = î ∂φ + ĵ ∂φ + k̂ ∂φ .
∇φ (5.3)
∂x ∂y ∂z
This is very useful for calculating directional derivative, given by

dφ ~ · û ,
= ∇φ (5.4)
du

where the unit vector is pointing in a direction along which the derivative is cal-
culated.

59
5.2. FIRST DERIVATIVE OF SCALAR AND VECTOR FIELDS

φ(x,y)
Ûs

φ(x0,y)
0

Figure 5.1: Example of a scalar field.

Physical significance of gradient

Note that, rate of change of φ in some direction u is maximum, if û is in the
~ itself. Thus, gradient is the direction along which the rate of
direction of the ∇φ
change (increase/decrease) of φ is maximum.

Geometrical significance of gradient

Note that, the value of φ does not change along a contour line. Thus, if we draw
a tangent at some point on the contour line, φ does not change along the line.
On the other hand, we know that, gradient is the direction along which the rate
of change of φ is maximum. Therefore, we conclude that gradient is the direction
normal to the surface at a given point.

5.2.2 Divergence

~ =∇
divV ~ = ∂Vx + ∂Vy + ∂Vz .
~ ·V (5.5)
∂x ∂y ∂z

~ · (φV
∇ ~ ) = (∇φ)
~ ·V~ + φ(∇
~ ·V
~ ). (5.6)
Physical significance of divergence will be discussed later.

60
5.2. FIRST DERIVATIVE OF SCALAR AND VECTOR FIELDS

5.2.3 Curl

~ = î ∂Vz ∂Vy ∂Vx ∂Vz ∂Vy ∂Vx
curlV − + ĵ − + k̂ − . (5.7)
∂y ∂z ∂z ∂x ∂x ∂y

~ × (φV
∇ ~ ) = (∇φ)
~ ×V~ + φ(∇
~ ×V
~ ). (5.8)
Physical significance of curl will be discussed later.

5.2.4 Exercise
1. Using Lagrange multiplier, find the maximum value of the directional deriva-
tive dφ/du, subject to the constraint that a2 + b2 + c2 = 1, where û = aî + bĵ + ck̂.

Other coordinate systems

2. Write the gradient operator in (a) polar, (b) cylindrical and (c) spherical coor-
dinate system.
~ in polar, as well as Cartesian coor-
3. For the following functions, calculate ∇f
dinate system. Compare the answers and check whether you get the same
answer or not.
• f (r) = r
• f (r, θ) = r cos θ
• f (r, θ) = r sin θ
• f (r) = r2

Equation of a line (normal to surface) and plane (tangent to surface)

4. Find a vector normal to the surface x2 + y 2 − z = 0 at the point (3,4,25). Find

the equation of the tangent plane and normal line at the point.
5. Find the directional derivative of f = x2 + sin y − xz in the direction î + 2ĵ − 2k̂
at the point (1, π/2, −3). Find the equation of the tangent plane and normal
line to f = 5 at the point (1, π/2, −3).

Finding the direction of heat flow

6. Temperature in the (x, y) plane is given by T = xy − x, sketch few isothermal

curves for T = 0, 1, 2, −1, −2.
• Find the direction in which the temperature changes most rapidly with
distance from the point (1,1) and the maximum rate of change.
• Find the directional derivative of T at (1,1) in the direction of the vector
3î − 4ĵ.
• Heat flows in the direction −∇T ~ (perpendicular to the isothermals).
Sketch a few curves along which heat would flow. Note that, you should
use a computer to plot for better understanding.

61
5.3. SECOND DERIVATIVE OF SCALAR AND VECTOR FIELDS

5.3 Second derivative of scalar and vector fields

We can treat ∇~ as a “vector” (you have to apply some common sense as well) and
then get the following results.

5.3.1 Divergence of gradient or Laplacian

~ · (∇φ)
∇ ~ ~ · ∇)φ
= (∇ ~ = ∇2 φ. (5.9)

5.3.2 Laplacian of a vector field

~ · ∇)
(∇ ~ V ~.
~ = ∇2 V (5.10)

5.3.3 Curl of gradient

~ × ∇φ
∇ ~ = 0. (5.11)
∂2φ ∂2φ
Note that, this is just a consequence of etc. Using this, we can write a
∂x∂y = ∂y∂x
very important theorem. First note that, ∇φ~ is a vector field (say U
~ ).
~ ~
Theorem: if (curl U )=0, then U must be the gradient of some scalar field φ.1

5.3.4 Divergence of curl

~ · (∇
∇ ~ ×V
~ ) = 0. (5.12)
Using this, we can write another very important theorem. Again, first we should
~ ×V
note that, ∇ ~ is a vector field (say U
~ ).
~ )=0, then U
Theorem: if (div U ~ must be the curl of some vector field V
~.

5.3.5 Curl of curl

~ × (∇
∇ ~ ×V
~ ) = ∇(
~ ∇~ ·V
~ ) − ∇2 V
~. (5.13)

5.3.6 Gradient of divergence

∂ 2 Vx ∂ 2 Vy ∂ 2 Vz ∂ 2 Vx ∂ 2 Vy ∂ 2 Vz ∂ 2 Vx ∂ 2 Vy ∂ 2 Vz

~ ∇
∇( ~ ·V
~ ) = î + + + ĵ + + + k̂ + + .
∂x2 ∂x∂y ∂x∂z ∂x∂y ∂y 2 ∂y∂z ∂x∂z ∂y∂z ∂z 2
(5.14)
1
Later, we will find that it is related to Euler reciprocity relation and definition of an exact
differential, which is also a path function in thermodynamics. A related concept is a conservative
force field in classical mechanics and electrodynamics and work done is independent of path in a
conservative force field.

62
5.4. LINE INTEGRALS

5.3.7 Exercise
~ · ∇)
1. Write the Laplacian operator (∇2 = ∇ ~ in (a) polar, (b) cylindrical and (c)
spherical coordinate system.
2. Starting from the gradient, divergence and curl (first derivatives), derive
some second derivatives like: Laplacian, (curl grad), (div curl) and (grad div).
~ operator as a “vector” and it worked fine!
3. In order to memorize, we treated ∇
~ × (∇ψ)
Then, can we conclude that (∇φ) ~ = 0?
~ × ∇ψ)?
~ · (∇φ
4. What would be the expression for: ∇ ~

5.4 Line integrals

~
We know that, h work done i by a force is dW = F · d~s and we have to calculate a
line integral W = F~ · d~s to get the total work done along certain path. The first
R

thing to keep in mind while calculating a line integral is the fact that there is
only one independent variable along a curve. Thus, first we have to express
F~ (x, y, z) and d~s = îdx+ ĵdy + k̂dz as functions of a single variable and then evaluate
the integral (of one variable) to find the total work done by the force to move an
object from one point to other along a path.
Now, work required to move an object from one point to another may depend
on the path (for example, because of energy dissipated due to friction). Such a
field is known as a non-conservative force field. On the other hand, if the work
required to move an object from one point to another is independent of the path
taken, we call it a conservative force field.
Clearly, we can evaluate W along different path and find out whether the force
field is conservative or not. Can we do this without evaluating the integral? The
answer is yes and we have to think logically to recognize the following:

~ × F~ = 0 .
A vector field is conservative if ∇

A similar statement is:

A vector field is conservative if F~ = ∇W

~ .

The first two statements are correlated can can be stated as,

If F~ = ∇W,
~ then curl F~ = 0 .

This is not entirely new, as we already know the reverse statement. In order to
prove this, let us write the components of F~ = ∇W
~ : Fx = ∂W/∂x, Fy = ∂W/∂y
and Fz = ∂W/∂z. Now, using the equality of second derivatives, i.e., ∂ 2 W/∂x∂y =
∂ 2 W/∂y∂x, we find that

∂Fx /∂y = ∂Fy /∂x, (5.15)

∂Fy /∂z = ∂Fz /∂y,
∂Fz /∂x = ∂Fx /∂z.

63
5.4. LINE INTEGRALS

Is there a way to express the above set of equations in a compact form? We have
~ × F~ = 0.
to use the definition of curl and we can write a compact equation like ∇
Finally, we want to prove that work done is independent of the path for a
conservative force field, i.e.,
Z
F~ · d~s is independent of path if ∇
~ × F~ = 0 or F~ = ∇W
~ .

Now, since F~ = ∇W
~ , we can write F~ · d~s = ∂W
∂x dx + ∂W
∂y dy + ∂W
∂z dz = dW , where
d~s = îdx + ĵdy + k̂dz. Finally, the line integral
Z B Z B
F~ · d~s = dW = W (B) − W (A), (5.16)
A A

is found to depend only on the value of W at the end points and independent
of the path along which the integration is carried out. It is obvious that for a
conservative force field, integral over a closed path:
I I
~
F · d~s = dW = 0. (5.17)
C C

Next, we see two important applications of what we have learnt just now.

5.4.1 Exact Differential

In thermodynamics, we often see terms like exact and inexact differential. Let us
understand what do they mean. Let dW be the infinitesimal difference between
two adjacent values of W , i.e., dW = W (x + dx, y + dy, z + dz) − W (x, y, z). Let us
assume that we can express the differential (often termed as total differential and
it is nothing but tangent approximation) as:

∂W ∂W ∂W
dW = dx + dy + dz = Fx (x, y, z)dx + Fy (x, y, z)dy + Fz (x, y, z)dz. (5.18)
∂x ∂y ∂z

This particular differential is an example of exact differential.2 One can easily

verify that Eq. 5.15 is satisfied for exact differential.3 This is known as Euler
reciprocity relation and it is simply based on the equality of the second deriva-
tive.

State and path functions in thermodynamics:

Do you see a connection between conservative force fields and exact differentials?
Note that, Eq. 5.15 implies that ∇ ~ × F~ = 0. Thus, we can also state that: dW =
~ ~ ~
F ·d~s is exact differential if ∇× F = 0. The last statement is true for a conservative
force field, for which work done is independent of path. Thus, we conclude that,
2
Every differential dw = P (x, y, z)dx + Q(x, y, z)dy + R(x, y, z)dz need not be an exact differential.
If the differetial is an exact differential, then only we can write it as dw = ∂w ∂x
dx + ∂w
∂y
dy + ∂w
∂z
dz. See
problem set for examples.
3
This is the test to check whether a given differential is an exact differential or an inexact differ-
ential.

64
5.5. GREEN’S THEOREM IN PLANE

line integrals of exact differentials are path independent, such that Eq. 5.16
and Eq. 5.17 are valid. In thermodynamics, exact differentials are related to state
functions, while inexact differentials are related to path functions.
One should note the connection between an exact differential and a conser-
vative vector field. If I give you a conservative vector field F~ , you can find an
exact differential dW = F~ · d~s. On the other hand, if I give you an exact differ-
ential, dW = Xdx + Y dy + Zdz, then you can define a conservative vector field
like F~ = X î + Y ĵ + Z k̂. You will find problems related to differentials in thermo-
dynamics, while problems related to force fields are important in mechanics or
electrodynamics.

5.4.2 Scalar potential for a conservative force field

In mechanics, we define a scalar potential for a conservative force field. Note that,
F~ = ∇W
~ implies that W is the work done by the force F~ . For example, if we lift
a mass, the work done against the gravitational force is W = −mgh. But we are
increasing the potential energy by φ = +mgh and we conclude that W = −φ. Thus,
we can write
F~ = −∇φ.
~ (5.19)

5.4.3 Exercise

Evaluation of line integrals

1. Mary L. Boas, chapter 6, section 8, problem 1-7.

Given a conservative force field, find the scalar potential

2. Mary L. Boas, chapter 6, section 8, problem 8-15.

Given the differential, determine the function

1 y
3. Test if dz = x2
dx − x3
dy is exact or inexact differential. If it is exact, find
z(x, y).

4. Test if dz = (2x + y)dx + (x + y)dy is exact or not. If exact, then find z(x, y).
h i
5. Given dP = VRT −b dT + RT
(V −b)2 − a
TV 2 dV , find out the function P (T, V ).

5.5 Green’s theorem in plane

Line integral around a closed path is equal to the double integral over the area A
enclosed by the path.
I Z Z
∂Q ∂P
[P (x, y)dx + Q(x, y)dy] = − dxdy (5.20)
c ∂x ∂y

65
5.6. DIVERGENCE AND DIVERGENCE THEOREM

The line integral should be evaluated in the counterclockwise direction around

the boundary of A. Consult the textbook for detailed proof. There is a simple way
of memorizing the formula. If we consider the differential in the L.H.S. to be an
exact differential, then the line integral should be zero. Now, the condition for
exact differential is: ∂Q/∂x = ∂P/∂y and these two are present in the R.H.S. in
such a combination that the integral is going to be zero if the differential in the
L.H.S. is an exact differential.

5.5.1 Exercise
1. Using Green’s theorem, prove that:
Z Z I
~ ~
∇ · V dxdy = ~ · n̂ds
V (5.21)
∂A
A

Note that, this can be generalized to write the divergence theorem.

2. Using Green’s theorem, prove that:

Z Z I
~ ~
(∇ × V ) · k̂dxdy = ~ · d~r
V (5.22)
∂A
A

Note that, this can be generalized to write the Stokes’ theorem.

3. Mary L. Boas, chapter 6, section 9, problems 2-12.

5.6 Divergence and divergence theorem

5.6.1 Physical significance of divergence
Let us develop our understanding of divergence using mass flux J~ [velocity ×
density]. Note that, whatever we discuss is true for any flux and in general, for
any vector field. We already have defined divergence as:

~ · J~ = ∂Jx + ∂Jy + ∂Jz .

divJ~ = ∇ (5.23)
∂x ∂y ∂z

Divergence represents net outflow per unit volume. See class notes/text book for
a simple proof for a cubic volume element.

5.6.2 Equation of continuity

By net outflow, we mean outgoing minus incoming and in general they are not
equal, such that the equation of continuity is:

~ · J~ + ∂ρ = 0.
∇ (5.24)
∂t
In steady state,
~ · J~ = 0.
∇ (5.25)

66
5.6. DIVERGENCE AND DIVERGENCE THEOREM

A
A'
θ

Figure 5.2: Amount of water crossing through area A0 is same as amount of water
crossing through the area A.

Note that, the above equations are correct if there exist no source and sink. Oth-
erwise, we have to add a term ψ to take into account the source minus sink part,

~ · J~ + ∂ρ = ψ.
∇ (5.26)
∂t

5.6.3 Divergence theorem: volume and surface integral

As shown in Fig. 5.2, imagine that water is flowing through the cylinder. Now,
amount of water crossing the area A0 in time t is (vt)A0 ρ and this is equal to
the amount crossing the area A in time t. Since A0 = A cos θ, we can write the
following:

(ρ)(vt)(A0 ) = (ρv)A0 t = (ρv)(A cos θ)t = (ρv cos θ)At = (J~ · n̂)At. (5.27)

Note that, θ is the angle between the direction of ~v and n̂ (unit normal to the
surface A). Thus, net amount of water crossing per unit area and per unit time is
given by J~ · n̂ . Now, we can take some area element da on any surface enclosing
some volume (for example surface of a sphere) and unit normal n̂ to the surface.
Thus, mass of water flowing out of the area is given by (J~ · n̂)da and the total
outflow from the volume enclosed by the surface is
Z Z Z Z
~
(J · n̂)da = J~ · d~a. (5.28)

We already know that divergence is net outflow per unit volume. We can easily
argue that (consult textbook or class notes): net outflow from the volume enclosed
by the surface must be equal to the the net outflow from a surface enclosing the
volume, which leads to the divergence theorem:
Z Z Z Z Z
~ · J)dV
(∇ ~ = (J~ · n̂)da . (5.29)

67
5.7. CURL AND STOKES’ THEOREM

da=dxdy

dy
Surface da
of
dx hemisphere

Curve bounding surface

Figure 5.3: (left) Line integral over a closed path in xy plane, such that the nor-
mal is pointing towards the k̂ at the given point. (center) We generalize the area
element and take it to be on the surface of a hemisphere. (right) Flat view of the
hemisphere.

Note that, the L.H.S. is a triple integral over the entire volume enclosed by the
surface A and R.H.S. is a double integral over the entire surface enclosing the
volume V .

5.6.4 Exercise
~ = ∇×
~ A, ~ using the divergence theorem, prove that B ~ · n̂da over
H
1. Given that, B
any closed surface is zero. Can you justify this in terms of simple arguments.

2. Mary L. Boas, chapter-6, section-10, problem 1-10 and 15-16.

5.7 Curl and Stokes’ theorem

5.7.1 Physical significance of curl
Again we consider fluid flow and let ~v be the vector field representing the velocity.
Now, ∇ ~ × ~v represents the angular velocity of the fluid in the neighborhood of a
given point. If ∇ ~ × ~v = 0 in some region, then the flow is said to be irrotational
in that region. Interestingly, this is the same mathematical condition for a force
field F~ (which is a vector field) to be conservative.

5.7.2 Stoke’s theorem: surface and line integral

Let V~ be a vector field. For example, V~ can be a force field F~ or it can be V ~ = ~v ρ
~
H
for fluid flow. We want to evaluate the line integral c V · d~r over the closed path
shownH in Fig. 5.3. We have already evaluated such integrals (for example, work
done c F~ · d~r over a closed path). Let the vector field be V
~ = Vx î + Vy ĵ = P î + Qĵ.

68
5.7. CURL AND STOKES’ THEOREM

Figure 5.4: In a fishing net, the net forms the open surface and the rim (made of
metal or plastic) is the curve bounding the open surface.

You can easily verify that,

P dx + Qdy = V ~ · d~r, (5.30)

∂Q ∂P ∂Vy ∂Vx ~ ×V ~ ) · k̂.
− = − = (∇
∂x ∂y ∂x ∂y
I Z Z
∂Q ∂P
Now, using Greens theorem (P dx + Qdy) = − dxdy, we can write
c ∂x ∂y
that, I Z Z
~ · d~r =
V ~ ×V
(∇ ~ ) · k̂dxdy. (5.31)
c
Now, there is nothing special about xy plane and we can easily generalize the
above to write: I Z Z
~
V · d~r = ~ ×V
(∇ ~ ) · n̂da, (5.32)
c
where n̂ is normal to the area element da and c is the curve surrounding the area
element da (see Fig. 5.3).
Now, imagine a surface which is not flat, for example, surface of a hemisphere.
We can divide the entire surface of the hemisphere in small area elements da and
add all the terms obtained from the above equation. As shown in Fig. 5.3, all
the interior line integals cancel each other, because along the border, two adja-
cent integrals are in opposite direction. However, line integrals around the curve
bounding the hemisphere (the outermost circle) do not cancel each other. Thus,
surface integral over entire surface of the hemisphere is equal to the line integral
around the curve (the circle) bounding the hemisphere. This is the statement of
Stokes’ theorem, which relates an integral over an open surface to a line integral

69
5.7. CURL AND STOKES’ THEOREM

around a curve bounding the surface:

I Z Z
~ · d~r =
V ~ ×V
(∇ ~ ) · n̂da . (5.33)
surface boundary surface

Let us think of a small fishing net, as shown in Fig. 5.4. The net forms the
open surface, while the rim is the curve bounding the open surface. Note that, we
can deform the net easily, but the rim does not change. Now, let us think of a
net of the shape of a hemisphere. We can deform the net to any other shape,
keeping the rim unchanged. If we do this, whatever we agrued to get Eq. 5.33,
still remains valid. Let us further assume that the net is made of a stretchable
material, which looks like a fishing net when stretched, but converts to something
like a badminton racket when unstretched. Accoding to our logic, integral over
the surface should be the same for the stretched and unstretched net. Thus,
we conclude that, what matters is the curve bounding the sufrace, not the surface
itself.
This further implies that, all we need is to calculate a surface integral over a
flat surface (similar to the badminton racket), instead of a curved surface (similar
to the fishing net). The result is going to be same as long as the rim (curve
bounding the net) remains same. For example, if we take the bounding curve to
be a circle, it does not matter whether we have a perfect hemisphere or deformed
hemisphere on top of the circle. We need not even try to calculate the surface
integral over a deformed (or perfect) hemisphere. All we need to calculate is a
surface integral over the circle (bounding the “hemisphere” of whatever shape).

5.7.3 Vector potential

~ ·V
A vector field is solenoidal if ∇ ~ = 0. Using the fact that div(curl) and curl(grad)
is zero, we can write,
V~ =∇~ ×A~ + ∇u,
~ (5.34)
~ is a vector field (vector potential) and u is a scalar field.
where A

5.7.4 Exercise
1. Let us verify the fact that integral over a hemisphere is same as integral over
a circle bounding the hemisphere. Assume V ~ = 4y î + xĵ + 2z k̂.
~ ×V ~ ) · n̂da over the hemisphere x2 + y 2 + z 2 = a2 , z ≥ 0.
RR
(a) Find (∇
(b) Verify that the result will be same if we evaluate the integral over the
circle bounding the hemisphere.

Apply Stokes’ theorem to evaluate the integrals

2. Mary L. Boas, chapter-8, section-11, problems 1-15.

Find vector potential, given the vector field

3. Mary L. Boas, chapter-8, section-11, problems 18-22.

70
Bibliography

[1] Boas, Mary L., “Mathematical Methods in the Physical Sciences” (Third Edi-
tion), WILEY.

71
Chapter 6

Coordinate transformation and

Tensor analysis

6.1 Linear Transformation

Consider multiplication of a column vector by a matrix:
0
x a b x
0 = . (6.1)
y c d y

We are taking a vector ~r(x, y), multiplying it with a matrix M and getting a new
vector r~0 = M~r. Note that, by taking various different M , we can get any other
point on the plane, starting from a point (x, y). Thus, matrix M is an operator,
that operates on the column matrices and maps the plane into itself. Why do we
call it a linear transformation? This is because of the following:

M (~r1 + ~r2 ) = M~r1 + M~r2 & M (k~r) = k(M~r). (6.2)

r'(x',y') r'(x',y')
r(x,y)

r'(x',y')

Figure 6.1: Vector ~r can be transformed to vector r~0 in various ways.

72
6.2. ORTHOGONAL TRANSFORMATION

6.2 Orthogonal Transformation

As shown in Fig. 6.1, several such transformations are possible. Some of them do
not change the length of the vector. Such transformations are known as orthogo-
nal transformations (examples: rotation and reflection of a vector). If the length
is not changing, we must have:

x02 + y 02 = (a2 + c2 ) x2 + 2(ab + cd) xy + (b2 + d2 ) y 2 = x2 + y 2 . (6.3)

| {z } | {z } | {z }
=1 =0 =1

Using the matrix M , we can write that:

2
a + c2 ab + cd

T a c a b 1 0
M M= = = . (6.4)
b d c d ab + cd b2 + d2 0 1

Since M T M is unit matrix, we can conclude that for an orthogonal transforma-

tion, the matrix must satisfy,
M T = M −1 . (6.5)
Now, since M T M = I, we can also write that,1

det(MT M) = det(MT )det(M) = (detM)2 = detI = 1. (6.6)

Thus, we can also conclude that for an orthogonal transformation, the determi-
nant of the transformation matrix must satisfy,

detM = ±1 . (6.7)

Later, we will see that detM = +1 for a rotation opetation and detM = −1 for a
reflection operation.

6.2.1 Rotation in 2D
As shown in Fig. 6.2, we can either rotate the vector keeping the reference axes
fixed or rotate the reference system keeping the vector fixed.

Active transformation
If we keep the reference system fixed, we can write x0 = r cos(θ + α) = x cos θ − y sin θ
and y 0 = r sin(θ + α) = x sin θ + y cos θ. This can be expressed in the matrix notation
as, 0
x cos θ − sin θ x
= . (6.8)
y0 sin θ cos θ y
1
In case of a (3 × 3) determinant, it represents a scalar triple product, i.e., volume bounded by
three vectors. (a) By transposing, we are just writing row vectors as column vectors, but the volume
does not change ⇒ det(MT ) = det(M). (b) Interchanging two rows/columns does not change the
determinant, but it picks a negative sign. (c) If two rows/columns are same, all the vectors are
coplanar and determinant is 0.

73
6.2. ORTHOGONAL TRANSFORMATION

y
y y'

r'(x',y')

r(x,y) (x,y)
x'
(x',y')
θ
α α θ
x x

Figure 6.2: Anti-clockwise rotation by an angle θ: We can either rotate the vector
keeping the reference axes fixed (left) or rotate the reference system keeping the
vector fixed (right).

Passive transformation
On the other hand, if we keep the vector fixed and rotate the reference axes
(change of basis), then the components of the vector in the new reference system
can be written as x0 = r cos(α − θ) = x cos θ + y sin θ and y 0 = r sin(α − θ) = −x sin θ +
y cos θ. This can be expressed in the matrix notation as:
0
x cos θ sin θ x
= . (6.9)
y0 − sin θ cos θ y

We can easily verify that rotation matrix is orthogonal, because RT (θ) = R−1 (θ) .
We also find that R−1 (θ) = R(−θ) . This makes sense, because inverse of an anti-
clockwise rotation is a clockwise rotation. We also see that, matrices in Eq. 6.8
and Eq. 6.9 are inverse of each other. This implies that, rotation of a vector in the
anti-clockwise direction is equivalent to the rotation of the reference axes in the
opposite (clockwise) direction. We can also verify that detR = +1 . This is true for
any rotation matrix.

6.2.2 Rotation in 3D
Matrix corresponding to an anti-clockwise rotation about the z-axis is:
 
cos θ − sin θ 0
 sin θ cos θ 0 . (6.10)
0 0 1

Again, we can easily verify that detR = +1 and R(θ)T = R(θ)−1 = R(−θ) .

6.2.3 Reflection
Reflection of a vector about a line making an angle θ with the x-axis is shown in
Fig. 6.3. We can write: x0 = r cos(2θ − α) = x cos 2θ + y sin 2θ and y 0 = r sin(2θ − α) =

74
6.2. ORTHOGONAL TRANSFORMATION

y
(x',y')

(x,y)

θ α
x
Figure 6.3: Reflection of a vector about a line making an angle θ with the x-axis.

x sin 2θ − y cos 2θ. This can be written in matrix form as:

0
x cos 2θ sin 2θ x
0 = . (6.11)
y sin 2θ − cos 2θ y

Again, we can easily verify that detM = −1 and M (θ)T = M (θ)−1 .

6.2.4 Exercise

Rotation and reflection matrix

1. Mary L. Boas, chapter-3, section-7, problems 22-35.

Group multiplication table (optional)

2. Read Mary L. Boas, chapter-3, section-13 (a brief introduction to groups).

3. A cube has three fold rotational symmetry along the body diagonals. Find
the orthogonal matrices, corresponding to each of the three fold axes.

4. Find the orthogonal matrices corresponding to each of the symmetry ele-

ments present in a rectangle. These elements form the point group 2mm.
Make the group multiplication table for this point group. Also draw the
stereographic projection for this point group and compare with the group
multiplication table.

5. Find the orthogonal matrices corresponding to each of the symmetry ele-

ments present in an equilateral triangle. These elements form the point
group 3m. Make the group multiplication table for this point group. Also
draw the stereographic projection for this point group and compare with the
group multiplication table.

75
6.3. MATRIX DIAGONALIZATION

y y

Figure 6.4: (Left) When multiplied with some matrix A, an ordinay vector ~v trans-
forms to v~0 . (Right) When multiplied with some matrix A, an eigen vector ~v trans-
forms to λ~v .

6. Find the orthogonal matrices corresponding to each of the symmetry ele-

ments present in a square. These elements form the point group 4mm. Make
the group multiplication table for this point group (some patience needed).
Also draw the stereographic projection for this point group and compare with
the group multiplication table.

7. Find the orthogonal matrices corresponding to each of the symmetry ele-

ments present in a regular hexagon. These elements form the point group
6mm. Draw the stereographic projection for this point group.

6.3 Matrix Diagonalization

6.3.1 Eigenvectors and eigenvalues
If we take some transformation matrix, we know that A~v = ~v 0 ; i.e., in general a
transformation changes length, as well as direction of a vector. Now, there are
some special vectors, for which A~v = λ~v ; i.e., only the length of the vector changes
(direction can change only by 180◦ ). Such vectors are known as the eigenvectors
(characteristic vectors) of the transformation A and corresponding λ is called the
eigenvalue. In a 2D space, we have 2 eigenvectors; in a 3D space, we have 3
eigenvectors etc. Let us try to find the eigenvalues and eigenvectors of a 2 × 2
matrix.
5 −2 x λx
= (6.12)
−2 2 y λy

76
6.3. MATRIX DIAGONALIZATION

We get two equations:

(5 − λ)x − 2y = 0 (6.13)
−2x + (2 − λ)y = 0,

and for non-trivial solution, the following determinant must be zero

(5 − λ) −2

−2 = 0, (6.14)
(2 − λ)

which gives us two eigenvalues λ = 1 and λ = 6. Putting these values in Eq. 6.13,
we get two lines:
2x − y = 0 & x + 2y = 0. (6.15)
Now, let us take two vectors along these two lines ~v1 = î + 2ĵ and ~v2 = −2î + ĵ and
these are the eigenvectors of A, with eigen values λ = 1 and λ = 6, respectively.
Note that, if we take an ordinary vector, say î + ĵ, and multiply with the matrix A,
the vector is transformed to another vector 3î, which has a different length and
also oriented in some other direction [see Fig. 6.4]. On the other hand, if we take
an eigenvector and multiply with the matrix A, we get a vector in the same (or
opposite) direction and length of the new vector is stretched/compressed (by a
factor of corresponding eigenvalue) with respect to the length of the eigenvector.
For example, in this case, ~v1 remains ~v1 , while ~v2 becomes 6~v2 after transformation
[see Fig. 6.4].

6.3.2 Similarity transformation and matrix diagonalization

Now, let us take two unit vectors v̂1 = √1 (î + 2ĵ)
and v̂2 = √1 (−2î + ĵ) and construct
5 5
a 2 × 2 matrix:
1 1 −2
B=√ . (6.16)
5 2 1
We can easily verify that AB = BD,2 where D is a diagonal martix, consisting of
eigenvalues of matrix A, i.e.,
1 0
D= . (6.17)
0 6
Multiplying both sides by B −1 , we get the following equation:

D = B −1 AB . (6.18)

This is known as the similarity transformation. In the following discussion, we will

find that matrix D and matrix A are similar in some sense and that is why we call
it similarity transformation. One important thing you should notice is the fact
2
We can do a matrix multiplication column wise. Let’s assume that A is a 2 × 2 matrix, and we
want to multiply it with another 2×2 matrix B, which can be written as a combination of two column
vectors v1 and v2 . Then the outcome of the matrix multiplication is: A×B = A×(v1 v2 ) = (Av1 Av2 ),
| {z }
B0
where Av1 and Av2 are two columns of the product matrix B 0 . Now, if v1 and v2 are eigenvectors
of
λ1 0
A, then we get A × B = A × (v1 v2 ) = (λ1 v1 λ2 v2 ) = (v1 v2 ) × D = B × D, where D = .
0 λ2

77
6.3. MATRIX DIAGONALIZATION

y
y'

R=R'

r=r'
θ x'

θ
x
Figure 6.5: In (x, y) coordinate system, vector ~r is transformed to vector R ~ by some
transformation matrix A. If we rotate the coordinate system (rotation matrix B)
~ 0 (same
to go to a new coordinate system (x0 , y 0 ), then ~r0 is transformed to vector R
transformation). In the new coordinate system, the transformation matrix from ~r0
~ 0 is B −1 AB. This is known as the similarity transformation.
to R

that trace (sum of diagonal elements) and determinant of a matrix is conserved by

similarity transformation. As we see in the above example, trace and determinant
of both A and B −1 AB are same.
Thus, we know how to diagonalize a matrix via similarity transformation. I
would like to draw your attention to the fact that det(B) = 1, which implies that
B is a rotation matrix! Let us try to understand in detail. As shown in Fig. 6.5,
we can define a new coordinate system (x0 , y 0 ), by rotating the existing coordinate
system (x, y) by an angle θ (anti-clockwise). Due to the rotation, (1, 0) is aligned
along the x0 and (0, 1) is aligned along the y 0 axis. In terms of rotation matrix, we
can write,

cos θ − sin θ 1 cos θ cos θ − sin θ 0 − sin θ
= & = . (6.19)
sin θ cos θ 0 sin θ sin θ cos θ 1 cos θ

Now, we can decide to take x0 and y 0 along v̂1 and v̂2 , respectively. Thus, matrix B
(see Eq. 6.16) is nothing but a rotation matrix that orient (1, 0) and (0, 1) along the
eigenvectors v̂1 and v̂2 , respectively.3 This is equivalent to rotation of coordinate
system by an angle θ (anti-clockwise).
We also know from Eq. 6.18 that B −1 AB is a diagonal matrix. Let us try to
understand the meaning of this matrix more clearly. First we define a transfor-
mation of ~r to R ~ in the (x, y) system, given by R
~ = A~r. As shown in Fig. 6.5, the
0 0
same transformation in the (x , y ) system is given by R ~ 0 = A0~r0 . Let us see how
A [transformation matrix in (x, y) system] is related to A0 [transformation matrix
in (x0 , y 0 ) system]. Now, since (x, y) coordinate system is rotated clockwise with
3
We can easily check that B î = v̂1 and B ĵ = v̂2 .

78
6.3. MATRIX DIAGONALIZATION

respect to the (x0 , y 0 ) coordinate system, we can write,

R ~0
~ = BR & ~r = B~r0 . (6.20)
~ = A~r, we find that B R
Replacing the above in R ~ 0 = B −1 AB ~r0 .
~ 0 = AB~r0 and finally, R
| {z }
A0
Thus, B −1 AB
describes the same transformation in (x0 , y 0 )
coordinate system,
which was described by A in the (x, y) coordinate system. Since we have dis-
covered that B is a rotation (orthogonal) matrix, we further write Eq. 6.18 as an
orthogonal similarity transformation,

D = B −1 AB = B T AB , (6.21)

where A is a symmetric matrix and B is an orghogonal matrix.

Note that, our discussion so far has been based on a symmetric matrix A. If
A is not symmetric, then also matrix B −1 AB describes a deformation in (x0 , y 0 )
coordinate system, which is described by the matrix A in (x, y) coordinate system.
However, x0 and y 0 axis are not perpendicular to each other. Obviously, B is no
longer an orthogonal (rotation) matrix, which means that the transformation from
(x, y) to (x0 , y 0 ) coordinate system is not a rotation operation. However, we can still
do a non-orthogonal similarity transformation D = B −1 AB.

6.3.3 Rotation matrix revisited

We already have discussed about rotation matrix in detail. We are going to take
another look at it, with the additional knowledge about eivenvalues, eigenvec-
tors and similarity transformation. We know that, a rotation operation does not
change the rotation axis itself. For example, if we are rotating about z axis, then
the transformation matrix multiplied by the k̂ vector should yield k̂ vector only.
This implies that, if A~r = ~r , then the vector is the rotation axis, about which
rotation takes place. We can equivalently state that, given a rotation matrix A, if
we can find an eigenvector with eigenvalue=1, then that particular eigenvector is
the rotation axis.
Let us solve a problem, where the rotation matrix is given by the following:4
 √ 
1 2 √1
1 √
A= − 2 √0 2 . (6.22)
2
1 − 2 1

We will not try to find all the eigenvalues and eigenvectors. Rather, we will try to
find the eigenvector for the eigenvalue=1. 5 . The following equations need to be
4
We can check that detA = 1, which implies that this is definitely a rotation matrix.
5
Note that, there has to be one such eigenvalue and eigenvector if A is a rotation matrix

79
6.3. MATRIX DIAGONALIZATION

solved for λ = 1,
1 √
( − λ)x + y/ 2 + z/2 = 0 (6.23)
2 √ √
−x/ 2 − λy + z/ 2 = 0
√ 1
x/2 − y/ 2 + ( − λ)z = 0.
2
Adding first two equations, we get x = z and y = 0 and thus, the eigenvector
corresponding to the eigenvalue λ = 1 is [1, 0, 1]. Hence, the above mentioned
rotation matrix A describes a rotation about [1, 0, 1] axis [i.e., î + k̂ axis].
Instead of (î, ĵ, k̂), it would be interesting to define a coordinate system where
~ = î + k̂ is one of the reference axes.6 Just by observation, we select a vector,
w
perpendicular to w ~ = [1, 0, 1] and this turns out to be ~u = [1, 0, −1]. Now we get the
third axis ~v = w ~ × ~u = 2ĵ. Thus, we get a new right handed coordinate system
(û, v̂, ŵ), where û = √12 [1, 0, −1], v̂ = [0, 1, 0] and ŵ = √12 [1, 0, 1].
Similar to the previous section, we construct a B matrix as,
 √ √ 
1/ 2 0 1/ 2
B =  0√ 1 0√  . (6.24)
−1/ 2 0 1/ 2

Interestingly, this is the rotation matrix from (î, ĵ, k̂) to (û, v̂, ŵ) coordinate system.7
Thus, the similarity transformation B −1 AB describes the same transformation in
(û, v̂, ŵ) coordinate system, as described by A in (î, ĵ, k̂) coordinate system and we
get,8  
0 1 0
B −1 AB = −1 0 0 . (6.25)
0 0 1
If we take a closer look at matrix B, we realize that w
~ (the rotation axis) is similar
to the z-axis in new coordinate system. Comparing with Eq. 6.10, we can easily
regognize the rotation angle as −90◦ .
Finally, can we find the rotation matrix representing a −90◦ rotation about the
axis (î + k̂)? Now we have to first wirte the rotation matrix A in some known
form, for example, with respect to z-axis using Eq. 6.10. Now, we have to define
a rotation matrix B that defines a rotation from (î + k̂) to k̂. Then, the similarity
transorm B −1 AB will give the answer.

6.3.4 Quadratic form and principal axis theorem (optional)

Let us take a vector ~x and a matrix A. We can easily verify that,

T
a11 a12 x1
Q = ~x A~x = x1 x2 = a11 x21 + (a12 + a21 )x1 x2 + a22 x22 . (6.26)
a21 a22 x2
6
We could have chosen other two eigenvectors of A, but we will avoid that.
7
We can easily verify that B î = û, B ĵ = v̂, B k̂ = ŵ.
8
Not surprisingly, B −1 AB is not a diagonal matrix, as B is not made exclusively from the eigen-
vectors of A.

80
6.3. MATRIX DIAGONALIZATION

y2 x2
y1

Figure 6.6: A quadratic equation 5x21 − 4x1 x2 + 5x22 = 20, when represented with
respect to the principal axes, simplifies to 3y12 + 7y22 = 20. The principal axes y1
and y2 are oriented along (î + ĵ) and (−î + ĵ), respectively.

Note that, this is a quadratic function.9 In general, the coefficients of a quadratic

function can be written in the form of a symmetric matrix.10 Now, we find the
eigenvalues and eigenvectors of A and construct the matrix B using the eigen-
vectors of A.11 We know that the similarity transform B −1 AB yields a diagonal
matrix D and we can write BDB −1 = A.12 Replacing this in the above equation,
we get,
T −1 T 2 2
Q=x BDB
|{z} | {z x} = y Dy = λ1 y1 + λ2 y2 . (6.27)

Let us solve a problem: express the quadratic equation 5x21 − 4x1 x2 + 5x22 = 20 in
terms of its principal axes.

6.3.5 Degenerate eigenvalues: Gram-Schmidt orthogonalization

Let us consider the following matrix:
 
1 −4 2
A = −4 1 −2 . (6.28)
2 −2 −2
9
Any rank-2 tensor can be geometrically interpreted as a quadratic function.
10
We are selecting a symmetric martix for convenience.
11
SinceA is symmetic,
B must be a rotation matrix.
12 λ1 0
D=
0 λ2

81
6.3. MATRIX DIAGONALIZATION

v'
u' u
o A
Figure 6.7: Two vectors ~u and ~v are not orthogonal. We can define a set of vectors,
~u0 =~u and ~v 0 =~v − (û · ~v )û, which are perpendicular to each other. The blue vectors in
the diagram just show the direction of ~u0 and ~v 0 .

We need to solve the following set of linear equations to get the eigenvalues and
eigenvectors:

(1 − λ)x − 4y + 2z = 0, (6.29)
−4x + (1 − λ)y − 2z = 0,
2x − 2y − (2 + λ)z = 0.

Setting the determinant equal to zero, we get λ3 − 27λ − 54 = 0 ⇒ (λ − 6)(λ + 3)2 = 0.

Thus two eigenvalues are equal and we call them degenerate eigenvalues. For
λ = −6, we must fulfill −5x − 4y + 2z = 0 and ~v1 = (2, −2, 1) can be an eigenvector.
For λ = −3, we must fulfill 2x − 2y + z = 0 and two possible eigenvectors are
~v2 = (1, 1, 0) and ~v3 = (−1, 0, 2). Now, we note that while ~v1 is perpendicular to both
~v2 and ~v3 (which means that ~v1 is perpendicular to the plane containing ~v2 and ~v3 ),
~v2 and ~v3 are not mutually perpendicular.
As shown in Fig. 6.7, starting from two non-collinear vectors ~u and ~v in plane,
we can always define a set of two orthogonal vectors ~u0 and ~v 0 . Let us take ~u0 in
the direction of ~u. Now, AB ~ is the vector (in the direction of ~v 0 ) we are looking for
~ ~
and AB = ~v − OA. Since OA ~ is the projection of ~v along ~u and we get,

~v 0 = ~v − (û · ~v )û . (6.30)

This is Gram-Schmidt orthogonalization and can be extended to any number of

vectors.
Let us go back to the problem we started with. We have ~v2 = (1, 1, 0) and
~v3 = (−1, 0, 2) and we need to orthogonalize. Using the above equation, we can

82
6.3. MATRIX DIAGONALIZATION

write
1 1
~v30 = (−1, 0, 2) − − √ √ (1, 1, 0), (6.31)
2 2
and thus, get a vector (−1/2, 1/2, 2) or (−1, 1, 4). Note that, this vector is perpendic-
ular to both (1, 1, 0) and (2, −2, 1). However, we need to check whether the vector
we derived is an eigenvector of matrix A. You can easily verify that,
   
−1 −1
A  1  = −3  1  . (6.32)
4 4

6.3.6 Diagonalization of Hermitian matrices

So far we have considered only matrices with real components and found a special
category, called symmetric matrices, if S = S T or Sij = Sji . Now, we generalize to
complex and a similar special category is known as Hermitian matrices, if H = H †
∗ . Similarly, complex analog of an orthogonal matrix (O T = O −1 ) is a
or Hij = Hji
unitary matrix (U † = U −1 ). If A is a Hermitian matrix, then we can diagonalize A
via a unitary similarity transformation,

D = B −1 AB = B † AB , (6.33)

where A is a Hermitian and B is a unitary matrix.

6.3.7 Exercise

Finding eigenvalues and eigenvectors

1. Mary L. Boas, chapter-3, section-11, problems 12-26.

Practice columnwise matrix multiplication

1 2 5
2. Take a matrix A = and multiply it with a vector . Show that the
3 4 6

1 2
resulting vector is a linear combination of and , with constants 5
3 4
and 6, respectively.

1 2 5 6
3. Take a matrix A = and multiply it with another matrix B = .
3 4 7 8
Show
that the
first column of the product matrix is a linear combination
1 2
of and , with constants 5 and 7, respectively. Thus, we can write
3 4

1 2 5
the first column of the product matrix to be equal to . Similarly,
3 4 7

1
the second column of the product matrix is a linear combination of and
3

83
6.3. MATRIX DIAGONALIZATION

2
, with constants 6 and 8, respectively. Thus, we can write the second
4

1 2 6
column of the product matrix to be equal to .
3 4 8

2 1 1 −1
4. Take a matrix A = and multiply it with another matrix B = .
1 2 1 1
(a) Check that, columns of B = (b1 b2 ) is made of eigenvectors of A, such
that Ab1 = 3b1 and Ab2 = b2 . (b) Using the above, we can write AB =
(Ab1 Ab (3b1 b2 ). (c) Note that, we can further rewrite this as (3b1 b2 ) =
2 ) =
3 0
(b1 b2 ) = BD, where D is a diagonal matrix, made of eigenvalues of
0 1
A. (d) Finally, confirm that AB = BD.

3 2 1 1
5. Repeat the previous problem with A = and B = .
6 −1 −3 1

Similarity transformation

2 1
6. Given A = . Find the matrix B, satisfying the similarity transforma-
1 2
tion B −1 AB = D, where D is a diagonal matrix. Find out whether B is a
rotation matrix or not and give proper justification to support your answer.

3 2
7. Given A = . Find the matrix B, satisfying the similarity transfor-
6 −1
mation B −1 AB = D, where D is a diagonal matrix. Find out whether B is a
rotation matrix or not and give proper justification to support your answer.

8. The first problem is an example of orthogonal similarity transformation.

Check whether B −1 AB = D = B T AB is satisfied.

9. The second problem is not an orthogonal similarity transformation. Check

whether B −1 AB = D = B T AB is satisfied.

10. In first two problems, check whether trace and determinant is conserved
after similarity transformation.

11. Mary L. Boas, chapter-3, section-11, problem 27-32.

12. Mary L. Boas, chapter-3, section-11, problem 9-10.

Finding the rotation axis and rotation angle

13. Mary L. Boas, chapter-3, section-11, problem 51-56.

84
6.3. MATRIX DIAGONALIZATION

Finding the rotation matrix

14. Find the rotation matrix corresponding to 90◦ rotation about î + k̂.

15. Find the rotation matrix corresponding to 180◦ rotation about î + ĵ.

16. Find the rotation matrix corresponding to 120◦ rotation about î + ĵ + k̂.

Quadratic form (optional)

17. Express the quadratic equation 5x21 − 4x1 x2 + 5x22 = 5 in terms of its principal
axes. Make a plot, clearly showing the principal axes. You have to specify
the vectors along the principal axes.

18. Express the quadratic equation x21 − x1 x2 + x22 = 1 in terms of its principal
axes. Make a plot, clearly showing the principal axes. You have to specify
the vectors along the principal axes.

19. Mary L. Boas, chapter-3, section-12, problems 2-9.

Unitary transformation

20. Mary L. Boas, chapter-3, section-11, problem 41-44.

85
Chapter 7

Ordinary Differential Equations

7.1 Linear and non-linear equations

Assuming x and y to be independent and dependent variable, respectively, a linear
differential equation of order n is given by

dy d2 y d3 y dn y
a0 y + a1 + a2 2 + a3 3 + · · · · +an n = b, (7.1)
dx dx dx dx
where a’s and b are functions of x (or constants). Some examples of linear equa-
tions are,

xy 0 + x2 y = ex , (order 1) (7.2)
3 00 x 0
x y + e y + ln xy = cos x, (order 2)
y 000 − 2y 00 + y 0 = 2 sin x(order 3).

Note that, in each of the above equation, dependent variable y and all its derivative
occur linearly. The order of the differential equation is decided according to the
order of the highest derivative included in the equation. General solution of a
linear differential equation of order n has n independent arbitrary constants and
we can get a particular solution by assigning particular values to the constants,
based on boundary condition or initial condition.
Some examples of the non-linear equations are,

y 0 − ln y = 0, (order 1) (7.3)
3 00 0 3
x y + y − y = sin x, (order 2)
y − 2y 00 + y 02 + x2 y = 2 sin y(order 3).
000

Note that, in each of the above equation, either the dependent variable y or some
of its derivative does not occur linearly.

7.2 First order differential equations

Differential equations of first order contain only first derivative of y, i.e., y 0 . We
are going to discuss different types of first order equations in this section.

86
7.2. FIRST ORDER DIFFERENTIAL EQUATIONS

7.2.1 Separable equations (optional)

Separable equations are of the form y 0 = f (x)/g(y), such that all the terms con-
taining y can be written on one side of the equation and all the terms containing
x can be written on the other,

g(y)dy = f (x)dx. (7.4)

Note that, we can solve linear, as well as non-linear equations using this method.

Example 1: Solve xy 0 = y, given boundary condition y = 3, when x = 2.

We can write,
Z Z
dy dx
= ⇒ ln(y) = ln(ax) ⇒ y = ax.
y x

Using the boundary condition, we get 3/2.

p √
Example 2: Solve x 1 − y 2 dx + y 1 − x2 dy = 0, given boundary condition y = 0.5,
when x = 0.5.
We can write,
−xdx
Z Z
ydy
p = √ .
1 − y2 1 − x2

Let us put 1 − y 2 = u2 and 1 − x2 = v 2 , such that the above equation is converted to

Z Z
− du = dv ⇒ (1 − y 2 )1/2 + (1 − x2 )1/2 = c,

√
where c = 3.

Example 3: Solve y 0 sin x = y ln y, given boundary condition y = e, when x = π/3.

We can write,
Z Z Z Z
dy dx dy
= ⇒ = csc xdx.
y ln y sin x y ln y

Left hand side is easy to integrate, if we substitute ln y = u, such that,

Z Z
dy du
= = ln u = ln ln y.
y ln y u

In order to integrate the right hand side, we multiply and divide by (csc x − cot x),
and substitute (csc x − cot x) = v, such that,

csc x(csc x − cot x)

Z Z Z
dv
csc xdx = dx = = ln v = ln(csc x − cot x).
(csc x − cot x) v
√
Using the boundary condition, the answer is ln y = c(csc x − cot x), where c = 3.

87
7.2. FIRST ORDER DIFFERENTIAL EQUATIONS

7.2.2 Exact equations

Say we have to solve a first order differential equation of the form,

P (x, y)dx + Q(x, y)dy = 0. (7.5)

We know that, if the above expression is an exact differential, then we can define
a function F (x, y), such that P = ∂F/∂x and Q = ∂F/∂y.1 Thus, we can write

P dx + Qdy = dF = 0 ⇒ F (x, y) = c. (7.6)

Often, an inexact differential can be converted to an exact equation by multi-

plying it by an appropriate integrating factor. For example, xdy − ydx = 0 is not an
exact differential, because P = −y, Q = x and ∂P/∂y 6= ∂Q/∂x. But we can make
it exact by dividing it with x2 and thus, 1/x2 is the integrating factor. Let us verify
this,
xdy − ydx 1 y y 1
2
= 0 ⇒ dy − 2 dx = 0 ⇒ P1 (x, y) = − 2 & Q1 (x, y) = .
x x x x x
Now, we satisfy the condition ∂P1 /∂y = ∂Q1 /∂x = −1/x2 .
In general, by multiplying the given inexact equation with the integrating factor
U (x, y), we get an equation of the form U (x, y)P (x, y) dx+U (x, y)Q(x, y) dy = 0, which
| {z } | {z }
P1 (x,y) Q1 (x,y)
∂P1 (x,y) ∂Q1 (x,y)
is an exact equation, i.e., = ∂y ∂x . However,
it might not be a trivial
exercise to find the integrating factor by inspection. We will learn a few tricks to
do this via some examples.

Example 1: Solve (3x2 y 3 − 5x4 )dx + (y + 3x3 y 2 )dy = 0

Since P (x, y) = 3x2 y 3 − 5x4 and Q(x, y) = y + 3x3 y 2 , you can easily verify that,
∂P/∂y = ∂Q/∂x = 9x2 y 2 and thus, this is an exact equation. Now, we have to find
a function F (x, y), such that P = ∂F/∂x and Q = ∂F/∂y. We can write,
Z Z
F (x, y) = P (x, y)dx = (3x2 y 3 − 5x4 )dx = x3 y 3 − x5 + f (y).

Thus, we have to find f (y) to get the solution. Using the other equation, we can
write,

∂F y2
y + 3x3 y 2 = = 3x3 y 2 + f 0 (y) ⇒ f (y) = + c1 .
∂y 2
y2
Thus, F (x, y) = x3 y 3 − x5 + 2 + c1 and the general solution of the given differential
equation is

y2
F (x, y) = c2 ⇒ x3 y 3 − x5 + =c
2

where the constant c replaces both c1 and c2 . You should differentiate the answer
1
Check for an exact differential: ∂P/∂y = ∂Q/∂x, because ∂ 2 F/∂y∂x = ∂ 2 F/∂x∂y.

88
7.2. FIRST ORDER DIFFERENTIAL EQUATIONS

and check whether you get the equation given in the question.

Example 2: Solve (2xe3y + ex )dx + (3x2 e3y − y 2 )dy = 0

Since P (x, y) = (2xe3y + ex ) and Q(x, y) = (3x2 e3y − y 2 ), you can easily verify that
∂P/∂y = ∂Q/∂x = 6xe3y and thus, this is an exact equation. Now, we have to find
a function F (x, y), such that P = ∂F/∂x and Q = ∂F/∂y. We can write,
Z Z
F (x, y) = P (x, y)dx = (2xe3y + ex )dx = x2 e3y + ex + f (y).

Thus, we have to find f (y) to get the solution. Using the other equation, we can
write,

∂F y3
3x2 e3y − y 2 = = 3x2 e3y + f 0 (y) ⇒ f (y) = + c1
∂y 3

Thus, F (x, y) = x2 e3y +ex −y 3 /3+c1 and the general solution for the given differential
equation is

F (x, y) = c2 ⇒ x2 e3y + ex − y 3 /3 = c

where the constant c replaces both c1 and c2 .

Example 3: Solve (x − y)dy + (y + x + 1)dx = 0

Since P (x, y) = y + x + 1 and Q(x, y) = x − y, you can easily verify that ∂P/∂y =
∂Q/∂x = 1, and thus, this is an exact equation. Now, we have to find a function
F (x, y), such that P = ∂F/∂x and Q = ∂F/∂y. We can write,

x2
Z Z
F (x, y) = P (x, y)dx = (y + x + 1)dx = xy + + x + f (y).
2

Thus, we have to find f (y) to get the solution. Using the other equation, we can
write,

∂F y2
x−y = = x + f 0 (y) ⇒ f (y) = − + c1
∂y 2

Thus, F (x, y) = xy + x2 /2 + x − y 2 /2 + c1 and the general solution for the given

differential equation is

x2 y2
F (x, y) = c2 ⇒ xy + +x− =c
2 2

where the constant c replaces both c1 and c2 .

Example 4: Solve (y 2 + 3xy 3 )dx + (1 − xy)dy = 0.

Since P (x, y) = y 2 + 3xy 3 and ∂P/∂y = 2y + 9xy 2 ; Q(x, y) = 1 − xy and ∂P/∂x = −y,
the equation is not an exact equation. We are going to use an integrating factor

89
7.2. FIRST ORDER DIFFERENTIAL EQUATIONS

U (x, y) = xm y n . Then, the above equation is converted to

(xm y n+2 + 3xm+1 y n+3 )dx + (xm y n − xm+1 y n+1 )dy = 0.

For the above equation to be exact, we must have,

(n + 2)xm y n+1 + 3(n + 3)xm+1 y n+2 = mxm−1 y n − (m + 1)xm y n+1 .

Rearranging, we can write,

[(n + 2) + (m + 1)]xm y n+1 + 3(n + 3)xm+1 y n+2 − mxm−1 y n = 0.

Since the right hand side is zero, every coefficient must be equal to zero,

(n + 2) + (m + 1) = 0,
(n + 3) = 0,
m = 0.

Thus, the solution is m = 0 and n = −3 and the integrating factor is U (x, y) = y −3 .

Given equation is converted to,

y 2 + 3xy 3 1 − xy
dx + dy = 0.
y3 y3
| {z } | {z }
P1 (x,y) Q1 (x,y)

We can easily verify that the above equation is exact, ∂P1 /∂y = ∂Q1 /∂x = −1/y 2 .
Now, we have to find a function F (x, y), such that P1 = ∂F/∂x and Q1 = ∂F/∂y. We
can write,

x 3x2
Z Z
1
F (x, y) = P1 (x, y)dx = + 3x dx = + + f (y).
y y 2

Thus, we have to find f (y) to get the solution. Using the other equation, we can
write,
1 x ∂F x 1
3
− 2 = = − 2 + f 0 (y) ⇒ f (y) = − 2 + c1
y y ∂y y 2y

Thus, F (x, y) = x/y + 3x2 /2 − 1/2y 2 + c1 and the general solution for the given
differential equation is

x 3x2 1
F (x, y) = c2 ⇒ + − 2 =c
y 2 2y

where the constant c replaces both c1 and c2 .

Example 5: Solve (3xy − y 2 )dx + x(x − y)dy = 0.

Since P (x, y) = (3xy − y 2 ) and ∂P/∂y = 3x − 2y; Q(x, y) = x(x − y) and ∂Q/∂x =

90
7.2. FIRST ORDER DIFFERENTIAL EQUATIONS

2x − y, the given equation is not an exact equation. We further see that,

1 ∂P ∂Q 1
− = = f (x).
Q ∂y ∂x x

When such condition (left hand side is a function of x only) is satisfied, I claim
R
f (x)dx
that the integrating factor is U (x, y) = U (x) = e = eI (see problem set). In
this particular case, the integrating factor is,
R
dx/x
U (x) = e = eln x = x. (7.7)

Multiplying the given equation with the integrating factor, we obtain,

(3x2 y − y 2 x) dx + (x3 − x2 y) dy = 0. (7.8)

| {z } | {z }
P1 (x,y) Q1 (x,y)

We can verify that, the above equation is exact as ∂P1 /∂y = ∂Q1 /∂x = (3x2 − 2xy).
Now, we have to find a function F (x, y), such that P1 = ∂F/∂x and Q1 = ∂F/∂y. We
can write,

y 2 x2
Z Z
F (x, y) = P1 (x, y)dx = (3x2 y − y 2 x)dx = x3 y − + f (y).
2

Thus, we have to find f (y) to get the solution. Using the other equation, we can
write,
∂F
x3 − x2 y = = x3 − x2 y + f 0 (y) ⇒ f (y) = c1
∂y
y 2 x2
Thus, F (x, y) = x3 y − 2 + c1 and the general solution for the given differential
equation is

y 2 x2
F (x, y) = c2 ⇒ x3 y − =c
2

where the constant c replaces both c1 and c2 .

7.2.3 Homogeneous equations (optional)

A first order differential equation of the form

P (x, y)dx + Q(x, y)dy = 0, (7.9)

is homogeneous if both P and Q are homogeneous functions of the same degree.2

Note that, a nth degree of homogeneous function of x and y can be expressed as
2
Homogeneous function has multiplicative scaling behavior. That means, if we multiply each
variable by same factor, then the function is multiplied by some integral power of this factor. For
example, f (x, y) is a homogeneous function of degree n, if f (tx, ty) = tn f (x, y). Example: x2 + xy +
y 2 +y 3 /x is a homogeneous function of degree 2. Example: x2 y+x4 /y+y 3 is a homogeneous function
of degree 3. Example: x2 y + xy + x is not a homogeneous function.

91
7.2. FIRST ORDER DIFFERENTIAL EQUATIONS

xn f (y/x).3 Since P and Q are homogeneous functions of same degree, the factor
xn gets canceled and we can write,

dy P (x, y) y
y0 = =− =f . (7.10)
dx Q(x, y) x

Thus, a homogeneous can always be expressed in the form of y 0 = f (y/x). Now,

we can solve this equation by substituting y = xv, which gives us a separable
equation in x and v.4 Some examples are given below.

Example 1: Solve x2 dy − (3y 2 + xy)dx = 0. 2

Q(x, y) = x2 and P (x, y) = −(3y 2 + xy) = −x2 3y
x2
+ y
x are homogeneous func-
tions of degree 2. We can write,

dy P (x, y) 3y 2 y y
=− = 2 + =f .
dx Q(x, y) x x x

In order to solve the above equation, we substitute, y = xv and get a separable

equation in x and v,

dv dv dx
v+x = 3v 2 + v ⇒ 2 = .
dx 3v x
Integrating both sides, we get

1 x −x
− = ln |x| + ln |c| ⇒ = −3 ln |cx| ⇒ y =
3v y 3 ln |xc|

In order to verify, you can differentiate the last equation and check whether you
get the differential equation given in the question.

Example 2: Solve x2 dy + (y 2 − xy)dx = 0.

2
Q(x, y) = x2 and P (x, y) = y 2 − xy = x2 xy 2 − xy are homogeneous functions of
degree 2. We can write,

dy P (x, y) y y2 y
=− = − 2 =f .
dx Q(x, y) x x x

In order to solve the above equation, we substitute, y = xv and get a separable

equation in x and v,

dv −dv dx
v+x = v − v2 ⇒ 2 = .
dx v x
3
Example: a homogeneous function of degree 2, x2 +xy can be expressed as x2 (1+y/x) = x2 f (y/x).
4
There is an alternate method of solving homogeneous differential equations. We can prove that
1/(P x + Qy) is an integrating factor for Eq. 7.9.

92
7.2. FIRST ORDER DIFFERENTIAL EQUATIONS

Integrating both sides, we get

1 x x
= ln |x| + ln |c| ⇒ = ln |cx| ⇒ y = (7.11)
v y ln |cx|

Example 3: Solve (y 2 − xy)dx + (x2 + xy)dy = 0.

P (x, y) = (y 2 − xy) = x2 (y 2 /x2 − y/x) and Q(x, y) = (x2 + xy) = x2 (1 + y/x) are
homogeneous functions of degree 2. We can write,

dy P (x, y) x2 (y/x − y 2 /x2 ) (y/x − y 2 /x2 ) y

=− = = = f .
dx Q(x, y) x2 (1 + y/x) (1 + y/x) x

In order to solve the above equation, we substitute, y = xv and get a separable

equation in x and v,

v − v2 −2v 2

dv dv 1 1 dx
v+x = ⇒x = ⇒ − 2− dv = 2 .
dx 1+v dx 1+v v v x

Integrating both sides, we get

1 x
− ln |v| = ln |cx2 | ⇒ = ln |cxy| ⇒ ex/y = cxy .
v y

7.2.4 Linear first order equations

A linear first order equation can be written in the form of

y 0 + P (x)y = Q(x), (7.12)

where P and Q are functions of x (or can be constants). If Q = 0, then we can

easily separate the variables and write,
Z
dy
= −P dx ⇒ ln y = − P dx + c. (7.13)
y
R
Assuming I = P dx (equivalently, dI/dx = P ), we can write the solution in the
form
y = ce−I . (7.14)
Now, let us solve for non-zero Q. In order to do this, first let us calculate the first
derivative of yeI :
d dI
(yeI ) = y 0 eI + yeI = y 0 eI + yeI P = eI (y 0 + yP ) = eI Q. (7.15)
dx dx
Since both eI and Q are function of x only, we can integrate to get,5
Z Z Z
I I −I I −I
ye = e Qdx + c ⇒ y = e e Qdx + e|{z}c, where I = P dx . (7.16)
| {z } yc
yp

5
There is an alternate way to solve linear equations, to be shown in the examples.

93
7.2. FIRST ORDER DIFFERENTIAL EQUATIONS

Note that, we have only one arbitrary constant, as expected for a linear first order
equation. Also, yc is the solution of Eq. 7.12 with Q = 0 and yp is known as the
particular solution.67 Some examples are given below.

Example 1: Solve y 0 − xy = 1.
Method 1:
This is a linear equation, with P (x) = −1/x and Q(x) = 1. Thus,
Z Z
1 1
I = P (x)dx = − dx = − ln x ⇒ eI = e− ln x =
x x
Z Z
y 1
yeI = eI Q(x)dx + ln c ⇒ = dx + ln c = ln(cx) ⇒ y = x ln(xc)
x x
Method 2:
dy dv
Let y = uv and dx= u dx + v du
dx . Thus, the above equation is converted to

dv du uv dv du u
u +v − =1⇒u +v − = 1.
dx dx x dx dx x
| {z }
=0

Setting the term involving v equal to zero, we get,

du u
= ⇒ ln u = ln(cx) ⇒ u = c1 x.
dx x
Let us replace u = c1 x in the equation above (term involving v is still equal to zero),

dv
c1 x = 1 ⇒ c1 v = ln(cx).
dx
Finally, using y = uv, we get,
1
y = c1 x ln(cx) ⇒ y = x ln(cx) .
c1

Example 2: Solve y 0 + y = ex .
Method 1:
This is a linear equation with P (x) = 1 and Q(x) = ex . Thus,
Z
I = P dx = x ⇒ eI = ex .

e2x ex
Z Z
I
ye = I
e Qdx + c = e2x dx + c = +c⇒ y = + ce−x .
2 2

Method 2:
6
From Eq. 7.16, yp eI = eI Qdx and yc eI = c, such that (yp + yc )eI = yeI = eI Qdx + c.
R R
7
Note that, the general solution of Eq. 7.12, i.e., y = yp + yc is not unique.

94
7.2. FIRST ORDER DIFFERENTIAL EQUATIONS

dy dv
Let y = uv and = u dx
dx + v du
dx . Thus, the above equation is converted to

dv du x dv du
u +v + uv = e ⇒ u +v + u = ex .
dx dx dx dx
| {z }
=0

Setting the term involving v equal to zero, we get,

du
= −u ⇒ ln u = −x + c1 ⇒ u = c2 e−x .
dx
Let us replace u = c2 e−x in the equation above (term involving v is still equal to
zero),

dv e2x
c2 e−x = ex ⇒ c2 v = + c3 .
dx 2
Finally, using y = uv, we get,

e2x ex

−x c3
y = c2 e + ⇒ y= + ce−x .
2c2 c2 2

Example 3: Solve x2 y 0 + 3xy = 1.

Method 1:
We can rewrite the given equation as y 0 + x3 y = x12 . This is a linear equation
with P (x) = 3/x and Q(x) = 1/x2 . Thus,
Z
I = P dx = 3 ln x ⇒ eI = x3 .

x2
Z Z
1
yeI = eI Qdx + c = xdx + c = +c⇒ y = + cx−3 .
2 2x

Method 2:
dy dv
Let y = uv and = u dx
dx + v du
dx . Thus, the above equation is converted to

dv du 3uv 1 dv du 3u 1
u +v + = 2 ⇒u +v + = 2.
dx dx x x dx dx x x
| {z }
=0

Setting the term involving v equal to zero, we get,

du 3u
=− ⇒ ln u = −3 ln x + ln c1 ⇒ u = c1 x−3 .
dx x
Let us replace u = c1 x−3 in the equation above (term involving v is still equal to
zero),

dv 1 x2
c1 x−3 = 2 ⇒ c1 v = + c2 .
dx x 2

95
7.2. FIRST ORDER DIFFERENTIAL EQUATIONS

Finally, using y = uv, we get,

x2

−3 c2 1
y = c1 x + ⇒ + cx−3 .
2c1 c1 2x

7.2.5 Bernoulli equation

Bernoulli equations can be written in the form of

y 0 + P y = Qy n , (7.17)

where P and Q are functions of x (or can be constants). Clearly, it is not a linear
equation, but can easily be converted to a linear equation, by making a change of
variable,
z = y 1−n ⇒ z 0 = (1 − n)y −n y 0 . (7.18)
Multiplying Eq. 7.17 with (1 − n)y −n and then making the above substitution, we
get

(1 − n)y −n y 0 + (1 − n)P y 1−n = (1 − n)Q (7.19)

0
z + (1 − n)P z = (1 − n)Q .

Thus, we have converted the non-linear equation 7.17 to a linear equation and
we already know how to solve this. Some examples are given below.

Example 1: Solve y 0 + y = xy 2/3 .

Substitute z = y 1/3 ⇒ z 0 = 31 y −2/3 y 0 . Multiplying both sides of the given equation
with 13 y −2/3 , we get,

1 −2/3 0 1 1/3 1 1 1
y y + y = x ⇒ z 0 + z = x.
3 3 3 3 3
Thus, we have converted the non-linear equation to a linear equation in x and z,
with P (x) = 1/3 and Q(x) = x/3. Thus,
Z
x
I = P dx = ⇒ eI = ex/3
3
Z Z
x
zeI = eI Qdx + c = ex/3 dx + c = xex/3 − 3ex/3 + c ⇒ z = x − 3 + ce−x/3 .
3

Replacing z = y 1/3 , the answer is y 1/3 = x − 3 + ce−x/3 .

Example 2: Solve y 0 + x1 y = 2x3/2 y 1/2 .

Substitute z = y 1/2 ⇒ z 0 = 12 y −1/2 y 0 . Multiplying both sides of the given equation
with 12 y −1/2 , we get,

1 −1/2 0 1 1/2 1
y y + y = x3/2 ⇒ z 0 + z = x3/2 .
2 2x 2x

96
7.2. FIRST ORDER DIFFERENTIAL EQUATIONS

Thus, we have converted the non-linear equation to a linear equation in x and z,

with P (x) = 1/2x and Q(x) = x3/2 . Thus,
Z
1
I = P dx = ln x ⇒ eI = x1/2
2
Z Z
x 3 x5/2
zeI = eI Qdx + c = x1/2 x3/2 dx + c = +c⇒z = + cx−1/2
3 3

x5/2
Replacing z = y 1/2 , the answer is y 1/2 = + cx−1/2 .
3

7.2.6 Exercise

Separable equations (optional)

1. Solve (1 + y 2 )dx + xydy = 0, given the boundary condition y = 0, when x = 5.
Answer: x2 (1 + y 2 ) = c, c = 25

2. Solve xy 0 − xy = y, given the boundary condition y = 1, when x = 1.

Answer: y = cxex , where c = 1/e
2 2
√ (y − x y)dy + (2xy + x)dx = 0, given the boundary condition y = 0, when
3. Solve
x = 2.
Answer: (2y 2 + 1) = c(x2 − 1)2 , where c = 1

4. Solve ydy + (xy 2 − 8x)dx = 0, given the boundary condition y = 3, when x = 1.

2
Answer: y 2 = 8 + ec−x , where c = 1

5. Solve y 0 = cos(x + y). [Hint: substitute u = x + y]

Answer: tan 21 (x + y) = x + c

6. Solve xy 0 + y = exy . [Hint: substitute u = xy]

Answer: y = −x−1 ln(c − x)

Exact equations

7. (cos x cos y + sin2 x)dx − (sin x sin y + cos2 y)dy = 0.

Answer: 4 sin x cos y + 2x − sin 2x − 2y − sin 2y = c

8. (1 + y 2 )dx + xydy = 0.
x2 x2 y 2
Answer: 2 + 2 +c=0

9. (x − cos y)dx − sin ydy = 0.

Answer: e−x (cos y − x − 1) = c

10. (xy 2 − 2y 3 )dx + (3 − 2xy 2 )dy = 0.

Answer: xexy + c = 0

97
7.2. FIRST ORDER DIFFERENTIAL EQUATIONS

11. ydx + (x2 + y 2 − x)dy = 0.

Answer: tan−1 x
y +y =c

12. (x − 1)y 0 + y − x−2 + 2x−3 = 0.

Answer: y 2 = −1/x2 + c/(x − 1)
h i
1 ∂P ∂Q
13. For an inexact equation, P (x, y)dx+Q(x, y)dy = 0, it is given that Q ∂y − ∂x =
I
Rf (x). Prove that e is an integrating factor for the given equation,I where I =
f (x)dx. [Hint: note that dI/dx = f (x). You have to prove that, e P (x, y)dx +
I
e Q(x, y)dy = 0 is an exact differential equation.]
h i
14. For an inexact equation, P (x, y)dx+Q(x, y)dy = 0, it is given that P1 ∂Q
∂x − ∂P
∂y =
fR (y). Prove that eI is an integrating factor for the given equation, where I =
f (y)dy. [Hint: note that dI/dy = f (y). You have to prove that, eI P (x, y)dx +
I
e Q(x, y)dy = 0 is an exact differential equation.]

Homogeneous equations (optional)

15. Check whether following functions are homogeneous and if yes, find the
degree.

(a) 4x2 + y 2
(b) x2 − 5xy + y 3 /x
(c) xy sin(x/y)
(d) (y 4 − x3 y)/x − xy 2 sin(x/y)
(e) x sin(xy)
p
(f) x2 y 3 + x5 ln(y/x) − y 6 / x2 + y 2
(g) x3 + x2 y + xy 2 + y 3
(h) x2 + y
(i) x2 + xy + y 3
(j) x + cos y
p
16. Solve ydy = (−x + x2 + y 2 )dx.
Answer: y 2 = 2cx + c2

17. Solve xydx + (y 2 − x2 )dy = 0.

2 /y 2
Answer: y 2 = ce−x

18. Solve (x2 + y 2 )dx − xydy = 0.

p
Answer: y = ± 2 ln(cx).

19. Solve (y − x)dx + (x + y)dy = 0.

√
Answer: y 2 + 2xy − x2 = c, which can be further written as y = ± 2x2 + c − x.

98
7.3. SECOND ORDER LINEAR DIFFERENTIAL EQUATIONS

20. Solve y 0 = y/x − tan(y/x).

Answer: x sin(y/x) = c

21. Prove that, 1/(P x + Qy) is an integrating factor for Eq. 7.9. [Hint: you have
to prove that (P dx + Qdy)/(P x + Qy) is an exact differential, provided P and
Q are homogeneous functions of same degree.]

Linear equations

22. Prove that eI is the integrating factor for Eq. 7.12, i.e., eI (P y − Q)dx + eI dy = 0
is an exact equation. Following the technique of solving an exact equation,
prove that yeI = eI Qdx + c.
R

2
23. Solve dy + (2xy − xe−x )dx = 0.
x2 −x2 2
Answer: y = 2 e + ce−x

24. Solve 2xy 0 + y = 2x5/2 .

Answer: y = 31 x5/2 + cx−1/2

25. Solve y 0 cos x + y = cos2 x.

Answer: y(sec x + tan x) = x − cos x + c

26. Solve y 0 + √ y = √1 .
x2 +1 (x+ x2 +1)
(x+c)
Answer: y = √
x+ x2 +1

Bernoulli equations

27. 3xy 2 y 0 + 3y 3 = 1.
Answer: y 3 = 1
3 + cx−3

28. yy 0 − 2y 2 cot x = sin x cos x.

Answer: y 2 = sin2 x(−1 + c sin2 x)

7.3 Second order linear differential equations

Differential equations of second order contain only first and second derivative of
y, i.e., y 0 and y 00 . We are going to discuss two types of second order equations in
this section.

7.3.1 Constant coefficients and zero right hand side

Equations of the form
a2 y 00 + a1 y 0 + a0 y = 0, (7.20)

99
7.3. SECOND ORDER LINEAR DIFFERENTIAL EQUATIONS

where a2 , a1 , a0 are constants, are known as homogeneous equations. Expressing

D = d/dx, the above equation is converted to

(a2 D2 + a1 D + a0 ) y = 0. (7.21)
| {z }
auxiliary equation

We could also have substituted y = ecx in Eq. 7.20 and get the same auxiliary
equation,
a2 c2 + a1 c + a0 = 0. (7.22)
Now, let us consider three possible cases.

Case 1: auxiliary equation having two distinct real roots

Expressing the auxiliary equation as (D − c1 )(D − c2 ),8 we can rewrite Eq. 7.21 as,

(D − c1 )(D − c2 )y = 0. (7.23)

Thus, in order to solve Eq. 7.21, we need to solve two first order equations,

(D − c1 )y = 0 & (D − c2 )y = 0. (7.24)

These are separable equations, with solutions y1 = ec1 x and y2 = ec2 x and the
general solution is a linear combination of the two.9 Thus, if c1 and c2 are two
roots of the auxiliary equation, the general solution is,

y = Aec1 x + Bec2 x . (7.25)

Case 2: auxiliary equation having complex conjugate roots

Let the roots of the auxiliary equation be, c1 = α+ιβ and c2 = α−ιβ. Thus, we have
two solutions: y1 = e(α+ιβ)x and y2 = e(α−ιβ)x , which are also complex conjugates of
each other. By taking linear combination, we get a complex solution of the form,

αx ιβx −ιβx
y=e Ae + Be , (7.26)

where A and B are arbitrary complex constants. Since e±ιβx = cos βx ± ι sin βx, we
can rewrite the above equation as, y = eαx (C1 cos βx + C2 sin βx), where C1 = (A + B)
and C2 = ι(A − B). Note that, by selecting appropriate constants, we can get
real, as well as imaginary solutions. For example, if we take A = B = 1/2, we
get a real solution y = eαx cos βx. Similarly, if we take A = 1/2ι and B = −1/2ι,
we get another real solution y = eαx sin βx. Interestingly, since cos βx and sin βx
are linearly independent functions, we can get a series of real solutions by taking
√
8 −a1 ± a2 1 −4a2 a0
c1 and c2 are the roots of the auxiliary equation a2 D2 + a1 D + a0 = 0, given by 2a2
.
9
We can do this as two solutions are linearly independent. Two functions f1 (x) and f2 (x) are lin-
f1 (x) f2 (x)
early independent if the Wronskian is not equal to zero. Wronskian is given by: W = 0 .
f1 (x) f20 (x)

100
7.3. SECOND ORDER LINEAR DIFFERENTIAL EQUATIONS

linear combination of them, i.e.,

y = eαx (C1 cos βx + C2 sin βx) , (7.27)

where C1 and C2 are real arbitrary constants. We can further express this as,

y = Ceαx sin(βx + γ) , (7.28)

where C and γ are arbitrary constants.

Case 3: auxiliary equation having same roots

There exist one more possibility, that both the roots of the auxiliary equation are
same. Then, Eq. 7.20 takes the following form:

(D − c) (D − c)y = 0. (7.29)
| {z }
u

Obviously, solving (D − c)y = 0, we get one solution to be y = Aecx . In order to get

the other solution, we note that (D − c)y is going to be some function u(x), such
that we can write the above equation as,

(D − c)u = 0 ⇒ u = Aecx . (7.30)

Finally, we can solve for y from,

(D − c)y = Aecx ⇒ y 0 − cy = Aecx . (7.31)

R side, with P = −c
This is a linear first order equation, having non-zero right hand
and Q = Aecx . The solution is given by Eq. 7.16, where I = P dx = −cx. We can
write the solution as,
Z Z
ye = e Qdx = e−cx Becx dx = Ax + B ⇒ y = (Ax + B)ecx .
I I
(7.32)

Example 1: Simple harmonic motion.

Such periodic or oscillatory motion happens when the restoring force is pro-
portional to the displacement and acts in the opposite direction. There are sev-
eral examples, like spring-mass system, pendulum, vibration of a structure (like
a bridge), vibration of atoms in a crystal etc.
Let us consider a spring-mass system. Assuming no friction, we can write the
2
Newton’s second law of motion as m ddt2x = −kx = −mω 2 x.10 Thus, we have to solve
2
a differential equation of the form ddt2x + ω 2 x = 0. Writing D = d/dt, we get,

(D2 + ω 2 )x = 0.
10
Potential energy of a spring is given by U (x) = 12 kx2 = 12 mω 2 x2 and the force = − dU
dx
. Such a
force is known as conservative force and we already know that work done is independent of the
path in such a force field.

101
7.3. SECOND ORDER LINEAR DIFFERENTIAL EQUATIONS

Thus, the auxiliary equation we have to solve is,

D2 + ω 2 = 0,

and the roots are D = ±ιω. Thus, the general solution can be expressed in any of
the three forms given in Eq. 7.26, Eq. 7.27 or Eq. 7.28,

x(t) = Aeιωt + Be−ιωt ,

x(t) = C1 cos ωt + C2 sin ωt,
x(t) = C sin(ωt + γ).

Example 2: Damped harmonic motion.

If energy of an oscillator is dissipated, leading to gradual decrease of amplitude
or preventing it from oscillating, such a motion is termed as damped harmonic
motion. Damping happens because of various different reasons like presence of
friction or viscous drag etc.11
Again, let us consider a spring-mass system. The damping force (due to fric-
tion) is linearly dependent on velocity and it acts opposite to the direction of the
velocity, i.e., Fd = −c dx
dt . Thus, the equation of motion is given by,

d2 x dx d2 x dx
m 2
= −kx − c ⇒ m 2 +c + kx = 0 .
dt dt dt dt
√
2
The auxiliary equation is mD2 + cD + k = 0, having roots D = −c± 2m c −4mk
=
q
2 −4mk
c
p
− 2m ± c 4m 2
c
= −γ ± γ 2 − ω 2 . Note that, γ = 2m is known as the damping
coefficient and the reason is going to be obvious when we discuss the solutions of
the equation of motion. Let us discuss three possible cases.

Case 1: Overdamped oscillator: c2 − 4mk > 0 p

In this case, the roots of the auxiliary equation are −γ ± β, where β = γ 2 − ω 2 .
The general solution is given by x(t) = e−γt (Aeβt + Be−βt ).

Case 2: Critically damped oscillator: c2 − 4mk = 0

In this case, both the roots are equal to γ. Thus, the general solution is x(t) =
−γt
e (At + B).

Case 3: Underdamped oscillator: c2 − 4mk < 0 p

In this case, we have complex roots −γ ± ιβ, where β = γ 2 − ω 2 . The general
solution is given by x(t) = e−γt (C1 cos βt + C2 sin βt).
Displacement is plotted as a function of time for all three cases in Fig. 7.1.
Note that, there are other systems for which we need to solve a similar differential
equation and not surprisingly, we will get similar solutions. One famous example
11
Consider a child playing on a swing. If we do not apply any force, the amplitude of oscillation
of the swing decreases gradually.

102
7.3. SECOND ORDER LINEAR DIFFERENTIAL EQUATIONS

1.5
Critically damped
Overdamped
1 Underdamped
x(t)

0.5

0 1 2 3 4 5
t
Figure 7.1: Displacement as a function of time for damped harmonic motion.

(a) (b)

Figure 7.2: Different systems having similar differential equation: (a) spring-mass
and (b) RLC circuit connected in series. Images are take from Wikipedia.

103
7.3. SECOND ORDER LINEAR DIFFERENTIAL EQUATIONS

is a RLC circuit, where the components are connected in series (see Fig. 7.2). In
this case, the governing equation is12

d2 I dI I dV
L 2
+R + = ,
dt dt C dt
and if we set right hand side equal to zero, we solve an equation similar to damped
harmonic motion. In this case, resistance has a similar role as played by friction
in case of spring-mass system.

7.3.2 Exercise
1. Re-derive Eq. 7.25: write auxiliary equation (D − c1 ) (D − c2 )y = 0. (D − c2 )y
| {z }
u(x)
must be some function of x, say u(x). Now, first solve for u(x) from (D−c1 )u =
0. Then, solve for (D − c2 )y = u and check whether you get the same answer
as Eq. 7.25.

2. Consider the solution in case of overdamped harmonic motion. Can β and γ

take any value for the general solution to be stable [i.e., x(t) does not go to
±∞ with increasing time] or there is some restriction?

Solve the differential equations

3. y 00 + y 0 − 2y = 0
Answer: y = Aex + Be−2x

4. y 00 + 9y = 0
Answer: y = Ae3ιx + Be−3ιx

5. y 00 − 2y 0 + y = 0
Answer: y = (Ax + B)ex

6. y 00 − 5y 0 + 6y = 0
Answer: y = Ae3x + Be2x

7. y 00 − 4y 0 + 13y = 0
Answer: y = Ae2x sin(2x + γ)

8. 4y 00 + 12y 0 + 9y = 0
Answer: y = (A + Bx)e−3x/2
12 Q
We know that V = RI, V = Q/C, V = L(dI/dt). Combining, we get L dI dt
+ RI + C
= V . Taking
dQ d2 I dI I dV
time derivative and noting that I = dt , we get L dt2 + R dt + C = dt .

104
7.3. SECOND ORDER LINEAR DIFFERENTIAL EQUATIONS

Check for linear independence (by calculating Wronskian)

9. e−x , e−4x

10. eax , ebx , a 6= b (a, b real or imaginary)

11. eax , xeax

12. sin βx, cos βx

13. 1, x, x2

14. eax , xeax , x2 eax

7.3.3 Constant coefficients and non-zero right hand side

Equations of the form
a2 y 00 + a1 y 0 + a0 y = f (x), (7.33)
where a2 , a1 , a0 are constants, are known as inhomogeneous equations. The func-
tion f (x) is often termed as a forcing function, which represents an applied force
or emf (electromotive force). If we set the right hand side equal to zero, we get a
complementary function yc , which is the solution of the homogeneous equation.
For non-zero right hand side, we get a particular solution yp and the general solu-
tion is given by,
y = yc + yp . (7.34)
We already know how to solve for yc . Let us learn a few tricks to solve for yp .

Method 1: Via inspection

Let us consider the equation y 00 + y 0 − 2y = −3. You can check that the complemen-
tary function is yc = Aex + Be−2x . Via inspection, it is easy to find the particular
solution to be yp = 3/2 and the general solution is y = Aex + Be−2x + 3/2.

Method 2: Solve two successive first order linear equations

Instead of a constant, if the right hand side is some function, then method of
inspection to find yp is most likely going to fail. For example, let us consider the
equation y 00 + y 0 − 2y = ex . The complimentary function is same as the previous
problem. First we write the differential equation as,

(D − 1) (D + 2)y = ex .
| {z }
u

Now, let (D + 2)y = u, such that we get a first order linear differential equation,

(D − 1)u = ex ⇒ u0 − u = ex ,

105
7.4. COUPLED FIRST ORDER DIFFERENTIAL EQUATIONS

with P = −1 and Q = ex . The solution is,

Z
I= P dx = −x
Z Z
ueI = eI Qdx + c = dx + c = x + c ⇒ u = xex + cex .

Thus, the first order linear differential equation for y is,

(D + 2)y = xex + cex ⇒ y 0 + 2y = xex + cex ,

where P = 2 and Q = xex + cex . The solution is,

Z
I= P dx = 2x
Z Z
1 1 1
yeI = eI Qdx + c1 = (xe3x + ce3x )dx + c1 = xe3x − e3x + ce3x + c1
3 9 3
1 x 1 x 1 x
y = xe − e + ce + c1 e−2x .
|3 {z 9 } |3 {z }
yp yc

I would like to draw attention to the fact that, we have obtained yc from the
arbitrary constants at every step. If we omit the arbitrary constants, we can
quickly get the particular solution. Finally, we can beautify the final answer by
writing 13 c − 91 = c2 , such that the general solution is y = 13 xex + c2 ex + c1 e−2x .

7.3.4 Exercise
1. y 00 − 4y = 10
Answer: y = Ae2x + Be−2x − 5
2

2. y 00 + y 0 − 2y = e2x
Answer: Aex + Be−2x + 41 e2x

3. y 00 + y = 2ex
Answer: y = Aeιx + Be−ιx + ex

4. y 00 − y 0 − 2y = 3e2x
Answer: y = Ae−x + Be2x + xe2x

5. y 00 + 2y 0 + y = 2e−x
Answer: y = (Ax + B + x2 )e−x

7.4 Coupled first order differential equations

This I add as an application of eigenvalues and eigenvectors. Let y1 (t) and y2 (t)
are both functions of t, having first derivatives y10 = dy1 /dt and y10 = dy2 /dt. We

106
7.4. COUPLED FIRST ORDER DIFFERENTIAL EQUATIONS

have to solve for y1 and y2 by solving two differential equations, given by

y10 = ay1 + by2 , (7.35)

y20 = cy1 + dy2 .

Note that, we can express the above equation in the matrix form as,
0
y1 a b y1
0 = . (7.36)
y2 c d y2

Two column vectors y~0 and ~y are related by the matrix A, such that y~0 = A~y . Now,
let us assume that ~y = B~x, such that y~0 = AB~x and we get,

B −1 y~0 = B −1 AB~x ⇒ x~0 = B −1 AB~x. (7.37)

If matrix B is made of eigenvectors of A, then we know that B −1 AB is a diagonal

matrix D, such that
x~0 = D~x . (7.38)

x1
It is easy to solve for the vector ~x = from the above equation, because D is a
x2
diagonal matrix, made of eigenvalues of A, say λ1 and λ2 . Thus, we solve for

x01 = λ1 x1 ⇒ x1 = c1 eλ1 t , (7.39)

x02 = λ2 x2 ⇒ x2 = c2 e λ2 t
.

Finally, we use ~y = B~x to get the solution for y1 (t) and y2 (t).
Let us solve for,

y10 = 3y1 + 2y2 , (7.40)

y20
= 6y1 − y2 .

3 2
The eigenvalues and eigenvectors for A = are,
6 −1

1 1
λ1 = −3, & λ2 = 5, . (7.41)
−3 1

−3 0
Let us construct the matrices from the eigenvalues and eigenvectors: D =
0 5

1 1
and B = . Now, using Eq. 7.38, we can write,
−3 1

x01

−3 0 x1
= (7.42)
x02 0 5 x2

and the solutions are, x1 = c1 e−3t and x2 = c2 e5t . Now, we get the final solution

107
7.5. CONVERTING HIGHER ORDER TO 1ST ORDER EQUATIONS

y1 1 1 x1
from ~y = B~x ⇒ = ,
y2 −3 1 x2

y1 (t) = c1 e−3t + c2 e5t , (7.43)

−3t 5t
y2 (t) = −3c1 e + c2 e .

7.4.1 Exercise
1. Solve y10 = y1 + y2 , y20 = 4y1 + y2 .
Answer: y1 = c1 e3t + c2 e−t , y2 = 2c1 e3t − 2c2 e−t

7.5 Converting higher order to 1st order equations

We can convert a linear differential equation of order n to n first order linear
differential equations and then use the above technique to get a solution. Let us
solve,
y 00 + a1 y 0 + a0 y = 0 ⇒ y 00 = −a0 y − a1 y 0 . (7.44)
We do the following change of variables,

x1 = y & x2 = y 0 , (7.45)
x01 0
= y = x2 & x02 =y . 00

Thus, we can write two coupled first order linear equation,

x01 = 0x1 + 1x2 , (7.46)

x02 = −a0 x1 − a1 x2 .

Thus, we have converted a 2nd order equation to two coupled 1st order equations,
which we can solve following the method shown in the previous section. We can
do it for even higher order equations, like a 3rd order equation,

y 000 + a2 y 00 + a1 y 0 + a0 y = 0 ⇒ y 000 = −a0 y − a1 y 0 − a2 y 00 . (7.47)

Using the following substitution,

x1 = y, x2 = y 0 , x3 = y 00 , (7.48)

the above equation can be converted to three 1st order equations,

x01 = 0x1 + 1x2 + 0x3 , (7.49)

x02 = 0x1 + 0x2 + 1x3 ,
x03 = −a0 x1 − a1 x2 − a2 x3 .

Let us see an example, where we solve the following 2nd order linear equation
using this method,
y 00 + 5y1 − 6y = 0. (7.50)

108
7.5. CONVERTING HIGHER ORDER TO 1ST ORDER EQUATIONS

Using the substitution x1 = y, x2 = y 0 , we get two 1st order equations to solve,

x01 = 0 + x2 , (7.51)
x02 = 6x1 − 5x2 .

0 1
The eigenvalues and eigenvectors for matrix A = are,
6 −5

1 1
λ1 = 1, & λ2 = −6, . (7.52)
1 −6

1 0 1 1
The D matrix is given by D = and the B matrix is given by, B = .
0 −6 1 −6
Using Eq. 7.38 we can write
0
z1 1 0 z1
0 = , (7.53)
z2 0 −6 z2

and the solutions are z1 = c1 et and z2 = c2 e−6t . Thus, the solution for x1 and x2
are,
x1 1 1 z1
= . (7.54)
x2 1 −6 z2
Now, since y = x1 , we write the final solution as, y = c1 et + c2 e−6t .

7.5.1 Exercise
1. y 00 + y 0 − 2y = 0
Answer: y = c1 et + c2 e−2t

109
Chapter 8

Fourier Series and Fourier

Transform

8.1 Fourier series of functions of period 2π

Let us consider a periodic function f : R → R of period 2π. Such a function
satisfies f (x + 2π) = f (x)∀x ∈ R. We can expand f (x) in an infinite series in terms
of cos nx and sin nx as,
∞
X
f (x) = a0 + (an cos nx + bn sin nx) . (8.1)
n=1

What is so special about cos nx and sin nx? Obviously, they also have a period of
2π, i.e., sin n(x + 2π) = sin nx and cos n(x + 2π) = cos nx. You may think that sin nx
and cos nx has a shorter period (2π/n), but still they repeat every 2π, which makes
them suitable for the purpose of expanding f (x).1
The second advantage of expanding in terms of sin nx and cos nx is their or-
thogonality,
Z π
1
sin mx cos nxdx = 0, (8.2)
2π −π
Z π
1
sin mx sin nxdx = δmn ,
2π −π
Z π
1
cos mx cos nxdx = δmn .
2π −π

This is of great advantage when we try to find the coefficients an and bn . In

order to find the coefficient an (bn ), we have to multiply both sides of Eq. 8.1
1 1
by 2π cos mx ( 2π sin mx) and integrate each term over a period −π to π. Because
of the orthogonality, only one term is going to be non-zero, which gives us the
1
We can not expand f (x) just in terms of only two terms, sin x and cos x, having a period exactly
equal to 2π. We need many of them, like sin nx and cos nx with n = 1, 2, 3, ....

110
8.2. FOURIER SERIES OF FUNCTIONS OF PERIOD 2L

coefficients,
1 π
Z
an = f (x) cos nxdx, (8.3)
π −π
1 π
Z
bn = f (x) sin nxdx.
π −π

To get the
R πconstant a0 ,Rwe just need to integrate each term of Eq. 8.1 from −π to
π
π. Since −π sin nxdx = −π cos nxdx = 0,
Z π
1
a0 = f (x)dx. (8.4)
2π −π

Continuity of f (x):
There are functions which are periodic, but piecewise continuous in the interval
[−π, π].2 We can still write the Fourier series for such a function. Say f (x) is
continuous everywhere, expect at x0 . The Fourier series (Eq. 8.1) converges to
f (x) everywhere except x0 . At the point of discontinuity, the series converges to
−
f (x+
0 )+f (x0 )
the average of the left and right hand limits of f at x0 , i.e., 2 .

8.1.1 Exercise
1. Can we use sin nx and cos nx in Fourier series to expand a function of period
2π if n is non-integer?

2. Give examples of some non-periodic functions.

3. Give examples of some functions which are periodic, but not defined for all
x ∈ R.

8.2 Fourier series of functions of period 2L

Instead of a function of period 2π, you are given a function of period 2L, i.e.,
f (x + 2L) = f (x)∀x ∈ R. How does one write the Fourier series expansion? All
we need to do is to redefine the range from [−π, π] ⇒ [−L, L] and the rest of the
methods remain same as before. One should not get confused by the phrase
“redefine the range”. We are not talking about the range of the function, as it is
defined over entire R. However, it is sufficient to specify the function only within
[−π, π] or [−L, L], as it is repeated over rest of the space because of periodicity.
One can easily switch between [−π, π] and [−L, L] by a suitable change of variable.
Let us define a variable z = πxL , such that −π ≤ z ≤ π as −L ≤ x ≤ L. Since sin z
and cos z has a period of 2π, sin(πx/L) and cos(πx/L) has a period of 2L. One can
easily verify that,
nπ nπx nπ
sin (x + 2L) = sin + 2nπ = sin x. (8.5)
L L L
2
Discontinuous at some points, for example f (x) = tan x is piecewise continuous.

111
8.2. FOURIER SERIES OF FUNCTIONS OF PERIOD 2L

Thus, the Fourier series expansion is,

∞ h
X nπx nπx i
f (x) = a0 + an cos + bn sin , (8.6)
L L
n=1

and the coefficients are,

Z L
1 nπx
an = f (x) cos dx, (8.7)
L −L L
Z L
1 nπx
bn = f (x) sin dx,
L −L L
Z L
1
a0 = f (x)dx.
2L −L

Example
Find the Fourier series expansion of,

 0, −2 < x < −1
f (x) = 2, −1 < x < 1 (8.8)
0, 1<x<2


In this case, 2L = 4 and the coefficients are,

1 2 1 1
Z Z
a0 = f (x)dx = dx = 1, (8.9)
4 −2 2 −1
1 2
Z nπx Z 1 nπx 4 nπ
an = f (x) cos dx = cos dx = sin , (8.10)
2 −2 2 −1 2 nπ 2
1 2
Z nπx Z 1 nπx
bn = f (x) sin dx = sin dx = 0. (8.11)
2 −2 2 −1 2

The second expression can further be expressed as,


 0, n − even
4
an = , n = 1, 5, ..... (8.12)
 nπ 4
− nπ , n = 3, 7, .....

Thus, the Fourier series expansion of f (x) is,

4 πx 1 3πx 1 5πx
f (x) = 1 + cos − cos + cos +··· . (8.13)
π 2 3 2 5 2

As discussed previously, since f (x) is discontinuous at x = ±1, the Fourier series

converges to 1 at those points.

8.2.1 Exercise
1. If f (x) has a period of w, prove that f (x + nw) = f (x), where n is an integer.

112
8.3. FOURIER COSINE/SINE SERIES

8.3 Fourier cosine/sine series

8.3.1 Fourier series expansion of even functions
In case of an even function, f (−x) = f (x) and Fourier series expansion is given
by,
∞
X nπx
f (x) = a0 + an cos . (8.14)
L
n=1

Since f (x) is an even function, bn = 0∀n ∈ N and,

Z L
2 nπx
an = f (x) cos dx, (8.15)
L 0 L
Z L
1
a0 = f (x)dx.
L 0

8.3.2 Fourier series expansion of odd functions

In case of an odd function, f (−x) = −f (x) and Fourier series expansion is given
by,
∞
X nπx
f (x) = bn sin . (8.16)
L
n=1

Since f (x) is an odd function, an = 0∀n ∈ N and,

Z L
2 nπx
bn = f (x) sin dx. (8.17)
L 0 L

8.3.3 Exercise
1. Derive the coefficients of Fourier cosine and sine series.

8.4 Half range expansion

If we have a function f (x) defined on [0, L],3 how can we write the Fourier series
expansion? We have to extend the function to make it 2L periodic. There are two
ways of extending f (x).

Case 1: even extension

f (x), 0<x<L
f˜(x) = (8.18)
f (−x), −L < x < 0
Since f˜(x) is an even function, we can write a Fourier cosine expansion,
∞
X nπx
f˜(x) = ã0 + ãn cos , (8.19)
L
n=1
3
Unlike the previous cases, where f (x) was defined over entire R.

113
8.4. HALF RANGE EXPANSION

where
L
2 L
Z Z
2 nπx nπx
ãn = f˜(x) cos dx = f (x) cos dx, (8.20)
L 0 L L 0 L
1 L ˜ 1 L
Z Z
ã0 = f (x)dx = f (x)dx.
L 0 L 0

Case 2: odd extension

f (x), 0<x<L
f˜(x) = (8.21)
−f (−x), −L < x < 0
Since f˜(x) is an odd function, we can write a Fourier sine expansion.
∞
X nπx
f˜(x) = b̃n sin , (8.22)
L
n=1

where
L
2 L
Z Z
2 ˜
nπx nπx
b̃n = f (x) sin dx = f (x) sin dx. (8.23)
L 0 L L 0 L

8.4.1 Exercise

114
Chapter 9

Series Solutions of Ordinary

Differential Equations

9.1 Legendre equation

(1 − x)2 y 00 − 2xy 0 + n(n + 1)y = 0 . (9.1)

9.2 Bessel equation

x2 y 00 + xy 0 + (x2 − n2 )y = 0 . (9.2)
Sometimes, it is preferred to have an additional parameter k in Bessel equation.
Substituting x = kz in the above equation, (dy/dx) = (dy/dz)(dz/dx) = 1/k(dy/dz)
and (d2 y/dx2 ) = 1/k 2 (d2 y/dz 2 ). Substituting in the above equation, we get Bessel
equation with z as the independent variable,

z 2 y 00 + zy 0 + (k 2 z 2 − n2 )y = 0 . (9.3)

9.3 Sturm-Liouville boundary value problems (SL-BVP)

The equation
Legendre, Bessel and other important ODEs can be expressed as a Sturm-Liouville
equation,
[p(x)y 0 ]0 + [q(x) + λr(x)]y = 0 , (9.4)

where p(x), p0 (x), q(x), r(x) are real-valued and continuous in the interval a ≤ x ≤ b,
and r(x) > 0 throughout the interval. The Sturm-Liouville boundary conditions
are given by,

k1 y(a) + k2 y 0 (a) = 0, (9.5)

0
l1 y(b) + l2 y (b) = 0,

115
9.3. STURM-LIOUVILLE BOUNDARY VALUE PROBLEMS (SL-BVP)

where k1 , k2 are constants, at least one of them non-zero, and so are l1 , l2 . One
can also have a periodic boundary condition given by,

y(a) = y(b), (9.6)

0 0
y (a) = y (b).

The solutions
The solutions of Eq. 9.4 are called the eigenfunctions. We have to find the eigen-
function y(x) corresponding to an eigenvalue λ. Several eigenvalues and eigen-
functions are possible for a given problem. If all the conditions mentioned above
are satisfied, then all the eigenvalues are real.1

Theorem on orthogonality of eigenfunctions

If ym (x) and yn (x) are two eigenfunctions of Eq. 9.4, corresponding to two different
eigenvalues λm and λn , then ym (x) and yn (x) are orthogonal with respect to the
weight function r(x), such that,
Z b
r(x)ym (x)yn (x)dx = c2 δmn , (9.7)
a

where c is a constant, δmn = 0 for m 6= n and δmn = 1 for m = n. Let us prove

the orthogonality of the eigenfunctions. Since ym and yn satisfies the the Sturm-
Liouville equation, we can write,
0 0
[pym ] + qym = −λm rym (9.8)
, [pyn0 ]0 + qyn = −λn ryn .

Multiplying the first and second equation with yn and ym , respectively, and sub-
tracting,
0 0
yn [pym ] − ym [pyn0 ]0 = (λn − λm )rym yn . (9.9)
Adding and subtracting pyn0 ym
0 to the left-hand side of the above equation, we can

express the above equation in a more compact form as,

0
[(pym )yn − (pyn0 )ym ]0 = (λn − λm )rym yn . (9.10)

Since p, p0 and r are continuous in the interval a ≤ x ≤ b, we can integrate both

sides of the equation,

0 b
Z b
p(ym yn − yn0 ym ) a = (λn − λm ) rym yn dx . (9.11)
|a {z }

Eigenfunctions are orthogonal if the left-hand side is equal to zero. Do you no-
tice that q(x) got eliminated and p(x), as well as boundary conditions [Eq. 9.5]
1
Eigenvalues generally correspond to physical quantities like energies and frequencies, which
are real.

116
9.3. STURM-LIOUVILLE BOUNDARY VALUE PROBLEMS (SL-BVP)

determine the orthogonality. From the above equation, we get,

0
p(b)[ym (b)yn (b) − yn0 (b)ym (b)] − p(a)[ym
0
(a)yn (a) − yn0 (a)ym (a)]. (9.12)

Above expression must be equal to zero for the eigenfunctions to be orthogonal to

each other. Let us survey all possible scenario and satisfy ourselves that Eq. 9.12
is zero under any circumstances.
• If p(a) = p(b) = 0, Eq. 9.12 is zero and we do not need boundary conditions
given in Eq. 9.5. Such a problem is known as singular, as opposed to a
regular problem, where we must use boundary conditions given in Eq. 9.5.
• If p(a) = 0 and p(b) 6= 0, then the second term in Eq. 9.12 is zero. We do
not need the first boundary condition in Eq. 9.5 and the second boundary
condition gives us,
0
l1 ym (b) + l2 ym (b) = 0, (9.13)
l1 yn (b) + l2 yn0 (b) = 0.
0 (b) = 0 for Eq. 9.12 to be zero.
– If l1 = 0 and l2 6= 0,2 we must have y
– If l1 6= 0 and l2 = 0, we must have y (b) = 0 for Eq. 9.12 to be zero.
0 (b) = 0 for
– If both l1 and l2 are non-zero, we must have y (b) = 0 and y
Eq. 9.12 to be zero.
• If p(a) 6= 0 and p(b) = 0, the proof is very similar. In this case, we have to use
the first boundary condition of Eq. 9.5 and do not need the second one.
0
k1 ym (a) + k2 ym (a) = 0, (9.14)
k1 yn (a) + k2 yn0 (a) = 0.
0 (a) = 0 for Eq. 9.12 to be zero.
– If k1 = 0 and k2 6= 0, we must have y
– If k1 6= 0 and k2 = 0, we must have y (a) = 0 for Eq. 9.12 to be zero.
0 (a) = 0 for
– If both k1 and k2 are non-zero, we must have y (a) = 0 and y
Eq. 9.12 to be zero.
• If p(a) 6= 0 and p(b) 6= 0, we have to use the both the boundary conditions of
Eq. 9.5.
• If p(a) = p(b) 6= 0, Eq. 9.12 is,
0
p(b)[ym (b)yn (b) − yn0 (b)ym (b) − ym
0
(a)yn (a) + yn0 (a)ym (a)]. (9.15)
0 (a) = y 0 (b), which is
The above expression is zero, if y (a) = y (b) and y
same as the periodic boundary condition, given in Eq. 9.6.
Using Eq. 9.7, we can also define the norm of yn as,
s
Z b
||yn || = r(x)yn2 (x)dx = c. (9.16)
a
2
Both of them can not be zero by definition.

117
9.3. STURM-LIOUVILLE BOUNDARY VALUE PROBLEMS (SL-BVP)

A set of eigenfunctions y1 , y2 , y3 , · · · are called orthonormal, if they are orthogonal

and c = 1 (i.e., norm equal to 1).

Example: Legendre equation

We can write Legendre equation [Eq. 9.1] in the form of,

[(1 − x2 )y 0 ]0 + λy = 0, (9.17)

such that p = (1 − x2 ), q = 0, r = 1, λ = n(n + 1). Since p(1) = p(−1) = 0, we have

a singular SL problem on the interval −1 ≤ x ≤ 1 and do not need boundary
conditions. Eigenvalues are λ = 0 × 1, 1 × 2, 2 × 3, · · ·, since n = 0, 1, 2, · · ·. Since Leg-
endre polynomials [Pn (x)] are solutions of the problem, we have the orthogonality
relation, Z 1
Pm (x)Pn (x)dx = 0, (9.18)
−1
as r = 1.

Example: Bessel equation

Multiplying each term of Eq. 9.3 with 1/z, we can express it in the form of a SL
equation as, 2
0 0 n
[zy ] + − + λz y = 0, (9.19)
z
in the interval 0 ≤ z ≤ kR [as 0 ≤ x ≤ R and z = kx], with p = z, q = −n2 /z, r = z
and λ = k 2 . Hence, at x = 0 [same as z = 0 as z = kx] p(0) = 0 and we do not
need the first of the boundary conditions given in Eq. 9.5. Using second of the
boundary conditions and assuming l1 6= 0, for orthogonality of the eigenfunctions
(Bessel functions) we must have,

Jn (kR) = 0. (9.20)

For a given value of n, Jn (kx) has infinitely many zeros at different values of an,i ,
such that,
an,i
kn,i = . (9.21)
R
Thus, Bessel functions Jn (kn,1 x), Jn (kn,2 x), Jn (kn,3 x) · ·· form an orthogonal set on
the interval 0 ≤ x ≤ R with respect to the weight function r(x) = x,
Z R
xJn (kn,i x)Jn (kn,j x)dx = δij . (9.22)
0

9.3.1 Exercise
1. How many boundary conditions do you need for Legendre equation? Specify
the boundary conditions.

2. How many boundary conditions do you need for Bessel equation? Specify
the boundary conditions.

118
9.3. STURM-LIOUVILLE BOUNDARY VALUE PROBLEMS (SL-BVP)

3. You have to solve y 00 + λy = 0 in the range a ≤ x ≤ b. Suggest a suitable

boundary condition.

4. Express Chebyshev equation,

(1 − x2 )y 00 − xy 0 + n2 y = 0, (9.23)

in the form of a SL equation. How many boundary conditions do you need?

119
Chapter 10

Solution of Selected Partial

Differential Equations

In this chapter, we will learn some of the most important problems in science and
engineering, represented in the form of partial differential equations (PDEs). Un-
like ordinary differential equations (ODEs), PDEs involve multivariable functions.
For example, we have to deal with functions in higher spatial dimensions (two or
three dimensions, involving two or three spatial variables) or functions of both
space and time.

10.1 Boundary and initial conditions

Instead of the differential equations, let me start the discussion with boundary
conditions! There is a good reason for doing this. As we progress, it will become
apparent to you that getting the general solutions is relatively easy. In fact, you
can get all possible general solutions listed in different coordinate systems, in
textbooks, or internet. But, then, you may wonder what we are supposed to do?
Well, what makes every problem unique are the boundary conditions. As shown
in Figure 10.1, we generally need to solve a differential equation within a domain.
Inside the domain, the solution of the differential equation determines the value
of the unknown function f . Along the boundaries, values of f are given as the
boundary conditions. Obviously, f must vary smoothly from the interior to the
boundary of a domain.
Note that how many boundary conditions we need for a problem depends on
the dimensionality and shape of the domain. For example, we need two boundary

Boundary condition Form

Dirichlet or fixed f (0) = a, f (l) = b
Neumann f 0 (0) = a, f 0 (l) = b
Robin or mixed f (0) = a, f 0 (l) = b
Periodic f (0) = f (l)

Table 10.1: List of boundary conditions for a one-dimensional domain, extending

from 0 to l.

120
10.1. BOUNDARY AND INITIAL CONDITIONS

f(x,l)=d

f(l,y)=b
f(0,y)=a
f(0)=a

f(l)=b
f=0
f=0

f(x,0)=c
Figure 10.1: Boundary conditions in one (left) and two (right) dimension. Within
the domain, the unknown function f is determined by the differential equation
d2 ∂2 ∂2
(Laplace’s equation in this case, ∆ = dx 2 for 1D and ∆ = ∂x2 + ∂y 2 for 2D, respec-

tively). Along the boundaries, values of f are given by the boundary conditions.
Obviously, the function f must vary smoothly as we move from the interior to the
boundary of a domain.

conditions at the two ends of the domain for a one-dimensional problem. On

the other hand, we require four boundary conditions for a square-shaped two-
dimensional domain. Several boundary conditions are listed in Table 10.1 for a
one-dimensional domain. The boundary condition may involve the value of the
function (Dirichlet), the value of the derivative of the function (Neumann), or both
(Robin), at the domain boundary.
Let us try to develop some intuition of what does boundary conditions mean.
Let’s assume that we are interested in temperature distribution within a one-
dimensional bar. Dirichlet boundary condition implies that the two ends are sub-
merged in some thermal reservoir to hold them at a constant temperature. On
the other hand, the Neumann boundary condition means maintaining some tem-
perature gradient at the boundaries, which can be done by appropriate heating
or cooling at the ends. Robin boundary condition is a combination of Dirichlet
and Neumann and can take several forms, like Dirichlet at one end Neumann
at the other (see Table 10.1). Another example of Robin boundary condition is
f (0) − af 0 (0) = 0 and f (l) − bf 0 (l) = 0. I have also listed another boundary condition
frequently used in science and engineering, known as the periodic boundary con-
dition. Imagine that I take a one-dimensional wire and bend to give it a circular
shape, such that the two ends x = 0 and x = l join with each other. In that case,
I need a periodic boundary condition. More generally, such a boundary condition
is used when the domain is infinite; however, it has a regular pattern, such that
a small part of the domain can be repeated to generate the entire space.
In general, we can have functions of both spatial and temporal variables. In
such cases, we need to have boundary conditions for the spatial part and initial
conditions for the temporal part. Initial conditions generally occur in the form of
f (t = 0) = a, if the time derivative is of the first order. Obviously, for a second-
order time derivative, we need two initial conditions.

121
10.2. CLASSIFICATION OF PDES

10.2 Classification of PDEs

Let us start with simple examples. The order of the highest partial derivative in a
PDE is known as its order. The following is an example of second-order PDE,

∂f ∂2f
+ = 0. (10.1)
∂y ∂x2

10.2.1 Linear PDE

The above equation is also linear because f and its derivatives appear linearly.
How about the following equation?

∂f ∂2f
+ (1 + x2 ) 2 = 0. (10.2)
∂y ∂x

This is an example of variable coefficient second-order linear PDE, because f and

its derivatives appear linearly and multiplied by functions of independent vari-
ables (x in this case). Instead, if we take an equation of form

∂f ∂2f
+ k 2 = 0, (10.3)
∂y ∂x

where k is a constant, this is an example of constant coefficient second-order linear

PDE. Note that all the examples shown so far are homogeneous PDEs, as they
contain only dependent variable f or its partial derivative. If we have an equation
of the form,
∂f ∂2f
+ k 2 = u, (10.4)
∂y ∂x
where u is some function like u(x, y) or u(x) or u(y) or, u = constant (non-zero),
then it is called a non-homogeneous PDE.

10.2.2 Non-linear PDE

If there is a non-linear term involving f and its derivatives, then we call it a non-
linear PDE. For example,
2
∂f ∂2f
+ = 0, (10.5)
∂y ∂x2
is a non-linear PDE, because the first order term is quadratic. There are several
sub-classes of non-linear PDEs. A detailed discussion is given in the appendix.

10.2.3 Solutions of linear PDEs: principle of superposition

In this chapter we are going to deal with only linear PDEs. Let us learn about an
important principle, obeyed by the solutions of the linear PDEs. For example, if
we are dealing with a linear equation of order 2, and if f1 and f2 are two solutions,
then by principle of superposition, c1 f1 + c2 f2 will also be a solution for any choice
of constants c1 and c2 . We will use this principle so many times throughout the
chapter that you will realize how profound it is.

122
10.2. CLASSIFICATION OF PDES

Equation Form General solution ky

sin kx e
Laplace ∇2 f = 0 X= ,Y =
cos kx e−ky
Diffusion 1 ∂f sin kx 2 2
∇2 f = α2 ∂t
X= , T = e−k α t
or Heat cos kx
1 ∂2f sin kx sin ωt
Wave ∇2 f = v 2 ∂t2
X= ,T =
cos kx cos ωt
2
Schrödinger − 2m
~
∇2 f + V f = ι~ ∂f
∂t
Helmholtz ∇2 f = −k 2 f
Poisson ∇2 f = f

Table 10.2: List of important partial differential equations: ∇ is the gradient

operator and f (x, y, z) or f (r, θ, φ) is a function of multiple variables in general,
depending on whether we are using the Cartesian or spherical coordinate system
or some other coordinate system. In case of Laplace, Poisson and Helmholtz
∂2 ∂2
equation, f (x, y) = X(x)Y (y) are solutions for ∇2 = ∂x 2 + ∂y 2 . In case of diffusion,
d2
Schrödinger and wave equation, f (x, t) = X(x)T (t) are solutions for ∇2 = dx2
.

10.2.4 Further classification of secone-order linear PDEs

We are mainly going to deal with second-order PDEs in this chapter, with a general
form of,

∂2f ∂2f ∂2f ∂f ∂f

a(x, y) 2
+ b(x, y) + c(x, y) 2
+ d(x, y) + e(x, y) + g(x, y)f = h(x, y). (10.6)
∂x ∂x∂y ∂y ∂x ∂y
Let us club the lower order terms together, an rewrite the above equation as,

∂2f ∂2f ∂2f

∂f ∂f
a(x, y) 2 + b(x, y) + c(x, y) 2 = L , , f, x, y , (10.7)
∂x ∂x∂y ∂y ∂x ∂y

because the classification is based on the coefficients of the second order terms
only. The PDE is classified as parabolic if b2 − 4ac = 0, hyperbolic if b2 − 4ac > 0
and elliptic if b2 − 4ac < 0.

10.2.5 Exercise
1. Identify whether the following equations are parabolic, hyperbolic or elliptic.
∂2f 2
∂ f ∂2f ∂f
(a) ∂x2
+ 4 ∂x∂y + ∂y 2
+ ∂x = 0.
∂2f ∂2f ∂2f ∂f
(b) ∂x2
+ 2 ∂x∂y + ∂y 2
+ ∂y = 0.
∂2f ∂2f ∂2f
(c) ∂x2
+ ∂x∂y + ∂y 2
= 2.
(d) Prove that, class of Eq. 10.6 does not change by a change of variable
from (x, y) → (χ(x, y), η(x, y)).

123
10.3. LAPLACE’S EQUATION

10
(a) (b) (c)

δ(θ)
f=0 f=5 f=?
+10 -10

Figure 10.2: Harmonic functions obey mean-value property – average value of the
function at the boundary is equal to its value at the center. Examples shown for
(a) one-dimensional domain, (b) and (c) two-dimensional domain. In case of (b),
f = 10 and f = 0 along the top and bottom part of the perimeter, respectively,
such that f = 5 along the line lying in the middle. What would be the value of the
function at the center in case of (c), where the function is zero everywhere, except
at a single point at the boundary?

10.3 Laplace’s equation

Laplace’s equation is a second-order partial differential equation named after
French scientist Pierre-Simon Laplace,

∇2 f = 0. (10.8)

Laplace’s equation arises in different context, like the study of heat flow, grav-
ity, electrostatics etc. A solution of Laplace’s equation is known as a harmonic
function. It has specific properties, which will become clear as we progress.

10.3.1 Solution in one dimension

Let us start with a one-dimensional equation. The solution of Laplace’s equation
in one dimension is very straightforward,

d2 f
= 0 ⇒ f (x) = mx + c. (10.9)
dx2
I would like to mention two interesting features of the solution of Laplace’s equa-
tion.

• Straight lines are monotonically increasing (or decreasing), having no maxi-

mum (or minimum). This is not surprising, as the second derivative equal to
zero implies that the function can not have any maximum or minimum point
anywhere within the domain. We will see that this holds good for solutions
of Laplace’s equation in the higher dimension as well.

124
10.3. LAPLACE’S EQUATION

n Real (Re) part Imaginary (Im) part

0 1 0
1 x y
2 (x2 − y 2 ) 2xy
3 (x3 − 3xy 2 ) (3x2 y − y 3 )
4 (x − 6x2 y 2 + y 4 )
4 (4x3 y − 4xy 3 )

Table 10.3: Real and imaginary part of the function f (x, y) = (x + ιy)n . Interest-
ingly, all of them are harmonic functions.

(a) 15 (b) 30 (c) 30

10
20 25
5
10 20
z z
0 z
0 15
5
10 10
10
20 5
15
30
4 4
3 4
2 3 3
1 2 2
4 0 1 1
3 2 1 y 4 0 4 3 0
1 0 3 2 1 y 2 1 y
2 1 0 1 0 2
1 3 2
x 2 3 1 3 1 2 3
4 4 x 2 3 x 3
4 4 4 4

Figure 10.3: Plot of two harmonic functions (a) x2 − y 2 and (b) 2xy. They do not
have any maximum or minimum, but only a saddle point at (0, 0). The third one
(c) x2 + y 2 is not a harmonic function and it has a minimum at (0, 0). We can easily
verify that x2 + y 2 does not satisfy Laplace’s equation.

• Value of the function (straight line) at x = a is average of f at x + a and x − a,

1
f (a) = [f (x − a) + f (x + a)]. (10.10)
2
Thus, if we know the values at the boundaries, we can find the value at
the midpoint of the domain (see Figure 10.2). You may wonder that what
I just mentioned is very trivial. You will be amazed to know that the same
thing holds good if the domain is two-dimensional, say a circle, as shown
in Figure 10.2. For example, if we have the temperature at +10 and −10 at
the circle’s upper and lower boundary, at the center, it will be zero. What
we discussed just now is known as the mean value property, which every
harmonic function satisfies. Since any function satisfying Laplace’s equation
is a harmonic function, all of them will follow the mean value property.

10.3.2 Solution in two dimension

In two dimensions and rectangular coordinates, Laplace’s equation reads,

∂2f ∂2f
+ = 0. (10.11)
∂x2 ∂y 2

125
10.3. LAPLACE’S EQUATION

This is an elliptic, homogeneous, and linear equation. We have to find harmonic

functions of up to two variables, e.g., 1, x, y, (x2 − y 2 ), 2xy, etc. There is a nice
pattern and I leave it as an exercise to verify that the real and imaginary part of
the function f (x, y) = (x + ιy)n satisfies the Laplace’s equation, for any value of n
(see Table 10.3). I have plotted few harmonic functions in Figure 10.3. Clearly,
they do not have any maximum or minimum, but only a saddle point. You can
use the following Python code to plot harmonic functions and feel them.

from mpl_toolkits import mplot3d

import numpy as np
import matplotlib.pyplot as plt
def f(x,y):
return x**2 - y**2
x = np.linspace(-4,4,30)
y = np.linspace(-4,4,30)
X, Y = np.meshgrid(x,y)
Z = f(X,Y)
fig = plt.figure()
ax = plt.axes(projection="3d")
ax.plot_wireframe(X, Y, Z, color=‘green’)
ax.set_xlabel(‘x’)
ax.set_ylabel(‘y’)
ax.set_zlabel(‘z’)
plt.show( )

The general solution is a linear combination of infinite number of harmonic func-

tions,
∞
X ∞
X
n
f (x, y) = an Re(x + ιy) + bn Im(x + ιy)n = a0 + a1 x + b1 y + a2 (x2 − y 2 ) + b2 xy + · · ·.
n=0 n=0
(10.12)
Thus we are looking for an infinite number of solutions, not a few of them. Re-
member that we can combine the individual solutions in this fashion because we
deal with a linear equation.
We can now explore the polar coordinate by considering the two-dimensional
domain to be circular. We have to substitute x = r cos θ and y = r sin θ. Then,
(x + ιy)n = rn eιnθ and real and imaginary part is rn cos nθ and rn sin nθ, respectively.
In this case, the general solution is,
∞
X ∞
X ∞
X
f (r, θ) = an rn cos nθ + bn rn sin nθ = rn (an cos nθ + bn sin nθ). (10.13)
n=0 n=0 n=0

This is known as the Fourier series solution of Laplace’s equation.

In section 10.3.3, we will learn about separation of variable method, which is
generally used for solving Laplace’s equation in two dimension. Writing Laplace’s
equation in polar coordinates and applying separation of variable method, one
can also get the Fourier series as solution.

126
10.3. LAPLACE’S EQUATION

y
T=0

T=200

T=0
w

T=0 x

Figure 10.4: A bar, having finite width in the y−direction and semi-infinite in the
x− direction. One side (along the y−axis) is held at 200◦ and the long sides (along
the x − axis) is held at 0◦ . The far end is also held at 0◦ . What would be the
temperature distribution within the bar?

10.3.3 Solving by separating variables

Rectangular or square domain
We generally solve Laplace’s equation by separating the variables. We write f (x, y) =
X(x)Y (y), and the equation converts to,

1 d2 X 1 d2 Y
= − . (10.14)
X dx2 Y dy 2

Now, the left-hand side is a function of x, and the right-hand side is a function
of y, and they can be equal if both are equal to some constant. This is the prin-
ciple behind the method of separation of variables. Finally, we have to solve two
eigenvalue problems,

d2 X d2 Y
= −k 2 X & = k 2 Y, (10.15)
dx2 dy 2

where the constant is known as the separation constant. Possible solutions of

the first equation are X(x) = sin kx or cos kx or a linear combination of them.
Similarly, possible solutions of the second equation are Y (y) = eky or e−ky or a
linear combination of them. We can write a general solution of the form f (x, y) =
(A sin kx + B cos kx)(Ceky + De−ky ), and feel very happy that we have solved the
problem! However, our work starts after we have written down the general solu-
tion. We have to match the boundary conditions and determine k for a specific
problem. Thus, our actual job is to find the values of k from a set of boundary
conditions, rather than finding the general solution (which remains the same as
long as it is a Laplace equation). Let us solve a problem to understand what I
mean.

Example: temperature distribution in a semi-infinite bar

Consider the problem depicted in Figure 10.4. The governing differential equation
is ∇2 T = 0, where T (x, y) is the temperature distribution within the bar. Looking
at the solutions of the Laplace equation, given in Table 10.2, we face a problem;

127
10.3. LAPLACE’S EQUATION

none of the solutions can satisfy the boundary conditions. For example, in the
x direction T → 0 as x → ∞. However, neither sin kx, nor cos kx can satisfy this
condition. Note that, in Equation 10.15 the negative sign was arbitrarily assigned
to one of the equations. If we reverse our choice, the solutions are,
kx
sin ky e
Y = ,X = . (10.16)
cos ky e−kx

For the given problem, this makes sense because T → 0 as x → ∞, if we take

the temperature distribution along the x−axis to be X(x) ∼ e−kx . Moreover, we
can not use X(x) ∼ ekx , as the temperature blows up at x = ∞. In the case of
Y (y), we can not choose cos ky because the temperature must be zero at y = 0.
Thus the solution looks like T (x, y) ∼ e−kx sin ky. Hence, you must be careful while
solving because getting the general solution (which is relatively trivial) is not all.
You must ensure that the solution satisfies all the boundary conditions. These
are known as the boundary value problems, and one ought to pay more attention
to the boundary conditions than the differential equation itself. Let us finish the
problem by applying the boundary conditions T = 0 at y = 0 and y = w. The
second condition implies, sin(kw) = 0 = sin(nπ) ⇒ kn = nπ w , where n = 1, 2, 3, ··.
Thus, for any integral n, the solution is
nπy
Tn (x, y) = e−nπx/w sin . (10.17)
w
But we also need to have T = 200 at x = 0, which is impossible for any value of
n. How do we achieve this? First, note that any linear combination of Tn is also a
solution and we can write,
∞
X nπy
T (x, y) = an e−nπx/w sin . (10.18)
w
n=1

Setting x = 0, we can write ,

∞
X nπy
T = 200 = an sin (10.19)
w
n=1

which is nothing but a Fourier sine series for f (y) = 200. We can find the coeffi-
cients (say for y = 10),

2 w
Z 800
nπy nπ , odd n,
an = f (y) sin dy = (10.20)
w 0 w 0, even n.

Thus, the temperature distribution in the bar can be written as,

800 −πx/10 πy 1 −3πx/10 3πy
T (x, y) = e sin + e sin +··· . (10.21)
π 10 3 10

Practically it is not possible to take an infinite number of terms. So it is natural

to ask: how many do we need? One can guess that, if x is not too small (such that
the term πx
10 is large enough), convergence in the x−direction will be rapid because

128
10.3. LAPLACE’S EQUATION

200

(a) 200 (b) 200 (c) 175

150
150 150 125
z z z
100
100 100
75
50 50 50
25
0 0 0

10 10 10
8 8 8
6 6 6
0 0 0
2 4 y 2 4 y 2 4 y
4 4 4
6 2 6 2 6 2
x 8 0 x 8 0 x 8 0
10 10 10

Figure 10.5: Eq. 10.21 plotted for n ranging from (a) 1 − 3, (b) 1 − 29, and (c) 1 − 299.

of the exponential term. However, in the y−direction, the boundary condition re-
quires a constant temperature along the edge. This requires many sine functions;
just a few of them are not sufficient. One can verify this with the following Python
code.

from mpl_toolkits import mplot3d

import numpy as np
import matplotlib.pyplot as plt
def f(x,y):
s = 0.0
for n in range(1,300,2):
s +=np.exp(-n*np.pi*x/10.0)*np.sin(n*np.pi*y/10)/n
s1 = 800 * s / np.pi
return s1
x = np.linspace(0,10,30)
y = np.linspace(0,10,30)
X, Y = np.meshgrid(x,y)
Z = f(X,Y)
fig = plt.figure()
ax = plt.axes(projection="3d")
ax.plot_wireframe(X, Y, Z, color=’green’)
ax.set_xlabel(’x’)
ax.set_ylabel(’y’)
ax.set_zlabel(’z’)
plt.show( )

Calculated temperature profiles obtained by varying the range of n has been

shown in Figure 10.5. Clearly, to match the boundary condition along the y-axis,
we need to take a very large value of n.

Example: temperature distribution in a square plate

Now we take a square plate (10 × 10), and use the boundary conditions T = 200
at x = 0, and T = 0 at every other boundary. This is very similar to the semi-
infinite bar problem, the only difference being the fact that we can not discard the
solution ekx , as x is finite. Thus, in the x−direction, the temperature profile looks

129
10.3. LAPLACE’S EQUATION

like cekx + de−kx and we have to choose c and d to ensure that T = 0 at x = 10. I
leave it as an exercise to show that 12 ek(10−x) − e−k(10−x) = sinh k(10 − x) satisfies
what we are looking for. Rest of the problem is very similar to what we did for the
semi-infinite plate. Thus, we can write the solution as,
∞
X nπ nπy
T (x, y) = an sinh (10 − x) sin . (10.22)
10 10
n=1

The function also has to satisfy T = 200 at x = 0,

∞ ∞
X nπy X nπy
T = 200 = an sinh nπ sin = An sin , (10.23)
w w
n=1 n=1

which is a Fourier sine series for f (y) = 200. After finding An (same as before), we
can derive the values of an as,
800

an = nπ sinh nπ , odd n, (10.24)
0, even n.

Thus, the temperature distribution in the square plate can be written as,

800 1 π πy 1 3π 3πy
T (x, y) = sinh (10 − x) sin + sinh (10 − x) sin +··· .
π sinh π 10 10 3 sinh 3π 10 10
(10.25)
I have shown a plot of the temperature distribution in Figure 10.8(b).

Circular domain
In polar coordinates, Laplace’s equation reads (see exercise for derivation),

∂2f 1 ∂f 1 ∂2f
+ + = 0. (10.26)
∂r2 r ∂r r2 ∂θ2
Writing f (r, θ) = R(r)Θ(θ), we get two ordinary differential equations (eigenvalue
problem) to solve,

d2 R dR
r2 +r = k 2 R, (10.27)
dr2 dr
d2 Θ
= −k 2 Θ.
dθ2
Solution to the first and second equation gives radial and angular part of the
function, respectively. The angular solution looks like Θ(θ) = A sin kθ + B cos kθ.
Convince yourself that Θ must satisfy Θ(θ) = Θ(θ + 2π), which requires k to be
some integer n = 0, 1, 2, 3, · · ·.1 The eigenfunctions are

Θn (θ) = an cos nθ + bn sin nθ, (10.28)

1
Plot sin nθ and cos nθ for integral n like sin θ, sin 2θ, sin 3θ and non-integral n like
sin 1.2θ, sin 1.5θ, sin 1.9θ from 0 to 2π. Identify the functions with a periodicity of 2π.

130
10.3. LAPLACE’S EQUATION

90°
10.5
135° 45° 9.0

7.5
1.0
0.8
0.6 6.0
0.4
0.2
180° 0° 4.5

3.0

1.5

225° 315° 0.0

1.5
270°

Figure 10.6: Temperature distribution in a circular plate, with boundary condi-

tions shown in Figure 10.2(b). I have plotted Eq. 10.37 by including 99 terms in
the sum.

with eigenvalues n2 . To solve for the radial part, we substitute R(r) = rα in

Eq. 10.27,
α2 − n2 = 0. (10.29)
Since α = ±n, we can write the general solution as,

Rn (r) = cn rn + dn r−n , for n = 1, 2, 3, · · (10.30)

R0 (r) = c0 + d0 ln r, for n = 0.

Since r = 0 lies inside the domain, we have to discard ln r and r−n , as they are
undefined at the origin. Thus, we can write the general solution as,
∞
X ∞
X
f (r, θ) = rn (an cos nθ + bn sin nθ) = a0 + rn (an cos nθ + bn sin nθ). (10.31)
n=0 n=1

If the boundary condition is f (a, θ) = h(θ), where a is the radius of the circular
domain, we can get the coefficients using,
Z 2π
1
a0 = h(θ)dθ for n = 0, (10.32)
2π 0
Z 2π
1
an = n h(θ) cos nθdθ for n ≥ 1,
a π 0
Z 2π
1
bn = n h(θ) sin nθdθ for n ≥ 1.
a π 0

131
10.3. LAPLACE’S EQUATION

Example: temperature distribution in a circular plate

Let us solve a problem similar to the one shown in Figure 10.2(b). We assume the
radius of the circular domain to be a = 1. The boundary conditions are,

f (a, θ) = h(θ) = 10 for π ≤ θ ≤ 0, (10.33)

f (a, θ) = h(θ) = 0 for π ≤ θ ≤ 2π.

First, for n = 0,
Z 2π Z π Z 2π
1 1
a0 = h(θ)dθ = 10dθ + (0)dθ = 5. (10.34)
2π 0 2π 0 π

Next, for n ≥ 1,

1 2π
Z Z π Z 2π
1
an = h(θ) cos nθdθ = 10 cos nθdθ + (0) cos nθdθ = 0, (10.35)
π 0 π 0 π

Finally,
Z 2π Z π Z 2π
1 1 10
bn = h(θ) sin nθdθ = 10 sin nθdθ + (0) sin nθdθ = [1 − (−1)n ].
π 0 π 0 π nπ
(10.36)
Thus, the temperature distribution in the circular plate is,
∞
X 10
T (r, θ) = 5 + [1 − (−1)n ]rn sin nθ. (10.37)
nπ
n=1

I have plotted T (r, θ) using the following Python code (see Fig. 10.6).

import numpy as np
import matplotlib.pyplot as plt
def f(r, theta):
s = 0.0
for n in range(1,100):
s += 10.0 * (1.0 - (-1.0)**n) * r**n * np.sin(n*theta) / (n*np.pi)
s1 = s + 5.0
return(s1)
radius = np.linspace(0, 1, 50)
angle = np.linspace(0, 2.0*np.pi, 50)
r, theta = np.meshgrid(radius, angle)
Z = f(r, theta)
fig, ax = plt.subplots(subplot_kw=dict(projection=’polar’))
cf = ax.contourf(theta, r, Z, cmap=’afmhot’)
fig.colorbar(cf)
plt.show( )

132
10.3. LAPLACE’S EQUATION

(x,y+h)

(x-h,y)

(x+h,y)
(x,y)

(x,y-h)

Figure 10.7: Finite difference method: the domain is divided in a square of rectan-
gular grid. Note that, grid points (red) are also placed at the boundary. However,
values of f (x, y) remain fixed (boundary conditions) at these grid points. On the
other hand, f (x, y) changes with each iteration inside the domain (black points),
until convergence is achieved.

10.3.4 Numerical solution

Finite difference method is commonly used to numerically solve Laplace’s equa-
tion. First, we have to divide the domain in a suitable grid (see Figure 10.7).
Using the central difference method, we can get the numerical derivative of f with
respect to x and y as,

∂2f f (x + h, y) − 2f (x, y) + f (x − h, y)
2
= , (10.38)
∂x h2
∂2f f (x, y + h) − 2f (x, y) + f (x, y − h)
2
= .
∂y h2

Combining, we get a finite difference formula in two-dimension as,

f (x + h, y) + f (x − h, y) + f (x, y + h) + f (x, y − h) − 4f (x, y)

= ∇2 f. (10.39)
h2
Since Laplace’s equation is ∇2 f = 0, we can write,
1
f (x, y) = [f (x + h, y) + f (x − h, y) + f (x, y + h) + f (x, y − h)] . (10.40)
4
We can use the relaxation method to get the solution,
1 n
f n+1 (x, y) = [f (x + h, y) + f n (x − h, y) + f n (x, y + h) + f n (x, y − h)] , (10.41)
4

133
10.3. LAPLACE’S EQUATION

where the superscript denotes the iteration sequence. We start with some guess
value at every grid point f 0 (x, y) (zeroth step) and hope to achieve convergence
quickly. Note that, grid points located at the boundary always have fixed val-
ues (because of boundary condition) and they do not change during the iteration
process. Finally, let us write a Python code to solve the problem of temperature
distribution in a square plate and compare with the analytical results.

from mpl_toolkits import mplot3d

import numpy as np
import matplotlib.pyplot as plt
# Gridsize
n = 50
# Boundary conditions
Tb = 0.0 # bottom
Tt = 0.0 # top
Tl = 200.0 # left
Tr = 0.0 # right
# Target precision
tp = 0.001
# Arrays
T = np.zeros([n+1,n+1],float)
T[:,0] = Tl
T1 = np.zeros([n+1,n+1],float)
# Main loop
for k in range(10000):
for i in range(n+1):
for j in range(n+1):
if i==0 or i==n or j==0 or j==n:
T1[i,j] = T[i,j] # boundary condition
else:
T1[i,j] = (T[i+1,j] + T[i-1,j] + T[i,j+1] + T[i,j-1])/4.0
# Calculate error
err = np.max(abs(T - T1))
if err < tp:
break
else:
for i in range(n+1):
for j in range(n+1):
T[i,j] = T1[i,j]
x = np.linspace(0,10,n+1)
y = np.linspace(0,10,n+1)
X, Y = np.meshgrid(x,y)
fig = plt.figure( )
ax = plt.axes(projection="3d")
ax.plot_surface(X, Y, T, cmap=’afmhot’)
ax.set_xlabel(’x’)
ax.set_ylabel(’y’)
ax.set_zlabel(’T(x,y)’)

134
10.4. DIFFUSION OR HEAT EQUATION

y
T=0 (a) (c)
200
(b)
200
175 175
150 150
125

T(x,y)

T(x,y)
125
T=200

100

T=0
100
75 75
50 50
25 25
0 0

10 10
8 8
T=0 x 0
2 4
6
y
0
2 4
6
y
4 4
6 2 6 2
x 8 0 x 8 0
10 10

Figure 10.8: (a) Boundary conditions in a square plate, (b) analytically calculated
temperature distribution (Eq. 10.25) and (c) numerically calculated temperature
distribution.

plt.show( )

10.3.5 Exercise
1. Starting with x = r cos θ and y = r sin θ, derive Eq. 10.26.

2. Analytically solve the problem shown in Figure 10.2(c), assuming the bound-
ary condition f (r, θ) = δ(θ) = 1.

10.4 Diffusion or heat equation

We have a function f , which depends on space and time and follows,

1 ∂f
∇2 f = , (10.42)
α ∂t
where α is a positive coefficient. As the name suggests, this equation arises in
the context of diffusion or heat flow. In diffusion, f and α are concentration and
diffusivity (a material property), respectively. Similarly, in heat flow, f and α are
temperature and thermal diffusivity (a material property), respectively. We will
use α2 , instead of α, in the rest of the discussion. This minor adjustment helps
us with the notation when we write down the final solution.

10.4.1 Solution in one dimension

In one dimension, diffusion or heat equation reads,

∂2f 1 ∂f
2
= 2 . (10.43)
∂x α ∂t

135
10.4. DIFFUSION OR HEAT EQUATION

50 t=0
t=0.01
t=0.1
40 t=0.2

T=0
t=0.3
t=0.4

T
T=0 20

l
10

0 2 4 6 8 10
l

Figure 10.9: (a) A bar is uniformly heated to 50◦ initially. Then, two of its faces
(red) are brought in contact with thermal reservoirs at 0◦ and rest of the faces
(white) are insulated, such that heat flow is essentially one-dimensional. (b) The
temperature profile as a function of time, as the bar cools down.

Using variables separation, we can split time and space-dependent part as f (x, t) =
X(x)T (t). But we can avoid that, as the answer is not too difficult to guess,

f (x, t) = e−λt X(x), (10.44)

where λ is positive and real. If λ is negative, the function becomes infinite with
t → ∞. On the other hand, if λ is imaginary, f (x, t) would oscillate with time and
not decay. However, we are looking for a solution that decays with time, and thus,
we must choose a positive and real λ. Substituting Eq. 10.44 in Eq. 10.43 and
canceling e−λt from both sides, we get,

d2 X λ
2
= − 2 X = −k 2 X, (10.45)
dx α
such that λ = k 2 α2 . This is an eigenvalue problem, having a general solution
X(x) = A cos kx + B sin kx. Thus, we can write,
2 α2 t 2 α2 t
f (x, t) = Ae−k cos kx + Be−k sin kx. (10.46)

Example: one-dimensional heat flow in a bar

I have defined the one-dimensional heat flow problem in Figure 10.9(a). The bar
is uniformly heated to a temperature of 50◦ . Thus, the initial condition is T (x, t =
0) = 50◦ . Then two of its ends are brought in contact with thermal reservoirs at
0◦ , i.e., T = 0◦ at x = 0 and x = l. What would be the temperature distribution
[T (x, t)] inside the bar as a function of time?
Note that, for the given boundary condition (T = 0◦ at x = 0), cos kx can not be
a solution. Thus, we are left with sin kx and sin kl = 0 ⇒ kn = nπl . One can write

136
10.4. DIFFUSION OR HEAT EQUATION

the general solution as a linear combination of all such functions,

∞
X 2 nπx
T (x, t) = bn e−(nπα/l) t sin . (10.47)
l
n=1

Applying the initial condition, we can write,

∞
X nπx
T (x, 0) = 50 = bn sin . (10.48)
l
n=1

This is a Fourier sine series and the coefficients are,

2 l
Z 200
nπx nπ , odd n,
bn = T (x, 0) sin dx = (10.49)
l 0 l 0, even n.

Finally, temperature distribution in the bar as a function of time is,

∞
200 X −(πα/l)2 t πx 1 −(3πα/l)2 t 3πx
T (x, t) = e sin + e sin +··· . (10.50)
π l 3 l
n=1

I have shown the temperature profile (using l = α = 10) as a function of time in

Figure 10.9(b). At t = 0, temperature distribution in the bar looks like a step
function, with T = 50◦ inside and T = 0◦ outside, respectively. As expected, the
temperature drops rapidly inside the bar as heat is sucked out from two faces.
One can use the following Python code to plot the temperature distribution.

import numpy as np
import matplotlib.pyplot as plt
def f(x,t):
s = 0.0
for n in range(1,400,2):
s +=np.exp(-n**2 * np.pi**2 * t)*np.sin(n * np.pi * x / 10) / n
s1 = 200 * s / np.pi
return s1
x = np.linspace(0,10,30)
t = float(input("Enter the value of time: "))
Temp=f(x,t)
plt.plot(x, Temp)
plt.xlabel(’l’)
plt.ylabel(’T’)
plt.show( )

10.4.2 Solution in two dimension

Assuming Cartesian coordinates, we apply separation of variables,

f (x, y, t) = F (x, y)G(t). (10.51)

137
10.5. WAVE EQUATION

2
f(x,0)=e-x
1.0 t=0
t=10
0.8

0.6
h(x+vt) g(x-vt)
2 2
=0.5e-(x+t) =0.5e-(x-t)

0.4

0.2

0.0

15 10 5 0 5 10 15
t

Figure 10.10: Let us imagine that a sound is created at the middle of a long
2
tunnel. At t = 0, the pulse has the form e−x . Half of it travels to the right and the
rest travels to the left. I assume v = 1.

Substituting Eq. 10.51 in Eq. 10.42, we can write,

1 dG
G∇2 F = F . (10.52)
α2 dt
Dividing both sides by F G, we get,

1 2 1 1 dG
∇ F = 2 . (10.53)
F α G dt
Since the left side of the identity depends only on space variables and the right
side only on time variables, both sides must be equal to some constant.

∇2 F = −k 2 F, (10.54)
dG
= −k 2 α2 G.
dt
The space equation is the Helmholtz equation. As discussed before, k is chosen
to be a real number, because we want the function to decay with time.

10.4.3 Exercise

10.5 Wave equation

1 ∂2f
∇2 f = (10.55)
v 2 ∂t2

138
10.5. WAVE EQUATION

10.5.1 Solution in one dimension

In one dimension, the equation is

d2 f 1 d2 f
= , (10.56)
dx2 v 2 dt2
where the constant v is the wave velocity. I claim that the solution is,

f (x, t) = g(x − vt) + h(x + vt) . (10.57)

| {z } | {z }
traveling right traveling left

Before attempting derivation, let us try to understand what does the solution
mean. Imagine a long tunnel and someone is stranded in the middle of it. The
person shouts for help (at t = 0). Whatever sound he produces, half of it travels
to the right and rest travels to the left of the tunnel. As shown in Figure 10.10,
the initial form of the pulse is a Gaussian.2 At time t = 10, the function is still
a Gaussian, but it has moved to some other place as the wave is traveling. Note
that, Figure 10.10 is a special case and we can actually write the solution as,
1 1
f (x, t) = g(x − vt) + g(x + vt). (10.58)
2 2
We can prove Eq. 10.57 by substituting ξ = x − vt, η = x + vt and looking to
write Eq. 10.56 in terms of the new variables ξ and η. Using chain rule,

∂f ∂f ∂ξ ∂f ∂η ∂f ∂f
= + = −v +v , (10.59)
∂t ∂ξ ∂t ∂η ∂t ∂ξ ∂η
∂f ∂f ∂ξ ∂f ∂η ∂f ∂f
= + = +
∂x ∂ξ ∂x ∂η ∂x ∂ξ ∂η
Similarly,

∂2f ∂2f ∂2f ∂2f

∂ ∂ ∂f ∂f
= −v +v −v +v = v 2 2 − 2v 2 + v2 2 , (10.60)
∂t2 ∂ξ ∂η ∂ξ ∂η ∂ξ ∂ξ∂η ∂η
2 2 2 ∂2f

∂ f ∂ ∂ ∂f ∂f ∂ f ∂ f
= + + = + 2 + .
∂x2 ∂ξ ∂η ∂ξ ∂η ∂ξ 2 ∂ξ∂η ∂η 2

Finally, substituting in Eq. 10.56, we can write the one-dimensional wave equa-
tion in terms of the new variables as,

∂2f
= 0, (10.61)
∂η∂ξ
which has a solution,
f (ξ, η) = g(ξ) + h(η). (10.62)
Writing ξ and η in terms of the old variables x and t, we get Eq 10.57.
2
Form of the function does not matter. I could have chosen any other form, like a δ-function or
Lorentzian. However, from our experience we know it should be a localized function. If someone
claps for a second, we hear that for a second only, not for an hour!

139
10.5. WAVE EQUATION

Example: wave equation for the infinite string

Let us imagine an infinitely long veena string, obeying Eq. 10.56 for −∞ < x < ∞.
We need two initial conditions to solve the equation,

f (x, 0) = G(x) & ft (x, 0) = H(x). (10.63)

The first one is the initial position and the second one is the initial velocity of the
string. At t = 0, we the initial velocity and position is,

ft (x, 0) = H(x) = −vg 0 (x) + vf 0 (x), (10.64)

f (x, 0) = G(x) = g(x) + h(x).

Integrating the first of the two equations with respect to the position, we can
write,3
1 x
Z
H(y)dy = −g(x) + h(x). (10.65)
v 0
Solving g(x) and h(x) from the above equations and substituting x + vt and x − vt
in place of x, we get,
Z x+vt
1 1
h(x + vt) = G(x + vt) + H(y)dy, (10.66)
2 2v 0
Z x−vt
1 1
g(x − vt) = G(x − vt) − H(y)dy.
2 2v 0

Combining, we can write the solution (first obtained by d’Alembert) as,

Z x+vt
1 1
f (x, t) = [G(x − vt) + G(x + vt)] + H(y)dy. (10.67)
2 2v x−vt

Let us try to understand the solution. First note that if both G(x) and H(x) is
zero, there is no wave! There is no surprise in it, because if you take a string and
do nothing to it, you can not set off any wave motion. Either you have to pull
the string (G(x) 6= 0) and release it, or you have to hold the string at one end and
shake your hand to give is some initial velocity (H(x) 6= 0).
Case 1: Assume the initial velocity H(x) = 0: We can write the solution as,
1
f (x, t) = [G(x − vt) + G(x + vt)] . (10.68)
2
Do you notice that the right and left moving wave go symmetrically away from
2
their initial position as time progresses? Let us take G(x) = 2e−x and v = 1, such
that,
2 2
f (x, t) = e−(x−t) + e−(x+t) . (10.69)
I have plotted different snapshots of the wave in Figure 10.11.
3
The integration yields a function, not a number.

140
10.5. WAVE EQUATION

2.00 2.00 2.00

Initial position t=0.4 t=0.8
1.75 Initial velocity 1.75 1.75

1.50 1.50 1.50

1.25 1.25 1.25

1.00 1.00 1.00

0.75 0.75 0.75

0.50 0.50 0.50

0.25 0.25 0.25

0.00 0.00 0.00

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

2.00 2.00 2.00

t=1.6 t=3 t=6
1.75 1.75 1.75

1.50 1.50 1.50

1.25 1.25 1.25

1.00 1.00 1.00

0.75 0.75 0.75

0.50 0.50 0.50

0.25 0.25 0.25

0.00 0.00 0.00

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

2
Figure 10.11: Wave propagation for initial position G(x) = e−x and initial velocity
H(x) = 0.

1.00 1.00 1.00

Initial velocity t=0.1 t=0.2
Initial position
0.75 0.75 0.75

0.50 0.50 0.50

0.25 0.25 0.25

0.00 0.00 0.00

0.25 0.25 0.25

0.50 0.50 0.50

0.75 0.75 0.75

1.00 1.00 1.00

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

1.00 1.00 1.00

t=0.4 t=1.0 t=2.0
0.75 0.75 0.75

0.50 0.50 0.50

0.25 0.25 0.25

0.00 0.00 0.00

0.25 0.25 0.25

0.50 0.50 0.50

0.75 0.75 0.75

1.00 1.00 1.00

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

1.00 1.00 1.00

t=3.0 t=4.0 t=5.0
0.75 0.75 0.75

0.50 0.50 0.50

0.25 0.25 0.25

0.00 0.00 0.00

0.25 0.25 0.25

0.50 0.50 0.50

0.75 0.75 0.75

1.00 1.00 1.00

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

Figure 10.12: Wave propagation for initial position G(x) = 0 and initial velocity
2
H(x) = 2xe−x .

141
10.6. SCHRÖDINGER EQUATION

Case 2: Assume initial position G(x) = 0: We can write the solution as,
Z x+vt
1 1 h i
f (x, t) = H(y)dy = H̃(x + vt) − H̃(x − vt) . (10.70)
2v x−vt 2v
2
Let us take H(x) = 2xe−x and v = 0.5, such that,
2 2
f (x, t) = e−(x−0.5t) − e−(x+0.5t) . (10.71)

I have plotted different snapshots of the wave in Figure 10.12.

Example: wave equation for the finite string

10.5.2 Exercise

10.6 Schrödinger equation

~2 2 ∂f
− ∇ f + V f = ι~ (10.72)
2m ∂t

10.7 Poisson’s equation

∇2 f = f (10.73)

142
Chapter 11

Probability and Statistics

11.1 Deterministic vs. stochastic process

First, let us understand the difference between a deterministic and a stochastic
process. When we are playing a game of football, if we know the force on the
ball, and its initial position and vecolcity, we can exactly predict its trajectory by
solving Newton’s second law of motion. This is known as a deterministic process.
However, while playing a game of ludo, we can not predict what would be the
outcome when we throw a die. This is known as a stochastic or probabilistic
process. In this case one can ask the following: what is the probability that you
get a 3 if you throw a die. The first thing that comes to your mind that there are 6
possible outcomes (all of them are equally likely) and based on this information,
you get the answer 1/6. Now, let us consider throwing two dice simultaneously
and ask the question: what are the total number of possible outcomes? The
answer is 6 × 6 = 36. This is based on the following principle: if there are N1
possible outcomes of one event and N2 possible outcomes of another event
and the two events occur simultaneously (or in succession), then there are
N1 × N2 possible outcomes. This principle can be extended to any number of
events.
Let us explore the game of throwing two die in more detail. The most impor-
tant step is to make a table of all possible outcomes (known as the sample space)
of throwing two dice simultaneously, as given in Table 11.1. As mentioned pre-
viously, there are 36 possible outcomes or sample points in the sample space.1
Once the table of all possible outcomes or sample space is ready, we can start
asking questions regarding probability of certain events. For example, probability
that sum of two dice equal to 12 is equal to 1/36. Note that, each outcome or each
sample point listed in Table 11.1 is assumed to be equally probable. Next, let us
find out what would be the probability that sum of two dice equal to 7? There are
6 possible outcomes satisfying this condition, (6,1), (5,2), (4,3), (3,4), (2,5), (1,6).
Thus the probability is given by 6 divided the total number of outcomes and equal
6
to 36 = 16 . Alternately, we can say that there are 6 possible outcomes (having
sum equal to 7), each having probability equal to 1/36, such that the answer is
1
6 × 36 = 16 .
It is clear from the above discussion that, if we want to know the probability of
1
In case of a single die, there are 6 sample points (1,2,3,4,5,6) in the sample space.

143
11.2. HOW TO COUNT?

1 2 3 4 5 6
1 1,1 1,2 1,3 1,4 1,5 1,6
2 2,1 2,2 2,3 2,4 2,5 2,6
3 3,1 3,2 3,3 3,4 3,5 3,6
4 4,1 4,2 4,3 4,4 4,5 4,6
5 5,1 5,2 5,3 5,4 5,5 5,6
6 6,1 6,2 6,3 6,4 6,5 6,6

Table 11.1: List of all possible outcomes (known as the sample space) if we throw
two dice simultaneously. Each outcome is termed as a sample point and there
are 36 sample points in this case.

certain outcome in some experiment, first we need to count all possible outcomes
in that experiment. Thus, we need to learn how to count in a systematic way.

11.2 How to count?

11.2.1 Permutation
Case 1: Let us consider a case, where we have to arrange n balls, all distinguish-
able, i.e., no two are alike, as shown in Fig. 11.1(a). You can think of n slots, each
having capacity to hold only one ball. Now, the first slot can be occupied by any
one of the balls, or in other words, there are n possible ways of filling the first
slot. The second slot can be filled by any one of the remaining (n − 1) balls, or in
other words, there are (n − 1) possible ways of filling the second slot. We can keep
doing this, and finally try to find the total number of all possible arrangements.
Using the principle described in the previous section, we can say that there are
n(n − 1)(n − 2) · · · ·1 ways of arranging n distinguishable balls in n slots. This is
known as factorial,2
P (n, n) = n(n − 1)(n − 2) · · · ·1 = n! (11.1)
One should keep in mind that the above formula holds good as long as no rep-
etition is allowed while filling the slots.3 Moreover, order does matter, i.e., an
arrangement in which ‘ball 1’ is placed after ‘ball 2’ is different from another ar-
rangement in which ‘ball 1’ is placed before ‘ball 2’.

Case 2: In the above problem, if we have 15 slots to arrange 15 balls, there there
are 15! = 1307674368000 possible arrangements. As shown in Fig. 11.1(b), let us
consider a case when we have only 3 slots to arrange the 15 balls, and clearly,
number of possible arrangements are much smaller in this case. As argued in
the previous section, the first slot can be occupied in 15 ways, the second slot
2
Some cases where we need to calculate permutation are, number of ways (a) n people can be
seated in n chairs, (b) n cards can be arranged on the table, (c) 5 single digit numbers (say 0-4) can
be arranged to give 5 digit numbers etc. Note that, order does matter in all the cases.
3
You have only one ball of each type in the reservoir, using which you have to fill the slots. So,
if you decide to put ‘ball 1’ in the first slot, you can not put it in any other slot and since there is
only one ‘ball 1’ available, it can not be repeated in any other slot.

144
11.2. HOW TO COUNT?

(a) (b) (c) (d)

Slots Slots
V W X Y Z
(e) (f)

1 2 3 4 5 6 7 8 9 101112131415
Slots

Figure 11.1: (a) All possible arrangements of 15 distinguishable pool balls in 15

145
11.2. HOW TO COUNT?

can be occupied in 14 ways etc. and total number of possible arrangements are
15 × 14 × 13 = 2730. However, it can be expressed in a smart way like,

12! 15! 15!

15 × 14 × 13 × = = . (11.2)
12! 12! (15 − 3)!

Generalizing the above discussion, total number of ways n distinguishable balls

can be arranged in r < n slots, is equal to

(n − r)! n!
P (n, r) = n(n − 1) · · · ·(n − r + 1) = n(n − 1) · · · ·(n − r + 1) = . (11.3)
(n − r)! (n − r)!

Note that, in this case also, there is no repetition and order does matter.

Case 3: Let us consider a combination lock, as shown in Fig. 11.1(c). Let the
combination to open the lock be 0279. Imagine that you forget it and you must
open it without breaking the lock. Note that, the first slot can be occupied by
any one of the 10 digits, ranging from 0-9. Thus, there are 10 ways in which the
first slot can be filled. Same is true for rest of the slots as well. Since there are 4
slots in this particular lock, you may have to try 10 × 10 × 10 × 10 = 104 possible
combinations before you can open the combination lock.4 Thus, if there are n
number of things to choose from and we can choose r at a time, then there are
nr ways of choosing them. Note that, unlike the previous two cases, repetition
is allowed, i.e., we can have combinations like 0000, 1112, 2234 etc. However,
similar to the previous two cases, order does matter in this case as well, i.e., 1234
is not same as 4321.
Before we move on to the next section, let us appreciate the fact that there
is one important factor that is common among all three cases discussed above:
order does matter. In the following section, we will learn to count when order does
not matter.

11.2.2 Combination
Case 1: First, let us discuss the case when repetition is not allowed. If you think
carefully, then you will realize that derivation of P (n, r) involves two steps. First,
we find out in how many ways one can select a set of r balls out of n. We denote
it as C(n, r),5 which is equal to all possible combinations of r objects chosen from
a set of n. Second, we find out the number of ways r balls can be arranged in r
boxes (one ball per box), and it is given by P (r, r). Thus, we can write

n!
P (n, r) = C(n, r) × P (r, r) ⇒ C(n, r) = . (11.4)
(n − r)!r!

The whole argument leading to the derivation of Eq. 11.4 can be rephrased in
the following manner. We know n = 15 distinguishable pool balls can be arranged
in r = 3 slots in P (15, 3) = 15 × 14 × 13 ways. If we consider a particular set of three
4
We should actually call it a
permutation
lock!
5 n
This is also represented by .
r

146
11.2. HOW TO COUNT?

pool balls (say number 1,2,3), then there are 6 possible arrangements (if order
does matter)
1 2 3
1 3 2
2 1 3
(11.5)
2 3 1
3 1 2
3 2 1
However, if order does not matter (which we call as combination), then we have
only one possibility: (123). Thus, we have to adjust the permutation formula and
divide it by the number of ways in which three numbers can be ordered, i.e.,
C(15, 3) = P (15, 3)/3!. Generalizing in terms of n and r, we get C(n, r) = P (n, r)/r! ,
which is same as Eq. 11.4.
Interestingly, from Eq. 11.4, we find that C(n, r) = C(n, n − r). In words, all
possible combinations of selecting r objects from a set of n is exactly equal to that
of selecting (n − r) objects from a set of n. This is obvious, because every time we
select r balls out of n, (n − r) balls are left out. Thus, all possible combinations of
selecting r objects out of n must be same as that of selecting (n − r) objects out of
n.
Let us discuss an example to understand when do we need to calculate com-
bination. Say we have 5 people and we want to make a committee of 3 [see
Fig. 11.1(d)]. In how many ways we can select 3 out of 5 people. In this case, we
need to calculate C(n, r). Say one particular composition is a committee consist-
ing of Mr. X, Y and Z. Note that, in this case order does not matter, because (X,Y,Z)
is the same committee as (X,Z,Y), (Z,X,Y) etc. This is the main difference between
permutation (order does matter) and combination (order does not matter).

Case 2: Now consider the case when we have indistinguishable objects. As

shown in Fig. 11.1(d), there are 7 red balls, 7 yellow balls and 1 black ball. In how
many ways we can arrange them in 15 slots. In this case, let us focus on the slots
and for your convenience, they are marked from 1-15 in Fig. 11.1(d). We need 7
slots to put 7 red balls. Now, let us ask the question: in how many ways we can
select 7 slots out of 15?6 The answer depends on, whether order does matter or
not. As we are going to put indistinguishable balls (say all red) in 7 slots, order
does not matter and the answer is C(15, 7). Among remaining 8 slots, 7 are going
to be occupied by yellow balls and there are C(8, 7) ways of selecting 7 slots out of
15. Obviously, remaining slot is going to be occupied by the only black ball and
there are C(1, 1) ways of selecting 1 object out of 1. Thus, total number of ways in
which 7 red, 7 yellow and 1 black ball can be arranged in 15 slots are,

15! 8! 1! 15!
C(15, 7) × C(15 − 7, 7) × C(15 − 7 − 7, 1) = × × = . (11.6)
8!7! 1!7! 0!1! 7!7!1!
Thus, we can generalize by stating that: if there are total n = n1 + n2 + · · +nk
objects, among which n1 , n2 , ... nk are indistinguishable, then total number of
6
Some examples are (a) slots 1234567, (b) slots 123458, (c) slots 1346789 etc.

147
11.2. HOW TO COUNT?

(a) (b)
C V S M B C V S M B

Figure 11.2: Three (r) scoops of ice cream (blue dots) chosen from five (n) different
flavors: (a) 3 chocolate, (b) 1 vanilla, 1 strawberry, 1 butterscotch, (c) 2 chocolate,
1 mango and (d) 1 chocolate, 1 vanilla, 1 mango. In order to find all possible
combinations, we have to calculate the number of ways 7 objects can be arranged,
which can be divided in two types, 3 blue dots and 4 red vertical lines.

ways in which they can be arranged is,

n!
C(n, n1 ) × C(n − n1 , n2 ) × C(n − n1 − n2 , n3 ) × · · · = . (11.7)
n1 ! × n2 ! × n3 ! · · · nk !

Case 3: Finally, let us discuss the case when repetition is allowed. As shown
in Fig. 11.1(f), in an ice-cream parlor, you are allowed to take 3 scoops of ice
cream out of chocolate (C), vanilla (V), strawberry (S), mango (M) and butterscotch
(B). Repetition is allowed in this case, i.e., you can opt for all 3 chocolate or 1
butterscotch, 1 vanilla & 1 strawberry or 2 chocolate & 1 mango or any other
combination. Obviously, order does not matter, i.e., an arrangement of chocolate
on top, followed by vanilla and strawberry is same as vanilla on top, followed by
strawberry and chocolate. Let us find out in how many ways we can select 3
scoops out of 5 flavors.
Some examples are shown in Fig. 11.2. The scoops are shown by blue balls
(r = 3) and the partition between different flavors (n = 5) are shown by red vertical
lines (there are n − 1 of them). Remember that, we are looking for all possible
combinations of r scoops of ice creams (repetition allowed) from n different flavors.
From Fig. 11.2, it is clear that the answer is nothing but all possible arrangements
of (n − 1) red lines and r blue dots, which is given by,

(r + n − 1)!
= C(r + n − 1, r). (11.8)
r! × (n − 1)!

11.2.3 Summary
• If there are N1 possible outcomes of event 1, N2 possible outcomes of event
2, ......., Nn possible outcomes of event n, and all the events occur simulta-
neously (or in succession), then there are N1 × N2 · · · ×Nn possible outcomes.

148
11.2. HOW TO COUNT?

• In how many ways we can arrange n objects in n slots.

– n! if all the objects are distinguishable.

n!
– n1 !×n2 !×n3 !···nk ! if there are n1 , n2 , ··, nk indistinguishable objects.

• In how many ways r objects can be chosen out of n possibilities.

n!
– P (n, r) = (n−r)! if repetition not allowed and order does matter.
n!
– C(n, r) = r!×(n−r)! if repetition not allowed and order does not matter.
– nr if repetition allowed and order does matter.
(r+n−1)!
– C(r + n − 1, r) = r!×(n−1)! if repetition allowed and order does not matter.

11.2.4 Some examples

Example 1: From your class of 30 students, in how many ways you can choose
a president, vice-president, secretary and treasurer for a particular program?
Since I am specifying someone as president, vice-president etc., order does
matter. For example, A: president, B: vice-president, C: secretary and D: treasurer
is different from A: vice-president, B: president, C: treasurer and D: secretary.
30!
Thus the answer is P (30, 4) = (30−4)! = 30 × 29 × 28 × 27.

Example 2: Now, imagine that you all believe in equality and you plan to make
a committee, where every member is equal. In that case, in how many ways a 4
member committee can be chosen from a class of 30?
Note that, in this case order does not matter, i.e., a committee comprising A,
B, C and D is same as the committee comprising A, C, D and B. Thus, the answer
30!
is C(30, 4) = 26!×4! = 30×29×28×27
24 .
I would like to draw your attention to the fact that, we can solve the above
problems without thinking too much about which formula to apply. For example,
if we want to make a committee of 4 from a class of 30, the first post can be filled
in 30 ways, the second post can be filled in 29 ways, · · · and thus, the answer is
30 × 29 × 28 × 27. Now, 4 members can be selected for 4 posts in 4! = 24 ways. Thus,
if order does not matter, then there are 30×29×28×27
24 ways of forming a 4 member
committee.

Example 3: You have 14 indistinguishable balls, which needs to be arranged in

4 boxes. A box can be empty or can hold any number of balls. In how many ways
you can distribute the balls?
You can draw a figure similar to Fig. 11.2. In this case, we have 14 balls and
(14+4−1)!
(4-1) partitions. Total number of ways these objects can be arranged is 14!×(4−1)! =
17×16×15
6 .

Example 4: You have 14 distinguishable balls, which needs to be arranged in 4

boxes. It is given that, box 1 can hold 2 balls, box 2 can hold 3 balls, box 3 can
hold 4 balls and box 4 can hold 5 balls. In how many ways you can distribute the
balls if order does not matter.

149
11.3. DISCRETE PROBABILITY FUNCTIONS

Sample space 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
Probability 36 36 36 36 36 36 36 36 36 36 36

Table 11.2: A sample space (sum of two dice) and probability derived from Ta-
ble 11.1.

For filling box 1, we can select 2 balls out of 14 in C(14, 2) ways (as order does
not matter). For box 2, we can select 3 balls out of remaining 12 in C(12, 3) ways.
Similarly, we can select 4 balls for box 3 in C(9, 4) ways and 5 balls for box 4 in
14!
C(5, 5) ways. Thus, the answer is C(14, 2) × C(12, 3) × C(9, 4) × C(5, 5) = 2!×3!×4!×5! .

11.2.5 Exercise
1. In a paramagnet, there are N -atoms and each of them can have either ↑ or ↓
spin.

• Find the number of all possible distributions of ↑ or ↓ spin in N -atoms.

• Let us consider a specific case, such that N1 atoms have ↑ spin and N2
atoms have ↓ spin, such that N = N1 + N2 . Find the possible number of
possible arrangements for this specific case.
• Starting from the second problem, how do you get the result obtained
in the first problem. Hint: there are several possibilities like (N ↑, 0 ↓
), (N − 1 ↑, 1 ↓), (N − 2 ↑, 2 ↓), · · ·, (2 ↑, N − 2 ↓), (1 ↑, N − 1 ↓), (0, N ↓). Using
second problem, calculate the number of possible arrangements and
then add all of them. The answer should be equal to what you obtained
in first problem.

11.3 Discrete probability functions

Let us come back to the problem of throwing two dice. We are not interested in
the outcome of an individual die but the sum of two dice. Let us call the sum as
x and it has a particular value for each point in the sample space (total number
of sample points equal to 36, as shown in Table 11.1). Possible values of x spans
from 2 to 12. Note that, other than 2 and 12, there are more than one sample
points for a given value of x. Such a variable x, having a specific value for each
sample point is termed as a random variable.7
Since we are not interested in the outcome of an individual die but the sum
of two dice, it would be a good idea to redefine the sample space, in which each
sample point corresponds to a value of x. This is shown in Table 11.2. For each
value of x (say xi ), we know the probability pi . Now let us define a function f (x),
such that pi = f (xi ) = probability that x has a value of xi . This function is known
as the probability function 8 and in this case, it can take only discrete values. A
7
It is a variable because it can take several values. Moreover, we are talking about a random or
stochastic process and thus, it is called a random variable.
8
Also known as probability distribution function or frequency function or probability distribu-
tion.

150
11.3. DISCRETE PROBABILITY FUNCTIONS

6/36 1

1/6 5/36
0.75
4/36
(a) (b) (c)

F(x)
f(x)

f(x)
3/36 0.5

2/36
0.25
1/36
0
1 2 3 4 5 6 0 2 4 6 8 10 12 14 0 2 4 6 8 10 12
x x x

Figure 11.3: Plot of the probability functions: (a) throwing a single die and (b)
throwing two dice simultaneously (see Table 11.2). (c) Plot of cumulative proba-
bility distribution function in case of throwing two dice.

plot of the probability function is shown in Fig. 11.3(b). A plot of the probability
function, in case of throwing a single die is shown in Fig. 11.3(a). Since all the
outcomes are equally probable, f (x) is a constant (equal to 1/6) in this case.
How the above discussion is useful to a scientist or engineer? Say you are
doing some experiment, like measuring electrical conductivity of some material.
You generally do the experiment
P several times and measure a value of xi for Ni
times out of total N = Ni and the probability of xi is calculated as pi = f (xi ) =
Ni /N . We are generally interested in the average value of all measurements and
the spread of the data about the average value, as discussed in the next section.

11.3.1 Mean value and variance

How do we compute the mean or average of a set of N numbers: by adding them
and then dividing by N . For example, we want to know the average age of the
students of your class. Say, there are total 20 students, out of which 6 are 23
years old, 5 are 24 years old, 4 are 25 years old and 5 are 26 years old. Thus, the
average age is 6×23+5×24+4×25+5×26
20 = x1 ×N1 +x2 ×N2N
+x3 ×N3 +x4 ×N4
. We can express this
in a compact form: average or mean or expectation value of x is given by:9

1 X X Ni X X
x̄ = xi Ni = xi = xi p i = xi f (xi ). (11.9)
N N
i i i i

For example, if we throw a die, mean or expectation value is 1 × 16 + 2 × 16 + 3 × 16 +

4 × 61 + 5 × 16 + 6 × 16 = 3.5. Interestingly, the mean or expectation value need not be
one of the possible outcomes. You never expect a 3.5 when you throw a die. The
term expectation value should just be interpreted as average.
Spread of data is a measure of how much individual values xi differ from the
mean value x̄. If we just calculate the difference ∆xi = (xi − x̄), some values will
be positive and some will be negative, such that when we calculate the average of
all ∆xi , we may get zero. Instead, we calculate average of ∆x2i , which is termed as
variance or dispersion
X X
Var(x) = ∆x2 = ∆x2i f (xi ) = (xi − x̄)2 f (xi ). (11.10)
i i
9
This is represented by a symbol of x̄ or µ or hxi.

151
11.3. DISCRETE PROBABILITY FUNCTIONS

6/36 6/36
5/36 (a) 5/36 (b) (c)

4/36 4/36

f(x)
f(x)

f(x)
3/36 3/36
2/36 2/36
1/36 1/36

a b x
0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14
x x

Figure 11.4: (a) Bar chart of the probability function shown in Fig. 11.3(b). Height
of each bar is proportional to the value of the data point and width of each bar
is equal to 1. (b) Width of each bar halved and additional points added. We can
keep halving the width, untill the points start touching each other (in the limit
of bar width→ 0) and we get a continious line, which represents a continuous
probability distribution function. (c) In this case, probability is given by the area
Rb
under the curve a f (x)dx.

The expression shown above can further be written as,10

X
x2i f (xi ) − 2x̄xi f (xi ) + x̄2 f (xi ) = x2 − 2x̄2 + x̄2 = x2 − x̄2 .

∆x2 = (11.11)
i

Square root of variance or ∆x is termed as the standard deviation.11

11.3.2 Cumulative distribution function

Now, let us ask the following question: what would be tha probability that the
1 2 3 4
sum of two dice is less than or equal to 5? The answer is 36 + 36 + 36 + 36 = 10
36 .
Similarly, we can calculate what would be the probability that the sum of two dice
is less than or equal to any given number. This is plotted in Fig. 11.3(c) and the
function is known as cumulative probability distribution function, defined as,
X
F (xi ) = f (xj ), (11.12)
xj ≤xi

where F (xi ) is the probability that x ≤ xi .

11.3.3 Exercise
1. Let the random variable x be the number of heads when three coins are
tossed. Make a table of x and the probability function f (x) = p.

10 P
Since f (xi ) = pi is a probability function, it satisfies i f (xi ) = 1.
11
This is also represented by a symbol of σ.

152
11.4. CONTINUOUS PROBABILITY FUNCTIONS

11.4 Continuous probability functions

Let us represent the probability distribution function shown in Fig. 11.3(b) in a
different way. Around each point, we draw a bar of constant width (1 in this case),
as shown in Fig. 11.4(a). Then, we can represent the probability as the area of
the rectangle, i.e., pi = f (xi ) × 1.12 Now, let us halve the width, as shown in
Fig. 11.4(b). In this process we will get additional points, in-between the original
points we started with and probability is given by the area of the bar pi = f (xi ) ×
0.5. We can keep halving the width and keep generating additional points, with
probability given by the area of the bar pi = f (xi ) × ∆x, ∆x being the width of the
bar. In the limit ∆x → 0, we get a continuous probability distribution function,
which can be represented by some smooth curve f (x),13 as shown in Fig. 11.4(c).
Now, let us ask the following question: what is the probability that the random
variable lies within some interval a to b? Remember that, in case of bar diagram
representation, probability was given by the area of the bar. So, if we have to
find the probability in some interval, we just have to find the sum of areas of all
the bars lying within the specified interval. Similarly, in case of continuous prob-
Rb
ability distribution, probability is given by the area under the curve a f (x)dx,14
as shown in Fig. 11.4(c). This is the only change that exists between discrete
and continuous probability distribution function: sum has to be replaced by an
integral.

11.4.1 Mean value and variance

As discussed in the previous section, let us replace the sum with integrals. Aver-
age is calculated as, Z ∞
x̄ = xf (x)dx. (11.13)
−∞
Compare with Eq. 11.9 and note that the sum is replaced by an integral. Simi-
larly, Variance or dispersion is calculated as,
Z ∞
2
2
Var(x) = ∆x = σx = (x − x̄)2 f (x)dx. (11.14)
−∞

Compare with Eq. 11.10 and note that the sum is replaced by an integral. Cu-
mulative probability function is given by,
Z x
F (x) = f (u)du. (11.15)
−∞

Compare with Eq. 11.12 and note that the sum is replaced by an integral. It is
obvious that F (∞) = 1.
12
If we add all the areas, we should get 1.
13
Instead of aR discrete random variable xi , we now have a continuous random variable x.
14 ∞
Obviously, −∞ f (x)dx = 1.

153
11.5. BINOMIAL DISTRIBUTION

11.5 Binomial distribution

Let us do an experiment of tossing a coin for 5 times in succession. There can
be several outcomes like HT HT H, HT T T H etc. The total number of possible out-
comes (sample space) are 25 .15 Thus, probability of one of the possible outcomes
(like HHHT T ) is 215 .
Now let us find out what would be the probability that 3 out of 5 outcomes
are H. We have to count all the outcomes like HHHT T , HT HT H, T T HHH etc.16
Thus, H needs to be distributed in 3 dirrerent slots, like 123, 135, 345 etc.17
Hence, this is a problem of selecting 3 numbers from a set of 5 (1,2,3,4,5), when
repetition is not allowed and order does not matter and the answer is C(5, 3).
Therefore, if we toss a coin 5 times in succession, there are total 25 outcomes
possible, out of which there are C(5, 3) outcomes with 3 H and each of the out-
comes with 3 H has a probability of 215 . Hence, probability of 3 H in 5 toss is equal
to C(5, 3) × 215 . Now we can define a probability density function, which gives the
probability of r heads in n successive toss as,
n
1
f (r) = C(n, r) . (11.16)
2

What we just discussed is an example of succesive independent experiments,

each trial having two possible outcomes, like head (H) and tail (T ) in case of coin
toss. Let us consider a practical example. Products manufactured in a factory can
either be perfect or defective. What would the probability that r out of n products
are perfect?18 Unlike the coin toss, where H and T has equal probability, it is
very unlikely that probability of a product being perfect and defective is equal.
Let us assume that probability of a product being perfect and defective is p and q,
respectively.19 Now, we generalize the above formula as,

f (r) = C(n, r)pr q n−r . (11.17)

A related question would be the probability of not more than r perfect products
out of n.20

11.6 Normal distribution

11.7 Poisson distribution

15
There are 2 possible outcomes of the 1st toss, 2 possible outcomes of the 2nd toss and so on.
16
If it is one of the specific outcomes like HT T HH, then the answer is 215 .
17
Equivalently, T needs to be distributed in 2 different slots, like 45, 24, 12 etc.
18
Or equivalently, n − r out of n products are defective.
19
q = 1 − p.
20
Equivalently, less than or equal to r perfect products out of n.

154
Appendix A

Introduction to Partial
Differential Equations

In this chapter, we shall learn about the basics of the partial differential equa-
tions.

A.1 Classification of PDEs

We already know that PDEs can be linear or non-linear. We shall further learn in
this chapter that non-linearity can be of different form, some more complicated
than others. Let us first learn how to express PDEs using a compact notation,
starting with linear PDEs.1

A.1.1 Linear PDE

We start with a simple linear PDE of second order,

∂2f ∂2f ∂2f ∂f ∂f

2
+ + 2
+ + + f + u(x, y) = 0, (A.1)
∂x ∂x∂y ∂y ∂x ∂y

where u(x, y) is a known function. We can rewrite the above equation as,

D(2,0) f + D(1,1) f + D(0,1) f + D(1,0) f + D(0,1) f +D0 f + u = 0. (A.2)

| {z } | {z }
D2 f D1 f

The term D2 is a (2 × 2) matrix,

∂2f ∂2f
(2,0) !
D(1,1) f

2 D f fxx fxy ∂x2 ∂x∂y
D f= = = ∂2f ∂2f . (A.3)
D(1,1) f D(0,2) f fyx fyy
∂y∂x ∂y 2

Similarly, the term D1 is a vector (or gradient of f ),

1 (1,0) (0,1) ∂f ∂f
D f = ∇f = (D f D f) = . (A.4)
∂x ∂y
1
We need this notation for classifying non-linear PDEs, which have many sub-classes.

155
A.1. CLASSIFICATION OF PDES

Now, we express Eq. A.2 in a compact form like,

2
X
aα Dα f + u = 0, (A.5)
α=0

where all the coefficients aα = 1. In general, for a PDE to be linear, aα can be any
function of the independent variables, x = (x, y, z, · · ·) and we can write a k th order
linear PDE as,
Xk
aα (x)Dα f + u(x) = 0 . (A.6)
α=0

A.1.2 Non-linear PDE

Non-linearity can be introduced in Eq. A.6 in various ways:

• The coefficients depend on f or some derivative of f , like aα (x, f ), aα (x, ∂f /∂x)

etc.

• The derivatives are non-linear like (∂f /∂x)2 , (∂f /∂x)(∂ 2 f /∂y 2 ) etc.

• u is a function of f , i.e., u(x, f ).

Let us learn about several sub-classes of non-linear PDE. Non-linear PDEs can
appear very complicated. Fortunately, their classification is based only on the
highest order terms and rest of the lower order terms can be ignored for classifying
non-liner PDEs.

Semi-linear PDE
If the highest order term is linear, we call it a semi-linear PDE. Let us first separate
the highest order term from the rest of the terms and write a k th order PDE as,

ak (x)Dk f + a(Dk−1 f, · · ·, Df, f, x) = 0 , (A.7)

| {z } | {z }
linear non−linear

where a is a function which contains anything other than the highest order term.
Note that, we have to care only about the linearity of the highest order term and
the rest can contain non-linearity of any form. For example, the equation

xfxx + yfyy + f fy = 0, (A.8)

is a semi-linear equation, because it is a second order equation and the highest

order terms (fxx and fyy ) are linear. The non-linearity occurs in the first order
term (f fy ).

Quasi-linear PDE
In this case, the highest order term is also non-linear. However, the coefficient of
the highest order term contains terms one order less than the highest order. For

156
A.2. METHOD OF CHARACTERISTICS

f(x,y)=(x-y)2 f(x,y)=sin(x-y)
3 0.5
2 0
1 -0.5 1
4 0.8
3.5 0.6
3 0.4
2.5 0.2
2 0
-0.2
1.5 -0.4
1 -0.6
0.5 -0.8
0 -1

y
x
(1,0)
(0,-1)

Figure A.1: Graph of f (x, y): two possible solutions of of Eq. A.15, plotted as-
suming a = b = 1. Contour lines or level curves are shown at the base of the
plots. Along the contour lines, f (x, y) is constant. Contour lines are parallel to
y = x − constant.

example, a k th order quasi-linear PDE has a general form of,

a (x, Dk−1 f, · · ·, Df, f )Dk f + a(Dk−1 f, · · ·, Df, f, x) = 0 . (A.9)

|k {z } | {z }
non−linear non−linear

Note that, ak is can be a function of x, Dk−1 f, · · ·, Df, f , but not a function of Dk f .

For example, the equation

fx fxx + fy fyy + fx + fy + f = 0, (A.10)

is a quasi-linear equation, because it is a second order equation and the highest

order terms are non-linear. However, the highest order terms (fxx and fyy ) have
coefficients, which are one order less (fx and fy ).

Fully non-linear PDE

If a k th order PDE contains term like (Dk f )n , where n > 1, then it is a fully non-
linear PDE. For example,
2 2
fxx + 2fxy + fyy + fx + fy = 0 (A.11)

is a fully non-linear equation, because it is a second order equation and the

highest order terms (fxx and fyy ) are quadratic.

A.1.3 Exercise

A.2 Method of characteristics

A.2.1 General solution
Let us solve the following first order and linear PDE,

a(x, y)fx + b(x, y)fy = c(x, y). (A.12)

157
A.2. METHOD OF CHARACTERISTICS

Since f (x, y) is a solution, graph of f (x, y)2 is going to be a smooth surface S.3 We
get a nice geometric interpretation, if we rewrite the equation as “dot product”,

[a(x, y), b(x, y), c(x, y)] · [fx (x, y), fy (x, y), −1] = 0. (A.13)
| {z } | {z }
tangent normal

Do you recognize that the second vector is normal to the surface S at (x, y, f (x, y))
or (x, y, z)?4 Thus, the first vector lies on the plane, tangent to the surface S at
(x, y, f (x, y)) or (x, y, z). Based on this information, if we can construct the surface
S, then we get the solution f (x, y).5
Let C(s) = (x(s), y(s), z(s)) be a parameterized curve on the surface S. Tangent
to the curve C 0 (s) = (x0 (s), y 0 (s), z 0 (s)) should lie on the tangent plane, such that,

dx
x0 (s) = = a(x(s), y(s)), (A.14)
ds
dy
y 0 (s) = = b(x(s), y(s)),
ds
dz
z 0 (s) = = c(x(s), y(s)).
ds
The above system of ODEs are called the characteristic equations of the PDE
given in Eq. A.12 and C(s) is called the integral curve to the PDE. If we get all
such curves, then we can construct the surface S.
Example 1: Homogeneous equations are simplest ones to solve. So, let us
start with,
afx + bfy = 0, (A.15)
where a, b are constants. The characteristic equations and solutions are,

dx
= a ⇒ x = as + c1 , (A.16)
ds
dy
= b ⇒ y = bs + c2 ,
ds
dz
= 0 ⇒ z = c3 .
ds
Eliminating s, we can write the integral curves as,

bx − ay = c4 & z = c3 . (A.17)

Thus, z = f (x, y) is constant along the lines bx − ay = c4 . If we draw all such lines,
then we can construct the surface S. Thus, we can write the general solution of
Eq. A.15 as,
f (x, y) = u(bx − ay). (A.18)
2
Graph of f (x, y) is the set of points (x, y, f (x, y)), where z = f (x, y) is the height of the graph at
the point (x, y). Try to plot the graph of f (x, y) = x2 + y 2 and see how the surface looks like.
3
Such that we can draw a tangent plane at every point.
4
Let us write F (x, y, z) = f (x, y) − z = 0 and ∇F = (fx , fy , −1).
5
Because S is the graph of f (x, y).

158
A.2. METHOD OF CHARACTERISTICS

y
x

x *
o
x * o x

* o

Figure A.2: Solution of Eq. A.15 (a = b = 1): f (x, y) = u(x − y) = constant and initial
condition is given along the x−axis as f (x, 0) = g(x). Values on the x−axis will
be “carried” or “transported” along the straight lines, because f (x, y) is constant
along the lines. Thus, the solution can be written as, f (x, y) = g(x − y).

You must keep in mind that, while f (x, y) ∈ R2 , u(bx − ay) ∈ R. We can verify,

fx = bu0 , fy = −au0 ⇒ afx + bfy = 0. (A.19)

Since u is an arbitrary function, there exist infinite number of solutions.

A.2.2 Initial condition

Assuming a = b = 1, some specific examples satisfying Eq. A.15 are: f (x, y) =
x − y, f (x, y) = (x − y)2 ,f (x, y) = sin(x − y) and f (x, y) = ex−y . Some examples are
illustrated in Figure A.1. Note that, the contour lines or level curves shown in the
figure are the integral curves given in Eq. A.17. You may wonder how can we get
a specific f (x, y) starting from Eq. A.18?
Let us set our initial conditions along the x axis, for example,

f (x, 0) = g(x), (A.20)

and try to solve Eq. A.15. Without doing anything, we can predict that the value
of the function on the x axis will be “carried” or “transported” along the contour
lines, bx − ay = constant (see Figure A.2). Based on this fact, convince yourself
that, if we take the initial condition (assuming a = b = 1) to be,

f (x, 0) = g(x) = x2 & f (x, 0) = g(x) = sin(x), (A.21)

then we get the particular solutions plotted in Figure A.1. Thus, a particular
solution to Eq. A.15 is obtained from the initial condition g(x) given in Eq.A.20

159
A.2. METHOD OF CHARACTERISTICS

and can be written as,

f (x, y) = g(bx − ay). (A.22)
Whatever we discussed so far works for homogeneous equations (c = 0), and
they will have some general solutions of the form f (x, y) = constant along certain
straight line or curve, like g(x + y), g(x2 + y 2 ) etc.
Now, let us take a non-homogeneous, first order, linear PDE with a given initial
condition,

a(x, y)fx + b(x, y)fy = c(x, y, u) , (A.23)

f (x, 0) = g(x),

and try to solve it. We must find the surface S, which is the graph of f , and
contain the data curve,
Γ(r) = (r, 0, g(r)). (A.24)
Note that, the data curve has been parameterized by r and constructed from the
initial condition. The integral curves C(r, s) = (x(r, s), y(r, s), z(r, s)), also lying on
S, originate from the data curve Γ(r) (see Figure A.3), and satisfy,

dx
x0 (r, s) = = a(x(r, s), y(r, s)), (A.25)
ds
dy
y 0 (r, s) = = b(x(r, s), y(r, s)),
ds
dz
z 0 (r, s) = = c(x(r, s), y(r, s)).
ds
This is same as before, with additional restriction that the integral curves must
originate from the data curve Γ(r). Note that, the integral curves are parameter-
ized by r and s. Since s = 0 along r, we can write x(r, 0) = r, y(r, 0) = 0, z(r, 0) = g(r),
and we have to solve,
∂x
x0 (r, s) = = a(x(r, s), y(r, s)); x(r, 0) = r, (A.26)
∂s
∂y
y 0 (r, s) = = b(x(r, s), y(r, s)); y(r, 0) = 0,
∂s
∂z
z 0 (r, s) = = c(x(r, s), y(r, s)); z(r, 0) = g(r).
∂s
Example 2: Let us solve the following linear, first order, non-homogeneous
PDE,

fx + fy = f, (A.27)
f (x, 0) = cos x.

The data curve is Γ(r) = (r, 0, cos r). The characteristic equations, with initial

160
A.2. METHOD OF CHARACTERISTICS

(x1,0,g(x1))

(x0,0,g(x0))

x0
x

Figure A.3: The data curve (solid line) is given along the x−axis: f (x, 0) = g(x).
The characteristic curves (dashed lines) originate from the data curve.

conditions are,

x0 (r, s) = 1; x(r, 0) = r, (A.28)

0
y (r, s) = 1; y(r, 0) = 0,
0
z (r, s) = z; z(r, 0) = cos r.

Integrating and using the initial conditions, we can write,

x(r, s) = s + φ(r) ⇒ x(r, s) = s + r, (A.29)

y(r, s) = s + φ(r) ⇒ y(r, s) = s,
ln |z(r, s)| = s + ln |φ(r)| ⇒ |z(r, s)| = es cos r.

Thus, eliminating s and r, we can write the solution as,

f (x, y) = ey cos(x − y). (A.30)

It is left as an exercise for you to verify that f (x, y) satisfies Eq. A.27.
Example 3: Let us now solve a quasi-linear equation, given by,

f fx + fy = 0, (A.31)
f (x, 0) = x.

The data curve is given by Γ(r) = (r, 0, r). The characteristic equations, with initial

161
A.2. METHOD OF CHARACTERISTICS

-0.5 50 -0.5
f(x,y)=x/(y+1) r=1.0
-0.55 1 45 r=0.8
r=0.6
-0.6 0.8 40 -0.6 r=0.4
0.6 r=0.2
-0.65 0.4 35
0.2 30 -0.7
-0.7
-0.75 25 y
y 20 -0.8
-0.8
-0.85 15

-0.9 10 -0.9

-0.95 5
0 -1
0 0.1 0.2 0.3 0.4 0.5
0 0.1 0.2 0.3 0.4 0.5
x x
x
Figure A.4: (Left) Color map with contour lines for f (x, y) = y+1 = r. (Right)
Contour lines are along x = r(y + 1) and f (x, y) = constant = r along these lines.
Clearly, there exist a singularity at the point (0,-1).

conditions are,

x0 (r, s) = z; x(r, 0) = r, (A.32)

0
y (r, s) = 1; y(r, 0) = 0,
z 0 (r, s) = 0; z(r, 0) = r.

Integrating and using the initial conditions, we can write,

z(r, s) = φ(r) ⇒ z(r, s) = r, (A.33)

y(r, s) = s + φ(r) ⇒ y(r, s) = s,
x(r, s) = rs + φ(r) ⇒ x(r, s) = r(y + 1).

Eliminating r, s, we can write the solution as,

x
z = f (x, y) = . (A.34)
y+1

Thus, f (x, y) = constant = r along the lines x = r(y+1). Let us assume that r ∈ [0, 1].
Thus, x lies in a region bound by the y−axis (when r = 0) and the line x = y + 1
(when r = 1). Along the y−axis, f (x, y) = 0 and along the line y = x − 1, f (x, y) = 1.
Few more contour lines or level curves are shown in Figure A.4. As shown in the
figure, along the contour lines f (x, y) = constant = r. However, all the lines emerge
from the point (0,-1). How can this be possible? Clearly, there exist a singularity
at the point (0,-1).

A.2.3 Exercise
1. Solve yfx − xfy = 0. Make a diagram like Figure A.2.

2. Solve the above equation with initial condition f (x, 0) = sin x.

3. What is the difference between Figure A.2 and Figure A.3.

Answer: In case of a homogeneous and linear, first order PDE, the solution is

162
A.3. CANONICAL FORM

f (x, y) = constant along the straight line (or some curve). Thus, if we know the
value of f (x, 0) = g(x), we know the value of f (x, y), as shown in Figure A.2.
If the PDE is non-homogeneous, we need to solve for f (x, y), originating from
the data curve f (x, 0) = g(x), as shown in Figure A.3.

A.3 Canonical form

A general form of a second order, linear PDE is,

a(x, y)fxx + b(x, y)fxy + c(x, y)fyy + d(x, y)fx + e(x, y)fy + g(x, y)f + h(x, y) = 0. (A.35)

We can rewrite the above equation as,

a(x, y)fxx + b(x, y)fxy + c(x, y)fyy + L[fx , fy , f ] + h(x, y) = 0. (A.36)

It is classified in one of the three categories depending on the value of ∆ = b2 − 4ac:

• Parabolic, if ∆ = 0.

• Hyperbolic, if ∆ > 0.

• Elliptic, if ∆ < 0.

Let us change the independent variables from (x, y) ⇒ (θ(x, y), η(x, y)). Let us
define a new function,

f (x(θ, η), y(θ, η)) = u(θ(x, y), η(x, y)). (A.37)

Applying chain rule to get fx , fy , fxx , fyy , fxy , we can write,

A(θ, η)uθθ + B(θ, η)uθη + C(θ, η)uηη + L[uθ , uη , u] + H(θ, η) = 0, (A.38)

where,

A = aθx2 + bθx θy + cθy2 , (A.39)

B = 2aθx ηx + b(θx ηy + θy ηx ) + 2cθy ηy ,
C = aηx2 + bηx ηy + cηy2 .

˜ = B 2 − 4AC = ∆J 2 , where J = ∂(θ,η) is the Jacobian.6 This

It can be proved that, ∆ ∂(x,y)
proves that change of variable does not change the category of the PDE. Using
Eq. A.39, we can get the canonical forms of the second order linear PDEs.

A.3.1 Hyperbolic equation

The canonical form7 of hyperbolic equation is,

uθη + L[uθ , uη , u] + H(θ, η) = 0. (A.40)

6
Note that, J must not be equal to zero.
7
You can think it as the simplest form.

163
A.4. BOUNDARY CONDITIONS

We shall see that any hyperbolic equation of the form Eq. A.35 can be transformed
to the canonical form (Eq. A.40) by suitable change of variable. We need to find
θ(x, y) and η(x, y) such that,

A(θ, η) = aθx2 + bθx θy + cθy2 = 0, (A.41)

C(θ, η) = aηx2 + bηx ηy + cηy2 = 0.

Note that, θ(x, y) and η(x, y) are roots of the same equation. We can rewrite,
" r # ! " r # !
b b2 b b2
aθx + + − ac θx aθx + − − ac θy = 0. (A.42)
2 4 2 4
| {z }| {z }
I II

Now we have two first order linear PDEs and we know how to solve them using
method of characteristics. The characteristic equations are,

dx
= a, (A.43)
r ds
dy b b2
= + − ac,
ds 2 4
dθ
= 0.
ds
Thus, θ(x, y) is constant along the characteristic curves and the characteristic
curves are given by, p
dy dy/ds b/2 + b2 /4 − ac
= = . (A.44)
dx dx/ds a
Similarly, η(x, y) is constant along the characteristic curves and the characteristic
curves are given by, p
dy dy/ds b/2 − b2 /4 − ac
= = . (A.45)
dx dx/ds a

A.4 Boundary conditions

The function f (x, y) satisfies some PDE in a given domain Ω. We can call it a
solution if it also satisfies one of the three following conditions along the domain
boundary ∂Ω. According to the boundary condition, we classify a PDE as Dirichlet
boundary value problem (BVP) if,

(f )∂Ω = u, (A.46)

a Neumann BVP if,

∂f
= v, (A.47)
∂η ∂Ω
and a mixed BVP if,
∂f
αf + β = w, (A.48)
∂η ∂Ω

164
A.5. ELLIPTIC PDE: LAPLACE EQUATION

where u, v, w are some functions given to us.

A.5 Elliptic PDE: Laplace equation

(∆f ) = ∇2 f = fxx + fyy = 0. (A.49)
The above equation is defined in some domain Ω, which is smooth and bounded
in R2 .8

Solutions: We already know that any harmonic function (examples: x, y, x2 − y 2

etc.) satisfies Laplace equation. Now, there are infinite number of harmonic
functions. Which one do we choose as the solution? The solution is a harmonic
function(s), which also satisfies the boundary conditions. Let us take Dirichlet
BVP and try to understand the nature of well behaved solutions.
(a) Uniqueness of solution: If f1 and f2 satisfies Eq. A.49, then f1 ≡ f2 .
(b) Continuous dependence on boundary data: If

∆f1 = 0 & f1 = u1 along ∂Ω, (A.50)

∆f2 = 0 & f2 = u2 along ∂Ω,

then,
||u1 − u2 || < δ ⇒ ||f1 − f2 || < ε. (A.51)
This implies that, if the boundary condition changes by small amount, the solu-
tion should also change by small amount.9

Harmonic functions and maximum principle: We know that f is a Harmonic

function if it satisfies ∆f = 0 in a domain Ω. Then, maximum of f is achieved on
the boundary ∂Ω.

Poisson equation: If we add a non-homogeneous term to the Laplace equation,

we get a Poisson equation.

∆f = ∇2 f = fxx + fyy = g(x, y). (A.52)

A.6 Hyperbolic PDE: Wave equation

A.6.1 Homogeneous wave equation
One dimensional, homogeneous wave equation is,

ftt − c2 fxx = 0, (A.53)

8
Generalization to Rn is straightforward.
9
If the solution changes drastically due to small change in boundary condition, obviously some-
thing is wrong!

165
A.6. HYPERBOLIC PDE: WAVE EQUATION

where x ∈ R and t ∈ (o, ∞). Initial conditions are,

f (x, 0) = g(x), (A.54)

ft (x, 0) = h(x).

From Eq. A.44 and Eq. A.45, characteristics curves are,

dt 1
= ± ⇒ x ± ct = constant. (A.55)
dx c
Let us define the following change of variable,

θ(x, t) = x + ct and η(x, t) = x − ct, (A.56)

and assume that f (x, y) = u(θ(x, y), η(x, y)). Applying chain rule, it can be shown
that,

fxx = uθθ + 2uθη + uηη , (A.57)

2
ftt = c (uθθ − 2uθη + uηη ).

Substituting in Eq. A.53, we get the canonical form of 1D wave equation and its
solution as,
uθη = 0 ⇒ u(θ, η) = φ(θ) + ψ(η). (A.58)
Thus, the solution of 1D wave equation is,

f (x, y) = φ(x + ct) + ψ(x − ct). (A.59)

It is left as an exercise to verify that the above equation satisfies Eq. A.53. Apply-
ing the initial conditions,

f (x, 0) = g(x) = φ(x) + ψ(x), (A.60)

ft (x, 0) = h(x) = cφ(x) − cψ(x).

Integrating the last equation, we can write,

1 x
Z
φ(x) − ψ(x) = h(τ )dτ + constant. (A.61)
c x0

Solving for φ and ψ,

1 x
Z
1 constant
φ(x) = g(x) + h(τ )dτ + , (A.62)
2 2c x0 2
1 x
Z
1 constant
ψ(x) = g(x) − h(τ )dτ − .
2 2c x0 2

166
A.6. HYPERBOLIC PDE: WAVE EQUATION

Thus, we can write,

Z x+ct
1 1 constant
φ(x + ct) = g(x + ct) + xh(τ )dτ + , (A.63)
2 2c x0 2
Z x−ct
1 1 constant
ψ(x − ct) = g(x − ct) − h(τ )dτ − .
2 2c x0 2

D’Alembert’s solution: Combining the above equations, we can write the solu-
tion of 1D wave equation as,
Z x+ct
1 1
f (x, t) = [g(x + ct) + g(x − ct)] + h(τ )dτ . (A.64)
2 2c x−ct

Thus, at the point (x0 , t0 ), the solution of 1D wave equation is,

Z x0 +ct0
1 1
f (x0 , t0 ) = [g(x0 + ct0 ) + g(x0 − ct0 )] + h(τ )dτ. (A.65)
2 2c x0 −ct0

A.6.2 Non-homogeneous wave equation

One dimensional, non-homogeneous wave equation is,

fxx − ftt = p(x, t), (A.66)

f (x, 0) = g(x),
ft (x, 0) = h(x).

Solution,
Z x+t Z Z
1 1 1
f (x, t) = [g(x + t) + g(x − t)] + h(τ )dτ − p(x, y)dR . (A.67)
2 2 x−t 2

This is known as Duhamel’s principle.

167

All MAT9004 Content (Derivative)
No ratings yet
All MAT9004 Content (Derivative)
167 pages
Math 201 Notes
No ratings yet
Math 201 Notes
145 pages
Mathematics 1
100% (4)
Mathematics 1
70 pages
MH1810 Notes 2023 (Part1)
No ratings yet
MH1810 Notes 2023 (Part1)
82 pages
Multivariable Functions Fields and Vector Calculus Notes 2020 PDF
No ratings yet
Multivariable Functions Fields and Vector Calculus Notes 2020 PDF
99 pages
Derive Lab Manual
No ratings yet
Derive Lab Manual
248 pages
Derive Lab Manual-1
No ratings yet
Derive Lab Manual-1
248 pages
Numerical
No ratings yet
Numerical
146 pages
VCMSTR
0% (1)
VCMSTR
252 pages
Multivariable Calclabs With Maple V by Belmonte and Yasskin
No ratings yet
Multivariable Calclabs With Maple V by Belmonte and Yasskin
252 pages
Numerical Analysis Lecture Ch.01 06
No ratings yet
Numerical Analysis Lecture Ch.01 06
241 pages
Mathematic Formulary
No ratings yet
Mathematic Formulary
65 pages
NumCompEWN2004
No ratings yet
NumCompEWN2004
383 pages
KC4010 Lecture Notes PDF
No ratings yet
KC4010 Lecture Notes PDF
148 pages
SI507lecturenotes PDF
No ratings yet
SI507lecturenotes PDF
245 pages
Lecture Notes2019
No ratings yet
Lecture Notes2019
91 pages
Math Formulary
No ratings yet
Math Formulary
74 pages
Mathematics Formulary
No ratings yet
Mathematics Formulary
74 pages
USyd MATH1011 Full Course Notes
100% (1)
USyd MATH1011 Full Course Notes
122 pages
Lecture Notes-1
No ratings yet
Lecture Notes-1
98 pages
Applied M I Lnote-1
No ratings yet
Applied M I Lnote-1
90 pages
Gty
100% (3)
Gty
424 pages
Optimum Design of Mechanical Elements: Class Notes For AME60661
No ratings yet
Optimum Design of Mechanical Elements: Class Notes For AME60661
217 pages
TIBookVol 2
No ratings yet
TIBookVol 2
233 pages
Basic Mathematics by Wevers
No ratings yet
Basic Mathematics by Wevers
66 pages
Maple - Multi Variable Mathematics With Maple - Linear Algebra, Vector Calculus and Differential
No ratings yet
Maple - Multi Variable Mathematics With Maple - Linear Algebra, Vector Calculus and Differential
40 pages
Multivariable Mathematics With Maple - Linear Algebra, Vector Calculus and Differential - Carlson & Johnson
100% (1)
Multivariable Mathematics With Maple - Linear Algebra, Vector Calculus and Differential - Carlson & Johnson
40 pages
Dedm PDF
No ratings yet
Dedm PDF
115 pages
Mathematics Formulary PDF
No ratings yet
Mathematics Formulary PDF
65 pages
Mathematics Formulary: by Ir. J.C.A. Wevers
No ratings yet
Mathematics Formulary: by Ir. J.C.A. Wevers
65 pages
Calculus PDF
No ratings yet
Calculus PDF
129 pages
metnum_V5
No ratings yet
metnum_V5
114 pages
Introduction To Scientific Computing: Using Matlab
No ratings yet
Introduction To Scientific Computing: Using Matlab
8 pages
Lecture Notes Up To Partial Differentiation
No ratings yet
Lecture Notes Up To Partial Differentiation
172 pages
Mathematics Formulary
No ratings yet
Mathematics Formulary
65 pages
Basics of Sub-Math by Khalid M
No ratings yet
Basics of Sub-Math by Khalid M
64 pages
basic_maths-lec
No ratings yet
basic_maths-lec
3 pages
104m
No ratings yet
104m
159 pages
Calculus For Scientists and Engineers With Matlab PDF
No ratings yet
Calculus For Scientists and Engineers With Matlab PDF
455 pages
Linear Algebra
No ratings yet
Linear Algebra
43 pages
(eBook PDF) A First Course on Numerical Methods 5th Edition instant download
100% (1)
(eBook PDF) A First Course on Numerical Methods 5th Edition instant download
46 pages
Maxima and Calculus
No ratings yet
Maxima and Calculus
52 pages
Lecture Notes To Transition To Advanced Mathematics - Mauch
No ratings yet
Lecture Notes To Transition To Advanced Mathematics - Mauch
1,451 pages
Lectures Compressed (3985) MAT135
No ratings yet
Lectures Compressed (3985) MAT135
300 pages
Maths
No ratings yet
Maths
149 pages
Math
No ratings yet
Math
173 pages
Reference Book For Numerical Analysis
100% (3)
Reference Book For Numerical Analysis
231 pages
Math 248: Computers and Numerical Algorithms
No ratings yet
Math 248: Computers and Numerical Algorithms
162 pages
MA1521+Lecture+Notes
No ratings yet
MA1521+Lecture+Notes
248 pages
Introduction To Calculus
No ratings yet
Introduction To Calculus
36 pages
(eBook PDF) Multivariable Calculus (International Series in Mathematics) instant download
100% (1)
(eBook PDF) Multivariable Calculus (International Series in Mathematics) instant download
57 pages