0% found this document useful (0 votes)
70 views

Main 2122

The document is a course on quantum theory that covers several key topics: 1. It introduces the quantum theory concepts that will be covered, including vector spaces, Hilbert spaces, operators, and the Dirac notation. 2. It discusses quantum states and properties like position, momentum, and energy as represented by vectors and operators in Hilbert spaces. 3. It examines how measurements are described probabilistically in quantum theory and explores foundational concepts like the uncertainty principle. 4. Specific quantum systems like the hydrogen atom, harmonic oscillators, angular momentum, and multiparticle systems are analyzed using the mathematical formalism.

Uploaded by

Amritraj Dash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views

Main 2122

The document is a course on quantum theory that covers several key topics: 1. It introduces the quantum theory concepts that will be covered, including vector spaces, Hilbert spaces, operators, and the Dirac notation. 2. It discusses quantum states and properties like position, momentum, and energy as represented by vectors and operators in Hilbert spaces. 3. It examines how measurements are described probabilistically in quantum theory and explores foundational concepts like the uncertainty principle. 4. Specific quantum systems like the hydrogen atom, harmonic oscillators, angular momentum, and multiparticle systems are analyzed using the mathematical formalism.

Uploaded by

Amritraj Dash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 207

QUANTUM THEORY

R. M. Potvliege

Contents
1 Introduction 5
1.1 What this course is about . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Revision: The hydrogen atom and the Stern-Gerlach experiment 5
1.3 Quantum states and vector spaces . . . . . . . . . . . . . . . . . 11
1.4 Mathematical note . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Vector spaces and Hilbert spaces 12


2.1 What is a vector space? . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Linear combinations . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 The span of a set of vectors . . . . . . . . . . . . . . . . . . . . . 23
2.5 Linear independence . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6 Dimension of a vector space . . . . . . . . . . . . . . . . . . . . 27
2.7 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.8 Inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.9 Norm of a vector . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.10 Orthogonal vectors . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.11 Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.12 Isomorphic vector spaces . . . . . . . . . . . . . . . . . . . . . . 43

1
3 Operators (I) 46
3.1 Linear operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2 Matrix representation of an operator . . . . . . . . . . . . . . . 49
3.3 Adding and multiplying operators . . . . . . . . . . . . . . . . . 56
3.4 The inverse of an operator . . . . . . . . . . . . . . . . . . . . . 59
3.5 Commutators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.6 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . 62
3.7 The adjoint of an operator . . . . . . . . . . . . . . . . . . . . . 64

4 Quantum states and the Dirac notation 66


4.1 Quantum states and ket vectors . . . . . . . . . . . . . . . . . . 66
4.2 Bra vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3 Operators and the Dirac notation . . . . . . . . . . . . . . . . . 71
4.4 The Principle of Superposition . . . . . . . . . . . . . . . . . . . 73

5 Operators (II) 76
5.1 Hermitian operators . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.2 Projectors and the completeness relation . . . . . . . . . . . . . 80
5.3 Bases of eigenvectors: I. Finite-dimensional spaces . . . . . . . . 84
5.4 Bases of eigenvectors: II. Infinite-dimensional spaces . . . . . . 91

6 Measurements and uncertainties 94


6.1 Probabilities and measurements . . . . . . . . . . . . . . . . . . 94
6.2 Dynamical variables and observables . . . . . . . . . . . . . . . 98
6.3 Expectation value of an observable . . . . . . . . . . . . . . . . 103
6.4 Probability distributions . . . . . . . . . . . . . . . . . . . . . . 104
6.5 Uncertainty relations . . . . . . . . . . . . . . . . . . . . . . . . 106
6.6 The state of the system after a measurement . . . . . . . . . . . 108

2
7 Wave functions, position and momentum 113
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.2 The Fourier transform and the Dirac delta function . . . . . . . 114
7.3 Eigenfunctions of the momentum operator . . . . . . . . . . . . 116
7.4 Normalization to a delta function . . . . . . . . . . . . . . . . . 119
7.5 Probability densities . . . . . . . . . . . . . . . . . . . . . . . . . 121
7.6 Eigenfunctions of the position operator . . . . . . . . . . . . . . 123
7.7 The position representation and the momentum representation 124
7.8 The commutator of Q and P . . . . . . . . . . . . . . . . . . . . 127
7.9 Position and momentum operators in 3D space . . . . . . . . . . 127
7.10 Continua of energy levels . . . . . . . . . . . . . . . . . . . . . . 129
7.11 Probabilities and wave functions . . . . . . . . . . . . . . . . . . 132
7.12 The parity operator . . . . . . . . . . . . . . . . . . . . . . . . . 133

8 Quantum harmonic oscillators 136


8.1 The Hamiltonian and the energy levels of a linear harmonic os-
cillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
8.2 The ladder operators . . . . . . . . . . . . . . . . . . . . . . . . 137
8.3 The coherent states of a simple harmonic oscillator . . . . . . . 140

9 Tensor products of Hilbert spaces 143


9.1 Bipartite and multipartite quantum systems . . . . . . . . . . . 143
9.2 Tensor products . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

10 Unitary transformations 157


10.1 Unitary operators . . . . . . . . . . . . . . . . . . . . . . . . . . 157
10.2 Transformed operators . . . . . . . . . . . . . . . . . . . . . . . 158

3
11 Time evolution 163
11.1 The Schrödinger equation . . . . . . . . . . . . . . . . . . . . . 163
11.2 The evolution operator . . . . . . . . . . . . . . . . . . . . . . . 166
11.3 The Schrödinger picture and the Heisenberg picture . . . . . . . 169
11.4 Constants of motion . . . . . . . . . . . . . . . . . . . . . . . . . 172

12 Rotations and angular momentum 174


12.1 Orbital angular momentum . . . . . . . . . . . . . . . . . . . . . 174
12.2 Spin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
12.3 Rotations and rotation operators . . . . . . . . . . . . . . . . . . 178
12.4 Symmetries and conservation laws . . . . . . . . . . . . . . . . 186
12.5 Angular momentum operators . . . . . . . . . . . . . . . . . . . 188
12.6 Matrix representation of angular momentum operators . . . . . 191
12.7 The Clebsch-Gordan coefficients . . . . . . . . . . . . . . . . . . 193

Supplement to Chapter 12 197


S.1 Translations and momentum . . . . . . . . . . . . . . . . . . . . . 197
S.2 Rotations and orbital angular momentum . . . . . . . . . . . . . . 199
S.3 Commutation relations and angular momentum . . . . . . . . . . 201

Appendix A: Multiplying matrices, column vectors and row vectors 205

Appendix B: Complex numbers 206

4
1 Introduction

1.1 What this course is about


In brief, this course is a survey of the mathematical apparatus of Quantum Me-
chanics. More specifically, a survey of its general mathematical underpinning
(the mathematical theory of Hilbert spaces) and of its formalism (bra and ket
vectors, operators, matrix representation, wave mechanics and the position and
momentum representations, unitary transformations, the Schrödinger and the
Heisenberg pictures, creation and annihilation operators, rotations and angu-
lar momentum). The course also examines how this mathematical apparatus
relates to what can be observed (i.e., how quantum states and physical observ-
ables are represented in the theory, and how predictions on the outcomes of
experiments can be extracted from the formalism).

1.2 Revision: The hydrogen atom and the Stern-Gerlach


experiment
As an introduction to the theoretical concepts addressed in this course, we start
by revisiting two examples of quantum systems you have studied in Term 1.
Example 1: The hydrogen atom
Hydrogen molecules may be dissociated into individual hydrogen atoms by an
electric discharge or some other interaction. The resulting atoms may be left
in an excited state by this process, in which case they will decay to a state of
lower energy sooner or later, usually by emitting a photon. Analysing the light
emitted by an ensemble of such atoms reveals that the energy of these photons
is narrowly distributed about discrete values. (An example of the resulting line
spectrum was shown in the first lecture of the course and can be found in the
Lecture 1 folder.)
This experimental fact can be largely explained by calculations based on the
following Hamiltonian:

~2 2 e2 1
H=− ∇ − , (1.1)
2µ 4π0 r

5
where µ is the reduced mass of the electron - nucleus system, e is the charge of
the electron in absolute value (e > 0), r is its distance to the nucleus, 0 is the
vacuum permittivity, ~ is the reduced Planck constant, and ∇2 is the square of
the gradient operator with respect to the coordinates of the electron. The latter
can be taken to be x, y and z, in which case
∂2 ∂2 ∂2
∇2 ≡ + + . (1.2)
∂x2 ∂y 2 ∂z 2
Alternatively, one can refer the position of the electron by the spherical polar
coordinates r, θ and φ, in which case ∇2 is a more complicated combination of
first and second order partial derivatives.
As defined, H is an operator which acts on functions and transforms a func-
tion of x, y and z (or a function of r, θ, φ) into another function of the same
variables. H is actually a combination of several operators: ∇2 is an operator
which, when acting on a function φ(x, y, z), transforms this function into the
sum of its second order derivatives with respect to x, y and z. The 1/r term in
the Hamiltonian is also an operator, which simply transforms φ(x, y, z) into the
product of φ(x, y, z) with the potential energy. In particular, H acts on wave
functions ψ(r, θ, φ) representing bound states of the electron – nucleus system,
i.e., states in which there is a vanishingly small probability that the electron is
arbitrarily far from the nucleus. These wave functions can be normalized in
such a way that
Z ∞ Z π Z 2π
dr r 2
dθ sin θ dφ |ψ(r, θ, φ)|2 = 1. (1.3)
0 0 0

As you have seen in the Term 1 Quantum Mechanics course, this Hamilto-
nian has discrete eigenvalues En (n = 1, 2, . . .) corresponding to bound energy
eigenstates (the eigenvalues of the Hamiltonian are often called eigenenergies).
I.e., in the notation used in that course, there exist bound state wave functions
ψnlm (r, θ, φ) such that the function obtained by letting H acts on ψnlm (r, θ, φ)
is simply ψnlm (r, θ, φ) multiplied by a constant En :

Hψnlm (r, θ, φ) = En ψnlm (r, θ, φ). (1.4)

As their name of eigenenergies indicates, the eigenvalues En have the physical


dimensions of an energy. Calculations show that En ∝ −1/n2 , n = 1, 2, . . . for

6
the above Hamiltonian. These eigenenergies thus form a discrete distribution
of infinitely many energy levels (discrete meaning that these energy levels are
separated by a gap from each other).1
The eigenvalues of the Hamiltonian of Eq. (1.1) (the eigenenergies En ) are re-
lated in a very simple way to the energy of the photons emitted by excited hy-
drogen atoms: each photon is emitted in a transition from an energy eigenstates
to another one, and the photon energy is almost equal to the difference between
the respective eigenenergies. The photon energy is not exactly equal to this dif-
ference for a variety of reasons — e.g., because describing the atom by way of
this Hamiltonian amounts to neglecting spin-orbit coupling and other relativis-
tic effects. However, what is important here is that there is a very close relation
between the eigenvalues of the Hamiltonian and the results which would be
found in an actual measurement of the energy of the photons.
Example 2: The Stern-Gerlach experiment
We refer here to an experiment of historical importance done by the two physi-
cists whose names have remained associated with this type of measurements
ever since, Otto Stern and Walther Gerlach, in the early 1920s. The aim of Stern
and Gerlach was to test the predictions of the Bohr model of atomic structure in
regards to the magnetic moment of atoms (modern Quantum Mechanics had not
yet been developed). The principle of the experiment was simple: since particles
with a magnetic moment µ in a magnetic field B experience a force F equal to
the gradient of µ·B, the magnetic moment of an atom can be inferred from how
its trajectory is deflected when it passes through an inhomogeneous magnetic
field. Stern and Gerlach directed a beam of silver atoms through a specially de-
signed magnet producing a magnetic field B = Bz ẑ such that ∇Bz 6= 0. (ẑ is a
unit vector in the z-direction.) Each of these atoms was therefore submitted to a
force equal to µz ∇Bz , with µz the z-component of its magnetic moment. They
observed that this beam was split into two when passing through this magnet,
from which they concluded that only two values of µz were possible for these
atoms. (The splitting is very visible in the image of the spatial distribution of
1
Besides being eigenfunctions of H, the energy eigenfunctions ψnlm (r, θ, φ), as defined,
are also eigenfunctions of the angular momentum operators L2 and Lz . The quantum num-
bers l and m identify the corresponding eigenvalues. In the case of the Hamiltonian defined
by Eq. (1.1), energy eigenfunctions with different values of l or m but the same value of the
principal quantum number n correspond to the same eigenenergy En .

7
the atoms shown in the first lecture of the course. A copy of this image can be
found in the Lecture 1 folder.)
At the time of this experiment Stern and Gerlach would not have been able to
formulate these results in terms of modern concepts of Quantum Mechanics.
However, we now know (1) that the magnetic moment of a silver atom orig-
inates almost entirely from the magnetic moment of the electrons it contains
(the magnetic moment of the nucleus is comparatively much smaller and can
be ignored in good approximation); (2) that for silver atoms in their ground state
the contribution of the electrons to the total magnetic moment can be written
in terms of a spin operator; and (3) that the two values of µz found in the ex-
periment correspond to two different eigenvalues of this spin operator.
More specifically, one can say that observing whether an atom is deflected
in one direction or the other amounts to a measurement of its spin in the z-
direction. We can represent the spin state of the atoms deflected in one direc-
tion by a column vector χ+ and the spin state of the atoms deflected in the other
direction by a column vector χ− , and set
   
1 0
χ+ = and χ− = . (1.5)
0 1

As can be easily checked, these two column vectors are eigenvectors of the
matrix  
~/2 0
Sz = . (1.6)
0 −~/2
In this formulation of the problem, this matrix is the spin operator mentioned
in the previous paragraph. (A 2 × 2 matrix is an operator which transforms 2-
component column vectors into 2-component column vectors since multiplying
a 2-component column vector by a 2 × 2 matrix gives a 2-component column
vector as a result. See Appendix A of these course notes for a reminder of the
rules of matrix multiplication.) The matrix Sz has two eigenvalues, ~/2 and
−~/2. Accordingly, in the experiment of Stern and Gerlach, the only possible
values of µz were γ~/2 and −γ~/2, where γ is a certain constant whose value
is not important for this discussion.
Before it enters the magnet, an individual atom could be in the spin state repre-
sented by the column vector χ+ or in the spin state represented by the column
vector χ− . More generally, it could also be in a superposition state represented

8
by a column vector of the form c+ χ+ + c− χ− , where c+ and c− are two complex
numbers such that
|c+ |2 + |c− |2 = 1. (1.7)
In the latter case, there would be a probability |c+ |2 that it would be found to
have µz = γ~/2 and a probability |c− |2 that it would be found to have µz =
−γ~/2: only ±γ~/2 could be found for µz , even if the atom is initially in a
superposition state. Supposing that neither |c+ |2 nor |c− |2 is zero, then one
cannot predict the value of µz which would be found for that individual atom.
Only the probability of each of the two possible values of µz can be predicted.
Important observations
The quantum systems considered in these two examples are obviously quite
different, both physically and mathematically — e.g., the mathematical objects
representing the state of the system are functions of position co-ordinates in the
first example and column vectors of numbers in the second example. Nonethe-
less, they are similar in regards to key aspects of their theoretical description:

• In both cases, the physical quantities measured in an experiment corre-


spond to mathematical operators [the Hamiltonian of Eq. (1.1) in the first
example, the spin matrix of Eq. (1.6) in the second example].

• The values these physical quantities could be found to have are given by
the eigenvalues of the corresponding operator.

• Each of these values has a certain probability to be found, and these prob-
abilities can be calculated from the wave function of column vector rep-
resenting the state of the system.

• Whether a wave function or a column vector, the mathematical object


representing the state of the system encapsulates all what can be known
about the results of future measurements.

This relationship between measurements, operators and mathematical objects


used to represent quantum states is actually very general and applies to any
quantum system.

+ Quantum systems can often be mathematically described in sveral differ-


ent but equivalent ways. For instance, when discussing the Stern-Gerlach

9
experiment, we have set the column vectors χ+ and χ− and the matrix Sz
to be as given by Eqs. (1.5) and (1.6). However, we could equally well have
decided to represent the two spin states by the column vectors
 √   √ 
0 −1/√ 2 0 1/√2
χ+ = and χ− = (1.8)
1/ 2 1/ 2

and the relevant spin operator by the matrix


 
0 0 −~/2
Sz = . (1.9)
−~/2 0

As can be checked without difficulty, χ0+ and χ0− are eigenvectors of Sz0
corresponding, respectively, to the eigenvalues ~/2 and −~/2, exactly as
χ+ and χ− are eigenvectors of Sz corresponding to the same eigenvalues.
Formulating the problem as per Eqs. (1.8) and (1.9) instead of (1.5) and
(1.6) changes the details of the mathematics involved in the calculations.
However, these two formulations are equivalent from a Physics point of
view.
As another example, consider the ground state wave function of a linear
harmonic oscillator, ψ0 (x). You have seen in the Term 1 Quantum Me-
chanics course that
 mω 1/4
ψ0 (x) = exp(−mωx2 /2~), (1.10)
π~
where m is the mass of the oscillator and ω its angular frequency. Clearly,
this function is continuous and is such that the integral
Z ∞
|ψ0 (x)| dx
−∞

exists. You may remember from one of your maths courses that these two
mathematical facts guarantee that ψ0 (x) can be written in the form of a
Fourier integral. I.e., there exists a function φ0 (p) such that
Z ∞
1
ψ0 (x) = φ0 (p) exp(ipx/~) dp. (1.11)
(2π~)1/2 −∞
In fact, the function φ0 (p) can be obtained by taking the inverse transfor-
mation:
Z ∞
1
φ0 (p) = ψ0 (x) exp(−ipx/~) dx. (1.12)
(2π~)1/2 −∞

10
Thus knowing φ0 (p) is knowing ψ0 (x) and knowing ψ0 (x) is knowing
φ0 (p). In other words, the ground state is represented by the function
φ0 (p) as well as by the function ψ0 (x): it is possible to use φ0 (p) rather
than ψ0 (x) if this would be convenient in some calculations, and the two
formulations are completely equivalent from a Physics point of view. This
topic will be discussed further in the course. [We just note at this stage
that ψ0 (x) is the “ground state wave function in position space" whereas
φ0 (p) is the “ground state wave function in momentum space".]

1.3 Quantum states and vector spaces


Of central importance in Quantum Mechanics is the mathematical theory of
vector spaces, particularly of a kind of vector spaces called Hilbert spaces. The
reason is that states of quantum systems correspond to vectors belonging to
certain Hilbert spaces (e.g., wave functions, column vectors of numbers, or ab-
stract vectors called ket vectors), and that physical quantities like the energy or
the angular momentum correspond to operators acting on these vectors.

1.4 Mathematical note


As seen in the example above, quantum mechanical probabilities are often given
in terms of the square modulus of a complex number. In fact, Quantum Mechan-
ics is based on complex numbers. It is thus essential that you can use complex
numbers confidently. All the important concepts are summarised in Appendix B
of these notes, and it is recommended that you read through this appendix be-
fore proceeding further. Make sure that you are also familiar with the rules of
matrix multiplications stated in Appendix A.

11
2 Vector spaces and Hilbert spaces

2.1 What is a vector space?


In a nutshell, the vector spaces used in Physics are sets of mathematical objects
which can be combined to each other and multiplied by numbers, pretty much
in the same way as vectors representing velocities, forces or accelerations can
be added to each other and multiplied by numbers. In this context, the math-
ematical objects belonging to such sets are called vectors, whether or not they
can be represented by arrows in ordinary 3D space, and the numbers by which
they are multiplied are called scalars. One talks about a real vector space when
these numbers are real (rational or irrational numbers without an imaginary
part), and a complex vector space when they are complex numbers. Quantum
Mechanics is based on complex vector spaces.

More precisely, a real vector space is a set V in which are defined a vector
addition and a multiplication by a scalar subject to the following axioms.

1. The vector addition associates one and only one element of V with
each pair v 1 , v 2 of elements of V . This element is called the sum of
v 1 and v 2 and is denoted by v 1 + v 2 . (The elements of V are called
vectors. One says that V is closed under vector addition, meaning that
the sum of two elements of V is always an element of V .)

2. The vector addition is associative. I.e., for any three elements v 1 , v 2 ,


v 3 of V , adding the sum of v 1 and v 2 to v 3 gives the same result as
additing v 1 to the sum of v 2 and v 3 :

(v 1 + v 2 ) + v 3 = v 1 + (v 2 + v 3 ). (2.1)

3. One of the elements of V , 0, and only this element, is such that

v+0=v (2.2)

for any element v of V . The vector 0 which has this property is called
the zero vector (or the null vector).

12
4. Every element v of V has one and only one inverse element in V ,
namely a vector −v such that

v + (−v) = 0. (2.3)

(If there is a risk of confusion with other meanings of the word inverse,
one can say that the vector −v is the additive inverse of v.)

5. The vector addition is commutative. I.e., for any two elements v 1 and
v 2 of V ,
v1 + v2 = v2 + v1. (2.4)

6. The multiplication by a scalar associates one and only one element of


V with each real number α and each element v of V . This element is
called the product of v by α and is denoted by α v. (Real numbers are
called scalars in this context. One says that V is closed under multipli-
cation by a scalar, meaning that the product of an element of V by a
scalar is always an element of V . We stress that the operation we are
talking about here gives a vector as a result; it should not be confused
with the scalar product or dot product of two vectors, which gives a
scalar as result.)

7. This operation is distributive. I.e., for any real numbers α and β and
any elements v, v 1 and v 2 of V ,

(α + β) v = α v + β v (2.5)

and
α (v 1 + v 2 ) = α v 1 + α v 2 . (2.6)

8. Multiplication by a scalar is also associative. I.e., for any real numbers


α and β and any element v of V ,

(αβ) v = α (β v). (2.7)

9. For every element v of V ,

1 v = v. (2.8)

13
The definition of a complex vector space is identical, except that the scalars
are taken to be complex numbers, not real numbers.

+ Vector spaces are also called linear spaces.

+ These axioms ensure that whatever the elements of a vector space are,
calculations involving these elements follow all the rules you routinely
use when adding vectors representing positions, forces or velocities and
multiplying those by numbers. For example, these axioms imply that
for any vector v, (1) 0 v = 0 and (2) (−1) v = −v.
Proof: (1) Let w = 0 v. Setting α = β = 0 in Eq. (2.5) gives 0 v =
0 v + 0 v, i.e., w = w + w. Adding the vector −w to each side of this
equation gives w + (−w) = (w + w) + (−w). By virtue of Eqs. (2.1),
(2.2) and (2.3), this last equation simplifies to 0 = w, which shows that
indeed 0 v = 0 for any vector v. (2) Setting α = 1 and β = −1 in
Eq. (2.5) gives 0 v = 1 v + (−1) v, i.e., 0 = v + (−1) v. Hence (−1) v
the inverse of an additive inverse As, by axiom 4, −v is the only vector
which fulfills this equation, (−1) v is necessarily −v. 

+ A mathematical concept related to vector spaces, and also very impor-


tant in Theoretical Physics, is that of groups. In fact, the first five ax-
ioms in the list above mean that the elements of a vector space form
an Abelian group under vector addition (i.e., a group for which the op-
eration associating the elements of the set is commutative). Further
information about this topic and examples of applications in Physics
can be found, e.g., in the Mathematics textbook by Riley, Hobson and
Bence (group theory as such is outside the scope of this course).

+ One can also define more general vector spaces in which the scalars
multiplying the vectors are not specifically real or complex numbers.
Instead, these scalars are taken to be the elements of what mathemati-
cians call a field (not to be confused with a vector field), namely a set of
numbers or other mathematical objects endowed with two operations
following the same rules as the ordinary addition and multiplication
between real numbers.

14
Notation
We will normally represent 3D geometric vectors (e.g., position vectors, velocity
vectors, angular momentum vectors, etc.) by boldface upright letters (e.g., v),
and elements of other vector spaces (e.g., vectors representing quantum states)
by normal fonts or some other symbols (e.g., χ+ or |ψi). Also, we will generally
use the symbol 0 to represent the zero vector, unless it would be desirable to
emphasize the different between this vector and the number 0.
Examples of vector spaces

• 3D geometric vectors
By 3D geometric vectors we mean the “arrow vectors" you have often
used to describe physical quantities which have both a magnitude and a
direction. Suppose that v1 and v2 are two such vectors (e.g., two different
forces acting on a same particle). You are familiar with the fact that v1
can be summed to v2 and that the result is also a geometric vector (the
vector given by the parallelogram rule). One can also multiply a geomet-
ric vector by a real number: by definition of this operation, the product
of a vector v by a real number α is a vector of the same direction as v (if
α > 0) or the opposite direction (if α < 0) and whose length is α times
the length of v. Geometric vectors form a real vector space under these
two operations.

+ As an exercise, check that these two operations have all the properties
required by the definition of a vector space. The zero vector here is
the vector whose length is zero (and whose direction is therefore un-
defined). Moreover, any geometric vector v has an additive inverse,
−v, which can be obtained by multiplying v by −1. (If v is not the
zero vector, then −v is a vector of same length as v but of opposite
direction.)

• Two-component column vectors of complex numbers


These vectors are single-column arrays of two complex numbers, for ex-
ample the vectors χ+ and χ− mentioned in Section 1.1 in regards to the
Stern-Gerlach experiment. As we will see later in the course, two-component
complex column vectors can be used to represent quantum states of a

15
spin-1/2 particle. Such column vectors can be added together to give a
column vector of the same form: by definition of this operation, if a, a0 , b
and b0 are complex numbers,
   0 
a + a0

a a
+ 0 = . (2.9)
b b b + b0
One can also multiply a column vector of complex numbers by a com-
plex number, the result being a column vector of complex numbers. For
example,    
a 2a + 3ia
(2 + 3i) ≡ . (2.10)
b 2b + 3ib

+ It is easy to see that these two operations have all the properties re-
quired by the definition of a vector space. The zero vector, here, is a
column vector of zeros, since adding a column vector of zeros to an-
other column vector does not change the latter:
     
a 0 a
+ = . (2.11)
b 0 b

Moreover, any column vector has an additive inverse vector, which is


obtained by multiplying each of its components by −1:
       
a −a a−a 0
+ = = . (2.12)
b −b b−b 0

• N -component column vectors of real or complex numbers


All what is said above in regards to 2-component column vectors of com-
plex numbers also applies to column vectors of more than two compo-
nents and to column vectors of real rather than complex numbers: For
any N ≥ 1, the set of N -component column vectors of real numbers is a
real vector space and the set of N -component column vectors of complex
numbers is a complex vector space.

• Functions of a real variable


Here we consider the set of the functions f (x) defined for all real x. This
set is a vector space, defining vector addition as the ordinary sum of two

16
such functions and multiplication by a scalar as the ordinary product by
a number. To the risk of being pedantic:

– The sum of a function f1 (x) and a function f2 (x) is the function


(f1 + f2 )(x) such that

(f1 + f2 )(x) ≡ f1 (x) + f2 (x). (2.13)

That is to say, (f1 + f2 )(x) is the function whose value is given by


the sum of the values of f1 (x) and f2 (x) at any x.
– The product of a function f (x) by a number α is the function (αf )(x)
such that
(αf )(x) ≡ αf (x). (2.14)

This vector space is real or complex according to whether the scalars α


are real numbers or complex numbers.

+ Again, check, as an exercise, that this set and these two operations
fulfill the definition of a vector space. The function 0(x) whose value
is zero for all values of x plays the role of the zero vector for this vector
space, since
f (x) + 0(x) ≡ f (x) (2.15)
for any function f (x) if 0(x) ≡ 0. Moreover, it is also the case that
to any function f (x) corresponds a function −f (x) ≡ (−1)f (x) such
that
f (x) + [−f (x)] ≡ 0(x). (2.16)
[Don’t be confused by the terminology: If f (x) is regarded as an ele-
ment of this vector space, then −f (x) is its inverse element (its additive
inverse), not its inverse function. The latter is the function f −1 (y) such
that if f (x) = y then f −1 (y) = x.]

17
• Square-integrable functions of a real variable
By a square-integrable function of a real variable we mean a function f (x)
(possibly taking complex values) such that the integral
Z b
|f (x)|2 dx
a

exists and is finite (for functions square-integrable on a finite interval


[a, b]) or such that the integral
Z ∞
|f (x)|2 dx
−∞

exists and is finite (for functions square-integrable on the infinite inter-


val (−∞, ∞)). Such functions also form a vector space, defining vector
addition and multiplication by a scalar in the same way as in the previous
example. They play an important role in Quantum Mechanics.

+ The integral introduced in elementary calculus course is the Riemann in-


tegral, which is properly defined only for finite integration intervals. Inte-
grals on infinite intervals are instances of “improper integrals". The stan-
dard definition of the Riemann integral can be extended to such cases by
a limiting process: integrals on infinite intervals are defined as limits of
integrals on a finite interval. E.g.,
Z ∞ Z b
|f (x)|2 dx ≡ lim |f (x)|2 dx. (2.17)
−∞ a→−∞ a
b→∞

For this integral to be finite it is necessary that |f (x)|2 goes to 0 faster


than 1/|x| when |x| → ∞.

For example,
(
b (b1−α − 1)/(1 − α) α 6= 1,
Z
1
= (2.18)
1 xα log b α = 1.

Thefore this integral diverges (goes to infinity) when b → ∞ unless α > 1.

18
+ The question of whether the set of all square-integrable functions forms a
vector space is rather subtle and cannot be fully addressed without using
mathematical concepts outside the scope of this course. Recall that the
axioms of a vector space require that the sum of any two elements of a
vector space V is also an element of V , and likewise for the product of
any element of V by any number. In other words, they require that V is
closed under vector addition and multiplication by a scalar. It is clear that
multiplying a square-integrable function f (x) by a real or complex finite
number α always results in a square-integrable function, in agreement
with the axioms of a vector space: since
Z ∞ Z ∞
|αf (x)|2 dx = |α|2 |f (x)|2 dx, (2.19)
−∞ −∞

the integral of |αf (x)|2 exists and is finite if the integral of |f (x)|2 exists
and is finite.
For the set of all square-integrable functions to be a vector space, it is also
necessary that the sum of any two square-integrable functions is a square-
integrable function. This is not difficult to prove for the case where we
would only consider functions that are continuous everywhere (which is
not restrictive for us, as wave functions used in Quantum Mechanics are
normally continuous).
Proof: The sum of two continuous functions f (x) and g(x) is a continuous
function and so are the functions |f (x)|2 , |g(x)|2 and |f (x) + g(x)|2 . The
case where these functions are defined and continuous on a closed interval
[a, b] is simple, as continuity on a closed interval implies integrability on
that interval; hence, f (x) + g(x) is square-integrable on [a, b]. The case
of of functions defined and continuous on the infinite interval (−∞, ∞)
is not as simple, however, since not all functions that are continuous on
(−∞, ∞) are also integrable on (−∞, ∞). Let us assume that f (x) and
g(x) are both square-integrable and continuous. Hence, the integrals
Z ∞ Z ∞
|f (x)|2 dx and |g(x)|2 dx
−∞ −∞

exist and are finite. We note that at any x,

|f (x) + g(x)|2 = [f (x) + g(x)][f ∗ (x) + g ∗ (x)]


= |f (x)|2 + |g(x)|2
+ f ∗ (x)g(x) + f (x)g ∗ (x). (2.20)

19
Similarly,

|f (x) − g(x)|2 = |f (x)|2 + |g(x)|2 − [f ∗ (x)g(x) + f (x)g ∗ (x)]. (2.21)

From this last equation, and from the fact that |f (x) − g(x)|2 ≥ 0, we
deduce that

f ∗ (x)g(x) + f (x)g ∗ (x) ≤ |f (x)|2 + |g(x)|2 . (2.22)

Coming back to Eq. (2.20), we thus have that

|f (x) + g(x)|2 ≤ 2|f (x)|2 + 2|g(x)|2 . (2.23)

Hence
Z ∞ Z ∞ Z ∞
2
|f (x) + g(x)| dx ≤ 2 2
|f (x)| dx + 2 |g(x)|2 dx. (2.24)
−∞ −∞ −∞

The integral of |f (x) + g(x)|2 is therefore finite (it is bounded above by


the sum of two finite numbers). 
The matter turns out to be more complicated for more general functions,
however. In fact, the mathematical theory of vector spaces of square-
integrable functions is based not on the Riemann integral (the familiar
integral you have learned at school) but on a generalization of this opera-
tion called the Lebesgue integral. However, the functions we are interested
are normally square-integrable in terms of the familiar Riemann integral,
and any function which is square-integrable in terms of the Riemann in-
tegral is also square-integrable in terms of the Lebesgue integral. While
important for a rigorous mathematical study of functional spaces, the dif-
ference between these two types of integrals can thus be ignored at the
level of this course. Square-integrable functions in the Lebesgue sense are
often referred to as square-summable functions or as L2 functions.

2.2 Subspaces
A subspace of a vector space V is a subset of V which itself forms a vector space
under the same operations of vector addition and scalar multiplication as in V .
Examples

20
• We have seen that the set of all N -component column vectors of real
numbers is a real vector space. For N = 3, these column vectors take the
following form,  
a
b ,
c
where a, b and c are three real numbers. Amongst these vectors are those
whose third component is zero, i.e., column vectors of the form
 
a
b .
0

These particular column vectors form a subspace of the larger vector


space formed by all 3-component real column vectors.

+ It is clear that summing any two column vectors whose third compo-
nent is zero gives a column vector whose third component is zero, and
likewise, that multiplying any of them by a number also gives a col-
umn vector whose third component is zero. This set is therefore closed
under vector addition and under multiplication by a scalar, as required
by the axioms of a vector space.
By contrast, the set of all 3-component column vectors whose third
component is 1 is not closed under these two operations, and is there-
fore not a vector space.

• As seen at the end of the previous section, the set of all functions of a real
variable is a vector space and the set of all square-integrable functions
of a real variable is also a vector space. The set of all square-integrable
functions is a subset of the set of all functions since all square-integrable
functions are functions but not all functions are square-integrable. Corre-
spondingly, the set formed by all square-integrable functions is a subspace
of the vector space formed by all functions.
Many other subspaces of this vector space can be considered, e.g., the sub-
space formed by all continuous functions, the subspace of that subspace
formed by all functions which are both continuous and have a continuous

21
first order derivative, the subspace formed by the periodic functions with
period 2π, etc.

+ Suppose that V1 and V2 are two subspaces of a certain vector space V ,


and that any vector v of V can be written in one and only one way as
the sum of a vector v1 of V1 and a vector v2 of V2 . The vector space V
is then said to be the direct sum of V1 and V2 . Direct sums of vectors
spaces are denoted by the symbol ⊕: V = V1 ⊕ V2 .
For example, the space formed by the 3-component column vectors
 
a
b
c

is the direct sum of the space formed by the 3-component column vec-
tors  
a
b
0
and the space formed by the 3-component column vectors
 
0
0 .
c

2.3 Linear combinations


A linear combination of a finite number of vectors is a mathematical expression
linear in the vectors involved. For example,

a+b

is a linear combination of the two geometric vectors a and b;


   
1 1
(2 + 3i) + (2 − 3i)
1 −1

22
is a linear combination of two-component complex column vectors;
2 sin x + cos 3x
is a linear combination of the functions sin x and cos 3x; etc.
The general form of a linear combination of N vectors is
c1 v1 + c2 v2 + c3 v3 + · · · + cN vN
where v1 , v2 , v3 , . . . , vN are vectors and c1 , c2 , c3 , . . . , cN are scalars.
+ Extending this definition to a linear combination of an infinite number
of vectors is fraught with mathematical difficulties. However, the ex-
tension is not impossible. For instance, a convergent Fourier series is
in effect an infinite linear combination of vectors, each vector being a
complex exponential (or a sine or cosine function).

2.4 The span of a set of vectors


The span of a set of N vectors is the set of all linear combinations of these
vectors.

The span of N vectors belonging to a vector space V is either V itself or a


subspace of V not equal to V .

Proof: By definition of this set, the span of N vectors is closed under


vector addition and multiplication by a scalar. It is either V itself or a
subspace of V not equal to V because any linear combination of ele-
ments of V must be an element of V since V is also closed under these
two operations. 

Examples

• The span of the two column vectors


   
1 0
0 and 1
0 0

23
is the vector space formed by all 3-component column vectors of the form
 
a
b .
0

+ The two numbers a and b are real if we exclude linear combinations in-
volving complex numbers. In this case, we can take these two numbers
to be the x- and y-coordinates of a point in 3D space, the z-coordinate
of this point being 0. If we do so, the column vectors
   
1 0
0 and 1
0 0

then represent unit vectors starting at the origin and oriented, respec-
tively, in the x- and y-directions, and saying that these two column
vectors span the space formed by all column vectors of the form
 
x
y 
0

is the same as saying that these two unit vectors span the whole xy-
plane.

• As you have seen in a previous course, the spherical harmonics Ylm (θ, φ)
are certain functions of the polar angles θ and φ labelled by the quantum
numbers l and m. You may remember that the quantum number m can
only take the values −1, 0 and 1 for l = 1. These three functions span
the set of all the functions of the form

f (θ, φ) = c−1 Y1−1 (θ, φ) + c0 Y10 (θ, φ) + c1 Y11 (θ, φ), (2.25)

where c−1 , c0 and c1 are three complex numbers.

24
2.5 Linear independence
A set formed by N vectors v1 , v2 , . . . , vN , and these N vectors themselves, are
said to be linearly independent if none of these vectors can be written as a linear
combination of the other vectors of the set. In the opposite case, one says that
the set is linearly dependent (or that these N vectors are linearly dependent).
Examples

• The three column vectors


     
1 0 0
0 , 1 , 0
0 0 1

are linearly independent.

• By contrast, the three column vectors


     
1 2 5
2 , 1 , 4
1 1 3

are linearly dependent since


     
5 1 2
4 = 2 + 2 1 . (2.26)
3 1 1

+ A necessary and sufficient condition for N non-zero vectors v1 , v2 , . . . ,


vN to be linearly independent is that the equation

c1 v1 + c2 v2 + · · · + cN vN = 0 (2.27)

implies that the scalars c1 , c2 , . . . , cN are all zero.


Proof: Suppose that the equation was possible with one of the coeffi-
cients cn non-zero, although these N vectors were both non-zero and
linearly independent. Let us suppose that this non-zero coefficient was
c1 (if not c1 , we can always relabel the coefficients cn and the vectors

25
vn so that n = 1 for this non-zero coefficient). We could then rewrite
this equation as

v1 = −(c2 v2 + · · · + cN vN )/c1 , (2.28)

which shows that v1 must then be the zero vector or a non-zero lin-
ear combination of the vectors v2 , . . . , vN . These two possibilities are
in contradiction with the hypothesis that the N vectors v1 , v2 , . . . , vN
are both non-zero and linearly independent. The condition is thus nec-
essary. The condition is also sufficient, as if, e.g., the vector v1 was a
linear combination of the vectors v2 , . . . , vN we could write

v1 = a2 v2 + · · · + aN vN (2.29)

with at least one of the coefficients a2 , . . . , aN being non-zero. However,


since this equation is the same as Eq. (2.27) with c1 = 1 and cn = −an
(n 6= 1), this would mean that the coefficients c1 , . . . , cN are not all zero.


+ The definition of linear independence given above applies to the case of


a finite set of vectors. One says that an infinite set of vectors is linearly
independent when every finite subset of this set is linearly independent.

The Exchange Theorem


A vector space spanned by N vectors does not contain linearly independent
subsets of more than N vectors.
(Although this theorem is fairly intuitive, it cannot be proven in a few lines.
The name of the theorem refers to an important step in its standard proof.)

+ The Exchange Theorem implies, for example, that the three column
vectors    0   00 
a a a
 b  ,  b0  ,  b00 
0 0 0
are always linearly dependent, irrespective of the values of a, a0 , a00 , b,
b0 and b”, since these three vectors belong to a vector space spanned by
less than three vectors (see Section 2.4).

26
2.6 Dimension of a vector space
A vector space may be finite-dimensional of infinite-dimensional. A vector
space is finite-dimensional and has a dimension N if it contains a linearly inde-
pendent set of N vectors but no linearly independent set of more than N vec-
tors. It is infinite-dimensional if it contains an arbitrarily large set of linearly
independent vectors. (Note that the dimension of a vector space is unrelated to
the number of elements this vector space contains. Finite-dimensional or not,
vector spaces always contain an infinite number of vectors.2 )
A vector space spanned by N linearly independent vectors is finite-dimensional
and its dimension is N .

Proof: The dimension of this vector space cannot be less than N since by
construction it contains a linearly independent set of N vectors. Moreover, it
cannot be larger than N since any set of more than N vectors belonging to
this vector space is necessarily linearly dependent by virtue of the Exchange
Theorem. 
A corollary of this theorem is that a vector space spanned by N linearly in-
dependent vectors cannot also be spanned by a set of fewer than N linearly
independent vectors.

Examples

• The vector space formed by all 3D geometric vectors is finite-dimensional


and its dimension is 3 (unsurprisingly), since it is spanned by, e.g., the
three unit vectors in the x-, y- and z-direction.

• The vector space spanned by the two column vectors


   
1 0
0 and 1
0 0
is finite-dimensional and its dimension is 2.
2
This sweeping statement has an exception: the single-element set formed by the zero vector
alone is a vector space in itself. This space contains only one vector. . .

27
• As mentioned above, the set of all 2π-periodic functions is a vector space.
This set includes, in particular, the complex exponentials exp(inx) (n =
0, ±1, ±2, . . .). [These functions are 2π-periodic since, for any real x and
any integer n, exp[in(x + 2π)] = exp(inx) exp(2nπi) = exp(inx).] As
we will see later, the set formed by the complex exponentials exp(inx)
(n = 0, ±1, ±2, . . . , ±nmax ) is linearly independent. This set is arbitrarily
large since nmax can be as large as one wants. Therefore the vector space
of all 2π-periodic functions is infinite dimensional.

• The vector space of all square-integrable functions of a real variable is also


infinite-dimensional. (You have seen in the Term 1 course that the Hamil-
tonian of a linear harmonic oscillator has infinitely many orthonormal
eigenfunctions. You know that these eigenfunctions are square-integrable
functions, as otherwise they could not be normalized in the usual way. We
will see later that these eigenfunctions are linearly independent. Hence
this vector space contains an arbitarily large set of linearly independent
vectors.)

2.7 Bases
A basis of a finite-dimensional vector space is a set of linearly independent
vectors such that any vector belonging to this vector space can be written as a
linear combination of these basis vectors. (The concept of basis for an infinite-
dimensional vector space is more complicated; it is addressed in Section 2.11.)

Although certain bases are more convenient than others, the choice of basis
vectors is arbitrary as long as these vectors are linearly independent. In fact,
any set of N linearly independent vectors belonging to a vector space of
dimension N is a basis for this vector space. Infinitely many such bases can
therefore be constructed.

Proof: By definition of the dimension of a vector space, a vector space


of dimension N cannot contain a linearly independent set of more than
N vectors. Suppose that a vector space V of dimension N contains a
linearly independent set of N vectors but that this set would not be a
basis for V . I.e., suppose that at least one of the elements of V could
not be written as a linear superposition of these N basis vectors. Then

28
that element could be joined to these N basis vectors to form a linearly
independent set of N + 1 vectors, which is in contradiction with the
hypothesis that the dimension of V is N . 

Given a basis of a vector space, there is one and only way of writing each
element of that space as a linear combination of these basis vectors.

Proof: Suppose that there would be more than one way of writing a
vector a as a linear combination of basis vectors v1 , v2 ,. . . , vN . I.e., one
could write
a = c1 v1 + c2 v2 + · · · + cN vN (2.30)
and also
a = c01 v1 + c02 v2 + · · · + c0N vN (2.31)
with cn 6= c0n for at least one value of n. However, subtracting these
two equations gives

0 = (c1 − c01 )v1 + (c2 − c02 )v2 + · · · + (cN − c0N )vN . (2.32)

Since the vectors v1 , v2 ,. . . , vN form a basis, they must be linearly inde-


pendent, and therefore this last equation is possible only if cn − c0n = 0
for all n. Hence there is only one way of writing the vector a as a linear
combination of these basis vectors. 

Examples

• Any 3D geometric vector can be written as a linear combination of a unit


vector oriented in the positive x-direction, a unit vector oriented in the
positive y-direction and a unit vector oriented in the positive z-direction.
These three unit vectors thus form a basis of this vector space. (Note that
they are linearly independent: none of these three unit vectors is a linear
combination of the other two.)

• The two column vectors


   
1 0
and
0 1

29
form a basis for the vector space of the two-component column vectors
since      
a 1 0
=a +b (2.33)
b 0 1
for any complex numbers a and b.
This basis is not unique. For instance, the two column vectors
   
1 3
and
2 4

also form a basis for this vector space since for any complex numbers a
and b one can always find a number α and a number β such that
     
a 1 3
=α +β . (2.34)
b 2 4

(In fact, α = (3b − 4a)/2 and β = (2a − b)/2.)


In each of these two cases there is only one way of writing a vector as a
linear combination of the basis vectors. In contrast, there are infinitely
many ways of writing a vector as a linear combination of linearly depen-
dent vectors. For example, take the three column vectors
     
1 0 1
, , .
0 1 1

These vectors are linearly dependent since


     
1 1 0
= + . (2.35)
1 0 1

Any 2-component column vector can be written as a linear combination


of these three column vectors, as if they formed a basis; however, there
are infinitely many ways of doing so since
       
a 1 0 1
= (a − λ) + (b − λ) +λ (2.36)
b 0 1 1

for any value of λ.

30
2.8 Inner product
The vector spaces most relevant to Quantum Mechanics are inner product spaces,
namely vector spaces equipped with an operation called inner product (or scalar
product) which extends the familiar dot product between geometric vectors to
more general vector spaces.
We will denote the dot product of two geometric vectors v and w by the usual
symbol v · w, and the inner product of two other vectors v and w by (v, w) or
a similar symbol. Other notations are sometimes used in Mathematics.

More precisely, for a complex vector field, an inner product is an operation


which associates a complex number (v, w) to any pair of vectors v, w subject
to the following axioms.

1. For any vectors v and w,

(v, w) = (w, v)∗ . (2.37)

(The order of the vectors in the pair thus matters.)

2. For any vectors v1 , v2 and w and for any complex numbers α and β,

(α v1 + β v2 , w) = α∗ (v1 , w) + β ∗ (v2 , w). (2.38)

(As can be seen if one sets α = β = 0 in this equation, this axiom


implies that the inner product of any vector with the zero vector is 0:
(0, w) = (w, 0) = 0 for any vector w.)

3. (v, v) = 0 if and only if v is the zero vector, and (v, v) > 0 if v is not
the zero vector.

Note the complex conjugations in Eq. (2.38). There is no complex conjugation


if the vector on the right of the inner product is multiplied by a complex
number:
(v, α w1 + β w2 ) = α(v, w1 ) + β(v, w2 ) (2.39)
for any vectors v, w1 and w2 and any complex numbers α and β.

31
Proof: In view of Eqs. (2.37) and (2.38),

(v, α w1 + β w2 ) = (α w1 + β w2 , v)∗ (2.40)


∗ ∗ ∗
= [α (w1 , v) + β (w2 , v)] (2.41)
∗ ∗
= α(w1 , v) + β(w2 , v) . (2.42)

Eq. (2.39) follow. 

The inner product is defined in the same way for real vector spaces, the only
difference being that (v, w) and the scalars α and β are real numbers in the
case of a real vector space.
Warning: The definition of the inner product used in this course differs
from a similar notation widely used in Mathematics. As defined here, the
inner product (v, w) is antilinear in the vector written on the left whereas
in Mathematics (v, w) is usually taken to be antilinear in the vector writ-
ten on the right. I.e., for us, (α v1 + β v2 , w) = α∗ (v1 , w) + β ∗ (v2 , w) and
(v, α w1 + β w2 ) = α(v, w1 ) + β(v, w2 ), whereas mathematicians would in-
stead write (α v1 + β v2 , w) = α(v1 , w) + β(v2 , w) and (v, α w1 + β w2 ) =
α∗ (v, w1 ) + β ∗ (v, w2 ).

Examples

• The familiar dot product between geometric vectors is an inner product:


For any two 3D geometric vectors, v and w, (v, w) = v · w. (There is no
complex conjugation because this vector space is real.)

• For the 2-component complex column vectors used to represent spin states,
the inner product is defined in the following way: If
 
a
v= (2.43)
b
and  0
a
w= 0 , (2.44)
b
then
 a0
 
(v, w) = a ∗
b∗
= a∗ a0 + b ∗ b 0 . (2.45)
b0

32
See Appendix A for a reminder of how to multiply a column vector by a
row vector. Note that the components of the column vector v, which is
on the left of (v, w), are complex conjugated when this inner product is
calculated, and also that this column vector is written as a row vector in
the product.
The rule is the same for column vectors of any number of components: We
will always calculate the inner product (v, w) by transforming the com-
plex conjugate of the column vector v into a row vector and multiplying
the column vector w by this row vector. (Other ways of calculating the
inner product are possible in principle but are not often used in Quantum
Mechanics.)
Changing the column vector appearing on the left into a row vector can
be taken as a convention. It differs from the one you probably follow
when calculating the dot product of two geometric vectors, in which this
calculation is written as a dot product of two column vectors formed by
the x-, y- and z-components of the respective geometric vectors. These
two conventions are entirely equivalent, although we will see that there
are mathematical reasons for preferring the one adopted in this course.
• The inner product of two square-integrable functions f (x) and g(x) is
defined as follows:
Z ∞
(f, g) = f ∗ (x) g(x) dx, (2.46)
−∞

or, for functions which are square-integrable on a finite interval [a, b],
Z b
(f, g) = f ∗ (x) g(x) dx. (2.47)
a

• The inner product of two square-integrable functions f (θ, φ) and g(θ, φ)


of the polar angles θ and φ is defined as follows:
Z π Z 2π
(f, g) = dθ sin θ dφ f ∗ (θ, φ) g(θ, φ). (2.48)
0 0

For functions f (r, θ, φ) and g(r, θ, φ) of the spherical polar coordinates r,


θ and φ,
Z ∞ Z π Z 2π
(f, g) = dr r 2
dθ sin θ dφ f ∗ (r, θ, φ) g(r, θ, φ). (2.49)
0 0 0

33
2.9 Norm of a vector
The norm of a vector v, which we will represent by the symbol ||v|| (or |v| for
3D geometric vectors) is the real number defined by the following equation:

(2.50)
p
||v|| = (v, v)

(or |v| = v · v for geometric vectors).
A vector is said to be normalized if it has unit norm. (A normalized vector is
also called a unit vector.)
Any non-zero vector can be normalized by multiplying it by the inverse of its
norm: If u = v/||v||, then,

(2.51)
p
||u|| = (v, v)/||v|| = 1.

(Clearly, the zero vector has a zero norm and cannot be normalized. By defi-
nition of an inner product, the zero vector is the only vector which has a zero
norm.)
Example
The vector  √ 
1/√ 2
v= (2.52)
i/ 2
is a unit vector since
 √ 
√ √  1/ 2
(v, v) = 1/ 2 −i/ 2 √ = 1/2 + (−i)(i)/2 = 1. (2.53)
i/ 2

+ As you know, v · w = |v| |w| cos θ where θ is the angle between the
vectors v and w. Since cos θ is a number between −1 and 1, its absolute
value is never larger than 1. Hence |v · w| ≤ |v| |w|. This result is a
particular case of the more general inequality

|(v, w)| ≤ ||v|| ||w||, (2.54)

called the Schwarz inequality (or Cauchy-Schwarz inequality), which


applies to any inner product space.

34
Proof: If v = 0 or w = 0 (i.e., v or w is the zero vector), then (v, w) = 0,
||v|| = 0 or ||w|| = 0, and the inequality reduces to 0 ≤ 0, which is true.
If neither v nor w is the zero vector, then we can define the normalized
vectors v 0 = v/||v|| and w0 = w/||w|| such that (v 0 , v 0 ) = (w0 , w0 ) = 1,
and set u0 = v 0 − (w0 , v 0 ) w0 . Now, since (w0 , v 0 ) = (v 0 , w0 )∗ ,

(u0 , u0 ) = (v 0 − (w0 , v 0 ) w0 , v 0 − (w0 , v 0 ) w0 ) (2.55)


0 0 ∗ 0 0 0 0 0 0
= 1 − (w , v ) (w , v ) − (w , v )(v , w )
+ (w0 , v 0 )∗ (w0 , v 0 ) (2.56)
0 0 2
= 1 − |(v , w )| . (2.57)

However, (u0 , u0 ) ≥ 0 by virtue of the third axiom of the inner product.


Thus 1 − |(v 0 , w0 )|2 ≥ 0, and therefore ||v|| ||w|| ≥ |(v, w)|. 

2.10 Orthogonal vectors


Two vectors are said to be orthogonal if their inner product is zero. Two or-
thogonal unit vectors are said to be orthonormal.
Examples

• The two vectors    


1 1
v= and w = (2.58)
i −i
are orthogonal since
 
1
(2.59)

(v, w) = 1 −i = 1 + (−i)(−i) = 1 − 1 = 0.
−i

• The two complex exponentials exp(inx) and exp(imx) are orthogonal on


the interval [0, 2π] for any integer n and m 6= n. Indeed, if n and m are

35
two integers and m 6= n, then m − n is a non-zero integer and therefore
Z 2π Z 2π

[exp(inx)] exp(imx) dx = exp[i(m − n)x] dx (2.60)
0 0

exp[i(m − n)x]
= (2.61)
i(m − n) 0
exp[2(m − n)πi] − 1
= (2.62)
i(m − n)
1−1
= (2.63)
i(m − n)
= 0. (2.64)
(We used the fact that exp(2kπi) = 1 for any integer k.)
• The zero vector is orthogonal to every other vector.

Gram-Schmidt orthogonalization
A set of linearly independent vectors can always be transformed into a set of
vectors othogonal to each other by using a method known as Gram-Schmidt
orthogonalization.
+ The orthogonalization method we are talking about differs from the
similar method of Gram-Schmidt orthonormalization you may have
studied in other courses.

The method is perhaps best explained by way of examples. First, let us consider
just two geometric vectors (arrow vectors), a and b, say. We want to make b
orthogonal to a. To this effect, we write b as the sum of a vector bk parallel to a
and a vector b⊥ orthogonal to a. Clearly, bk is a unit vector in the direction of a,
â, multiplied by bk · â. Since â = a/(a·a)1/2 , altogether bk = [(bk ·a)/(a·a)] a.
Therefore b⊥ = b − [(bk · a)/(a · a)] a.
As a second example, suppose that we want to form three vectors a0 , b0 and
c0 orthogonal to each other, starting from three linearly independent non-zero
column vectors a, b and c. For the sake of the illustration, let us imagine that
     
1 1 1
a = 1 , b = 1 , c = 0 .
     (2.65)
1 0 1

36
We can decide to include one of the latter amongst our set of orthogonal vectors.
For instance, let us take a0 to be the vector a:
 
1
0
a = a = 1 .
 (2.66)
1

Then we form a vector b0 orthogonal to a0 by subtracting from b the vector a0


multiplied by (a0 , b) and divided by (a0 , a0 ):

b0 = b − [(a0 , b)/(a0 , a0 )] a0 . (2.67)

Here
   
 1  1
(a0 , b) = 1 1 1 1 = 2 and (a0 , a0 ) = 1 1 1 1 = 3. (2.68)
0 1

Thus, in our case,


     
1 1 1/3
2
b0 = 1 − 1 =  1/3  . (2.69)
3
0 1 −2/3

The vector b0 so defined is always orthogonal to a0 since

(a0 , b0 ) = (a0 , b) − [(a0 , b)/(a0 , a0 )] (a0 , a0 ) = (a0 , b) − (a0 , b) = 0. (2.70)

Note that b0 cannot be the zero-vector, as otherwise the vectors a and b would
not be linearly independent. We then form a vector c0 orthogonal to both a0 and
b0 by the same process. I.e., we set

c0 = c − [(a0 , c)/(a0 , a0 )] a0 − [(b0 , c)/(b0 , b0 )] b0 . (2.71)

Here (a0 , c) = 2, (a0 , a) = 3, (b0 , c) = −1/3 and (b0 , b0 ) = 6/9 = 2/3, and thus
       
1 1 1/3 1/2
2 1
c0 = 0 − 1 +  1/3  = −1/2 . (2.72)
3 2
1 1 −2/3 0

37
As can be checked easily, the three vectors so obtained,
     
1 1/3 1/2
a0 = 1 , b0 =  1/3  and c0 = −1/2 , (2.73)
1 −2/3 0

are orthogonal to each other.


This orthogonalization process can be iterated further to orthogonalize as many
vectors as one wants, as long as these vectors are linearly independent.
Orthonormal bases
Bases formed of normalized vectors orthogonal to each other are particularly
convenient. Suppose that the vectors u1 , u2 ,. . . , uN form a basis. Then, for any
vectors v and w there exist a set of scalars c1 , c2 ,. . . , cN and a set of scalars d1 ,
d2 ,. . . , dN such that
N
X N
X
v= cj u j and w= dj uj . (2.74)
j=1 j=1

Suppose further that the vectors uj are orthonormal — i.e., that (ui , uj ) = δij
where δij is the Kronecker delta:
(
1 i = j,
δij = (2.75)
0 i 6= j.

The coefficients cj and dj can then be obtained as the inner product of the re-
spective vector with the corresponding basis vector: cj = (uj , v) and dj =
(uj , w). Moreover,
XN
(v, w) = c∗j dj (2.76)
j=1

and therefore
N
X N
X
2
||v|| = (v, v) = |cj | 2
and 2
||w|| = (w, w) = |dj |2 . (2.77)
j=1 j=1

38
Proof: Taking the inner product of v with u1 , we see that
N
X N
X
(u1 , v) = cj (u1 , uj ) = cj δ1j = c1 . (2.78)
j=1 j=1

Similarly (u2 , v) = c2 , and in general cj = (uj , v) and dj = (uj , w). More-


over,
XN X N N X
X N
(v, w) = c∗i dj (ui , uj ) = c∗i dj δij . (2.79)
i=1 j=1 i=1 j=1

Eqs. (2.76) follows. 

Orthogonal subspaces
A subspace V 0 of a vector space V is said to be orthogonal to another subspace
V 00 of V if all the vectors of V 0 are orthogonal to all the vectors of V 00 .
For example, the space spanned by the two column vectors
   
1 0
0 and 1
0 0
is orthogonal to the space spanned by the column vector
 
0
0
1
since  
 0
a∗ b ∗ 0 0 = 0 (2.80)
c
for any complex numbers a, b and c.

+ We have seen, in Section 2.4, that the space spanned by these first two
column vectors can be represented geometrically by the xy-plane. Sim-
ilarly, the one-dimensional space spanned by the third column vector

39
can be represented by the z-axis, and saying that these two spaces are
orthogonal is the same as saying that the z-axis is orthogonal to the
xy-plane.

Orthogonality and linear independence


A set of non-zero orthogonal vectors is always linearly independent.

Proof: As seen in Section 2.5, we can be sure that N vectors are linearly
independent if the equation

c1 v1 + c2 v2 + · · · + cN vN = 0 (2.81)

is possible only if all the coefficients cn are zero. Suppose that the N
vectors vn are mutually orthogonal and non-zero. Suppose further that
the equation would be possible with some of the coefficients being non-
zero. Let cj be a non-zero coefficient. Taking the inner product of the
vector vj with each side of the equation gives

c1 (vj , v1 ) + c2 (vj , v2 ) + · · · + cN (vj , vN ) = (vj , 0). (2.82)

However, (vj , vn ) = 0 if j 6= n and (vj , 0) = 0. Eq. (2.82) thus reduces


to
cj (vj , vj ) = 0. (2.83)
Since vj is not the zero vector, (vj , vj ) 6= 0. Thus cj must be zero, in
contradiction with the hypothesis that cj is non-zero. Hence all the
coefficients must be zero for the equation to be possible. 

For example, taken as functions defined on the interval [0, 2π], the complex
exponentials exp(inx) and exp(imx) are linearly independent for any inte-
ger n and m 6= n since, as established above, these functions are orthogonal
on that interval.

2.11 Hilbert spaces


The vector spaces of greatest importance in Quantum Mechanics are inner prod-
uct spaces which have a further mathematical property called completeness.

40
Such vector spaces are called Hilbert spaces. (Recall that inner product spaces
are vector spaces in which an inner product is defined. For the mathematical
property called completeness, see below.)
We have already encountered several Hilbert spaces: The vector spaces formed
by N -component column vector of complex numbers (e.g., column vectors rep-
resenting spin states) are Hilbert spaces, and so are the vector spaces formed by
square-integrable functions (e.g., wave functions).

What completeness is is not important for this course. Briefly, if one takes an
infinite set of vectors belonging a vector space V and this infinite set forms a
convergent sequence (in a precise mathematical sense, see the example in the
note below for more information), then for V to be complete the limit of this
sequence must also a vector belonging to V . This property plays an impor-
tant role in the mathematical theory of infinite-dimensional vector spaces.
One can show that any finite-dimensional inner product space is complete
and is therefore a Hilbert space. However, not all infinite-dimensional inner
product spaces are Hilbert spaces.

+ An example of an incomplete inner product space is the space formed


by all the functions that are continuous on the closed interval [0, 1],
defining the inner product as per Eq. (2.45):
Z 1
(f, g) = f ∗ (x)g(x) dx. (2.84)
0

Accordingly, the norm ||f || of a function f (x) belonging to that space


is given by the equation
Z 1
||f ||2 = |f (x)|2 dx. (2.85)
0

Now, consider the following sequence of functions: f1 (x) = x, f2 (x) =


x1/2 , f3 (x) = x1/3 , f4 (x) = x1/4 ,. . . , fn (x) = x1/n ,. . . All these func-
tions are continuous on [0, 1]. A short calculation shows that
 −1  −1  −1
2 2 2 1 1
||fn − fm || = +1 + +1 −2 + +1 .
n m n m
(2.86)

41
Thus ||fn − fm || goes to zero for n and m → ∞, which means that
the sequence converges (loosely speaking, the difference between the
functions fn (x) and fm (x) becomes vanishingly small when n and m
increase). If you have taken a course in Analysis, you may have rec-
ognized that these functions actually form a Cauchy sequence, which
is the mathematically rigorous way convergence is defined in this con-
text: for any positive number  one can find an integer N such that
||fm − fn || <  for all m and n > N . Although this sequence con-
verges and only contains functions continuous everywhere on [0, 1],
however, the function it converges to is not continuous on [0, 1]: the
limit of x1/n for n → ∞ is indeed the discontinuous function
(
1 0 < x ≤ 1,
f (x) = (2.87)
0 x = 0.

Therefore the vector space of all the functions continuous on [0, 1] and
equipped with this inner product is not complete and does not qualify
as a Hilbert space.
One needs to enlarge the space in order to complete it, in particular by
including functions that are not continuous everywhere. How to do this
is well outside the scope of the course, but the result can be stated rela-
tively simply: the space of all square-integrable functions on an interval
[a, b] or on the infinite interval (−∞, ∞) is complete with respect to the
inner products defined by Eqs. (2.44) and (2.45). (For this result to hold,
however, these integrals must be understood as being Lebesgue inte-
grals — see page 20. Recall that square-integrable functions are called
L2 functions.) The mathematical theory of the corresponding Hilbert
spaces underpins the whole of wave quantum mechanics.

+ As seen in previous sections, in a finite-dimensional vector space V


any linear combination of vectors of V is an element of V and any
element of V can be written in a unique way as a linear combination
of given basis vectors. This is also the case in many important infinite-
dimensional Hilbert spaces, although with significant differences.
Let us take, for example, the Hilbert space of L2 functions on the in-
terval [0, 2π]. We have seen that the complex exponentials exp(inx)
(n = 0, ±1, ±2,. . . ) are mutually orthogonal on that interval. We can
thus form sequences of linear combinations of an increasingly large

42
number of such functions, e.g., s0 (x), s1 (x), s2 (x),. . . , with
N
X
sN (x) = cn exp(inx). (2.88)
n=−N

Each of these functions is a linear combination of a finite number


of linearly-independent L2 functions, and is therefore a L2 function.
However, this sequence of sN (x) functions converges to a well defined
L2 function s(x) only if the coefficients cn make this possible (e.g., these
coefficients should go to zero for n → ±∞). Furthermore, converging
to s(x) here means that ||sN (x) − s(x)|| → 0 for N → ∞, not that
sN (x) → s(x) at any value of x. However, within this definition of
convergence, we write
X
s(x) = cn exp(inx), (2.89)
n

being understood that this equality does not necessarily hold at all val-
ues of x. It turns out that for any L2 function on [0, 2π] one can find
a set of coefficients cn such that the function can be expanded in such
a way. In this sense, the complex exponentials exp(inx) form a basis
for this Hilbert space. (There would be much more to say about the
concept of basis on infinite-dimensional vector space, but saying much
more would go well beyond the scope of this course.)

2.12 Isomorphic vector spaces


You are familiar with the fact that 3D geometric vectors representing, e.g., ve-
locities, can be written as linear combinations of three orthogonal unit vectors.
For example,
v = vx x̂ + vy ŷ + vz ẑ, (2.90)
where x̂, ŷ and ẑ are the unit vectors in the x-, y- and z-directions. Given these
three unit vectors, the components vx , vy and vz are in one-to-one correspon-
dence with the vector v: Each geometric vector corresponds to one and only
one set of components, and each set of three components corresponds to one
and only one geometric vector. Since these sets of components can be arranged

43
in column vectors, e.g.,  
vx
 vy  ,
vz
Eq. (2.90) defines a one-to-one correspondence between the elements of the vec-
tor space of 3D geometric vectors and the vector space of 3-component column
vectors of real numbers. In fact, adding geometric vectors or multiplying them
by a number is equivalent to adding the corresponding column vectors or mul-
tiplying these by the same number. Also, the dot product of any two geometric
vectors can be calculated as the inner product of the corresponding column
vectors: If v = vx x̂ + vy ŷ + vz ẑ and w = wx x̂ + wy ŷ + wz ẑ, then
 
 wx
v · w = vx wx + vy wy + vz wz = vx vy vz wy  . (2.91)
wz
(In writing the row vector, we have assumed that the components vx , vy and vz
are real numbers and therefore equal to their complex conjugate.)
In recognition of these facts, one says that the vector space of 3D geometric
vectors and the vector space of 3-component column vectors of real numbers
are isomorphic (which means, literally, that they have the same form). In a
sense, there is only one such vector space, and its elements can be represented
equally well by arrow vectors as by column vectors.
As another example, take the vector space spanned by the three spherical har-
monics Y1−1 (θ, φ), Y10 (θ, φ) and Y11 (θ, φ) (see Section 2.4). You have seen in
the Term 1 course that these three functions are orthonormal; in fact, for any l,
l0 , m and m0 ,
Z π Z 2π
dθ sin θ ∗
dφ Ylm (θ, φ) Yl0 m0 (θ, φ) = δll0 δmm0 . (2.92)
0 0

These three spherical harmonics are thus linearly independent. Any element
f (θ, φ) of this vector space is a linear combination of the form

f (θ, φ) = c−1 Y1−1 (θ, φ) + c0 Y10 (θ, φ) + c1 Y11 (θ, φ), (2.93)

where c−1 , c0 and c1 are three complex numbers. Since the three spherical har-
monics are linearly independent, each of these functions can be written in only

44
one way as a linear combination of that form. These functions are therefore in
one-to-one correspondence with the column vectors
 
c−1
 c0 
c1

and the vector space spanned by these three spherical harmonics is isomorphic
to the vector space of 3-component column vectors of complex numbers.

Exercise: Let

f (θ, φ) = c−1 Y1−1 (θ, φ) + c0 Y10 (θ, φ) + c1 Y11 (θ, φ), (2.94)
g(θ, φ) = d−1 Y1−1 (θ, φ) + d0 Y10 (θ, φ) + d1 Y11 (θ, φ). (2.95)

Show that the inner product of these two functions, defined as the integral
Z π Z 2π
dθ sin θ dφ f ∗ (θ, φ) g(θ, φ),
0 0

is c∗−1 d−1 + c∗0 d0 + c∗1 d1 . Show that the same result is also obtained by
taking the inner product of the corresponding column vectors,
   
c−1 d−1
 c0  and  d0  .
c1 d1

45
3 Operators (I)

3.1 Linear operators


As was mentioned in Sections 1.2 and 1.3, measurable physical quantities cor-
respond to certain mathematical operators in Quantum Mechanics. You have
encountered a number of such operators in the Term 1 course, including the
Hamiltonian operator, the position operator, the momentum operator and var-
ious angular momentum operators. You have also encountered operators such
as ladder operators, which do not correspond to measurable quantities.
These operators are mathematical objects which transform elements of a certain
vector space into elements of the same vector space. In other words, they map
vectors to vectors.
Examples

• As we will see later in the course, the matrix


 
~ 0 −i
Sy = , (3.1)
2 i 0
is an operator related to spin measurements. This operator transforms
2-component column vectors into 2-component column vectors. For in-
stance, it transforms the column vector
 
a
χ= (3.2)
b
into the column vector
    
~ 0 −i a ~ −ib
Sy χ = = . (3.3)
2 i 0 b 2 ia

• As another example, take the Hamiltonian of a linear harmonic oscillator


of mass m,
~2 d2
H=− + V (x), (3.4)
2m dx2
where V (x) is the potential energy. In terms of the oscillator’s angular
frequency ω,
V (x) = mω 2 x2 /2. (3.5)

46
This operator transforms functions into functions. For instance, it trans-
forms the function
ψ(x) = exp(−α2 x2 /2), (3.6)
where α is a real constant, into the function

~2 d2 ψ
Hψ(x) = − 2
+ V (x)ψ(x) (3.7)
 2m2 dx
 mω 2 2

~
= − α 4 x2 − α 2 + x exp(−α2 x2 /2). (3.8)
2m 2

Domain of an operator
This last example illustrates the need for a careful definition of the vector
space in which an operator acts. Since it includes a term in d2 /dx2 , the
Hamiltonian operator of Eq. (3.4) is defined only when acting on functions
which can be differentiated twice. Such functions form a subspace of the vec-
tor space of all functions of x. Thus H maps twice-differentiable functions
to functions (the latter may or may not be twice-differentiable themselves,
hence the mapping is not from the space of all twice-differentiable functions
to the space of all twice-differentiable functions).
More generally, an operator is a mapping from a subspace V 0 of a vector space
V to the vector space V itself. The vector space V 0 in which the operator acts
is called the domain of this operator.
It might be that V 0 = V , in which case the domain of the operator is the
whole of V . (By definition of a subspace, V 0 is a subset of V ; however, as
a subset of a set can be this set itself, V 0 can be the whole of V .) This is
the case of the operator Sy defined in the first example: any 2-component
column vector can be multiplied by a 2 × 2 matrix, hence the domain of this
operator is the whole of the vector space of 2-component column vectors.
However, Quantum Mechanics also makes use of operators whose domain
is smaller than the vector space they map to — typically, these are operators
acting on wave functions, such as the Hamiltonian operator H of Eq. (3.4).

47
Linear operators
Operator representing measurable physical quantities and most of the other
operators used in Quantum Mechanics have the important mathematical prop-
erty to be linear. An operator A is said to be linear if it fulfils the following two
conditions:

1. If w is the sum of the vectors v1 and v2 , then Aw = Av1 + Av2 .


2. If w is the product of a vector v by a scalar c, then Aw = cAv. (The right-
hand side is meant to represent the product of the vector Av by the scalar
c.)

These two conditions can be summarized into the single condition that for any
vector v1 and v2 and any scalar c1 and c2 ,
A(c1 v1 + c2 v2 ) = c1 Av1 + c2 Av2 . (3.9)
For example, the differential operator d/dx is linear since, if c1 and c2 are con-
stants,
d df1 df2
[c1 f1 (x) + c2 f2 (x)] = c1 + c2 . (3.10)
dx dx dx
Throughout the rest of the course we will always assume that the operators we
are talking about are linear operators.

+ Not all operators are linear. For example, consider an operator O which
would multiply vectors by their norm, i.e., such that
Ov = ||v|| v (3.11)
for any vector v of the space in which this operator acts. This operator
violates the conditions operators must fulfil to qualify as linear opera-
tors. In particular, it is not the case that O(cv) = c Ov for any scalar c
and any vector v, since
O(cv) = ||cv|| cv, (3.12)
and ||cv|| cv 6= c ||v|| v unless c = 0, |c| = 1 or v = 0. Hence O is not
a linear operator.
+ Linear operators are particular instances of more general mappings
called linear transformations.

48
The identity operator
The identity operator, which is usually denoted by the letter I, is the operator
which maps any vector into itself:

Iv = v (3.13)

for any vector v. We will use this operator from time to time.

3.2 Matrix representation of an operator


We start by the case of operators acting in a finite-dimensional Hilbert space,
e.g., spin operators. Recall that an operator A acting on a vector v transforms
this vector into a vector denoted Av. Let us set

w = Av. (3.14)

Let us also assume that these vectors belong to a finite-dimensional Hilbert


space of dimension N . We can therefore write them as linear combinations of
N orthonormal basis vectors un :

v = c1 u1 + c2 u2 + · · · + cN uN , (3.15)
w = d1 u1 + d2 u2 + · · · + dN uN , (3.16)

where the coefficients c1 ,. . . , cN and d1 ,. . . , dN are in general complex numbers.


Recall that saying that the vectors un are orthornormal means that

(ui , uj ) = δij , (3.17)

and that cn = (un , v) and dn = (un , w), n = 1, 2,. . . , N . (See Section 2.10 for
further information about orthonormal bases.) Eq. (3.14) can be directly written
as a relation between the coefficients cn and the the coefficients dn : One can
organise these two sets of coefficients into two column vectors, c and d, such
that    
c1 d1
 c2   d2 
c =  ..  and d =  ..  , (3.18)
   
 .   . 
cN dN

49
and in terms these two column vectors Eq. (3.14) reads
d = Ac (3.19)
where A is the N × N matrix
 
(u1 , Au1 ) (u1 , Au2 ) · · · (u1 , AuN )
 (u2 , Au1 ) (u2 , Au2 ) · · · (u2 , AuN ) 
A= .. .. .. .. . (3.20)
 
 . . . . 
(uN , Au1 ) (uN , Au2 ) · · · (uN , AuN )

Proof: Written in terms of the basis set expansions of the vectors v and
w, Eq. (3.14) reads
XN XN
dj uj = cj Auj . (3.21)
j=1 j=1

Taking the inner product of this equation with u1 , we see that


N
X N
X
dj (u1 , uj ) = cj (u1 , Auj ). (3.22)
j=1 j=1

In view of Eq. (3.17), this equation reduces to


N
X
d1 = cj (u1 , Auj ). (3.23)
j=1

Likewise, taking the inner product of Eq. (3.21) with u2 yields


N
X
d2 = cj (u2 , Auj ), (3.24)
j=1

and similarly for all the other coefficients dn . In general,


N
X
di = Aij cj , i = 1, 2, . . . , N, (3.25)
j=1

with Aij = (ui , Auj ). Eq. (3.19) follows. 

50
The matrix A is said to represent the operator A in the basis {un }. Its elements
— i.e., the inner products (ui , Auj ) — are called the matrix elements of A in that
basis.
It is clear that the column vectors c and d representing the vectors v and w and
the matrix A representing the operator A all depend on the basis: changing the
basis from a set of orthonormal vectors {un } to a set of orthonormal vectors
{u0n } changes the column vectors c and d into column vectors c0 and d0 of ele-
ments c0n = (u0n , v) and d0n = (u0n , w) and changes the matrix A into a matrix
A0 of elements (u0i , Avj0 ). As long as the set {u0n } is also an orthonormal basis,
however, the column vector d0 is related to A0 and to c0 in the same way as d is
related to A and to c:
d0 = A0 c0 . (3.26)
Many different sets of units vectors can form an orthonormal basis.3 Therefore,
a same operator can be represented by many different matrices.
Examples

• The spherical harmonics Ylm (θ, φ) are orthonormal in the sense that
Z π Z 2π
dθ sin θ ∗
dφ Ylm (θ, φ) Yl0 m0 (θ, φ) = δll0 δmm0 . (3.27)
0 0

Therefore the three l = 1 spherical harmonics Y1−1 (θ, φ), Y10 (θ, φ) and
Y11 (θ, φ), constitute an orthonormal basis for the vector space of all linear
combinations of the form

c−1 Y1−1 (θ, φ) + c0 Y10 (θ, φ) + c1 Y11 (θ, φ),

where c−1 , c0 and c1 are complex numbers (see Section 2.4 for this vector
space). You may remember from the Term 1 course that the spherical har-
monics are eigenfunctions of the angular momentum operator Lz . More
precisely,

Lz = −i~ (3.28)
∂φ
3
Actually, infinitely many different sets of units vectors can form an orthonormal basis if the
dimension of the vector space is 2 or higher. There is no choice of basis set possible in spaces
of dimension 1.

51
and
Lz Ylm (θ, φ) = m~Ylm (θ, φ). (3.29)
Hence, when Lz acts on a linear combination of spherical harmonics with
l = 1, the result is also a linear combination of spherical harmonics with
l = 1; in fact,

Lz [c−1 Y1−1 (θ, φ)+c0 Y10 (θ, φ) + c1 Y11 (θ, φ)]


= [−~ c−1 Y1−1 (θ, φ) + ~ c1 Y11 (θ, φ)]. (3.30)

One can therefore represent the operator Lz by a 3×3 matrix Lz in the ba-
sis formed by the spherical harmonics Y1−1 (θ, φ), Y10 (θ, φ) and Y11 (θ, φ),
the elements of this matrix being the integrals
Z π Z 2π
dθ sin θ ∗
dφ Ylm (θ, φ) Lz Yl0 m0 (θ, φ). (3.31)
0 0

These integrals are easy to calculate in view of Eqs. (3.27) and (3.29). The
calculation gives  
−~ 0 0
Lz =  0 0 0  . (3.32)
0 0 ~
Written in terms of this matrix, Eq. (3.30) reads
    
−~ 0 0 c−1 −~c−1
 0 0 0   c0  =  0  . (3.33)
0 0 ~ c1 ~c1

It is worth noting that the column vectors representing the linear combi-
nations of spherical harmonics and the matrix representing the operator
Lz depend on the order of the basis functions in the basis set. Eqs. (3.32)
and (3.33) apply to the case where the basis is the ordered set

{Y1−1 (θ, φ), Y10 (θ, φ), Y11 (θ, φ)}.

If we had taken the basis to be the ordered set

{Y10 (θ, φ), Y11 (θ, φ), Y1−1 (θ, φ)},

52
these two equations would have been
 
0 0 0
Lz = 0 ~ 0  (3.34)
0 0 −~

and     
0 0 0 c0 0
0 ~ 0   c1  =  ~c1  . (3.35)
0 0 −~ c−1 −~c−1

Other choices of basis functions are also possible. For instance, we can
work with the functions Y1x (θ, φ), Y1y (θ, φ) and Y1z (θ, φ) defined as fol-
lows:

Y1x (θ, φ) = [Y1−1 (θ, φ) − Y11 (θ, φ)]/ 2, (3.36)

Y1y (θ, φ) = i[Y1−1 (θ, φ) + Y11 (θ, φ)]/ 2, (3.37)
Y1z (θ, φ) = Y10 (θ, φ). (3.38)

It is not particularly difficult to show that these three functions also form
an orthonormal basis for the vector space spanned by the spherical har-
monics Y1−1 (θ, φ), Y10 (θ, φ) and Y11 (θ, φ), and that in the basis {Y1x (θ, φ), Y1y (θ, φ), Y1z (θ, φ)}
the operator Lz is represented by the matrix
 
0 i~ 0
L0z = −i~ 0 0 . (3.39)
0 0 0

(The calculation is left as an exercise.)

• The five spherical harmonics Y2−2 (θ, φ), Y2−1 (θ, φ), Y20 (θ, φ), Y21 (θ, φ)
and Y22 (θ, φ) constitute an orthonormal basis for the vector space of all
linear combinations of the form

c−2 Y2−2 (θ, φ) + c−1 Y2−1 (θ, φ) + c0 Y20 (θ, φ) + c1 Y21 (θ, φ) + c2 Y22 (θ, φ),

where the coefficients cn are complex numbers. In the basis formed by


these five spherical harmonics, the angular momentum operator is repre-

53
sented by the matrix

−2~ 0 0 0 0
 
 0 −~ 0 0 0 
(3.40)
 
Lz = 
 0 0 0 0 0.
 0 0 0 ~ 0
0 0 0 0 2~

+ Although its analytical form [Eq. (3.28)] is the same as in the previous
example, this operator is represented by a 5 × 5 matrix here, not by
a 3 × 3 matrix. It may seem bizarre that a same operator can be rep-
resented by matrices of different sizes. However, technically, the Lz
operator considered here is not the same operator as the Lz operator
considered above: The mathematical definition of an operator includes
a specification of the vector space in which the operator acts, and this
vector space differs between these two examples.

• Since (ui , Iuj ) = (ui , uj ) = δij if I is the identity operator and the vectors
ui and uj are orthonormal, this operator is always represented by the unit
matrix in any orthonormal basis (specifically, by the N × N unit matrix
if the space is of dimension N ).4

Operators in infinite-dimensional vector spaces


All what we have seen above for the case of operators acting in finite-
dimensional vector spaces generalizes to the case of infinite-dimensional vec-
tor spaces, although the rigorous mathematical theory of the latter is more
difficult. Ignoring (as usual) various mathematical subtleties, any linear oper-
ator acting on vectors belonging to an infinite-dimensional Hilbert space can
be represented, at least formally, by “square" matrices of an infinite number
of elements. Such matrices can be constructed as in the finite-dimensional
case: given an orthonormal basis spanning the space, {u1 , u2 , u3 , . . .}, an op-
erator A is represented in that basis by a matrix A whose elements Aij are
4
Recall that the unit matrix is the diagonal matrix whose diagonal elements are 1 and off
diagonal elements are 0.

54
the inner products (ui , Auj ). (Aij is the element of A located on the i-th row
and in the j-th column.) Likewise, a vector v of this Hilbert space is repre-
sented by a column vector c whose i-th component (i = 1, 2, 3,. . . ) is the
inner product (ui , v). In this representation, the vector Av is calculated as
the product of the column vector c by the matrix A.
Take, for example, the Hilbert space of all square-integrable functions of the
polar angles θ and φ, which is infinite-dimensional. (We are talking about
the Hilbert space of all square-integrable functions of these two angles, not
about a finite-dimensional Hilbert space of functions that can be written as
a linear combination of spherical harmonics with same l values as in the
previous examples.) One can show that the spherical harmonics Ylm (θ, φ)
form an orthonormal basis for this Hilbert space. In the basis
{Y00 (θ, φ), Y1−1 (θ, φ), Y10 (θ, φ), Y11 (θ, φ), Y2−2 (θ, φ), Y2−1 (θ, φ), . . .},
the angular momentum operator Lz is represented by the infinite matrix
 
0 0 0 0 0 0 ···
0 −~ 0 0 0 0 · · ·
 
0 0 0
 0 0 0 · · · 
Lz = 0 0 0 ~ 0 0 · · · . (3.41)

0 0 0
 0 −2~ 0 · · · 
0 0 0 0 0 −~ · · ·
.. .. .. .. .. .. ..
 
. . . . . . .
Likewise, a function f (θ, φ) which can be expanded as
∞ X
X l
f (θ, φ) = clm Ylm (θ, φ) (3.42)
l=0 m=−l

is represented, in that basis, by the column vector


 
c00
c1−1 
 
 c10 
 
c =  c11  . (3.43)
 
c 
 2−2 
c 
 2−1 
..
.

55
Warning: The mathematical theory of finite matrices does not extend
straightforwardly to infinite matrices. For example, a column vector of in-
finitely many components can be multiplied by a row vector of infinitely
many components only if the resulting sum of products of components con-
verges; the issue does not arise in the finite-dimensional case.

3.3 Adding and multiplying operators


Sum of two operators
Operators acting on the same vectors can be added. The sum of an operator A
and an operator B is the operator A + B such that

(A + B)v = Av + Bv (3.44)

for any vector v the operators A and B can both act on.

+ For example, the linear harmonic oscillator Hamiltonian given by Eq. (3.4)
can be seen as being the sum of the operator

~2 d2

2m dx2
and the operator V (x), representing, respectively, the kinetic energy and
the potential energy of the oscillator. The latter is a multiplicative opera-
tor: seen as an operator, V (x) transforms a function ψ(x) into the function
V (x)ψ(x) [e.g., transforms the function exp(−α2 x2 /2) into the function
(mω 2 x2 /2) exp(−α2 x2 /2)].

Not surprisingly, the matrix representing the sum of two operators is the sum
of the corresponding matrices: E.g., if the operators A and A0 are represented
by the matrices    0 0
a b a b
A= and A = 0 0 ,
0
(3.45)
c d c d
then the operator A + A0 is represented by the matrix
  0 0 
a + a0 b + b0
 
a b a b
A+A = 0
+ 0 0 = . (3.46)
c d c d c + c0 d + d 0

56
Multiplication by a scalar
Operators can also be multiplied by scalars. To state the obvious, if α is a num-
ber, the operator αA is defined as the operator such that

(αA)v = α(Av) (3.47)

for any vector v the operator A can act on. Therefore one can make linear
combinations of operators in the same way as one can make linear combinations
of vectors. Examples of such linear combinations are the ladder operators Jˆ+
and Jˆ− used in the theory of the angular momentum, which are defined in terms
of the angular momentum operators Jˆx and Jˆy as Jˆ± = Jˆx ± iJˆy (you may have
encountered these operators in Term 1, and we will come back to them later in
this course).
In terms of matrix representations, multiplying an operator by a number amounts
to multiplying the corresponding matrix by this number, which also amounts
to multiplying each of the elements of that matrix by this number:
   
a b αa αb
α = . (3.48)
c d αc αd

Products of operators
Less obvious perhaps is that operators acting in the same space can also be
multiplied: If A and B are two operators, then, by definition, their product AB
is the operator such that
(AB)v = A(Bv) (3.49)
for any vector v (or more precisely, for any vector v for which the right-hand
side of this equation is defined). I.e., operating on a vector v with the product
operator AB is operating on v with B and then operating on the resulting vector
with A. Recall that the order of the operators in such products often matters:
in many cases, AB and BA are different operators (see Section 3.5 for further
information about non-commuting operators).
Clearly, an operator can be multiplied by itself to form the square of that opera-
tor, and this process can be iterated to form higher powers. For example, if A is
an operator, the operator A2 is the product AA, A3 is the product AA2 (which
can also be written AAA and A2 A), etc.

57
In terms of matrix representations, the matrix representing a product operator
AB in a given basis is the product of the matrix representing A with the matrix
representing B in the same basis. (These matrices do not commute if the oper-
ators A and B do not commute, in which case they must be multiplied in the
same order as the corresponding operators.)

Proof: The elements of the matrix representing a product operator


AB in an orthonormal basis {u1 , u2 , . . . , uN } are the inner products
(un , ABum ), n, m = 1, . . . , N . Now, by definition of the product of
two operators, ABum = A(Bum ). Since the vectors {u1 , u2 , . . . , uN }
form a basis, it is always possible to write Bum as a linear combination
of these basis vectors, for each um :
N
(m)
X
Bum = ci ui , (3.50)
i=1

(m)
where the coefficients ci are scalars. Thus
N
(m)
X
(un , ABum ) = ci (un , Aui ). (3.51)
i=1

(m)
However, ci = (ui , Bum ) since the basis is orthogonal. Therefore
N
X
(un , ABum ) = (ui , Bum )(un , Aui ). (3.52)
i=1

Since the matrix elements (un , Aui ) and (ui , Bum ) are numbers and
numbers commute, Eq. (3.52) can also be written as
N
X
(un , ABum ) = (un , Aui )(ui , Bum ). (3.53)
i=1

Eq. (3.53) says that the element on the n-th row and m-th column of
the matrix representing AB is obtained by multiplying the n-th row of
the matrix representing A by the m-th column of the matrix represent-
ing B. Hence, the matrix representing AB is the product of these two
matrices. 

58
+ You may have noticed that what is written above is a bit imprecise in
regards to the domain of the operators concerned. A product operator
AB may act only on vectors v which are in the domain of B and such
that Bv is in the domain of A.

Exponentials of operators
More complicated functions of operators can also be defined. One which often
crops up in Quantum Mechanics is the exponential of an operator. Recall that
if z is a number,

1 1 X 1
exp(z) = 1 + z + z 2 + z 3 + · · · = zn. (3.54)
2! 3! n=0
n!

Similarly, the exponential of an operator A is defined by the following equation:



1 1 X 1
exp(A) = I + A + A2 + A3 + · · · = An . (3.55)
2! 3! n=0
n!

+ Warning: Whereas exp(a+b) = exp(a) exp(b) if a and b are two numbers,


it is in general not the case that exp(A + B) = exp(A) exp(B) if A and B
are two operators (this equation is correct if A and B commute, though).

3.4 The inverse of an operator


If A is an operator and there exists an operator B such that AB = I and BA = I,
where I is the identity operator, then one says that A is invertible and that B is its
inverse. The inverse of an operator is usually denoted by the superscript −1 . Thus A−1
is the inverse of A if and only if

AA−1 = A−1 A = I. (3.56)

+ In finite-dimensional spaces, that AA−1 = I implies that A−1 A = I,


and the other way round. However, the two conditions AA−1 = I and
A−1 A = I are not equivalent in infinite-dimensional spaces.

The following theorems are easily proven:

59
1. If the operators A and B are both invertible, then the product AB is also invert-
ible and
(AB)−1 = B −1 A−1 . (3.57)

2. The inverse of the inverse of an operator is the operator itself:

(A−1 )−1 = A. (3.58)

3. If the operator A is invertible, then Av is the zero vector only if v is the zero
vector.

4. If the operator A is invertible and is represented by a matrix A, then this ma-


trix has an inverse, A−1 , and this inverse matrix represents the inverse of the
operator A.

Note quite all operators are invertible. Recall, for example, that a matrix whose deter-
minant is zero is not invertible. One says that an operator is singular if it not invertible.

3.5 Commutators
Two operators A and B are said to commute if ABv = BAv for any vector v these
operators may act on. (Recall from the previous section that ABv is the vector obtained
by first transforming v with B and then with A, while BAv is the vector obtained by
first transforming v with A and then with B.) If A and B commute then AB = BA
and AB − BA = 0. (The right-hand side of this last equation is the zero operator,
i.e., the operator which transforms any vector into the zero vector. Acting with this
operator on a vector amounts to a multiplication by the scalar 0.)

+ More precisely, A and B commute if ABv = BAv for all the vectors
v which are in the domain of AB as well as in the domain of BA. It is
possible for these different operators to have different domains if they
act in an infinite-dimensional vector space.

The commutator of two operators A and B is the operator AB − BA. This operator is
usually represented by the symbol [A, B]:

[A, B] = AB − BA. (3.59)

Therefore
[A, B]v = ABv − BAv. (3.60)

60
Clearly, the commutator of two commuting operators is zero.
For example, take the matrix operators
   
1 0 0 1
σz = and σx = ,
0 −1 1 0

which are used to represent spin operators (see later in the course). These two operators
do not commute since σz σx and σx σz are different matrices:
    
1 0 0 1 0 1
= (3.61)
0 −1 1 0 −1 0
whereas     
0 1 1 0 0 −1
= . (3.62)
1 0 0 −1 1 0
Clearly,      
0 1 0 −1 0 2
[σz , σx ] = − = . (3.63)
−1 0 1 0 −2 0

+ Note: You may remember to have seen the equation [x, px ] = i~, where
x and px are, respectively, the position and momentum operators for the
x-direction. Since the left-hand side of this equation is an operator, the
correct (but pedantic) way of writing its right-hand side is i~I, where I
is the identity operator. Writing [x, px ] = i~ is completely acceptable,
though, unless there would be a risk of confusion.

A few properties of commutators worth remembering: For any A, B, C,

• [B, A] = −[A, B];

• [A, I] = 0, where I is the identity operator;

• [A, A] = 0 (an operator always commutes with itself);

• [A, A2 ] = 0, and more generally [A, f (A)] = 0 for any function f (A) of the
operator A [e.g., exp(A)].

+ The “Jacobi identity for commutators" is also worth noting:

[A, [B, C]] + [C, [A, B]] + [B, [C, A]] = 0. (3.64)

61
3.6 Eigenvalues and eigenvectors
Suppose that v is a non-zero vector such that
Av = λv (3.65)
for a certain scalar λ. One then says that λ is an eigenvalue and v an eigenvector of the
operator A. Or, more specifically,one says that v is an “eigenvector of A with eigenvalue
λ" or an “eigenvector of A belonging to the eigenvalue λ". (Don’t be confused by this
equation: the left-hand side represents the vector resulting from the action of A on
v while the right-hand side represents the vector obtained by multiplying v by the
number λ.)
Clearly, if the operator A and the vector v appearing in Eq. (3.65) are represented by a
matrix A and a column vector c, then one also has
Ac = λc. (3.66)
I.e., the column vector c is an eigenvector of the matrix A.
One often uses the term eigenfunction instead of eigenvector if the operator considered
acts on functions, as in the case, for example, of Eq. (3.29) of Section 3.2. It may be
worth stressing that the words “eigenfunction" and “wave function" do not mean the
same thing. An eigenfunction is what we just defined. A wave function is a function
representing a quantum state. An eigenfunction may or may not represent a certain
quantum state, and therefore may or may not also be a wave function. Similarly, a
wave function may or may not be an eigenfunction of some interesting operator. For
example, the time-dependent function

Ψ(r, θ, φ, t) = [ψ100 (r, θ, φ) exp(−iE1 t/~)+ψ211 (r, θ, φ) exp(−iE2 t/~)]/ 2 (3.67)
is a wave function representing a linear superposition of the 1s and 2pm=1 states of
atomic hydrogen. Although a valid wave function, solution of the time-dependent
Schrödinger equation, Ψ(r, θ, φ, t) is not an eigenfunction of the Hamiltonian, the an-
gular momentum operator L2 or the angular momentum operator Lz .
+ It should be noted that only vectors belonging to the space in which the
operator is defined are regarded as being eigenvectors of this operator
(eigenvectors in the sense normally given to this term in Mathemat-
ics). For example, the differential operator d/dx has infinitely many
eigenfunctions if regarded as an operator acting on any differentiable
function, since
d
exp(λx) = λ exp(λx) (3.68)
dx

62
for any real or complex λ. But d/dx has no eigenfunction if regarded
as an operator acting only on differentiable square-integrable functions
on (−∞, ∞), since all the solutions of the equation

dy
= λy(x) (3.69)
dx
are of the form C exp(λx), where C is a constant, and none of these
solutions is square-integrable on (−∞, ∞). We will come back to this
issue in a later part of these notes.

Degenerate eigenvalues
It may happen that several linearly-independent vectors belong to a same eigenvalue.
This eigenvalue is said to be degenerate in that case.
In particular, one says that the eigenvalue λ is M -fold degenerate (or that its degree of
degeneracy is M ) if there exist M linearly-independent vectors v1 , v2 ,. . . , vM such that

Avn = λvn , n = 1, 2, . . . , M, (3.70)

and if any other eigenvector belonging to that eigenvalue is necessarily a linear com-
bination of these M linearly-independent vectors.
For example, ignoring spin, the n = 2 eigenenergy of the non-relativistic Hamiltonian
of atomic hydrogen is 4-fold degenerate since (1) the 2s, 2pm=0 , 2pm=1 and 2pm=−1
wave functions all belong to this eigenenergy, (2) these four wave functions are mutu-
ally orthogonal and therefore linearly independent, and (3) it is not possible to find a
fifth n = 2 energy eigenfunction that would be orthogonal to all these four functions.
The words “linearly independent" are an important part of the definition above. If v is
an eigenvector of an operator A with eigenvalue λ, then any multiple of v is also an
eigenvector of A with that same eigenvalue since, for any scalar c,

A(cv) = cAv = cλv = λ(cv). (3.71)

Therefore it is always the case that infinitely many eigenvectors belong to a same eigen-
value. However, vectors multiple of each other are not linearly independent. (As we will
see, in Quantum Mechanics vectors multiple of each other represent the same quan-
tum state, while linearly independent vectors necessarily represent different states. An
eigenvalue is degenerate if it corresponds to several different quantum states.)

63
+ Suppose that the operator A appearing in Eq. (3.70) acts in a vector
space V . The M linearly-independent eigenvectors v1 , v2 ,. . . , vM then
span a M -dimensional subspace of V , called an invariant subspace,
whose elements are transformed by A into elements of the same sub-
space.

The spectrum of an operator


In Physics, the set of all the eigenvalues of an operator is usually called the spectrum
(or the eigenvalue spectrum) of that operator.

+ In Mathematics, however, the spectrum of an operator is defined as the


set of the scalars λ for which the operator A − λI is singular, where
I is the identity operator. One can show that these two definitions are
equivalent for finite-dimensional vector spaces. However, for operators
acting in infinite-dimensional spaces, it is possible that the operator
A − λI is singular at values of λ which are not eigenvalues of A. (This
operator is always singular at the eigenvalues of A, since, if v is an
eigenvector of A with eigenvalue λ, (A − λI)v = 0 although v 6= 0.)

3.7 The adjoint of an operator


The adjoint of an operator A is the operator A† such that

(v, Aw) = (w, A† v)∗ (3.72)

for any vector v and w. The symbol †, pronounced “dagger", is traditionally used in
Physics to denote the adjoint of an operator.

+ Saying, as above, that Eq. (3.72) must apply to any vector v and w of the
Hilbert space in which the operator A acts is unproblematic for finite-
dimensional Hilbert spaces. For infinite-dimensional spaces, however,
a mathematically sound definition of the adjoint requires a careful spec-
ification of the domains of the operators A and A† . One says that A† is
the adjoint of A if (v, Aw) = (w, A† v)∗ for any vector w in the domain
of A and any vector v in the domain of A† , the latter being defined as
the set of all the vectors v such that there exists a vector vA for which
(v, Aw) = (vA , w) for any vector w in the domain of A.

64
+ It can be shown that any operator has one and only one adjoint.

+ The adjoint of an operator is also called the Hermitian conjugate of this


operator.

The following theorems are easily proven:

1. The adjoint of a sum is the sum of the adjoints:

(A + B)† = A† + B † . (3.73)

2. Scalars multiplying operators get complex conjugated in the adjoints: if α is a


complex number,
(αA)† = α∗ A† . (3.74)

3. The adjoint of a product of two operators is the product of their adjoints in reverse
order:
(AB)† = B † A† . (3.75)

4. The adjoint of the adjoint of an operator is this operator:

(A† )† = A. (3.76)

+ The ladder operators a+ and a− used in the Term 1 course for calculat-
ing the energy levels of a linear harmonic oscillator are examples of an
operator and its adjoint: As we will see later on, a− = a†+ and a+ = a†− .

+ If the operator A is represented by the matrix A in an orthonormal basis


{|u1 i, |u2 i, . . . , |uN i}, its adjoint is represented in this basis by the conju-
gate tranpose matrix, A† . (The proof of this assertion is left as an exercise.)
For example, if an operator is represented by the matrix
 
a b
,
c d
its adjoint is represented by the matrix
 ∗ ∗
a c
.
b∗ d∗

Of particular importance in Quantum Mechanics are operators that are identical to their
adjoint. We will explore their properties later in these notes.

65
4 Quantum states and the Dirac notation

4.1 Quantum states and ket vectors


The whole of Quantum Mechanics is based on a few fundamental rules. The first of
them can be expressed as follows:

Each state of a quantum system can be described by a vector belonging to a


Hilbert space.

Recall that Hilbert spaces are vector spaces and that their elements are called vectors.
At the beginning of the course we have seen an example of system in which the vectors
describing the states of interest were square-integrable functions of r, θ and φ, and an
example where they were 2-component column vectors. States of quantum systems can
be represented in a more general way by what Dirac called ket vectors. These vectors
are usually denoted by the symbol | . . . i, with . . . standing for whatever label would
be used to identify the particular vector (e.g., |ψi, |ni, | ↑i, |x, +i, etc.).5 Ket vectors
are often called kets, in short, or state vectors. Depending on the system, they can in
turn be represented by a function of r, θ and φ, or by a 2-component column vector, or
by some other mathematical object appropriate for the problem at hands.
We stress that ket vectors themselves do not depend on specific coordinates. Only their
representations in terms of wave functions do. They do not depend on the choice of
basis vectors either, contrary to the column vectors representing them. Ket vectors
are vectors in their own right, however vectors belonging to an abstract Hilbert space
rather than a Hilbert space of functions or column vectors.
Take, for example, the non-relativistic Hamiltonian of atomic hydrogen given by Eq. (1.1)
of Section 1.2 of these notes,

~2 2 e2 1
H=− ∇ − , (4.1)
2µ 4π0 r
and the equation
Hψnlm (r, θ, φ) = En ψnlm (r, θ, φ) (4.2)
defining the energy levels and energy eigenfunctions of that Hamiltonian. This oper-
ator H acting on wave functions representing possible quantum states of a hydrogen
5
The zero ket vector is usually represented by the numeral 0 rather than by a | . . .i symbol.
E.g., the combination |ψi + 0 represents the sum of the vector |ψi with the zero vector, not the
sum of the vector |ψi with the number 0, and |ψi + 0 = |ψi for any |ψi.

66
atom corresponds to an operator Ĥ acting on ket vectors representing the same quan-
tum states. Passing to this other formulation, Eq. (4.2) becomes
Ĥ |n, l, mi = En |n, l, mi, (4.3)
where Ĥ is a certain operator acting on the vectors |n, l, mi (operators are discussed
in Chapter 3). Neither |n, l, mi nor Ĥ are given by some combinations of the vari-
able r, θ and φ. However, they can be represented by such combinations — i.e., by
the wave functions ψnlm (r, θ, φ) and the operator H of Eq. (4.1) — in the sense that
there is a one-to-one correspondence between the kets |n, l, mi and the wave functions
ψnlm (r, θ, φ). (In the mathematical terminology of Section 2.12, one would say that the
Hilbert space inhabited by these ket vectors is isomorphic to the one inhabited by these
wave functions.) In particular, if the ket vectors |ψa i and |ψb i are represented by the
wave functions ψa (r, θ, φ) and ψb (r, θ, φ), then their linear combination α|ψa i+β |ψb i
is represented by the wave function αψa (r, θ, φ) + βψb (r, θ, φ).
The same also applies to inner products: Inner products of ket vectors can be calcu-
lated as inner products of the wave functions or column vectors representing these ket
vectors. Suppose, for instance, that the ket vectors |ψa i and |ψb i describe certain states
of a linear harmonic oscillator and correspond to the wave functions ψa (x) and ψb (x).
The inner product of these two wave functions is the integral
Z ∞
ψa∗ (x) ψb (x) dx.
−∞
Because of the correspondence between the ket vectors |ψa i and |ψb i and the wave
functions ψa (x) and ψb (x), the inner product of the latter, (ψa , ψb ), is equal to the
inner product of the former, hψa |ψb i:
Z ∞
hψa |ψb i = ψa∗ (x) ψb (x) dx. (4.4)
−∞
(The notation introduced in Chapter 2 for denoting the inner product of vectors is not
used for ket vectors: We denote the inner product of a ket |ψa i with a ket |ψb i by
the symbol hψa |ψb i, not by (|ψa i, |ψb i).) If |ψa i and |ψb i described quantum states
of atomic hydrogen instead, and corresponded to the wave functions ψa (r, θ, φ) and
ψb (r, θ, φ), we would have had, similarly,
Z ∞ Z π Z 2π
hψa |ψb i = (ψa , ψb ) = dr r 2
dθ sin θ dφ ψa∗ (r, θ, φ) ψb (r, θ, φ). (4.5)
0 0 0
The same applies to column vectors: If the ket vectors |χi and |χ0 i describe spin states
and are represented by the column vectors
   0
a 0 a
χ= and χ = 0 , (4.6)
b b

67
then
 a0
 
0
hχ|χ0 i = (χ, χ ) = a∗ b∗ . (4.7)
b0

The correspondence between ket vectors and wave functions or column vectors goes
both ways. The discussion above is phrased in terms of ket vectors being represented
by wave functions or column vectors, but one can say equally well that wave functions
or column vectors are represented by the corresponding ket vectors. (Seen in that way,
symbols such as |ψi and |χi may be thought of as being merely a simplified notation
for the corresponding wave functions or column vectors. However, it is useful to keep
in mind that these symbols actually refer to vectors in their own right.)

+ The rule stated at the beginning of this section relates the states of a
quantum system to wave functions or some other vectors. But what do
one mean by “states" in this context?
The meaning may seem almost obvious at first sight. For example, it
is an extremely well established experimental fact that an atom may
behave quite differently when exposed to a laser beam than when left
alone, although it is still the same atom (same electrons, same nucleus).
It is natural to say that the atom is in a different state when exposed to
the laser beam than when left alone. The rule says that each of these
states can be described by a wave function or some other vector, and
the same for any other state the atom could be in.
Digging a little deeper, however, one hits a difficulty with defining pre-
cisely what a “state" is in this context. What the issue is is perhaps
best explained by a simple example. Take a classical system consisting
of a mass hanging from a spring, and suppose that this mass moves
only in the vertical direction (i.e., that it does not swing laterally like a
pendulum). One could say that the position and the momentum of this
mass define its state of motion: to know them is to know the ampli-
tude, frequency and phase of its oscillation, and in fact anything that
one might want to know about how the mass moves. The trajectory of
this mass can be represented by the function z(t) describing how its z-
coordinate varies as a function of time. Someone knowing this function
could predict, with complete accuracy, where the mass is at any given
time. Position measurements on several identical oscillators in exactly
the same state of motion would return exactly the same results if done
at exactly the same time (within experimental error, of course, but this
limitation is not fundamental). The same can be said for a system of

68
many particles. In Classical Mechanics, in general, a state is defined by
the positions and momenta of all its constituents.
The situation is quite different in the case of a quantum oscillator. A
measurement of the position of the mass would only return a random
result in this case (random within the distribution of probability de-
termined by the wave function). If position measurements were made
simultaneously on several identical oscillators, all prepared in exactly
in the same way, then a different result would normally be found for
each of these oscillators (even if the measurements were so accurate
and precise than the experimental error was negligible). In these cir-
cumstances, it would make little sense to say the mass follows a cer-
tain trajectory. Whether its position and momentum could be taken
as defining its state is altogether questionable, too, since its position
and momentum cannot be both assigned precise values at the same
time, neither experimentally nor theoretically (recall the uncertainty
relation).
Hence, what the rule stated at the beginning really means becomes
rather unclear if one goes beyond the intuitive notion of state men-
tioned above. In fact, this issue touches to the philosophy and interpre-
tation of Quantum Mechanics and is still a matter of controversy.

+ What is meant by the word system in “quantum system" also deserves


some scrutiny. Briefly, in this context, a system is something which is
a self-contained entity as far as describing its state is concerned. For
example, an atom may often be regarded as an independent quantum
system — hence one can talk about its ground state, etc. — but an atom
forming part of a molecule can’t. It is rarely possible to assign an indi-
vidual wave function or state vector to each of the different parts of a
multipartite system.

4.2 Bra vectors


The symbol hψ|φi representing the inner product of the kets |ψi and |φi is called a
bracket. This inner product can be viewed as a combination of a ket vector |φi with
what Dirac called a “bra vector", hψ|. (Bra and ket vectors are so named, of course,
because their combinations are represented by brackets.) For consistency, bra and ket
vectors must obey the following rules:

69
• To each ket vector |ψi there must be a bra vector hψ|, and to each bra vector
hψ| there must be a ket vector |ψi (in other words, bra and ket vectors are in
one-to-one correspondence).

• Ket vectors correspond to wave functions or to column vectors:


 
a
|ψi ←→ ψ(x), .
b

• Bra vectors correspond to complex conjugated wave functions or to complex


conjugated row vectors:

hψ| ←→ ψ ∗ (x), a∗ b∗ .


• If the ket vectors |ψi and |φi correspond to the bra vectors hψ| and hφ|, then the
ket vector |ψi + |φi must correspond to the bra vector hψ| + hφ|.

• If c is a complex number and the ket vector |ψi corresponds to the bra vector
hψ|, then the ket vector c|ψi corresponds to the bra vector c∗ hψ|. (Note that the
factor multiplying hψ| is the complex conjugate of c.)

+ A linear combination c1 |ψ1 i + c2 |ψ2 i of ket vectors thus corresponds


to a linear combination c∗1 hψ1 | + c∗2 hψ2 | of bra vectors for any kets |ψ1 i
and |ψ2 i and any complex numbers c1 and c2 . Such correspondences
are said to be antilinear (anti because of the complex conjugation of c1
and c2 ).

+ A bra vector hψ| defines a mapping from ket vectors to complex num-
bers such that any ket vector |φi is mapped to the complex number
hψ|φi and that any linear combination c1 |φ1 i + c2 |φ2 i is mapped to
the complex number c1 hψ|φ1 i + c2 hψ|φ2 i. Such mappings are called
linear functionals. Mathematically, bra vectors are best seen as being
linear functionals on the vector space of ket vectors. The set of all lin-
ear functionals on a vector space V is also a vector space, called the
dual of V . The one-to-one correspondence between ket and bra vectors
mentioned above is guaranteed by a theorem of functional analysis, the
Riesz Representation Theorem.

70
4.3 Operators and the Dirac notation
As we have just seen, quantum states can be represented not only by wave functions or
column vectors of complex numbers, as befits the problem considered, but also by “ket
vectors" belonging to an abstract Hilbert space. We have also seen that each operator
acting on wave functions or column vectors has a counterpart in term of an operator
acting on ket vectors. To take the same example as in Section 4.1, the Hamiltonian
operator
~2 e2 1
H = − ∇2 − , (4.8)
2µ 4π0 r
which acts on wave functions depending on the space coordinates r, θ and φ corre-
sponds to an operator Ĥ acting on ket vectors. In particular, the eigenvalue equation

Hψnlm (r, θ, φ) = En ψnlm (r, θ, φ) (4.9)

becomes
Ĥ |n, l, mi = En |n, l, mi (4.10)
when expressed in terms of ket vectors rather than wave functions. One can say that the
wave functions ψnlm (r, θ, φ) represent the ket vectors |n, l, mi and that the differential
operator H represents the operator Ĥ. More generally, if a ket vector |ψi is represented
by, e.g., a wave function ψ(r, θ, φ) and an operator  acting on |ψi is represented by an
operator A acting on ψ(r, θ, φ), then the ket vector Â|ψi is represented by the function
Aψ(r, θ, φ). For example, Eq. (3.71) defining the adjoint of an operator would read

hφ| Â|ψi = hψ| † |φi∗ (4.11)

if written in terms of ket vectors |φi and |ψi rather than in terms of generic vectors v
and w.
Given this correspondence, the matrix elements of an operator acting on ket vectors
can be calculated as the matrix elements of the corresponding operator acting on wave
functions or column vectors. That is to say, if the two orthonormal vectors ui and uj
(wave functions or column vectors) represent the orthonormal ket vectors |ui i and |uj i
and the operator A acting on ui and uj represents the operator  acting on |ui i and
|uj i, then the inner product of |ui i with Â|uj i — i.e., hui | Â|uj i — is nothing else than
the inner product of ui with Auj :

hui | Â|uj i = (ui , Auj ). (4.12)

71
For example, when written in terms of ket vectors, Eq. (3.20) reads
 
hu1 | Â|u1 i hu1 | Â|u2 i · · · hu1 | Â|uN i
 hu2 | Â|u1 i hu2 | Â|u2 i · · · hu2 | Â|uN i 
A= .. .. . .. . (4.13)
 
 . . . . . 
huN | Â|u1 i huN | Â|u2 i · · · huN | Â|uN i
Thus calculations involving operators acting on ket vectors can generally be reduced
to calculations involving operators acting on wave functions or column vectors if this
would be necessary for obtaining the result.
The reduction is often not necessary, though. Consider, for example, the following
problem of quantum optics: calculate the matrix element hα|â+ |αi, where α is a com-
plex number, the ket vector |αi is a normalized eigenvector of the operator â− with
eigenvalue α (â− |αi = α|αi and hα|αi = 1), and â+ and â− are ladder operators
(â+ = â†− ). This matrix element can be calculated immediately without passing to
a representation in terms of wave functions or column vectors: In view of Eq. (4.11),
of the fact that â†+ = (â†− )† = â− and of the assumptions that â− |αi = α|αi and
hα|αi = 1,

hα|â+ |αi = hα|â†+ |αi∗ = hα|â− |αi∗ = (αhα|αi)∗ = α∗ . (4.14)

Two important notes:

1. The inner product hφ|ψi is in general a complex number, and hφ|ψi = hψ|φi∗ .

2. The bra vector conjugate to the ket vector Â|ψi can be written as hψ| † . (Recall
that any ket vector |ψi has a conjugate bra vector hψ|, see Section 4.2 above.)

+ That hφ|ψi = hψ|φi∗ simply follows from the axioms of the inner product.
To understand why the bra conjugate to Â|ψi can be written as hψ| † ,
consider the ket vector |Aψi = Â|ψi and the inner product hφ|Aψi,
where |φi is arbitrary. Note that

hφ|Aψi = hφ| Â|ψi = hψ| † |φi∗ (4.15)

and also that


hφ|Aψi = hAψ|φi∗ . (4.16)
Thus hAψ|φi = hψ| † |φi for any |φi, which shows that the bra hAψ| can
be replaced by hψ| † in any calculation.

72
+ Take, for example, hα|â†− . (hα| is the bra conjugate to the ket |αi and as
above â− |αi = α|αi.) Since α∗ hα| is the bra conjugate to the ket α|αi,
we see that hα|â†− = α∗ hα|. In other words, hα| is a left eigenvector of
â†− , in the same way as |αi is a right eigenvector of â− . The words left and
right can be taken as defining the direction in which the operator acts:
in â− |αi = α|αi, the operator â− acts “on the right" on |αi, whereas in
hα|â†− = α∗ hα| the operator â†− acts “on the left" on hα|.
The above calculation of the matrix element hα|â+ |αi could thus be done
quickly by letting â+ act “on the left" on hα|: since â+ = â†− ,

hα|â+ = hα|â†− = α∗ hα| (4.17)

and therefore hα|â+ |αi = α∗ hα|αi = α∗ .


There is nothing unusual with operators having left eigenvectors. For ex-
ample, it is easy to verify that
    
i 1 1 1
=i , (4.18)
0 1 0 0
and also that  
 −i 0
(4.19)

1 0 = −i 1 0 .
1 1
The first of these two equations shows that the column vector it figures
is a right eigenvector of this square matrix, and the second that the corre-
sponding row vector is a left eigenvector of the conjugate transpose of this
matrix (its adjoint). Note that the left and right eigenvectors are different
kinds of vectors (the former is a row vector, the latter a column vector),
in the same way as the left eigenvector of ↠and the right eigenvector of
â are different kinds of vectors (the former is a bra vector, the latter a ket
vector).

We will use the Dirac notation from now on, for simplicity, unless we would talk specif-
ically about operators acting on wave functions or column vectors. However, all the
results stated normally apply to any Hilbert space, not just spaces of ket vectors.

4.4 The Principle of Superposition


The Principle of Superposition is a fundamental principle of Quantum Mechanics. It
postulates that if the vectors |ψa i and |ψb i represent physically possible states of a

73
quantum system, then any linear combination of these two vectors also represents a
physically possible state of that system.
Take, for example, an atom of hydrogen. This atom can be in the ground state, whose
time-dependent wave function is ψ100 (r, θ, φ) exp(−iE1 t/~). It can also be in the 2s
state, whose time-dependent wave function is ψ200 (r, θ, φ) exp(−iE2 t/~). Therefore
an atom of hydrogen can also (at least in principle) be in a linear superposition of these
two states, e.g., in a state whose wave function is given by the equation
1 1
Ψ(r, θ, φ, t) = √ ψ100 (r, θ, φ) exp(−iE1 t/~) + √ ψ200 (r, θ, φ) exp(−iE2 t/~).
2 2
(4.20)
An atom in this state is, in a sense, both in the 1s state and in the 2s state. If checked,
this atom could be found to be in the ground state or, with the same probability, to
be in the 2s state. Note that the state described by Ψ(r, θ, φ, t) is neither the 1s state
nor the 2s state. Some of its physical properties are quite different; for instance, it is
not difficult to see that the probability density |Ψ(r, θ, φ, t)|2 varies in time whereas
|ψ100 (r, θ, φ) exp(−iE1 t/~)|2 and |ψ200 (r, θ, φ) exp(−iE2 t/~)|2 don’t.
As a second example, let us consider the spin states of the silver atoms in Stern’s and
Gerlach’s experiment. As mentioned in Section 1.2 of these notes, these spin states can
be represented by 2-component column vectors, e.g.,
   
1 0
χ+ = and χ− = . (4.21)
0 1

Later in the course, we will see that conventionally these two column vectors repre-
sent a state of spin up (χ+ ) or spin down (χ− ) in the z-direction. We can make linear
combinations of χ+ and χ− to form new column vectors, for instance
     √ 
1 1 1 1 1 0 1/√2
χa = √ χ+ + √ χ− = √ +√ = . (4.22)
2 2 2 0 2 1 1/ 2

Since χ+ and χ− represent possible spin states of an atom of silver, by the Principle of
Superposition χa also represents a possible spin state of that system. In fact, χa repre-
sents a state of “spin up" in the x-direction (not the z-direction, this will be explained
later in the course). Moreover, introducing vectors representing states of “spin up" and
“spin down" in the y-direction, respectively
 √   √ 
1/√ 2 i/ √2
χy+ = and χy− = , (4.23)
i/ 2 1/ 2

74
we can write χa not only as a linear combination of the vectors χ+ and χ− but also as
a linear combination of the vectors χy+ and χy− : as can be checked easily,

1 1
χa = (1 − i)χy+ + (1 − i)χy− . (4.24)
2 2
An atom of silver in the spin state described by χa can thus be understood as being in
a state of “spin up" in the x-direction, or as being in both the state of spin up and the
state of spin down in the z-direction, or also in both the state of “spin up" and the state
of “spin down" in the y-direction. These three descriptions are equivalent, and each
one is as good as the other two, even though they may seem contradictory.

+ It is because the Schrödinger equation is linear and homogeneous that


a linear combination of state vectors can also be a valid state vector. In
this respect, the situation is no different than that encountered, e.g., in
many problems of classical mechanics or electromagnetism also governed
by equations of motion that are linear and homogeneous. For example, the
simple wave equation describing the oscillation of a string you have stud-
ied at Level 1 has this property; hence, the motion of, say, a guitar string
can be be understood as being the sum of a number of different modes
of oscillation (harmonics), all present at the same time. However, making
linear superpositions of quantum mechanical states is more intriguing and
leads to more surprising results than making linear superposition of dif-
ferent modes of oscillations of a guitar string, even though mathematically
the two are analogous.

75
5 Operators (II)

5.1 Hermitian operators


An operator  is said to be Hermitian when

hφ| Â|ψi = hψ| Â|φi∗ (5.1)

for any vectors |φi and |ψi this operator may act on. Many of the important operators
of Quantum Mechanics are Hermitian, although not quite all of them.

+ It should be noted that this definition does not require an operator to


be equal to its adjoint in order to be Hermitian — i.e., it does not say,
and does not necessarily imply, that

 = † . (5.2)

Eq. (5.1) refers only to the action of  on the vectors in the domain of Â,
irrespective of the domain of † . Eq. (5.2) says that  not only satisfies
Eq. (5.1) but also that  and † have the same domain. An operator
satisfying Eq. (5.2) is said to be self-adjoint. An operator which is self-
adjoint is also Hermitian but the converse is not true; some operators
are Hermitian but not self-adjoint (an example is given below). The dis-
tinction between Hermiticity and self-adjointness is of key importance
in Mathematics. Physical quantities such as the energy, the linear mo-
mentum and the orbital angular momentum correspond to operators
that are self-adjoint, not merely Hermitian. Why this needs to be so
will be addressed later in the course.
Let us take an example. The z-component of the angular momentum of
a particle (specifically, the “orbital angular momentum" of that particle)
can be shown to correspond to an operator


Lz = −i~ , (5.3)
∂φ
where φ is the azimuth angle of the particle in spherical polar co-
ordinates. Suppose that we take the domain of Lz to be the space of all
differentiable square-integrable functions y(φ) defined on [0, 2π] and
such that y(0) = y(2π) = 0. Lz is Hermitian in that space, since, if

76
f (φ) and g(φ) are two such functions,
Z 2π Z 2π
∗ ∂g
f (φ) Lz g(φ) dφ = −i~ f ∗ (φ)dφ
0 0 ∂φ
∂f ∗
Z 2π

= −i~ f ∗ (φ)g(φ) + i~ g(φ) dφ
0 0 ∂φ
 Z 2π ∗
∗ ∂f
= −i~ g (φ) dφ , (5.4)
0 ∂φ

and therefore
Z 2π Z 2π ∗
∗ ∗
f (φ) Lz g(φ) dφ = g (φ) Lz f (φ) dφ . (5.5)
0 0

(The boundary term vanishes since g(0) = g(2π) = 0 if g(φ) is in the


domain of Lz .) However, so defined, Lz is not self-adjoint: The domain
of the adjoint of Lz would indeed be the set of all the functions f (φ)
such that, for any function g(φ) in the domain of Lz ,
Z 2π  Z 2π ∗
∗ ∂g ∗ ∂f
−i~ f (φ) dφ = −i~ g (φ) dφ , (5.6)
0 ∂φ 0 ∂φ

and this set includes functions f (φ) which are finite but non-zero at
φ = 0 or at φ = 2π and therefore do not belong to the domain of Lz
(the boundary term vanishes for any function f (φ) finite at φ = 0 and
φ = 2π, as long as g(0) = g(2π) = 0).
By contrast, Lz is not only Hermitian but also self-adjoint if we take its
domain to be the space of all differentiable square-integrable functions
y(φ) defined on [0, 2π] and such that y(0) = y(2π) (i.e., we do not
longer require that y(φ) vanishes at φ = 0 and φ = 2π). Indeed, in
order for the boundary term

−i~ [f ∗ (2π)g(2π) − f ∗ (0)g(0)]

to be zero for any function g(φ) such that g(0) = g(2π), it is necessary
that f (0) = f (2π). The domain of the adjoint of Lz coincides with the
domain of Lz in this case.
That the domain of Lz matters is illustrated by the fact that this op-
erator has no eigenfunctions if we require that y(0) = y(2π) = 0,
and that it has infinitely many eigenfunctions if we only require that

77
y(0) = y(2π). (Explanation: any eigenfunction of Lz must be of the
form exp[i(λ/~)φ]. There is no value of λ for which an exponential
function of this form vanishes both at φ = 0 and at φ = 2π; however,
exp[i(λ/~)0] = exp[i(λ/~)2π] for λ = 0, ±~, ±2~,. . . )

+ The terminology is a bit confused. In Mathematics, an operator satisfy-


ing Eq. (5.1) is said to be symmetric. The term “Hermitian" is generally
used in Physics for the same. It is also used in Mathematics, often as
a synonym of “symmetric" but sometimes to refer only to a particular
kind of symmetric operators. Many physicists use the word “Hermi-
tian" as a synonym of “self-adjoint" (which, as noted above, is fraught
mathematically).

+ Although it is worth keeping in mind that, mathematically speaking,


“Hermitian" is not the same as “self-adjoint", there is actually no dif-
ference between the two in finite-dimensional spaces: any Hermitian
operator defined in a finite-dimensional space is also self-adjoint. This
is not the case for operators defined in an infinite-dimensional space.

+ The word Hermitian is less commonly spelled Hermitean.

Examples

• The spin operator represented by the matrix


 
~ 0 −i
Sy = (5.7)
2 i 0

is Hermitian since, for any 2-component column vectors χ and χ0 , (χ, Sy χ0 ) =


(χ0 , Sy χ)∗ :
  0
~ ∗ ∗  −ib0
  
 ~ 0 −i a

a b ∗ = a b = −i~(a∗ b0 − a0 b∗ )/2
2 i 0 b0 2 ia0
  ∗
 0∗ 0∗
∗ ~ 0 ∗ 0 ∗  −ib
= −i~(a b − ab )/2 = a b
2 ia
    ∗
 ~ 0 −i a
= a0 ∗ b0 ∗ . (5.8)
2 i 0 b

• One of the systems you have studied in your level 1 Quantum Mechanics course
was that formed by a particle of mass m submitted to no force between x = −a

78
and x = a but confined to that interval by impenetrable potential barriers at
x = ±a. The Hamiltonian for this problem can be taken to be
~2 d2
H=− , (5.9)
2m dx2
considered as an operator acting on twice-differentiable, square-integrable func-
tions of x which vanish at x = −a and x = a. This operator is Hermitian since
for any such functions,
Z a Z a
∗ ~2 d2 ψ
φ (x) H ψ(x) dx = − φ∗ (x) 2 dx
−a 2m −a dx
2
 a Z a ∗

~ ∗ dψ dφ dψ
=− φ (x) − dx
2m dx −a −a dx dx
dψ a dφ∗ a Z a 2 ∗
~2
 
∗ d φ
=− φ (x) − ψ(x) + 2
ψ(x) dx
2m dx −a dx −a −a dx
Z a 2 ∗
~2

d φ
=− ψ(x) dx
2m −a dx2
Z a 2 ∗ Z a ∗
~2 d φ ∗ ∗
=− ψ (x) dx = ψ (x) H φ(x) dx . (5.10)
2m −a dx2 −a

(The boundary terms vanish since φ∗ (±a) = ψ(±a) = 0.)

• The ladder operators a+ and a− are non Hermitian. (The proof of this assertion
is left as an exercise.)

Real eigenvalues and orthogonal eigenvectors


The eigenvalues of a Hermitian operator are always real.

Proof: Let us suppose that  is a Hermitian operator and that there


exists a non-zero vector |ψi such that Â|ψi = λ|ψi. (We assume that
|ψi is non-zero because the zero vector is never considered to be an
eigenvector.) Since  is Hermitian, we must have, from Eq. (5.1), that
hψ| Â|ψi = hψ| Â|ψi∗ . But hψ| Â|ψi = λhψ|ψi since Â|ψi = λ|ψi.
Thus λhψ|ψi = λ∗ hψ|ψi∗ , which implies that λ = λ∗ since hψ|ψi∗ =
hψ|ψi = 6 0. (Recall the axioms of the inner product and our assumption
that |ψi is not the zero vector). 

Eigenvectors of a Hermitian operator corresponding to different eigenvalues are always


orthogonal.

79
Proof: Suppose that  is a Hermitian operator, and also that Â|ψ1 i =
λ1 |ψ1 i and Â|ψ2 i = λ2 |ψ2 i with λ1 6= λ2 . Thus

hψ2 | Â|ψ1 i = λ1 hψ2 |ψ1 i, (5.11)


hψ1 | Â|ψ2 i = λ2 hψ1 |ψ2 i. (5.12)

But hψ1 | Â|ψ2 i = hψ2 | Â|ψ1 i∗ since  is Hermitian, and λ2 = λ∗2


since the eigenvalues of a Hermitian operator are real. Moreover,
hψ2 |ψ1 i = hψ2 |ψ1 i∗ from the axioms of the inner product. Complex
conjugating Eq. (5.12) thus gives hψ2 | Â|ψ1 i = λ2 hψ2 |ψ1 i. Subtracting
this last equation from Eq. (5.11) yields (λ1 − λ2 )hψ2 |ψ1 i = 0. There-
fore hψ2 |ψ1 i = 0. (Remember that we assume that λ1 6= λ2 .) 

It is worth noting that not all Hermitian operators have eigenvalues and eigenvectors
(see Section 5.4) and that non-Hermitian operators may also have real eigenvalues or
orthogonal eigenvectors. However, if an operator is Hermitian, one can be sure that its
eigenvalues and eigenvector (if any) have these two important properties.
Hermitian matrices
Matrices representing Hermitian operators are Hermitian. Recall that a Hermitian ma-
trix is one equal to its conjugate transpose — e.g., for 2 × 2 matrices, a matrix such
that    ∗ ∗
a b a c
= ∗ . (5.13)
c d b d∗
(The proof of this assertion is left as a short exercise.)

5.2 Projectors and the completeness relation


We start with a definition: An operator  is said to be idempotent if Â2 =  (i.e., if
Â(Â|ψi) ≡ Â|ψi for any vector |ψi the operator  acts on).
Now, suppose that you are interested in a certain subspace V 0 of the whole vector
space, V . Specifying V 0 is also specifying another subspace of V , namely the space V 0⊥
containing all the vectors of V that are orthogonal to every vector of V 0 . It is intuitively
clear, and can be shown rigorously, that a vector v belonging to V can always be written
as the sum of a vector v 0 belonging to V 0 and a vector v 0⊥ belonging to V 0⊥ , and that
there is only one way of doing so. For example, arrow vectors parallel to the x-axis form
a subspace of the whole vector space of 3D geometric vectors, and any 3D geometric

80
vector v can be written as the sum of a vector v0 parallel the x-direction and a vector
v0⊥ orthogonal to that direction (thus parallel to the yz-plane). In terms of Cartesian
components, if
v = vx x̂ + vy ŷ + vz ẑ (5.14)
with x̂, ŷ and ẑ unit vectors in the respective directions, then v0 is necessarily vx x̂
and v0⊥ is necessarily vy ŷ + vz ẑ. Clearly, v0 is the projection of v onto the x-axis.
More generally, if we write each vector v of V as the sum of a vector v 0 belonging to V 0
and a vector v 0⊥ belonging to V 0⊥ , mapping v to v 0 amounts to projecting v onto the
subspace V 0 . Operators effecting this transformation are called projection operators, or
projectors. In the example above, projecting v onto the x-axis amounts to transforming
this vector into the vector x̂ multiplied by the inner product of x̂ and v (since vx = x̂·v).
Let us consider, for example, a ket vector of unit norm |φi and the operator P̂φ whose
action on any ket vector |ψi is to project it onto the 1-dimensional subspace spanned
by |φi. Thus P̂φ transforms |ψi into the ket vector |φi multiplied by the inner product
of |φi and |ψi:
P̂φ |ψi = hφ|ψi|φi. (5.15)
Since hφ|ψi|φi is just a scalar (a number), and scalars commute with vectors, this equa-
tion can also be written in the form

P̂φ |ψi = |φihφ|ψi. (5.16)

Eq. (5.16) suggests an alternative notation for P̂φ :

P̂φ ≡ |φihφ|, (5.17)

in the sense that the action of |φihφ| on a ket vector |ψi is to transform it into |φihφ|ψi.
The operator P̂φ so defined is a projector.
It is easy to show that P̂φ is idempotent:

P̂φ2 = P̂φ P̂φ = |φihφ|φihφ| = |φihφ| = P̂φ , (5.18)

where we have used our assumption that hφ|φi = 1. It is also possible to show that P̂φ
is Hermitian.

81
Proof: P̂φ is Hermitian since hψa | P̂φ |ψb i = hψb | P̂φ |ψa i∗ for any ket |ψa i,
|ψb i:

hψa | P̂φ |ψb i = hψa |φihφ|ψb i


= hφ|ψb ihψa |φi
= hψb |φi∗ hφ|ψa i∗
∗
= hψb |φihφ|ψa i
= hψb | P̂φ |ψa i∗ .

Hence, P̂φ fulfils the definition of a Hermitian operator. 

It is not extremely difficult to show that any projection operator is idempo-


tent and Hermitian, and that the converse is also true: any operator which
is both idempotent and Hermitian is a projection operator.
To put the above in a mathematically rigorous framework, it is necessary
for the subspace V 0 to have the property of being closed; what this means
cannot be explained within the scope of the course, is relevant only for
infinite-dimensional spaces, and can normally be ignored in applications
of this formalism to Quantum Mechanics. In the jargon of Linear Algebra,
the set V 0⊥ of all the vectors orthogonal to a closed subspace V 0 of a Hilbert
space V is called the orthogonal complement of V 0 , and V is said to be the
direct sum of the two subspaces V 0 and V 0⊥ (in the sense that any vector of
V can be written in one and only one way as the sum of a vector of V 0 and
a vector of V 0⊥ ). This relation is expressed by the equation V = V 0 ⊕ V 0⊥ ,
where the symbol ⊕ denotes the direct sum.

+ Projection operators projecting ket vectors onto a higher-dimensional sub-


space are also used. For example, let us consider the unit vectors |φ1 i, |φ2 i,. . . ,
|φM i, with hφi |φj i = δij . The operator
M
X
P̂ = |φi ihφi | (5.19)
i=1

projects any vector it acts on onto the M -dimensional subspace spanned by


the M unit vectors |φn i.

+ Suppose that we would use wave functions instead of ket vectors. E.g., |ψi
and |φi would correspond, respectively, to a wave function ψ(r) and a wave

82
function φ(r). In this case, the operator P̂φ would correspond to an operator
Pφ defined by the following equation:
Z 
Pφ ψ(r) = φ(r) d3 r0 φ∗ (r0 )ψ(r0 ) . (5.20)

We can therefore write the operator Pφ as φ(r)φ∗ (r0 ) in this representation,


being understood that φ(r)φ∗ (r0 ) is actually an operator, not a mere product
of two functions, and that this operator transforms any ψ(r) it acts on into
φ(r) times the inner product of φ(r) and ψ(r).
+ Suppose that in a certain basis the ket vector |φi is represented by the column
vector c. Then, in that basis, the operator |φihφ| is represented by the matrix
Pφ formed by taking the outer product of c with itself:
Pφ = cc† , (5.21)
where c† denotes the row vector obtained by transposing the column vector
c and complex-conjugating all its elements. (The proof of this assertion is left
as an exercise for the reader. See Appendix A of these notes for the calculation
of the outer product of two column vectors.)

The completeness relation


By a “complete set" of vectors one means a set of vectors spanning the whole of the
vector space considered. By definition, a basis set is always a complete set.
Consider a finite-dimensional Hilbert space of dimension N , spanned by the orthonor-
mal basis {|φ1 i, |φ2 i, . . . , |φN i}. Any vector of that space can be written as a linear
superposition of these |φn i’s; e.g.,
N
X
|ψi = cn |φn i. (5.22)
n=1
Since the basis is orthonormal, each coefficient cn can be calculated as the inner product
of |ψi with the respective basis vector: cn = hφn |ψi. Therefore one can also write
N
X
|ψi = hφn |ψi|φn i, (5.23)
n=1

or, writing the scalar hφn |ψi on the right rather than on the left of the vector |φn i (this
is just a change of notation),
N
X
|ψi = |φn ihφn |ψi. (5.24)
n=1

83
Hence
N
X
|ψi = P̂n |ψi (5.25)
n=1

where P̂n = |φn ihφn |. Since Eq. (5.25) must be true for any vector |ψi, the sum of the
projectors P̂n over all the vectors in the basis can only be the identity operator:
N
X
ˆ
|φn ihφn | = I. (5.26)
n=1

This last equation is an important result called the completeness relation, or closure
relation. (We stress that the |φn i’s must be orthonormal and form a complete set for it
to hold.)

5.3 Bases of eigenvectors: I. Finite-dimensional spaces


Not all Hermitian operators acting in infinite-dimensional spaces — e.g., in spaces of
functions— have eigenvectors and eigenvalues. However, Hermitian operators acting in
finite-dimensional spaces, such as spin operators, always have eigenvectors and eigen-
values. In fact, if  is a Hermitian operator acting in a finite-dimensional Hilbert space,
then it is always possible to form an orthonormal basis of eigenvectors of  spanning
the whole of this Hilbert space.

Proof: We start by showing that the eigenvectors of a Hermitian opera-


tor acting in a finite-dimensional Hilbert space always span that Hilbert
space.
First, let us recall the general result that any square matrix has at
least one eigenvalue (possibly complex), and therefore has eigenvec-
tors. (You may have seen this theorem in a previous course; if not, see,
e.g., “Advanced Engineering Mathematics" by E. Kreyszig.]
Suppose that the operator  acts in a finite-dimensional Hilbert space
H of dimension N . Let HA be the subspace of H spanned by the eigen-
vectors of  and H⊥ the subspace of H formed by the vectors orthog-
onal to all the vectors of HA . The dimension of HA is a certain number
NA ≤ N . This subspace can thus be spanned by a basis of NA vectors.
Let {|φn i} (n = 1, . . . , NA ) be such a basis.
We first consider the possibility that NA < N . If N 6= NA , it is always
possible to find a linearly independent set of N − NA non-zero vectors

84
of H orthogonal to all these NA vectors (recall that linearly independent
vectors can always be made orthogonal to each other using the Gram-
Schmidt method). These N − NA vectors span H⊥ . Let us denote them
by |φn i, n = NA + 1, . . . , N . Joining them to the NA basis vectors
spanning HA gives a set of N basis vectors spanning the whole of H,
{|φn i}, n = 1, . . . , N .
By definition, any vector |γi in HA can be written as a linear combina-
tion of eigenvectors of Â:
X
|γi = ci |ψi i, (5.27)
i

where each vector |ψi i is such that Â|ψi i = λi |ψi i for some number
λ. But then X X
Â|γi = ci Â|ψi i = ci λi |ψi i. (5.28)
i i
The operator  thus transforms vectors belonging to HA into vectors
belonging to HA .
Next we show that  also transforms vectors belonging to H⊥ into
vectors belonging to H⊥ : Suppose, as above, that |γi is in HA . Then,
as just shown, |γA i = Â|γi is also in HA . Now, suppose that |βi is in
H⊥ and consider |βA i = Â|βi. Since  is Hermitian,
hγ |βA i = hγ | Â|βi = hβ | Â|γi∗ = hβ |γA i∗ . (5.29)
However, since |γA i is in HA and |βi is in H⊥ , hβ |γA i∗ = 0 and there-
fore hψ|βA i=0. Since this is true for any vector |γi belonging to HA ,
Â|βi must be in H⊥ if |βi is in H⊥ . Thus  transforms vectors in H⊥
into vectors in H⊥ .
The upshot is that hφi | Â|φj i = 0 if i ≤ N and j ≥ N +1 or if i ≥ N +1
and j ≥ N . The matrix A representing the operator  in the {|φn i}
basis is therefore block-diagonal:
 
× × ... × × 0 ... 0
× × . . . × × 0 . . . 0 
. . .
. . ... ... ... . . . ... 
 
 .. ..

 
× × . . . × × 0 . . . 0 
 
A= , (5.30)
× × . . . × × 0 . . . 0 
 0 0 . . . 0 0 × . . . ×
 
 .. .. . . . . . .. .
. .. .. .. . .. 
 
. .
0 0 ... 0 0 × ... ×

85
where the crosses indicate which elements of A may be non-zero. The
upper and lower diagonal blocks of A are, respectively, NA × NA and
(N − NA ) × (N − NA ) square matrices. Any eigenvector of A corre-
sponds to an eigenvector of Â; indeed, if the numbers cn , n = 1, . . . , N
are the components of a vector c such that Ac = λc, then Â|ψi = λ|ψi
for
XN
|ψi = cn |φn i. (5.31)
n=1

Now, that HA is spanned by the eigenvectors of  implies that any


eigenvector of the matrix A is such that cn = 0 for all n ≥ N + 1 and
cn 6= 0 for at least one value of n ≤ N . However, as any square matrix
has at least one eigenvalue, the lower diagonal block of A must have
eigenvectors. Hence A must have eigenvectors such that cn 6= 0 for at
least one value of n ≥ N + 1 and cn = 0 for all n ≤ N .
The contradiction implies that it is impossible that NA < N . Instead,
NA must be equal to N , which means that HA is the whole Hilbert
space H in which the operator  acts.
Let us now show that an orthonormal basis of eigenvectors of  can
always be found. First, recall that if an eigenvalue λ of an operator  is
M -fold degenerate, then, (1) it is possible to find M linearly indepen-
dent non-zero vectors |ψ1 i, |ψ2 i,. . . , |ψM i such that Â|ψn i = λ|ψn i,
(n = 1, 2, . . . , M ), and (2) any eigenvector belonging to that eigen-
value can be written as a linear combination of these M linearly in-
dependent vectors. As the latter can always be orthonormalized, it
is always possible to form a basis of M orthogonal unit eigenvectors
spanning the subspace formed by all the eigenvectors belonging to that
eigenvalue.
The same also applies to non-degenerate eigenvalues, i.e., to “1-fold de-
generate" eigenvalues (M = 1): If the eigenvalue λ is non-degenerate,
then any of its eigenvectors can be written as the linear combination
c|ψi, where c is a non-zero complex number and |ψi is a unit vector
such that Â|ψi = λ|ψi.
Thus any eigenvector of a Hermitian operator  can be written as a lin-
ear combination of unit eigenvectors of Â. These unit eigenvectors are
mutually orthogonal (both by construction, in the case of degenerate
eigenvalues, and because eigenvectors of Hermitian operators corre-

86
sponding to different eigenvalues are always orthogonal). As any vec-
tor of the Hilbert space in which this operator acts can be written as
a linear combination of its eigenvectors, this set of orthonormal eigen-
vectors is a basis for that space. 

Matrix representation in a basis of eigenvectors


Suppose that {|ψ1 i, |ψ2 i, . . . , |ψN i} is an orthonormal basis formed by eigenvectors
of a Hermitian operator Â. Let us denote by λj the eigenvalue the eigenvector |ψj i
corresponds to. We then have that

hψi | Â|ψj i = λj hψi |ψj i = λj δij . (5.32)

(We replaced hψi |ψj i by δij in the last step since the ket vectors |ψj i are assumed to
be orthonormal.) Therefore, the off-diagonal elements of the matrix A representing Â
in that basis are all zero, and the diagonal elements are the eigenvalues λj :
 
λ1 0 · · · 0
 0 λ2 · · · 0 
 ... .. .. ..  (5.33)
 
A= .
. . . 
..
 
0 0 . λN

A matrix representing an operator in an orthonormal basis of its own eigenvectors


is always diagonal, and each diagonal element is the eigenvalue the respective basis
vector corresponds to.
Spectral decomposition of an operator
As just seen, given a Hermitian operator  acting in a finite-dimensional Hilbert space,
it is always possible to form an orthonormal basis spanning this space with eigenvectors
of that operator. Let {|ψ1 i, |ψ2 i, . . . , |ψN i} be such a basis, with Â|ψn i = λn |ψn i. As
this set is complete, we can invoke the completeness relation (Section 5.2) and write
PN
Iˆ = |ψ ihψ |. However, since Iˆ is the identity operator, Â ≡ ÂIˆ and thus
PNn=1 n n
 = n=1  |ψn ihψn |. Therefore, given that Â|ψn i = λn |ψn i,
N
X
 = λn |ψn ihψn |. (5.34)
n=1

This last equation expresses the operator  in terms of its eigenvalues and of the projec-
tors |ψn ihψn |. The right-hand side of this equation is called the spectral decomposition
of Â.

87
+ We have seen in Section 3.3 that the exponential of an operator can be
defined by a series of powers of that operator. More general functions
can also be defined in terms of the spectral decomposition of that op-
erator. I.e., given Eq. (5.34), a function f (Â) of the operator  can be
taken to be the operator defined by the equation
N
X
f (Â) = f (λn ) |ψn ihψn |. (5.35)
n=1

For example,
N
1 X 1
= |ψn ihψn |. (5.36)
 − λIˆ n=1
λn − λ

(It is clear from this last equation that the operator  − λIˆ is not in-
vertible when λ is an eigenvalue of Â.)

Eigenvalues and eigenvectors of commuting operators


Consider two Hermitian operators, Â and B̂, acting in the same finite-dimensional
Hilbert space, and suppose that  and B̂ commute. That they commute has important
implications:

1. If  and B̂ commute and |ψn i is an eigenvector of Â, then the vector B̂ |ψn i is
also an eigenvector of  corresponding to the same eigenvalue.

Proof: If Â|ψn i = λn |ψn i and  commutes with B̂, then


ÂB̂ |ψn i = B̂ Â|ψn i = λn B̂ |ψn i and therefore B̂ |ψn i is an
eigenvector of  with eigenvalue λn (the same eigenvalue as that
|ψn i corresponds to). 

As an example, take the case of an atom of hydrogen exposed to an external


time-varying electric field oriented in the z-direction. The wave function of this
system evolves according to the time-dependent Schrödinger equation,
∂Ψ
i~ = H(t)Ψ(r, θ, φ, t), (5.37)
∂t
where H(t) is the Hamiltonian of the atom+field system. Thus, given Ψ(r, θ, φ, t)
at a time t = t0 ,
1
Ψ(r, θ, φ, t = t0 + dt) = Ψ(r, θ, φ, t = t0 ) + H(t0 )Ψ(r, θ, φ, t = t0 ) dt.
i~
(5.38)

88
One can show that H(t) commutes with the angular momentum operator Lz .
Thus, if Ψ(r, θ, φ, t = t0 ) is an eigenfunction of Lz , then Ψ(r, θ, φ, t = t0 + dt)
is also an eigenfunction of Lz corresponding to the same eigenvalue. The upshot
is that the wave function Ψ(r, θ, φ, t) then remains an eigenfunction of Lz at all
times, even though it may change in a very complicated way under the effect of
this time-varying electric field.

+ Suppose that  and B̂ commute and that Â|ψn i = λn |ψn i. The eigen-
value λn may or may not be degenerate.
It follows from the fact that |ψn i and B̂ |ψn i are both eigenvectors of Â
corresponding to a same eigenvalue that this eigenvalue is degenerate
if these two vectors are linearly independent. In fact, most eigenvalue
degeneracies arise from the fact that the operator of interest commutes
with another operator.
On the other hand, if λn is not degenerate, then |ψn i is an eigenvector
of B̂ as well as of Â. Indeed, |ψn i and B̂ |ψn i cannot be linearly inde-
pendent if λn is not degenerate; instead, these two vectors must differ
at most by a scalar factor. Hence, if λn is not degenerate, there must
exist a number µn such that B̂ |ψn i = µn |ψn i, which means that |ψn i
is an eigenvector of B̂.

2. One can find a basis constructed from vectors which are eigenvectors both of
 and of B̂ if and only if  and B̂ commute. I.e., there exists a basis set {|ψ1 i,
|ψ2 i, . . . , |ψN i} and scalars λn and µn , n = 1, 2 . . . , N , such that

Â|ψn i = λn |ψn i and B̂ |ψn i = µn |ψn i.

Proof: Suppose that  and B̂ commute and that λ is an eigen-


value of Â. Let us suppose that λ is M -fold degenerate (M could
be any number between 1 and N ). This means that one can
find up to M linearly independent eigenvectors of Â, all cor-
responding to this eigenvalue. M such eigenvectors span a M -
dimensional subspace of the Hilbert space in which  and B̂ are
defined. The vectors belonging to that subspace are all eigenvec-
tors of  corresponding to the eigenvalue λ. Moreover, since Â
and B̂ commute, the action of B̂ on any of these eigenvectors
results in an eigenvector of  also corresponding to this eigen-
value. Thus B̂ transforms vectors of that subspace into vectors

89
of the same subspace. Within that subspace B̂ is therefore equiv-
alent to a Hermitian operator acting in a Hilbert space of dimen-
sion M . Hence it is always possible to form an orthonormal basis
of eigenvectors of B̂ spanning this M -dimensional subspace, and
each vector in that basis is also an eigenvector of  correspond-
ing to the eigenvalue λ. This process can be repeated for each
eigenvalue of  in turn, resulting in a basis of N vectors that are
eigenvectors of both  and of B̂.
We now prove the converse, that  and B̂ commute if one can
find a basis constructed from vectors which are eigenvectors
both of  and of B̂. Suppose that there exists a basis set {|ψ1 i,
|ψ2 i, . . . , |ψN i} and scalars λn and µn , n = 1, 2 . . . , N , such
that

Â|ψn i = λn |ψn i and B̂ |ψn i = µn |ψn i.

Because {|ψ1 i, |ψ2 i, . . . , |ψN i} is a basis, any vector |ψi can


be written as a linear combination of these vectors; e.g., |ψi =
PN
n=1 cn |ψn i. Now,

N
X N
X
ÂB̂|ψi = cn ÂB̂ |ψn i = cn λn µn |ψn i
n=1 n=1
XN
= cn µn λn |ψn i
n=1
XN
= cn B̂ Â|ψn i = B̂ Â|ψi. (5.39)
n=1

Since ÂB̂|ψi = B̂ Â|ψi for any |ψi, Â and B̂ commute. 

If each pair of eigenvalues (λn , µn ) corresponds to a unique |ψn i (up to a scalar factor),
one says that the operators  and B̂ form a complete set of commuting operators. This
means, in practice, that any joint eigenvector of  and B̂ can be unambiguously defined
by a pair of quantum numbers.
This definition generalizes to sets of more than two operators and to infinite-dimensional
spaces. For example, the non-relativistic Hamiltonian of an atom hydrogen, Ĥ, com-
mutes with the angular momentum operators L̂2 and L̂z and the latter two also com-

90
mute with each other:
[Ĥ, L̂2 ] = [Ĥ, L̂z ] = [L̂2 , L̂z ] = 0. (5.40)
The wave functions Ψnlm (r, θ, φ) you have studied in Term 1 are simultaneous eigen-
functions of Ĥ, L̂2 and L̂z . The corresponding eigenvalues completely define these
joint eigenfunctions (up to a constant factor), and therefore Ĥ, L̂2 and L̂z form a com-
plete set of commuting operators.
As we will see in Chapter 6, that two Hermitian operators representing measurable
physical quantities commute means that there is no uncertainty relation limiting the
predictions one can make about what values these quantities could be found to have if
measured jointly.

5.4 Bases of eigenvectors: II. Infinite-dimensional spaces


The mathematical theory of Hermitian operators acting in an infinite-dimensional Hilbert
space is considerably more difficult than that of finite matrices and other operators
acting in finite-dimensional spaces. Stating (let alone proving) general results for the
infinite-dimensional case would go much beyond the scope of this course. However,
the following is worth bearing in mind:

1. As seen in Section 5.1, the Hamiltonian operator defined by Eq. (5.9) is Hermitian
if taken as acting in the Hilbert space of the square-integrable functions which
vanish at x = ±a. It is possible to form an orthonormal basis set of eigenfunc-
tions of that operator, such that any element φ(x) of that Hilbert space can be
written in the form of an expansion on that set of eigenfunctions. I.e., denoting
the orthonormal basis functions by ψn (x), n = 0, 1, 2, . . ., one can write φ(x)
in the form
X∞
φ(x) = cn ψn (x), (5.41)
n=0
where the ψn (x) functions are eigenfunctions of H of unit norm and are such
that Z a
ψi∗ (x)ψj (x) dx = δij . (5.42)
−a

+ As can be checked easily,


(
An cos(kn x) n = 0, 2, 4, . . .
ψn (x) = (5.43)
An sin(kn x) n = 1, 3, 5, . . .

91
with kn = (n + 1)π/(2a) and An a normalization factor. These
functions satisfy the differential equation

~2 d2 ψn
− = En ψn (x), (5.44)
2m dx2
where En is a constant (in fact, En = ~2 kn2 /2m). Since they are
also square-integrable on [−a, a] and zero at x = ±a, they qualify
as eigenvectors of this operator. These functions are orthogonal to
each other since they all correspond to different eigenvalues and H
is Hermitian. Moreover, they can be made to have unit norm by
an appropriate choice of the normalization factors An . That any
function φ(x) belonging to that Hilbert space can be written as an
expansion in these ψn (x) functions is a standard result of Fourier
analysis.

2. Consider a particle of mass m confined to the x-axis but submitted to no force in


the x-direction and free to move to arbitrarily large distances from the origin The
Hamiltonian of this system is also given by Eq. (5.9), as in the previous example,
but here this operator is taken as acting in the Hilbert space of square-integrable
functions on (−∞, ∞). This operator is still Hermitian. However, contrary to
the previous example, it has no eigenfunctions in the mathematical definition of
the term.
+ To be an eigenfunction of H, a function ψ(x) should be a non-trivial
solution of the differential equation

~2 d2 ψ
− = E ψ(x) (5.45)
2m dx2
for a certain value of the constant E and should also be square-
integrable on (−∞, ∞). (By non-trivial solution one means a solu-
tion other than ψ(x) ≡ 0.) Let k = (2mE/~2 )1/2 . Any non-trivial
solution of Eq. (5.45) is of the form

ψ(x) = c+ exp(ikx) + c− exp(−ikx) (5.46)

with c+ and c− two arbitrary complex numbers (not both zero), and
such solutions exist for any real or complex value of E. However,
none of these solutions go to zero both for x → ∞ and x → −∞.
Hence, no function of that form is square-integrable on (−∞, ∞).

92
3. Adding a potential energy term mω 2 x2 /2 to the Hamiltonian of the previous
example changes it into the Hamiltonian of a linear harmonic oscillator,
~2 d2 m
H=− 2
+ ω 2 x2 . (5.47)
2m dx 2
Taken as acting in the Hilbert space of the square-integrable functions on (−∞, ∞),
this Hamiltonian is Hermitian and has infinitely many eigenvalues. It is possible
to form an orthonormal basis set of eigenfunctions of that operator, such that
any square-integrable function can be written in the form of an expansion on
that set of eigenfunctions. (You have studied this system in the Term 1 Quantum
Mechanics course, and we will come back to it a little later in these notes.)
+ Advanced mathematical concepts are necessary to prove the impor-
tant result, stated above, that any square-integrable function can be
written in terms of eigenfunctions of that Hamiltonian.

4. We have already mentioned the angular moment operator Lz (actually, the z-


component of the orbital angular operator L̂, about which there will be much
more later in the course). This operator takes on the very simple form given in
Eq. (5.3) when written in terms of the polar angle φ of a system of spherical polar
co-ordinates. Recall that this polar angle varies between 0 and 2π. As we have
seen in Section 5.1, Lz is a Hermitian operator in the space of all differentiable
square-integrable functions y(φ) defined on [0, 2π] and such that y(0) = y(2π).
The (normalized) eigenfunctions of Lz are the complex exponentials
1
ψm (φ) = √ exp(imφ), m = 0, ±1, ±2, . . . (5.48)

These eigenfunctions form an orthonormal basis set, in the sense that any func-
tion belonging to that Hilbert space can be written in the form of an expansion
on this set of eigenfunctions.

+ The functions ψm (φ) are orthonormal since (check this as an exercise!)


Z 2π Z 2π
∗ 1
ψm (φ)ψn (φ) dφ = exp[i(n − m)φ] dφ = δnm . (5.49)
0 2π 0
These functions are also eigenfunctions of Lz since
d exp(imφ) exp(imφ)
Lz ψm (φ) = −i~ √ = m~ √ = m~ ψm (φ).
dφ 2π 2π
(5.50)

We will come back to this issue later in the course, when studying the position and
momentum operators and other operators with a continuous spectrum.

93
6 Measurements and uncertainties

6.1 Probabilities and measurements


In many respects, Quantum Mechanics is a mere mathematical algorithm for predicting
the outcome of experiments. (Remember from the first lecture that the word “exper-
iment" should be understood in a broad sense, here: e.g., detecting the arrival of a
bunch of photons with the naked eye is an experiment. Remember, also, that a the-
oretical prediction is what a theory says about the results of an experiment, whether
this experiment will be done in the future, or has been done in the past, or is a thought
experiment which will never be done.)
In practice, a well controlled experiment would typically consist in preparing a system
in a certain state (e.g., exciting an atom in a certain way), making a measurement or
some test on this system (e.g., testing whether the atom is in a state of spin up or a state
of spin down), and recording the result of this experiment (a number, or a yes/no result).
From a Quantum Theory standpoint, the initial state of this system would typically be
described by a vector belonging to a certain Hilbert space,6 the measurement would
be described by a theoretical model formulated in terms of operators acting in that
Hilbert space, and the theoretical predictions about the outcome of the measurement
would normally be probabilistic. There is certainty about a specific outcome only if the
probability of this outcome is 0 or 1. E.g., if the probability for a certain result is zero,
this result will never be found — it is impossible. A very small but non-zero probability
only makes the result unlikely, it does not make it impossible.
We have already seen that, depending on the system, the vector describing the state
of a quantum system could be, e.g., a function or one or several variables, a column
vector, or a ket vector. Functions and ket vectors describing quantum states are usually
referred to as wave functions and state vectors, respectively.
The probability of each possible outcome of the experiment is entirely determined by
the wave function or state vector describing the state of the system at the time of the
experiment. Hence, all what one can say about what can be observed on a given quan-
tum system is entirely determined by its wave function or state vector. In this sense,
the wave function or state vector contains all the information that can be known about the
state of the system. In fact, often no distinction is made between the state of a system
and the state vector describing it. E.g., one would say “the atom is in the state |ψi"
6
See Section 4.1 of these notes. Alternative formulations exist but are not addressed in this
course.

94
instead of “the atom is in the state described by the state vector |ψi".
It is important to understand that what is measured in experiments is not the wave
function or state vector describing a quantum state (although the results obtained in
measurements may of course reveal a lot about the state of a system). Namely, there
is no “spin-meter" which would return the values of the coefficients a and b in the ket
vector
|χi = a| ↑ i + b| ↓ i (6.1)
representing the spin state of a given atom of silver. A measurement of the spin in the
z-direction would only say whether each atom measured is found to be in a state of spin
up (| ↑ i) or a state of spin down (| ↓ i). It may be that information about the values of
a and b could be obtained by making well-chosen measurements on many atoms all
prepared in the same state, but no single measurement will return values for a and b.
The wave functions and state vectors are not measureable as such; they are theoretical
constructs which are used to calculate quantities that can be measured.7
The Born rule
Suppose that one would consider checking whether or not a quantum system prepared
in a state |ψi is in a state |φi (for example, cheking whether or not an atom prepared
in the spin state of Eq. (6.1) is in the state of spin up, | ↑ i.) It might seem bizarre
that a system in a state |ψi could be found to be in a different state |φi in a careful
experiment; however, see Section 4.4, about the Principle of Superposition, and note
that we can always write |ψi as a linear combination of |φi and another vector:

|ψi = |φi + (|ψi − |φi). (6.2)

Whether or not a measurement would find the system to be in state |φi cannot (nor-
mally) be predicted with certainty, but it is possible to calculate the probability Pr(|φi; |ψi)
of finding it in that state, given that it was in state |ψi just before the measurement. It
is a fundamental principle of Quantum Mechanics, often referred to as the Born rule or
Born postulate, that this probability is given by the following equation:

|hφ|ψi|2
Pr(|φi; |ψi) = . (6.3)
hφ|φihψ|ψi

[To keep the notation simple, Pr (|φi; |ψi) will usually be written as Pr (|φi) in the
lectures.]
7
This situation is not unique to Quantum Mechanics: e.g., in Classical Mechanics, it is not
possible to measure the Lagrangian of a system of particle; the Lagrangian is merely an auxiliary
function introduced for helping with the calculation of positions and velocities.

95
Note that the numerator and denominator of Eq. (6.3) would both be zero if |ψi or |φi
was the zero vector (the zero vector has zero norm and its inner product with any other
vector is zero). As Eq. (6.3) would then be meaningless, the zero vector never describes
a possible state of a quantum system.
The probability Pr (|φi; |ψi) is exactly the same whether the state of the system is
described by the vector |ψi or by the vector |cψi = c|ψi, where c is a non-zero complex
number. Indeed, |hφ|cψi|2 = |c|2 |hφ|ψi|2 since hφ|cψi = chφ|ψi, and hcψ|cψi =
|c|2 hψ|ψi since |cψi = c|ψi and hcψ| = c∗ hψ|. Thus

|hφ|cψi|2 |c|2 |hφ|ψi|2 |hφ|ψi|2


Pr(|φi; |cψi) = = 2 = = Pr(|φi; |ψi).
hφ|φihcψ|cψi |c| hφ|φihψ|ψi hφ|φihψ|ψi
(6.4)
The same applies for the calculation of any probability, and therefore the vectors |ψi
and |cψi describe the same state, in the sense that there is no difference in the predic-
tions from the theory whether we describe the state by |ψi or by |cψi: Multiplying a
state vector by a non-zero complex number never changes its physical content.
Since the norm of the state vectors is irrelevant as far as calculating probabilities is
concerned, the choice of this norm is arbitrary. It is convenient to work only with
normalized state vectors — i.e., with state vectors |ψi and |φi such that hψ|ψi = 1 and
hφ|φi = 1, as Eq. (6.3) then takes the simpler form

Pr(|φi; |ψi) = |hφ|ψi|2 . (6.5)

From now on we will work only with normalized state vectors and eigenvec-
tors.
The inner product hφ|ψi is called the probability amplitude of finding the system in the
state |φi when it is initially prepared in the state |ψi. (Don’t be confused: A probability
amplitude is not a probability! It is, in general, a complex number. The probability
Pr(|φi; |ψi) is is a real number, equal to the square of the modulus of the probability
amplitude.)

+ Even though they describe the same quantum state, |ψi and c|ψi are nonethe-
less two different vectors (unless of course c = 1). The value of c may thus
matter. For example, the linear combinations |ψ1 i + |ψ2 i and |ψ1 i − |ψ2 i
may describe very different states although the kets |ψ2 i and −|ψ2 i de-
scribe the same state.

96
+ Since multiplying the state vector by a non-zero overall factor does not
change anything to the predictions of the theory, as long as probabilities
are correctly calculated, it is sometimes said that quantum states are repre-
sented by rays rather than by vectors (given a vector |ψi, the correspond-
ing ray is the 1-dimensional subspace spanned by |ψi, excluding the zero
vector).

We make three important observations at this point:

1. The probability Pr (|φi; |ψi) is zero if, and only if, the two states |ψi and |φi
are orthogonal (i.e., hφ|ψi = 0). Hence, aside from experimental errors, it is
impossible that a quantum system be found in a state orthogonal to that in which
it was immediately prior to the measurement.

2. The probability Pr(|φi; |ψi) is 1 if, and only if, |φi = c|ψi with c 6= 0 (see proof
below). The two ket vectors |φi and |ψi then describe the same quantum state.
As would be expected, aside from experimental errors, an experiment aiming
at finding whether a quantum system is in the state in which it was prepared
immediately before the measurement will certainly find it in that state.

3. In any other case, 0 < Pr(|φi; |ψi) < 1: the measurement may or may not find
the system to be in the state |φi, and whether it will find it to be in the state
|φi cannot be predicted with certainty. Referring to Eq. (6.2) above, there is a
non-zero probability that it be found in the state |φi and a non-zero probability
that it be found in the state |ψi − |φi; in that sense, one can say that the system
is simultaneously in the states |ψi, |φi and |ψi − |φi immediately before the
measurement (although one should be careful not to put too much meaning in
this interpretation: the theory only predicts probabilities, nothing more).

For example, if we take the spin state given by Eq. (6.1), the probability that the atom is
found to be in the state of spin up with respect to the z-direction is |h ↑ |χi|2 , assuming
that |χi and | ↑ i are normalized (hχ|χi = h ↑ | ↑ i = 1). Since h ↑ | ↓ i = h ↓ | ↑ i = 0
and h ↓ | ↓ i = 1, this probability is |a|2 , and furthermore |b|2 = 1 − |a|2 . (The latter
equation follows from the normalization condition hχ|χi = 1 and from the fact that
hχ|χi = |a|2 h ↑ | ↑ i + a∗ bh ↑ | ↓ i + ab∗ h ↓ | ↑ i + |b|2 h ↓ | ↓ i = |a|2 + |b|2 .) Thus,
in the case where a = 0, the probability to find in the atom in the state | ↑ i is zero (in
fact, the atom is then in the state | ↓ i, which is orthogonal to the state | ↑ i. In the case
where a = 1, then b = 0 and |χi = | ↑ i: the atom will certainly be found in the state

97
| ↑ i. Otherwise, there is a non-zero probability |a|2 to find it in the state | ↑ i and a
non-zero probability |b|2 = 1 − |a|2 not to find it in that state.

+ Being the square of the real number |hφ|ψi|, it is obvious that


Pr(|φi; |ψi) ≥ 0. Moreover, it is clear from Schwarz inequality (Section
2.9 of these notes) that Pr(|φi; |ψi) ≤ hφ|φihψ|ψi, hence, on account
of hφ|φi = hψ|ψi = 1, that Pr(|φi; |ψi) ≤ 1, as befits a probability.
Suppose that |φi = c|ψi with |c| = 1. Then hφ|ψi = c∗ hψ|ψi = c∗
and therefore Pr(|φi; |ψi) = |c|2 = 1. Proving the converse is slightly
longer: Suppose that Pr (|φi; |ψi) = 1. Then, this means that hφ|ψi
must be a number of modulus 1. Thus hψ|φi must also be a number
of modulus 1. Let |ηi = |φi − hψ|φi|ψi. Proceeding as in the proof
of the Schwarz inequality given in Section 2.9, one obtains hη|ηi =
1 − |hφ|ψi|2 , which is zero. Thus |ηi can only be the zero vector and
|φi can only be |ψi times a number of modulus 1. 

6.2 Dynamical variables and observables


By dynamical variable, in Quantum Mechanics, one means a variable such as the po-
sition, momentum, energy or angular momentum of a particle or a whole quantum
system — i.e., physical quantities whose value has a probability distribution which may
vary in time. Constant quantities which have only one possible values for a given sys-
tem, such as the charge and the mass of the electron, are not dynamical variables.
Much of Quantum Mechanics is based on a small number of fundamental principles,
two of which we have already seen: the Born rule, and that each state of a quantum
system can be described by a vector belonging to a Hilbert space. To those, we add the
following ones:

• Given the Hilbert space of the state vectors or wave functions describing the
possible states of the system of interest, each dynamical variable of this system
is associated with a linear operator acting in this Hilbert space.

• The only values a dynamical variable may be found to have, when measured, are
the eigenvalues of the operator with which this variable is associated.

For example, the operator associated with the internal energy of an atom of hydrogen
is the Hamiltonian Ĥ of this system. This operator acts on ket vectors |ψi describing

98
the state of the atom. A measurement of the energy on an atom in a state |ψi would
return one of the eigenvalues of Ĥ as a result (within the experimental uncertainties,
of course).
As noted in Section 1.2, what is directly measured in an actual experiment is often not
the dynamical variable of interest but rather some other quantities the value of this dy-
namical variable can be inferred from. For example, what was directly measured in the
experiment of Stern and Gerlach was the position of each of the silver atoms recorded
after the magnet; however, measuring each of these positions amounted to a measure-
ment of the component of the spin of the respective atom in the direction of the mag-
netic field. Hence, the operator associated with this measurement can be taken to be the
relevant spin operator rather than an operator whose eigenvalues would correspond to
the possible positions at which atoms could have been found in the experiment.
We will soon see that it is important for the consistency of the theory that the opera-
tors associated with dynamical variables each have a complete set of eigenvectors. It
is also important that the eigenvalues of these operators are real (not complex) since
physical quantities such as the position, momentum, etc, are real. For these two rea-
sons, dynamical variables are represented by Hermitian operator whose eigenvectors
form a complete set. Recall from Sections 5.3 that Hermitian operators acting in a
finite-dimensional space always have a complete set of eigenvectors. Hermitian oper-
ators acting in an infinite-dimensional space may or may not have a complete set of
eigenvectors (here, by eigenvectors, we mean ordinary eigenvectors as well as the gen-
eralized eigenvectors associated with the continuous spectrum, see Chapter 8 of these
notes).
As the term “dynamical variable" is a bit heavy and a bit undescriptive, in the follow-
ing we will use the word “observable" to refer to a dynamical variable amenable (in
principle) to a measurement.

+ In this we follow the textbook recommended for the Term 1 Quantum


Mechanics course.8 However, in the context of Quantum Mechanics, the
word “observable" is more commonly taken to mean a Hermitian opera-
tor whose eigenvectors form a complete set spanning the relevant Hilbert
space, whether or not this operator corresponds to a physical quantity
accessible to experiment. Within this definition, any Hermitian operator
acting in a finite-dimensional Hilbert space is an observable.
8
D J Griffiths and D F Schroeter, “Introduction to Quantum Mechanics", 3rd ed., Cambridge
University Press, Cambridge (2018).

99
The question we are thus concerned with is calculating the probability that the outcome
of the measurement of a certain observable is a given eigenvalue of the Hermitian oper-
ator representing this observable in the theory. We will assume that the wave function
or state vector representing the state of the system is known. How such probabilities
depends on whether the eigenvalues of interest form part of the discrete spectrum of
the relevant operator or part of its continuum spectrum (if there would be one). The
latter situation may arise only in infinite-dimensional Hilbert spaces and is more fully
addressed in Chapter 8. At this stage we exclusively consider the case of dis-
crete eigenvalues. Much of what we will say for the discrete case also applies to the
continuous case, with the replacement of discrete summations by integrals. There is,
however, an important difference: in the case of a continuum spectrum the calculation
gives densities of probability of finding specific eigenvalues, whereas in the case of a
discrete spectrum it gives probabilities of finding specific eigenvalues.
Suppose that a measurement of an observable A is made on a system in the state |ψi.
Suppose, also, that this observable is associated with a Hermitian operator Â. Consider
a discrete eigenvalue λn of Â, which for the time being we will assume to be non-
degenerate, and an eigenvector |ψn i of  corresponding to that eigenvalue (i.e., the
vector |ψn i is such that Â|ψn i = λn |ψn i.) We also assume that the state vector |ψi
and the eigenvector |ψn i are normalized: hψ|ψi = hψn |ψn i = 1. Then, the probability
Pr(λn ; |ψi) that the observable A is found to have the value λn is given by the equation

Pr(λn ; |ψi) = |hψn |ψi|2 . (6.6)

We note the following:

1. Because  is a Hermitian operator, there is no possibility that some of its eigen-


values would be complex rather than real (which would be problematic since
these eigenvalues are meant to represent results of measurements).

2. Suppose that the system is in an eigenstate |ψi i of  corresponding to an eigen-


value λi (i.e., |ψi = |ψi i). Then there is a zero probability that any other eigen-
value of  would be found in the measurement (because eigenvectors of a Her-
mitian operator corresponding to different eigenvalues are always orthogonal).

3. Eq. (6.6) applies only if the eigenvalue λn is non-degenerate. If λn is M -fold de-


generate, then one can find a set of M orthonormal vectors |ψn1 i, |ψn2 i, . . . , |ψnM i
such that Â|ψnr i = λn |ψnr i, r = 1, . . . , M , and Eq. (6.6) becomes
M
X
Pr(λn ; |ψi) = |hψnr |ψi|2 . (6.7)
r=1

100
4. Obtaining one eigenvalue or another are mutually exclusive possibilities. There-
fore the probabilities Pr(λn ; |ψi) should sum to 1:
X
Pr(λn ; |ψi) = 1, (6.8)
n

where the summation runs over all the eigenvalues of Â. To keep the notation
simple, let us assume that the eigenvalues λn are all non-degenerate and that the
corresponding eigenvectors |ψn i are orthonormal. Then Eq. (6.8) says that

hψn |ψi∗ hψn |ψi =


X X X
1= |hψn |ψi|2 = hψ|ψn ihψn |ψi.
n n n

Since this equation must hold for any state |ψi, we conclude that
X
ˆ
|ψn ihψn | = I, (6.9)
n

where Iˆ is the identity operator. Eq. (6.9) is the completeness relation, Eq. (5.26)
of Section 5.2 of these notes. For the probabilistic interpretation of the inner
products hψn |ψi to make sense, it is thus necessary that the eigenvectors |ψn i
form a complete set.
We know from Section 5.3 that if  is defined in a finite-dimensional space it is
always possible to form an orthonormal basis of eigenvectors of  spanning that
space; hence, in spaces of dimension N , the requirement that the eigenvectors
of  form a complete set does not add to the requirement that  is Hermitian.
The case where the space is infinite-dimensional is more complicated, though.
A Hermitian operator  can represent an observable only if its has a complete
set of eigenvectors (including generalized eigenvectors in the sense of Chapter
8 of these notes).
+ That the theory would not be consistent if these eigenvectors did
not form a complete set can also be understood from the following
argument. Suppose that the eigenvectors of  would not form a
complete set — i.e., that there would exist a vector |ψi orthogonal
to all the eigenvectors of Â. If the system was prepared in that state,
there would then be a zero probability of obtaining any of the eigen-
values of  in a measurement of the dynamical variable associated
with Â, since the inner products hψn |ψi would be zero for all n; this
would be inconsistent with the rule that the only possible values of
this variable are the eigenvalues of Â.

101
+ Important note. The above considerations generalize to the case
of quantum states described by elements of an infinite-dimensional
Hilbert space; however, there are many subtleties and the mathe-
matically rigorous theory is far from straightforward. For example,
we will see, in Chapter 8, that in the case of quantum states de-
scribed by wave functions defined in 3D space, Eq. (6.9) may take
on the form X
ψn (r)ψn∗ (r0 ) = δ(r − r0 ), (6.10)
n

or the form Z
φk (r)φ∗k (r0 ) d3 k = δ(r − r0 ), (6.11)

or even the form


X Z
ψn (r)ψn∗ (r0 ) + φk (r)φ∗k (r0 ) d3 k = δ(r − r0 ), (6.12)
n

depending on the Hamiltonian. In these equations, r and r0 rep-


resent position vectors and δ(r − r0 ) is a “3D delta function". The
functions ψn (r) are eigenfunctions of the Hamiltonian in the ordi-
nary sense of the word (they belong to the Hilbert space of square-
integrable functions and are such that Hψn (r) = En ψn (r), where
H is the Hamiltonian and the En ’s are the eigenenergies). They
correspond to discrete energy levels. Eq. (6.10) has exactly the same
form as Eq. (6.9), except that the eigenfunctions ψn (r) and ψn∗ (r0 )
stand for the ket and bra eigenvectors |ψn i and hψn | and that the
delta function δ(r − r0 ) stands for the identity operator I.ˆ How-
ever, in Eqs. (6.11) and (6.12), the functions φk (r) do not belong
to that Hilbert space, and are eigenfunctions of the Hamiltonian
only in a generalized sense (they correspond to continuously dis-
tributed value of the energy and of the three components of a wave
vector, hence they are summed over through an integration rather
than a discrete summation). For this result to be possible, though,
it is important that the Hamiltonian is not just Hermitian but also
self-adjoint.
We have seen, in Section 5.1, that an operator is self-adjoint when
identical to its adjoint. All self-adjoint operators are Hermitian,
and in finite-dimensional spaces all Hermitian operators are self-
adjoint. However, in infinite-dimensional spaces self-adjointness is

102
a stronger condition than Hermiticity: an operator may be Hermi-
tian but not self-adjoint. Whether a Hermitian operator is or is not
self-adjoint depends on whether the domain of the adjoint of this op-
erator is or is not the same as the domain the operator itself (which
is something that can be difficult to establish). In Quantum Me-
chanics, observables are always described by self-adjoint operators.
Contrary to mere Hermiticity, self-adjointness guarantees that the
probabilities derived from the theory as described above always sum
up to 1, and also that the operator has a spectral decomposition, i.e.,
can be expressed in terms of a (discrete or continuous) sum of pro-
jectors as seen in Section 5.3 in the finite-dimensional case. Both
are essential for the consistency of the theory.
Throughout the rest of this course, we will always assume that the
observables of interest correspond to self-adjoint operators.

6.3 Expectation value of an observable


The expectation value of an observable A in a state |ψi is the matrix element hψ| Â|ψi,
where  is the operator representing this observable. The matrix element hψ| Â|ψi is
also called the expectation value of the operator  in the state |ψi. If  has p distinct
eigenvalues λ1 , λ2 ,. . . , λp , it follows from Eq. (5.34) that
p
X
hψ| Â|ψi = λj Pr(λj ; |ψi), (6.13)
j=1

where Pr (λj ; |ψi) is the probability that the value λj is found if the corresponding
physical quantity was measured on a system in the state |ψi. (The proof is left as an
exercise.)
The reason why hψ| Â|ψi is called an “expectation value" is perhaps best grasped from
the following example: Suppose that you toss a coin with someone, repeatedly, and at
each toss bet one pound that you will get head. Thus at each toss you gain one pound
if you get head and lose one pound (i.e., “gain" minus one pound) if you get tail. In
probabilistic terms, the “expectation value" of your gain at each toss is the amount you
gain if you get head times the probability of that outcome, plus the (negative) amount
you gain if you get tail times the probability of that outcome — i.e., (1 pound) × 0.5 +
(−1 pound) × 0.5 (obviously, this amounts to 0 pound, assuming that the coin is fair).
Eq. (6.13) says just the same in a Quantum Mechanical context: The expectation value

103
of the observable  is the eigenvalue λ1 of that operator times the probability that the
outcome of the experiment is λ1 , plus the eigenvalue λ2 times the probability that the
outcome is λ2 , etc.

+ Since observables are represented by Hermitian operators (in fact, self-


adjoint operators, as noted above), it it clear that hψ| Â|ψi is real: by def-
inition of a Hermitian operator, hψ| Â|ψi = hψ| Â|ψi∗ if  is Hermitian.
It can be shown that the converse is true for complex vector spaces (but
is false for real vector spaces): for complex vector spaces, hψ| Â|ψi is real
for any ket vector |ψi only if  is Hermitian.

6.4 Probability distributions


To summarize the previous sections, Quantum Mechanics predicts that the only values
a dynamical variable A can be found to have are the eigenvalues λn of the operator
 associated with A, and that each of these eigenvalues is obtained with a probability
Pr(λn ; |ψi) given by Eq. (6.6) or Eq. (6.7) if the system is in the state |ψi at the time of
the measurement.
A is thus obtained as a random variable. Following well established traditions, we will
denote the mean of this random variable by hAi and its variance by (∆A)2 :
X
hAi = λn Pr(λn ; |ψi), (6.14)
n
X
(∆A) =2
(λn − hAi)2 Pr(λn ; |ψi). (6.15)
n

(It is clear that the values of hAi and (∆A)2 depend on the state of the system, |ψi; for
simplicity, we do not specify this dependence in the notation.) hAi can also be written
in terms of the operator  as

hAi = hψ| Â|ψi (6.16)

As we have seen in the previous section, hψ| Â|ψi is called the expectation value of Â
in the state |ψi. The variance (∆A)2 can be written in two equivalent ways in terms
of Â:

(∆A)2 = hψ|(Â − hAi)2 |ψi (6.17)


2
2
= hψ| Â |ψi − hψ| Â|ψi . (6.18)

104
+ Proof: First, note that hψ|(Â − hAi)2 |ψi should really be written as
hψ|(Â−hAiI) ˆ 2 |ψi, since hAi is a scalar, not an operator. Since hψ|ψi = 1,

ˆ 2 |ψi = hψ| Â2 − 2hAi + hAi2 Iˆ|ψi


hψ|(Â − hAiI)
= hψ| Â2 |ψi − 2hAihψ| Â|ψi + hAi2
= hψ| Â2 |ψi − hψ| Â|ψi2 . (6.19)

Eqs. (6.17) and (6.18) are therefore equivalent. 


Showing that Eqs. (6.17) and (6.18) are also equivalent to Eq. (6.15) is left
as an exercise.

∆A, the square root of the variance (∆A)2 , is often referred to as the uncertainty in
the variable A. Eqs. (6.17) and (6.18) provide a precise definition of this quantity.
Suppose we have N identical copies of the quantum system of interest (e.g., N atoms
of hydrogen), that all these copies are prepared in the same state |ψi, and that we
make the same measurement of the observable A on each of them. Experimental errors
put aside, the value found for A in each of these N measurements will be one of the
eigenvalues λn of Â. If the theoretical description of these measurements is correct,
the probability distribution Pr (λn ; |ψi) predicts the frequency distribution of these
experimental results.
Let λ(1) be the value of A found for system 1, λ(2) the value found for system 2, etc.,
and λ(N ) the value found for system N . These N results form a statistical sample of
mean λ̄ and standard deviation σ, with
v
N u N
1 X u 1 X 2
λ̄ = λ(j)
and σ= t λ(j) − λ̄ .
N N −1
j=1 j=1

Quantum Mechanics predicts that λ̄ → hAi and σ → ∆A for N → ∞ if experimental


errors can be ignored.
The dispersion of the data about their mean λ̄ is characterized by the standard devi-
ation σ. How close to hAi we may expect each of the measured values of A to be is
characterized by the uncertainty ∆A. We stress that ∆A has nothing to do with an
experimental error. A non-zero value of ∆A sets a fundamental limit on what can be
predicted about the value found for A if this observable was measured, irrespective of
any experimental considerations.

105
Apart for the experimental error, there is certainty about what value A would be found
to have if and only if ∆A = 0. This is the case if the state vector |ψi is in an eigenvector
of Â, and only if it is in an eigenvector of  (see proof below); obviously, the value found
for A would then be the eigenvalue of  this eigenvector corresponds to.

+ Let us first show that ∆A = 0 if |ψi is an eigenvector of Â. Suppose


that Â|ψi = λ|ψi. Then hAi = hψ| Â|ψi = λhψ|ψi = λ. Moreover,
Â2 |ψi = λÂ|ψi = λ2 |ψi. Therefore hψ| Â2 |ψi = λ2 hψ|ψi = λ2 , and
(∆A)2 = hψ| Â2 |ψi − hψ| Â|ψi2 = λ2 − λ2 = 0.
We now prove the converse, which is that |ψi is necessarily an eigenvector
of  if ∆A = 0. Let us assume that ∆A = 0. We note that (∆A)2 =
ˆ 2 |ψi. Let |ηi = (Â − hAiI)|ψi.
hψ|( − hAiI) ˆ Since  and Iˆ are self-
adjoint and hAi is real, (Â − hAiIˆ = ((Â − hAiI) ˆ † . Hence (∆A)2 =
hψ|(Â − hAiI) (Â − hAiI)|ψi = hη|ηi. Thus hη|ηi = 0 since ∆A = 0,
ˆ † ˆ
which means that |ηi is the zero vector, therefore that Â|ψi = hAi|ψi.
Hence |ψi is an eigenvector of Â. 

6.5 Uncertainty relations


Suppose that A and B are observables represented, respectively, by the operators Â
and B̂. One can show that the product of the uncertainties ∆A and ∆B always obeys
the following inequality, where [Â, B̂] is the commutator of  and B̂:
1
(∆A)2 (∆B)2 ≥ − hψ|[Â, B̂]|ψi)2 . (6.20)
4

+ The proof of Eq. (6.20) is similar to the one you have seen in the Term 1
QM course for a particular case of this uncertainty relation. Before starting,
we recall that  and B̂ are self-adjoint operators (see the important note
at the end of Section 6.2). Let Â0 = Â − hAiIˆ and B̂ 0 = B̂ − hBiI. ˆ
By definition of the uncertainties ∆A and ∆B, (∆A)2 = hψ| Â02 |ψi and
(∆B)2 = hψ| B̂ 02 |ψi. Introducing the real number λ, we observe that
(A0 + iλB 0 )|ψi is a vector and that the square of the norm of this vector
is necessarily non-negative (as is the case for any vector). Thus

hψ|(A0† − iλB 0† )(A0 + iλB 0 )|ψi ≥ 0. (6.21)

106
Given that the operators  and B̂ are self-adjoint and that hAi and hBi
are real, it is also true that

hψ|(A0 − iλB 0 )(A0 + iλB 0 )|ψi ≥ 0. (6.22)

That is,

hψ|A02 |ψi + iλhψ|A0 B 0 − B 0 A0 |ψi + λ2 hψ|B 02 |ψi ≥ 0. (6.23)

It is easy to see that A0 B 0 − B 0 A0 = [A, B], and as noted above


hψ|[A, B]|ψi is an imaginary number. Let us denote hψ|[A, B]|ψi by iC,
where C is real, and rewrite Eq. (6.23) in the form

λ2 (∆B)2 − λC + (∆A)2 ≥ 0. (6.24)

This inequation is fulfilled for λ = 0 and must be fulfilled for any real
value of λ. Thus the quadratic equation λ2 (∆B)2 − λC + (∆A)2 = 0,
as an equation for λ, cannot have two distinct real solutions, which is the
case only if C 2 − 4(∆A)2 (∆B)2 ≤ 0. Eq. (6.20) follows. 

+ Despite the minus sign, the right-hand side of this equation is always a
non-negative real number.
Proof: Since  and B̂ are self-adjoint, hψ| ÂB̂ |ψi = hψ| B̂ † † |ψi∗ =
hψ| B̂ Â|ψi∗ . Thus hψ|[Â, B̂]|ψi = hψ| ÂB̂ − B̂ Â|ψi = hψ| B̂ Â −
ÂB̂ |ψi∗ = hψ|[B̂, Â]|ψi∗ = −hψ|[Â, B̂]|ψi∗ . Now, hψ|[Â, B̂]|ψi is a
number, and we have shown that this number is equal to the negative of
its complex conjugate. This number must therefore be imaginary (i times
a real number), which implies that hψ|[Â, B̂]|ψi)2 is negative. 

The following is worth noting:

1. This inequality is a relation between the widths of two different probability dis-
tributions, under the assumption that each one is based on the same state |ψi.
There is no assumption that the variables A and B are measured simultaneously
or even measured in a same experiment.

2. A state |ψi in which the product ∆A∆B is zero may exist if the right-hand side
of this inequality is zero. This is the case, in particular, if  commutes with B̂.
(In fact, if  commutes with B̂, these two operators have common eigenstates,

107
and for such states the uncertainties ∆A and ∆B are both zero.) There is no
theoretical limit on how small ∆A and ∆B may both be in a same quantum
state when  commutes with B̂.
3. The time-energy uncertainty relation is not amenable to this formulation. Time
is a parameter in Quantum Mechanics, not a dynamical variable associated with
a Hermitian operator.

+ One can also show that in finite-dimensional spaces, hψ|[Â, B̂]|ψi = 0 if |ψi
is an eigenvector of  or B̂. This result is consistent with the fact previously
mentioned that ∆A = 0 if |ψi is an eigenvector of Â. However, it does not
apply to the case where one would work in an infinite-dimensional space and
take |ψi to be a generalized eigenvector of  or B̂ (i.e., an eigenvector belong-
ing to the continuous spectrum of one of these operators, in the “physicists’
definition" of these, as discussed in Chapter 8).

6.6 The state of the system after a measurement


This discussion of measurements in Quantum Mechanics would not be complete
without a mention of the influence of a measurement on the quantum state of the
system. There would be enormously to say about this issue; however, it is outside
the syllabus of the course and therefore we discuss it but briefly here.
The first thing to say is that it is not infrequent that a measurement made on a system
destroys this system completely. In their historical experiment, Stern and Gerlach
detected the atoms of silver they studied by the trace they left when absorbed on a
screen they impacted on. Clearly, it would be a moot point to discuss the spin state
of the atoms after such a destructive measurement. However, not all measurements
are of this nature, and experiments can be done in which the system under study is
measured repeatedly. The question then is: if the state of the system was described
by a certain state vector |ψi before the first measurement, what can then one predict
about the outcome of a second measurement?
The question is answered concisely by what is often regarded as a fundamental prin-
ciple of Quantum Mechanics, called the “collapse postulate" or an equivalent name.
In its simplest form, this principle can be stated as follows:

• If a system is initially in a state |ψi and is found to be in a state |φi in a


measurement, then immediately after the measurement this system is in the
state |φi.

108
If a second measurement is made on this system, the outcome of this second mea-
surement therefore needs to be calculated from the state vector |φi, in the same way
as the outcome of the first measurement needs to be calculated from the state vector
|ψi. Thus, if the system is found to be in the state |φi and an identical measurement
would be made immediately after the first, this second measurement would also find
that the system is in state |φi.
In some respects, this rule may seem completely intuitive: if the system is found to
be in state |φi, why would it not be found to be still in the same state if checked and
the state has not changed between the two measurements? Note, however, that the
collapse postulate implies that the state vector of the system changes abruptly upon
a measurement. Suppose that an atom would be in the spin state |χi = a| ↑ i+b| ↓ i
and somehow one would measure the z-component of the spin and find it to be up,
then at the point of this measurement the spin state of this atom changes from |χi
to | ↑ i. It is often said that the measurement “collapses" the wave function (or here
the state vector) from a linear combination of several states to the state found in
the measurement. However, the situation is normally rather more complicated than
just stated and requires a detailed analysis of what a measurement really entails.
There has been much discussion about the status of the collapse postulate and the
interpretation of the abrupt change over from one state to another it describes — e.g.,
whether it corresponds to something “real" in the physical properties of the atom, or
whether it merely reflects a change in what we know about the state of the atom. In
many respects, however, the issue of how the collapse postulate is best interpreted
is a problem of Philosophy rather than of Quantum Theory, and is outside the scope
of this course.
Let us illustrate the above by yet another thought experiment on spin states. Suppose
that the spin state of a spin-1/2 atom (e.g., an atom of silver) is initially represented
by the column vector
 
1
χ= √ √1 . (6.25)
3 2
As we will see later in the course, the column vectors
   
1 1 1
α= and (x)
α =√
0 2 1

are eigenvectors of, respectively, the z-component and the x-component of the spin
operator, and both correspond to a state of spin “up" in the respective direction (i.e.,
they both correspond to an eigenvalue of ~/2 rather than −~/2. Note that these
three column vectors are normalized. Now, imagine an experiment in which one

109
checks whether an atom initially in the state χ is or is not in the state α (thus in the
state of spin up with respect to the z-direction). The probability of finding it in that
state is given by Eq. (6.5) as
p  2
1/3 2
(6.26)
 p p
1 0 p = 1× 1/3 + 0 × 2/3 = 1/3.
2/3

Suppose that the atom is found to be in the state α. Upon this finding, its state
changes from χ to α. If this state is not perturbed by some interaction or another
measurement, a subsequent check of whether it is in the state α will find it to be in
that state with a probability of 1 since
  2
 1
1 0 = 1. (6.27)
0

Instead of checking whether it is in the state α after the first measurement, let us
imagine that it would be checked whether it is in the state α(x) . It would be found
to be in that state with a probability of
  2
 1
(6.28)
p p
1/2 1/2 = 1/2.
0

Assumint that it is found to be in that state, its state would then be α(x) after this
second measurement. If now a third measurement is made, on whether this atom is
or is not in the state α, the probability of finding it in that state is now 1/2, rather
than 1, since
p  2
1/2
(6.29)

1 0 p = 1/2.
1/2

The collapse postulate can be recast in terms of the projector operator P̂φ introduced
in Section 3.10, as we will now see. A system initially in a state |ψi can be found
to be in a state |φi only if hφ|ψi = 6 0. If so, and in view of Eq. (3.103), P̂φ |ψi =
6 0.
Given that c|φi represents the same quantum state as |φi for any non-zero value of
the number c, we see that the collapse postulate can also be phrased as follows: “If
a system is initially in a state |ψi and is found to be in a state |φi in a measurement,
then immediately after the measurement this system is in a quantum state described
by the state vector P̂φ |ψi." Or, in other words,

• If a system is initially in a state |ψi, finding it to be in a state |φi projects its


state vector onto the 1-dimensional subspace spanned by |φi.

110
The same applies to the case where what is measured is the value of a certain observ-
able. Suppose that a particular eigenvalue λn of the Hermitian operator  represent-
ing this observable is obtained as a result and that this eigenvalue is non-degenerate.
Then, if |ψn i is a normalized eigenvector of  corresponding to this eigenvalue, the
collapse postulate sets that the measurement leaves the system in the state |ψn i.
This principle can be generalized to the case where the eigenvalue λn is degenerate.
Recall that one can find a set of M orthonormal eigenvectors of  all belonging to the
eigenvalue λn if this eigenvalue is M -fold degenerate. Let |ψn1 i, |ψn2 i, . . . , |ψnM i
be such a set. These M orthonormal vectors span a M -dimensional subspace of the
whole Hilbert space, and the operator
M
X
P̂λn = |ψnr ihψnr | (6.30)
r=1

projects any vector of this Hilbert space onto that M -dimensional subspace. In gen-
eral, the collapse postulate sets that if a system is initially in a state |ψi, finding
the eigenvalue λn of  in a measurement of the corresponding dynamical variable
projects |ψi onto the subspace of this eigenvalue. Correspondingly, the measure-
ment transforms the state vector of the system from |ψi to P̂λn |ψi (or, in terms of
a normalized vector, to P̂λn |ψi/||P̂λn |ψi||).
The case of a pair of observables represented by commuting operators is particularly
noteworthy. Suppose that the observables A and B are represented by the opera-
tors  and B̂, respectively, with [Â, B̂] = 0, and suppose that A is measured. The
measurement transforms the initial state vector into a superposition of eigenvectors
of  belonging to the eigenvalue found in the measurement (λn , say):
M
X
|ψi → P̂λn |ψi = cn |ψnr i (6.31)
r=1

with cn = hψnr |ψi. A subsequent measurement of B resulting in the eigenvalue µm


of B̂ then further transforms this superposition into a superposition of eigenvectors
of B̂ belonging to µm :
M
X M
X
cn |ψnr i → cn P̂µm |ψnr i. (6.32)
r=1 r=1

However (the proof is left as an exercise), the resulting superposition is still an eigen-
vector of  belonging to the eigenvalue λn . Therefore, if A is measured again, after

111
the measurement of B, λn is found with probability 1. In other words, a measure-
ment of B would not affect the result of a measurement of A. In view of this fact,
the observables A and B are said to be compatible. Two observables are compati-
ble when, and only when, they are representing by operators commuting with each
other.

112
7 Wave functions, position and momentum

7.1 Introduction
Wave functions are commonly used to describe the quantum state of atoms and molecules,
or more generally of systems characterized by the positions or the momenta of their
constituents. These wave functions are vectors belonging to a Hilbert space of square-
integrable functions. As noted previously in this course, such spaces are infinite di-
mensional, and the mathematics of infinite dimensional spaces is considerably more
complicated than that of finite dimensional spaces.
In regards to Quantum Mechanics, the most important difference between finite and
infinite dimensional Hilbert spaces concerns the eigenvalues and eigenvectors of the
Hermitian operators representing dynamical variables. For example, spin states can
be described by vectors belonging to a finite dimensional space, e.g., by N -component
column vectors where N is a finite number (N = 2 for spin-1/2 particles). Spin observ-
ables can be represented by N × N Hermitian matrices acting on these column vectors.
The eigenvectors of these matrices are themselves N -component column vectors and
the corresponding eigenvalues are finite in number and form a discrete distribution
(different eigenvalues are separated by a gap). Moreover, these sets of eigenvectors are
complete and span the whole of the Hilbert space in which these matrices act, which is
essential for the consistency of the probability calculations outlined in Chapter 4.
Because they act on vectors belonging to an infinite-dimensional space, however, the
operators representing such dynamical variables as the position, the momentum and
the energy do not have, or do not always have, a complete set of eigenvectors. By
eigenvectors, here, we mean square-integrable eigenfunctions f such that Af = λf ,
where A is the operator and λ is a constant. Hence, it is often not possible to calculate
probabilities for the corresponding measurements using the same method as for finite-
dimensional Hilbert spaces. A mathematically rigorous way round this difficulty, based
on projector operators, has been known since the early days of Quantum Mechanics.
However, the probabilities of interest can often be calculated by following a differ-
ent approach, in which the concept of eigenfunction is generalized to encompass non
square-integrable functions. The possible results of a measurement are then taken to
be the generalized eigenvalues associated with these generalized eigenfunctions. These
“eigenvalues" are normally infinite in number and may form a continuous rather than
a discrete distribution. This approach is illustrated in Section 7.3 by the example of
the momentum operator (the operator corresponding to the momentum of a particle,
p = mv in Classical Mechanics).

113
7.2 The Fourier transform and the Dirac delta function
We start with a review of important mathematical facts particularly relevant in this
context. In principle, this section contains nothing that you have not already seen pre-
viously. Please refer to your maths courses, to the term 1 Quantum Mechanics course,
to a maths or Quantum Mechanics textbook and/or to reliable online material if you
are unfamiliar with any of the results stated below.
The Fourier transform
Let ψ(x) be a function integrable on (−∞, ∞). The Fourier transform of ψ(x) is a
function φ(k) defined by the following equation:
Z ∞
1
φ(k) = exp(−ikx)ψ(x) dx. (7.1)
(2π)1/2 −∞

If φ(k) is the Fourier transform of ψ(x) then ψ(x) is the inverse Fourier transform of
φ(k) and Z ∞
1
ψ(x) = exp(ikx)φ(k) dk. (7.2)
(2π)1/2 −∞
Not all functions have a Fourier transform or an inverse Fourier transform. The con-
ditions under which a function can be Fourier-transformed have been studied in great
details by mathematicians and it would be outside the scope of the course to state them
in full generality. A good understanding of this question would be necessary for an
in depth knowledge of the maths of wave mechanics but is not required at the level
of this course. We just quote a useful result you have probably seen before: if ψ(x)
is a continuous function and |ψ(x)| can be integrated on (−∞, ∞), then ψ(x) has a
Fourier transform φ(k) defined by Eq. (7.1).
The Dirac delta function
The Dirac delta function δ(x − x0 ) is a mathematical object such that
Z ∞
δ(x − x0 )f (x0 ) dx0 = f (x) (7.3)
−∞

for any integrable function f (x) of a real variable x. (The Dirac delta function is not
defined for complex arguments.) There is no difference between δ(x−x0 ) and δ(x0 −x),
so that Eq. (7.3) can also be written
Z ∞
δ(x0 − x)f (x0 ) dx0 = f (x).
−∞

114
It is customary to refer to δ(x − x0 ) as a function. However, the mathematical symbol
δ(x − x0 ) does not represent a function. There is no function δ(x − x0 ), in the usual
sense of the word function, such that Eq. (7.3) could hold for any x and any f (x). The
delta “function" belongs to a different class of mathematical objects called distributions
or generalized functions.
The delta function δ(x − x0 ) can be represented by various mathematical expressions.
In particular, Z ∞
1
exp[ik(x − x0 )] dk = δ(x − x0 ). (7.4)
2π −∞

+ Eq. (7.4) can be justified by the following argument: In view of Eq. (7.1),
Eq. (7.2) can also be written as
Z ∞  Z ∞ 
1 1 0 0 0
ψ(x) = exp(ikx) exp(−ikx )ψ(x ) dx dk.
(2π)1/2 −∞ (2π)1/2 −∞

Permuting the two integrals appearing in this equation yields


Z ∞ Z ∞ 
1
ψ(x) = exp(ikx) exp(−ikx ) dk ψ(x0 ) dx0
0
−∞ 2π −∞
Z ∞ Z ∞ 
1
= exp[ik(x − x )] dk ψ(x0 ) dx0 .
0
(7.5)
−∞ 2π −∞

Eq. (7.4) follows.

+ The following results are explored in the fifth set of workshops and are
worth remembering. Please refer to the corresponding worksheet for proofs
and examples of applications.

1. For any x0 , Z ∞
δ(x − x0 ) dx = 1 (7.6)
−∞
and (
Z b f (x0 ) x0 ∈ (a, b),
δ(x − x0 ) f (x) dx =
a 0 x0 6∈ [a, b].
(The left-hand side of this last equation is not defined if x0 = a or
x0 = b.)
2. δ(αx) = δ(x)/|α| if a 6= 0.

115
3. If F (x) is a differentiable function, then
X 1
δ[F (x)] = δ(x − xn ), (7.7)
n
|F 0 (x n )|

where the xn ’s are the zeros of F (x) (i.e., the values of x at which
F (x) = 0) and
dF
F 0 (xn ) = . (7.8)
dx x=xn
δ[F (x)] has no mathematical meaning if it happens that F (x) and
F 0 (x) are simultaneously zero. For example, δ(x2 ) has no meaning.
4. For any q1 and q2 (q1 6= q2 ),
Z ∞
δ(x − q1 ) δ(q2 − x) dx = δ(q2 − q1 ). (7.9)
−∞

7.3 Eigenfunctions of the momentum operator


As you have seen in Term 1, the x-component of the momentum of a particle is asso-
ciated with the operator

P = −i~ , (7.10)
∂x
in the formulation where the quantum state of the particle is described not by a ket
vector but by a wave function [e.g., ψ(x, y, z)]. For simplicity, let us suppose that the
particle is confined to the x-axis, so that the wave function has no dependence in y and
z and Eq. (7.10) can be written as
d
P = −i~ . (7.11)
dx
This operator acts on wave functions ψ(x) belonging to the Hilbert space of the func-
tions square-integrable on (−∞, ∞).
According to the principles of Quantum Mechanics, the result of a measurement of
the momentum of this particle should be an eigenvalue of P . However, here we hit the
difficulty mentioned in the previous section: this operator has no eigenvalue within the
normal definition of an eigenvalue. Indeed, for P to have an eigenvalue p, there should
exist a square-integrable function ψp (x) such that P ψp (x) = p ψp (x) with ψp (x) not
identically zero. I.e., ψp (x) must be a non-trivial solution of the equation
dψp
− i~ = p ψp (x) (7.12)
dx

116
such that the integral Z ∞
|ψp (x)|2 dx
−∞
exists and is finite. (By non-trivial solution one means a solution ψp (x) which is not
zero for all values of x.) We can deduce from Eq. (7.12) that

ψp (x) = C exp(ipx/~), (7.13)

with C a constant.9 Such solutions exist for any value of p, real or complex. However,
they never belong to the Hilbert space of square-integrable functions on (−∞, ∞), and
therefore they do not qualify as eigenvectors of the operator P in the mathematical
sense of this term.

+ As defined in Section 3.6, the eigenvectors of an operator are vectors


belonging to the vector space in which this operator acts, here the space
of functions square-integrable on (−∞, ∞). It is easy to see that there
is no value of p for which ψp (x) belongs to that space. Indeed, note that

|ψp (x)|2 = | (2π~)−1/2 exp[i(Re p + iIm p)x/~] |2


= | (2π~)−1/2 exp[i(Re p)x/~] exp[i2 (Im p)x/~] |2
= (2π~)−1 | exp[i(Re p)x/~] |2 | exp[−(Im p)x/~] |2 , (7.14)

and since | exp[i(Re p)x/~] |2 = 1 (recall that | exp(iz)| = 1 if z is real),

|ψp (x)|2 = (2π~)−1 exp[−2(Im p)x/~].

Therefore |ψp (x)|2 explodes exponentially when x → ∞ if Im p < 0 or


when x → −∞ if Im p > 0, and remains equal to (2π~)−1 at all values
of x if Im p = 0. Hence there is no value of p for which the integral
Z ∞
|ψp (x)|2 dx
−∞

exists and is finite.

+ The mathematical definition of the spectrum of an operator was also


mentioned in Section 3.6. In the Dirac notation, the spectrum of an op-
erator  acting in a Hilbert space H is the set of the scalars λ such that
9
Check: −i~dψp /dx = (−i~)(ip/~)C exp(ipx/~) = p ψp (x). Eq. (7.12) is readily solved
using the methods you have learned for dealing with separable equations or with linear differ-
ential equation with constant coefficients.

117
the operator  − λIˆ is not invertible, where Iˆ is the identity operator.
This set always contains all the eigenvalues of this operator (i.e., all the
scalars λ such that Â|ψi = λ|ψi for some vector |ψi belonging to H).
However, if H is infinite-dimensional, the spectrum may also contain
values of λ for which there is no vector |ψi such that Â|ψi = λ|ψi.
This is the case for the momentum operator: mathematicians would say
that the spectrum of P is R, the set of all real numbers, even though this
operator has no eigenvalues. The spectrum of P is thus a continuous
distribution of numbers. That Eq. (7.12) has no square-integrable solu-
tions is not an accident, as it can be shown that the continuous part of
the spectrum of an operator is never associated with square-integrable
eigenfunctions.

Following the approach usually adopted in elementary Quantum Mechanics, let us


broaden the concept of eigenvalue and eigenfunction, consider that the functions ψp (x)
are eigenfunctions of P in a generalized sense, and use these functions to calculate
probabilities as if they were square-integrable. We only consider real values of p to
be eigenvalues, though, because the momentum of a particle is a real number (not a
complex number).
+ Mathematical notes:

1. The true mathematical reason for excluding complex values of p,


however, is that they do not belong to the continuous spectrum
of P in the mathematical definition of the continuous spectrum
of an operator.
2. A function f (x) is said to be absolutely integrable on (−∞, ∞)
[to be L1 (−∞, ∞)] if the integral
Z ∞
|f (x)| dx
−∞

is a finite number. Wave functions are normally continuous and


absolutely integrable, and therefore can be written as a Fourier
transform (or inverse Fourier transform, see Section 7.2). In par-
ticular, if ψ(x) is a continuous and absolutely integrable wave
function, there exists a function φ(p) such that
Z ∞
1
ψ(x) = exp(ipx/~)φ(p) dp. (7.15)
(2π~)1/2 −∞

118
In that sense, any continuous and absolutely integrable wave
function can be expanded on the set of the functions ψp (x) de-
fined by Eq. (7.13) with C = (2π~)−1/2 and p real, and there is
no need to include functions ψp (x) with complex values of p in
this set to make it complete.
Absolute integrability is not the same as square-integrability,
though. One can find function that are absolutely integrable but
not square-integrable. Likewise, one can imagine wave func-
tions that are square-integrable but not absolutely integrable. The
above result, about the existence of a Fourier transform, extends
to the more general case of any square-integrable wave function,
although the definition of the Fourier transform must then be
suitably modified (the straightforward definition in terms of ordi-
nary Riemann integrals you have probably seen in a maths course
does not apply to functions that are square-integrable but not ab-
solutely integrable).

7.4 Normalization to a delta function


In the case of a finite-dimensional Hilbert space, it is possible to construct an orthonor-
mal basis for that space from amongst the eigenvectors of any Hermitian operator act-
ing in that space. Denoting these basis vectors by |ψn i, the orthonormality condition
reads hψn |ψin0 = δnn0 . That is,
Z ∞
ψn∗ (x)ψn0 (x) dx = δnn0 (7.16)
−∞

if each of the vectors |ψn i is represented by a square-integrable function ψn (x) with


−∞ < x < ∞.
Eq. (7.16) does not make sense for the generalized eigenfunctions of the momentum
operators defined by Eq. (7.13), both because p varies continuously and because these
functions are not square-integrable on (−∞, ∞). We note, however, that
Z ∞ Z ∞
ψp∗ (x)ψp0 (x) dx = |C|2 exp[i(p0 − p)x/~]) dx
−∞ −∞
Z ∞
= |C|2 ~ exp[i(p0 − p)ξ) dξ
−∞
= 2π|C| ~ δ(p0 − p).
2
(7.17)

119
(We have passed from the first to the second equation by changing variable from x to
ξ = x/~, and from the second to the last by using Eq. (7.4) with x − x0 replaced by
p0 − p and k replaced by ξ.) It is convenient in the applications to choose the constant
C such that this integral is exactly δ(p0 − p). Setting C = (2π~)−1/2 and writing
1
ψp (x) = exp(ipx/~) (7.18)
(2π~)1/2

ensures that Z ∞
ψp∗ (x)ψp0 (x) dx = δ(p0 − p). (7.19)
−∞

Functions ψp (x) satisfying this equation are said to be “normalized to a delta function
in momentum space".
Instead of the functions ψp (x), it is often more convenient to introduce the wave num-
ber, k (k = p/~ and p = ~k), and use the functions
1
ψk (x) = exp(ikx). (7.20)
(2π)1/2

These functions are such that


Z ∞ Z ∞
1
ψk∗ (x)ψk0 (x) dx = exp[i(k 0 − k)x)] dx, (7.21)
−∞ (2π) −∞

and therefore, in view of Eq. (7.4),


Z ∞
ψk∗ (x)ψk0 (x) dx = δ(k 0 − k). (7.22)
−∞

Functions such as ψk (x) satisfying this last equation are said to be “normalized to a delta
function in k-space". Eqs. (7.19) and (7.22) replace Eq. (7.16) for non-square-integrable
functions such as ψp (x) and ψk (x).

+ Remember that here p is the x-component of the momentum of the particle


and k = p/~. Positive values of p and k correspond to a propagation in the
positive x-direction and negative values to a propagation in the negative
x-direction. The scalars p and k are the 1D analogues of the 3D momentum
p and wave vector k.

120
7.5 Probability densities
The fact that the possible values of p are continuously distributed also affects how
probabilities are defined. In regards to predicting the result of a measurement of the
x-component of the momentum, the quantity of interest is not the probability Pr(p) of
obtaining a given value p, but rather the probability Pr([p1 , p2 ]) of obtaining a result
between a certain value p1 and a certain value p2 . There are two reasons for this: (1) The
probability of obtaining a value specified to infinitely many digits in a measurement of a
continuously distributed variable is zero, in the same way as the probability of drawing
(at random) one particular ball from an urn containing infinitely many balls would
be zero. (2) Since no detectors have an infinite resolution, an actual measurement of a
variable distributed continuously can only determine a range of values for this variable.
The probability Pr([p1 , p2 ]) can be written as the integral of a certain density of prob-
ability (or probability density function) P(p):
Z p2
Pr([p1 , p2 ]) = P(p) dp. (7.23)
p1

I.e., P(p) dp is the probability of obtaining a value of the momentum between p and
p+dp (or, which is equivalent because dp is an infinitesimal, the probability of obtaining
a value of the momentum between p − dp/2 and p + dp/2). We stress that P(p)
is a density of probability, not a probability. Pr ([p1 , p2 ]) is a probability and has no
physical dimensions. By contrast, P(p) has the physical dimensions of the inverse of
a momentum.
Suppose that the particle is in a state described by a normalized wave function ψ(x).
According to the rules of Quantum Mechanics, and assuming that the eigenfunctions
ψp (x) are normalized as per Eq. (7.19),
Z ∞ 2 Z ∞ 2
1
P(p) = ψp∗ (x)ψ(x) dx = exp(−ipx/~)ψ(x) dx . (7.24)
−∞ (2π~)1/2 −∞

Rather than working with the functions ψp (x), it is often convenient to express the
momentum p in terms of the wavenumber k and work with the ψk (x) functions defined
by Eq. (7.20). These two formulations are equivalent. In particular, the probability
Pr([k1 , k2 ]) of obtaining a value between k1 = p1 /~ and k2 = p2 /~ in a measurement
of the particle’s wave number in the x-direction is
Z k2
P(k) dk,
k1

121
where P(k) = |φ(k)|2 with φ(k) being the Fourier transform of ψ(x):
Z ∞ Z ∞
∗ 1
φ(k) = ψk (x)ψ(x) dx = exp(−ikx)ψ(x) dx. (7.25)
−∞ (2π)1/2 −∞

+ The normalization to a delta function in p- or k-space ensures that the


probability densities P(p) and P(k) can be calculated as stated above.
To see that this is indeed the case, recall that
Z ∞ 2
P(k) = ψk∗ (x)ψ(x) dx (7.26)
−∞

and note it is necessary that


Z ∞
P(k) dk = 1 (7.27)
−∞

in order for P(k) to be a probability density Now, suppose that we take


ψk (x) to be C exp(ikx), where C is a constant factor not necessarily
equal to (2π)−1/2 . Then
Z ∞ 2
P(k) = |C| 2
exp(−ikx)ψ(x) dx
−∞
Z ∞ ∗ Z ∞ 
2 0 0 0
= |C| exp(−ikx)ψ(x) dx exp(−ikx )ψ(x ) dx .
−∞ −∞
(7.28)
Therefore
Z ∞
P(k) dk = |C|2 ×
−∞
Z ∞ Z ∞ ∗ Z ∞ 
0 0 0
exp(−ikx)ψ(x) dx exp(−ikx )ψ(x ) dx dk.
−∞ −∞ −∞
(7.29)
Let us now change the order in which these three integrals are done, so
as to write this last equation as
Z ∞
P(k) dk = |C|2 ×
−∞
Z ∞ Z ∞ Z ∞
0 ∗ 0
dx dx ψ (x)ψ(x ) dk exp(ikx) exp(−ikx0 ).
−∞ −∞ −∞
(7.30)

122
We can now do the integral over k easily, since
Z ∞ Z ∞
0
dk exp(ikx) exp(−ikx ) = exp[ik(x−x0 )] dk = 2πδ(x−x0 ).
−∞ −∞
(7.31)
Therefore
Z ∞ Z ∞ Z ∞
P(k) dk = 2π|C|2 dx dx0 ψ ∗ (x)ψ(x0 )δ(x − x0 )
−∞
Z−∞

−∞

= 2π|C|2 dx ψ ∗ (x)ψ(x)
−∞
= 2π|C| .2
(7.32)

Requiring that P(k) integrates to 1 means that C should be such that


2π|C|2 = 1, hence that |C| = (2π)−1/2 . Therefore Eq. (7.26) would
not be correct if |C| was not equal to (2π)−1/2 . This relation does not
define the sign of C; however, it is convenient to chose C to be positive
and therefore define ψk (x) as per Eq. (7.20).

7.6 Eigenfunctions of the position operator


In 1D, for a particle confined to the x-axis, the position operator, Q, is the operator
which transforms any function ψ(x) into the function xψ(x). Like the momentum
operator, the position operator has no eigenfunction in the usual sense of the word.
Indeed, an eigenfunction ψq (x) of Q should be a function which is not everywhere
zero and such that Qψq (x) = qψq (x) where q is a constant. I.e., ψq (x) should be such
that xψq (x) = qψq (x). Since q is a constant and x isn’t, the only solution of this
equation is ψq (x) ≡ 0, and this function does not qualify as an eigenfunction.
It is possible to go round this difficulty by broadening the definition of an eigenfunction
even further than in the case of the momentum operator and take ψq (x) to be the delta
function δ(x − q). (Remember that despite its name, a delta function is not a function
at all.) Doing so has the merit that the probability of finding the particle in a particular
region of space can then be calculated in terms of these generalized eigenfunctions
ψq (x), in the same way as the probability of obtaining a value between p1 and p2 in a
measurement of the particle’s momentum can be calculated in terms of the generalized
eigenfunctions ψk (x). I.e., if the particle is in a state described by the wave function

123
ψ(x), the probability Pr([x1 , x2 ]) of finding it between x1 and x2 can be written as
Z x2
P(q) dq,
x1

where Z ∞ 2
P(q) = ψq∗ (x)ψ(x) dx (7.33)
−∞

with ψq (x) = δ(x − q). Indeed, replacing ψq (x) by δ(x − q) in this last equation gives
Z ∞ 2
P(q) = δ(x − q)ψ(x) dx = |ψ(q)|2 , (7.34)
−∞

in complete agreement with the probabilistic interpretation of the wave function.

+ The generalized eigenfunctions ψq (x) can be said to be normalized to a


delta-function since, in view of Eq. (7.9),
Z ∞ Z ∞

ψq (x)ψq0 (x) dx = δ(x − q)δ(x − q 0 ) dx = δ(q 0 − q). (7.35)
−∞ −∞

7.7 The position representation and the momentum rep-


resentation
We have seen that any state vector can be written as the expansion
X
|ψi = cn |ψn i (7.36)
n

with cn = hψn |ψi if {|ψn i, n = 1, 2, . . .} is an orthonormal basis. Taking together as


a single set, the coefficients cn represent the state |ψi in the basis {|ψn i}. Moreover,
probability that the system is found to be in a state |ψn i in a measurement made on the
state |ψi is |hψn |ψi|2 .
The position operator Q and the momentum operator P we have considered so far have
been defined as operators acting on functions of x, not on ket vectors. However, one can
also define a position operator Q̂ and a momentum operator P̂ acting on ket vectors.
The eigenvalue equations Qψq (x) = qψq (x) and P ψp (x) = ψp (x) then correspond to
the equations Q̂|qi = q|qi and P̂ |pi = p|pi, where q and p are constants distributed
continuously. These two equations have no solutions in the Hilbert space of the ket
vectors |ψi, in the same way as the equations Qψq (x) = qψq (x) and P ψp (x) = ψp (x)

124
have no solutions in the Hilbert space of the square-integrable functions on (−∞, ∞).
However, let us proceed as if the vectors |qi and |pi existed, formed a complete set, and
were orthonormal in the sense that
hq|q 0 i = δ(q 0 − q) and hp|p0 i = δ(p0 − p).
Then any vector |ψi could be written both as
Z ∞
|ψi = ψ(q)|qi dq (7.37)
−∞

and as Z ∞
|ψi = φ(p)|pi dp, (7.38)
−∞
with, respectively,
ψ(q) = hq|ψi and φ(p) = hp|ψi. (7.39)
Eqs. (7.37) and (7.38) are formally equivalent to Eq. (7.36), apart that their right-hand
sides are integrals rather than discrete sums. The coefficients ψ(q) and φ(p) of these
two expansions are wave functions. The functions ψ(q) (or ψ(x) if instead of q we
use the letter x to denote the position) are “wave functions in position space", and the
functions φ(p) are “wave functions in momentum space". The former are nothing else
than the wave functions you have already encountered in Year 1. We have already
mentioned their probabilistic interpretation: the density of probability that the particle
is found to be at a position q is |ψ(q)|2 (i.e, |hq|ψi|2 ). Likewise, the density of probability
that the particle is found to have a momentum p is |φ(p)|2 (i.e., |hp|ψi|2 ).
Both ψ(q) and φ(p) are sets of numbers representing the ket |ψi in the bases of the
“eigenvectors" of Q̂ and of those of P̂ , in the same way as {cn } is a set of numbers
representing |ψi in the {|ψn i} basis used to write Eq. (7.36). Working in the {|qi} basis
is working in the “position representation", and working in the {|pi} basis is working
in the “momentum representation".

+ In the same way as ket vectors are represented by wave functions, oper-
ators acting on ket vectors are represented by operators acting on wave
functions. For example, in a basis {|uj i, j = 1, 2, . . . , N } of orthonor-
mal ordinary ket vectors, the identity operator would be represented by
the matrix of elements hui | Iˆ|uj i, i.e., by the identity matrix (hui | Iˆ|uj i =
hui |uj i = δij since the basis is orthonormal). In the position representa-
tion, the identity operator would be represented by the operator of “ele-
ments" hq| Iˆ|q 0 i, i.e., by the delta function δ(q 0 − q) (note that hq| Iˆ|q 0 i =
hq|q 0 i = δ(q 0 − q)).

125
Let us call the position co-ordinate x now, rather than q. In the position representa-
tion, ket vectors are represented by functions of x, Q̂ by the operator multiplying any
function ψ(x) by x, and P̂ by the operator −i~ d/dx. Conversely, in the momentum
representation ket vectors are represented by functions of p, the momentum operator
P̂ by the operator multiplying any function φ(p) by p, and the position operator Q̂ by
the operator i~ d/dp (this last equation will be obtained in a workshop). Moreover, the
wave function in momentum space is the Fourier transform of the wave function in po-
sition space, and conversely the wave function in position space is the inverse Fourier
transform of the wave function in momentum space [see Eqs. (7.1) and (7.2)]:
Z ∞
−1/2
φ(p) = (2π~) exp(−ipx/~)ψ(x) dx, (7.40)
Z−∞

ψ(x) = (2π~)−1/2 exp(ipx/~)φ(p) dp. (7.41)
−∞

+ Let us justify these last two equations. We have seen that the functions
ψp (x) defined by Eq. (7.18) are normalized “eigenfunctions" of this last
operator. Hence we can write ψp (x) as hx|pi and ψp∗ (x) as hp|xi:
1 1
hx|pi = exp(ipx/~), hp|xi = exp(−ipx/~).
(2π~)1/2 (2π~)1/2
(7.42)
Combining Eq. (7.37) with the identity φ(p) = hp|ψi yields
Z ∞
φ(p) = ψ(x)hp|xi dx, (7.43)
−∞

which in view of Eq. (7.42) is nothing else than Eq. (7.40). Similarly,
Eq. (7.41) follows from the equation
Z ∞
ψ(x) = φ(p)hx|pi dp, (7.44)
−∞

which is obtained by combining Eq. (7.18) with the identity ψ(x) =


hx|ψi.

+ The approach followed in these last paragraphs may appear to be purely


formal since the eigenvalue equations Q̂|qi = q|qi and P̂ |pi = p|pi
have no solutions in the Hilbert space. However, it can be given a pre-
cise mathematical meaning, even though the symbols hq|ψi and hp|ψi
do not represent inner products of vectors in the usual sense of that
term.

126
7.8 The commutator of Q and P
Recall that in the position representation the position operator Q and the momentum
operator P transform any wave function ψ(x) into, respectively, xψ(x) and −idψ/dx.
As you have seen in the Term 1 course, these two operators do not commute; instead,

[Q, P ] = i~. (7.45)

It follows from Eq. (6.20) that measurements of the corresponding dynamical variables
(x and p) are subject to the uncertainty relation ∆x∆p ≥ ~/2.

7.9 Position and momentum operators in 3D space


A position operator and a momentum operator can be defined for each direction of
space. In particular, the operators x̂ and p̂x for the x-direction, the operators ŷ and p̂y
for the y-direction and the operators ẑ and p̂z for the z-direction (x̂ and p̂x are the same
operators as those we denoted by Q̂ and P̂ in the previous sections). In the position
representation, p̂x , p̂y and p̂z are represented by the operators
∂ ∂ ∂
−i~ , −i~ , and − i~ .
∂x ∂y ∂z
These operators obey the commutation relations

[x̂, p̂x ] = [ŷ, p̂y ] = [ẑ, p̂z ] = i~. (7.46)

By contrast, operators pertaining to orthogonal directions always commute with each


other (we stress the word orthogonal here):

[x̂, p̂y ] = [x̂, p̂z ] = 0, [ŷ, p̂x ] = [ŷ, p̂z ] = 0, [ẑ, p̂x ] = [ẑ, p̂y ] = 0. (7.47)

Moreover, position operators commute with position operators and momentum oper-
ators with momentum operators:

[x̂, ŷ] = [x̂, ẑ] = [ŷ, ẑ] = [p̂x , p̂y ] = [p̂x , p̂z ] = [p̂y , p̂z ] = 0. (7.48)

+ Why a position operator always commutes with a momentum operator


for an orthogonal direction is easily understood. For example,
 
∂ ∂
[x, py ]ψ(x, y, z) = −i~ x ψ(x, y, z) − xψ(x, y, z)
∂y ∂y
 
∂ ∂
= −i~ x ψ(x, y, z) − x ψ(x, y, z) = 0. (7.49)
∂y ∂y

127
All what we have seen for the 1D case generalizes to the 3D case. For example, in posi-
tion representation, the 1D momentum operator −i~d/dx becomes the 3D momentum
operator −i~∇, where
∂ ∂ ∂
∇ = x̂ + ŷ + ẑ , (7.50)
∂x ∂y ∂z
with x̂, ŷ and ẑ the unit vectors in the x-, y- and z-directions (the hats here indicate
that these vectors have unit norm, not that they are operators). Likewise, the functions
ψk (x) of Eq. (7.20) become the “plane waves"

ψk (r) = (2π)−3/2 exp(i k · r), (7.51)

where r = x x̂ + y ŷ + z ẑ and k = kx x̂ + ky ŷ + kz ẑ. (Note that k is a wave vector.)


Since k · r = kx x + ky y + kz z, each of these functions factorizes into a product of three
exponentials:

ψk (r) = (2π)−3/2 exp(ikx x) exp(iky y) exp(ikz z). (7.52)

Therefore
Z Z ∞
1
ψk∗ (r)ψk0 (r) d3 r = exp[i(kx0 − kx )x] dx ×
(2π) −∞
Z ∞
1
exp[i(ky0 − ky )y] dy ×
(2π) −∞
Z ∞
1
exp[i(kz0 − kz )z] dz. (7.53)
(2π) −∞

Hence, Z
ψk∗ (r)ψk0 (r) d3 r = δ(kx0 − kx )δ(ky0 − ky )δ(kz0 − kz ). (7.54)

More succinctly,
Z
ψk∗ (r)ψk0 (r) d3 r = δ(k0 − k), (7.55)

with the “3D delta function" δ(k0 − k) defined by the equation

δ(k0 − k) = δ(kx0 − kx )δ(ky0 − ky )δ(kz0 − kz ). (7.56)

128
7.10 Continua of energy levels
For systems described by state vectors belonging to a finite dimensional space, the
eigenvalues of the Hamiltonian (the eigenenergies of the system) are finite in numbers
and form a discrete distribution (i.e., each energy level is separated from the adjacent
levels by an energy gap). This may or may not be the case in infinite dimensional spaces.
For example, harmonic oscillators and infinite square wells only have discrete energy
levels. However, there are also systems for which the Hamiltonian has no eigenvalue
in the mathematical definition of the term but has generalized eigenvalues forming
a continuous distribution, and systems which have both discrete eigenvalues and a
continuous distribution of generalized eigenvalues.
A free particle in 3D is an example of system with a continuous distribution of gener-
alized eigenvalues. That a particle of mass m is free means that its Hamiltonian can
be written as −(~2 /2m)∇2 , without a potential energy term. The functions ψk (r)
of Eq. (7.51) are generalized eigenfunctions of this Hamiltonian and the correspond-
ing generalized eigenenenergies are ~2 k 2 /(2m) with k = |k|: To see this, note that
∇ exp(i k · r) = ik exp(i k · r) and therefore ∇ · ∇ exp(i k · r) = (i)2 k · k exp(i k · r),
whence
~2 2 ~2 k 2
− ∇ ψk (r) = . (7.57)
2m 2m
Since the wave number k varies continuously, these generalized eigenenergies are con-
tinuously distributed and form a continuum of energy levels.
Typically systems such as finite square wells, atoms, molecules and nuclei have both
discrete energy levels and a continuum of energy levels. Atomic hydrogen is a good
example of such systems. You have studied the bound states of that atom in Term 1.
The corresponding energy levels are discrete and correspond to wave functions which
go to zero for r → ∞, where r is the distance between the electron and the nucleus.
This means that the electron has a vanishingly small probability to be arbitrarily far
away from the nucleus (which is why we can say that in such states the electron is
bound to the nucleus).
The energy of these bound states is negative, which can be understood from the fol-
lowing argument: Suppose for an instant that the electron is a classical particle with a
well defined trajectory obeying the rules of Classical Mechanics. Its energy, E, would
then be the sum of its potential energy,

e2 1
V (r) = − , (7.58)
4π0 r
which is negative for all values of r, and its kinetic energy, T , which cannot be negative

129
(mv 2 /2 ≥ 0 since m > 0 and v 2 ≥ 0). Classically, an electron with a total energy E =
T +V (r) can be located at any distance r from the nucleus at which T = E −V (r) ≥ 0.
I.e., when E < 0 the electron cannot go beyond the distance rE at which V (rE ) = E,
according to the rules of Classical Mechanics.
According to the rules of Quantum Mechanics, however, the electron may go beyond
rE by “tunnelling" through the potential barrier, but the probability of finding it at a
distance r must then go to zero for r → ∞. Mathematically, the Schrödinger equation
has a solution finite everywhere and going to zero for r → ∞ only for certain values
of the energy; these values are the bound state eigenenergies you have found in Term
1.
There is no classical potential barrier for E > 0, though: Since V (r) < 0, the kinetic
energy, E − V (r), is positive at all values of r when E > 0. Therefore a classical
electron can go arbitrarily far from the nucleus if its total energy is positive. Corre-
spondingly, in Quantum Mechanics, the atom can be in an unbound state of positive
energy. As the electron can be arbitrarily far in such states, its wave function does not
need to go to zero for r → ∞. Therefore the boundary condition which restricts the
energy of bound states to discrete values does not apply for unbound states, with the
consequence that such states exist for any positive values of E. The corresponding
eigenenergies (in the sense of generalized eigenvalues of the Hamiltonian) form a con-
tinuous distribution. The corresponding wave functions can be obtained analytically,
but they are considerably more complicated than the bound state wave functions you
have studied in Term 1.
More generally, the energy eigenfunctions of a Hamiltonian H can correspond to bound
states or to continuum states. The eigenenergies of bound states form a discrete distri-
bution of energy levels and the corresponding eigenfunctions (ψi (r), say) are square-
integrable:
Z
Hψi (r) = Ei ψi (r), with ψi∗ (r)ψj (r) d3 r = δij . (7.59)

(The index used here to label these eigenfunctions stands for the set of all the quantum
numbers necessary to identify each of the functions unambiguously. E.g., for atomic
hydrogen, i ≡ {n, l, m} where n is the principal quantum number, l the orbital angular
momentum quantum number and m the magnetic quantum number.)
The eigenenergies of continuum states are distributed continuously. The corresponding
eigenfunctions (ψk (r), say) are not square-integrable but can be normalized to a delta
function:
Z
Hψk (r) = Ek ψk (r), with ψk∗ (r)ψk0 (r) d3 r = δ(k − k0 ). (7.60)

130
(For simplicity, we label these continuum wave functions by a wave vector k; in some
applications of this formalism one may need additional quantum numbers or several
wave numbers to uniquely identify each of these eigenfunctions.)
Bound states wave functions are always orthogonal to continuum state wave functions
since these states correspond to different eigenenergies:
Z
ψn∗ (r)ψk (r) d3 r = 0. (7.61)

Depending on the Hamiltonian, its eigenenergy spectrum may be entirely discrete, or


entirely continuous, or include both discrete energy levels and a continuum of energy
levels. In the latter case, a complete set of energy eigenstates always includes both
bound eigenstates and continuum eigenstates. Expanding a wave function ψ(r) on a
basis of energy eigenfunctions then involves both a summation on the former and an
integral on the latter:
X Z
ψ(r) = cn ψn (r) + ck ψk (r) d3 k (7.62)
n

Since the eigenfunctions ψn (r) and ψk (r) are orthonormal,


Z Z
∗ 0 0 3 0
cn = ψn (r )ψ(r ) d r , ck = ψk∗ (r0 )ψ(r0 ) d3 r0 . (7.63)

Replacing cn and ck by these integrals in Eq. (7.62) yields


Z X Z 
ψ(r) = ψn (r)ψn∗ (r0 ) + ψk (r)ψk∗ (r0 ) d3 k ψ(r0 ) d3 r0 . (7.64)
n

Since this relation must hold for any wave function ψ(r) and at any position vector r,
we see that Z
X
ψn (r)ψn∗ (r0 ) + ψk (r)ψk∗ (r0 ) d3 k = δ(r0 − r). (7.65)
n

This equation generalizes the completeness relation we have derived for the case of
a finite-dimensional Hilbert space in Section 5.2 of these notes. Here we work in an
infinite-dimensional Hilbert space, with the consequences that the summation over n
may encompass an infinite number of terms and that continuum eigenstates may need
to be included. Clearly, Eq. (7.65) includes an integral over k only if H has a continuous
spectrum and a sum over n only if H has a discrete spectrum.

131
7.11 Probabilities and wave functions
To recap, one can represent the quantum state of a particle by way of a wave function.
(We ignore spin here — the case of a particle with a non-zero spin is similar but slightly
more complicated as it involves column vectors of wave functions rather than a single
wave function.) One may choose to work in the position representation, in which case
the wave function is a function of the position of the particle (e.g., ψ(x) in 1D or ψ(r) in
3D). Alternatively one may choose to work in the momentum representation, in which
case the wave function is a function of the momentum of the particle (e.g., φ(p) in 1D
or φ(p) in 3D).
The squared modulus of the wave function is a density of probability. Integrating this
density of probability over a range of positions or momenta gives the probability that
the particle has a position or a momentum in that range. This fact is often expressed in
the following ways:

• |ψ(x)|2 dx is the probability that the particle is between x and x + dx.


• |ψ(r)|2 d3 r is the probability that the particle is in a volume d3 r centered at the
point of position r.
• |φ(p)|2 dp is the probability that the particle has a momentum between p and
p + dp.
• |φ(p)|2 d3 p is the probability that the particle’s momentum is in a momentum
space volume d3 p centered at the momentum p.

Systems of several particles


Pretty much any interesting quantum mechanical system contains several particles,
sometimes a huge number of particles, not just one. Think, e.g., to atoms other than
hydrogen, to molecules, to nuclei, to condensed matter systems, etc. Even atomic hy-
drogen is a system of two particles — the proton and the electron — although it is often
reduced to a system containing only one particle by separating the motion of the centre
of mass of the atom from the motion of the electron relative to the proton and consid-
ering only the latter. The wave function formalism generalizes to these more complex
cases. The wave functions are then functions of the coordinates or momenta of all the
particles of the system. For example, the state of a system consisting of a particle A of
position vector rA and a particle B or position vector rB may be described by a wave
function ψ(rA , rB ). In this case, |ψ(rA , rB )|2 d3 rA d3 rB would be the joint probability
that particle A is in a volume d3 rA centered at the point of position rA and particle B
in a volume d3 rB centered at the point of position rB .

132
7.12 The parity operator
You may have come across the following terminology in a previous course: a func-
tion f (x) is said to be of even parity (or more simply, to be even) if f (−x) = f (x)
and of odd parity if f (−x) = −f (x). For example, cos x is an even function
since cos(−x) = cos x, whereas sin x is an odd function since sin(−x) = − sin x.
The function exp(x/a), with a a constant, is neither even nor odd, since changing
x into −x changes this function into exp(−x/a), which is neither exp(x/a) nor
− exp(x/a). The function xn is even (odd) for even (odd) integer values of n.
We are interested in the operator which transforms any function ψ(x) into the func-
tion ψ(−x). It is called the parity operator and we will denote it by Π:

Πψ(x) = ψ(−x). (7.66)

For example, if ψ(x) = exp(−x/a − x2 /a2 ) with a a constant, then Πψ(x) =


exp(x/a − x2 /a2 ).
We note that Π2 is the identity operator: for any ψ(x), Π2 ψ(x) = Π[Πψ(x)] =
Πψ(−x) = ψ(x). This operator has thus two eigenvalues, 1 and −1, corresponding
respectively to eigenfunctions of even parity and to eigenfunctions of odd parity.

Proof: Suppose that


Πψ(x) = λψ(x), (7.67)
with λ a real or complex number. Then Π2 ψ(x) = λ2 ψ(x). Since Π2 is the
identity operator, λ2 must be equal to 1, hence λ is either 1 or −1. Since
Πψ(x) = ψ(−x), Eq. (7.67) says that ψ(−x) = ψ(x) when λ = 1, in which
case the eigenfunction ψ(x) is even, and that ψ(−x) = −ψ(x) when λ = −1,
in which case the eigenfunction ψ(x) is odd.

These properties generalize to 3D and to systems of N particles: the parity


operator Π transforms any wave function ψ(x1 , y1 , z1 , . . . , xN , yN , zN ) into the
wave function ψ(−x1 , −y1 , −z1 , . . . , −xN , −yN , −zN ), and as above, the eigen-
values of Π are 1 and −1. It is worth noting that the point of coordinates
(−x, −y, −z) is the image of the point of coordinates (x, y, z) by a reflec-
tion through the origin: the parity operation is a reflection through the ori-
gin, and changes any position vector r into −r. A function may be even

133
or odd under this operation: As in 1D, ψ(x1 , y1 , z1 , . . . , xN , yN , zN ) is even if
ψ(−x1 , −y1 , −z1 , . . . , −xN , −yN , −zN ) = ψ(x1 , y1 , z1 , . . . , xN , yN , zN ) and is
odd if ψ(−x1 , −y1 , −z1 , . . . , −xN , −yN , −zN ) = −ψ(x1 , y1 , z1 , . . . , xN , yN , zN ).

+ In 1D, the eigenfunctions of the Hamiltonian are either even or odd under a
parity transformation if the potential energy is even, i.e., if V (x) = V (−x)
for all x. In 3D, they are either even or odd or may be chosen to be either
even of odd if the potential energy is even, i.e., if V (r) = V (−r) for any
r.
See Fig. 6 of Professor Cole’s notes for an example: the eigenfunctions of
the 1D harmonic oscillator with potential energy V (x) = (1/2)kx2 are
alternatively even and odd, as should be expected since V (x) = V (−x)
for this system.
Proof: For simplicity, we will only consider the case of a single particle
in 3D. The proof generalizes immediately to the case of N particles. Sup-
pose that ψ(x, y, z) is an eigenfunction of the Hamiltonian: Hψ(x) =
Eψ(x, y, z), with

~2
 2
∂2 ∂2


H=− + + + V (x, y, z) (7.68)
2m ∂x2 ∂y 2 ∂z 2

where m is the mass and V (x, y, z) is an interaction potential. First, we


note that the parity operator commutes with the kinetic energy term. E.g.,
for any φ(x, y, z),

∂2 ∂ ∂
Π 2
φ(x, y, z) = φ(−x, −y, −z)
∂x ∂(−x) ∂(−x)
∂ ∂
= (−1)2 φ(−x, −y, −z)
∂x ∂x
∂2
= Πφ(x, y, z). (7.69)
∂x2
Also, we note that Π also commutes with V (x, y, z) if V (x, y, z) =
V (−x, −y, −z). Π then also commutes with H, and it follows from gen-
eral theorems, (1) that Πψ(x, y, z) is also an eigenfunction of H, and
(2) that ψ(x, y, z) and Πψ(x, y, z) belong to the same eigenenergy (see
page 88).
Let us introduce the new functions ψ+ (x, y, z) = ψ(x, y, z) +
ψ(−x, −y, −z) and ψ− (x, y, z) = ψ(x, y, z) − ψ(−x, −y, −z). Clearly,

134
ψ+ (x, y, z) is of even parity and ψ− (x, y, z) is of odd parity, and
ψ+ (x, y, z) is orthogonal to ψ− (x, y, z). Given that Hψ = Eψ and
HΠψ = EΠψ, we also have that Hψ± = Eψ± .
If the eigenenergy E is not degenerate, ψ+ (x, y, z) and ψ− (x, y, z) can-
not be both eigenfunctions of E since they are linearly independent.
Thus either ψ+ (x, y, z) ≡ 0 or ψ− (x, y, z) ≡ 0, in which case either
ψ(x, y, z) = −ψ(−x, −y, −z) or ψ(x, y, z) = ψ(−x, −y, −z).
If the eigenenergy E is degenerate, then any pair of linearly independent
eigenfunction ψ(x, y, z) and Πψ(x, y, z), can always be replaced by the
linear combinations ψ(x, y, z) + Πψ(x, y, z) and ψ(x, y, z) − Πψ(x, y, z),
which are linearly independent and either even or odd.
The eigenenergies of a 3D system may or may not be degenerate (e.g.,
ignoring spin and relativistic effects, the ground state energy of atomic
hydrogen is not degenerate but all the other energy levels are). However,
in 1D the eigenenergies are never degenerate (proving this would require
a discussion) and as a consequence the energy eigenfunctions are always
of a well defined parity if the potential is even. 

135
8 Quantum harmonic oscillators
This part of the course is essentially a brief revision of what you have seen in the
Michaelmas Term, with an extention to 3D oscillators. Harmonic oscillators are of
great importance in Classical Mechanics, not the least because the motion of systems
of particles in the vicinity of a configuration of stable equilibrium can often be described
in terms of coupled harmonic oscillators. This is also the case in Quantum Mechanics.

8.1 The Hamiltonian and the energy levels of a linear har-


monic oscillator
In Classical Mechanics, a linear harmonic oscillator is a mass point subject to a force
F proportional to its displacement from a fixed point and constrained to move along a
straight line (here taken to be the x-axis): F = −k(x − x0 ) x̂, where x is the position of
the mass, x0 is its equilibrium position, k is the “spring constant", and x̂ is a unit vector
in the positive x-direction. In Hamiltonian Mechanics, this system is described by a
Hamiltonian function H(p, x), where p (the canonical momentum) is the generalized
momentum conjugate to the co-ordinate x. Here p = mdx/dt, with m the mass of the
oscillator, and
p2 1
H(p, x) = + k x2 . (8.1)
2m 2
The corresponding equations of motion are solved readily: The mass point describes
a harmonic oscillation at an angular frequency ω related to the spring constant k and
the mass m by the equation ω = (k/m)1/2 . (A “harmonic oscillation" is an oscillation
described by a sine or cosine function.) Making use of this relation, Eq. (8.1) can be
recast as
p2 1
H(p, x) = + mω 2 x2 . (8.2)
2m 2
The corresponding quantum mechanical Hamiltonian is obtained from the classical
Hamiltonian by replacing p by the momentum operator p̂x and x by the position oper-
ator x̂:
1 2 1
H(p, x) → Ĥ = p̂ + mω 2 x̂2 . (8.3)
2m x 2
In the position representation, the operators x̂ and p̂x are taken to be the operators
which transform a wave function Ψ(x, t) into, respectively, xΨ(x, t) and −i~dΨ/dx,
with the result that Ĥ is then represented by the following operator, which we have

136
already seen several times:

~2 d2 1
H=− 2
+ mω 2 x2 . (8.4)
2m dx 2
It turns out that the time-independent Schrödinger equation can be solved exactly for
this Hamiltonian, either as a differential equation or by using algebraic methods based
on the ladder operator mentioned in the next section. The following is found:

1. The eigenenergies of the linear Harmonic oscillator and the corresponding en-
ergy eigenstates can be labelled by an integer n which can take any non-negative
value:
Ĥ |ψn i = En |ψn i, n = 0, 1, 2, . . . , (8.5)
or in the position representation,

Hψn (x) = En ψn (x), n = 0, 1, 2, . . . (8.6)

2. Each of the eigenenergies En is non degenerate, and

En = ~ω(n + 1/2). (8.7)

These eigenenergies thus form a ladder of equally spaced energy levels. The
bottom “rung" — i.e., the ground state energy — is E0 = ~ω/2, and each level is
separated from the adjacent levels by an energy ~ω. (Note that the ground state
energy is non-zero.)

8.2 The ladder operators


The ladder operators for a linear harmonic oscillator were introduced in the Term 1
QM course. These operators were denoted by a− and a+ in that course and were de-
fined in the position representation, in terms of the operators x and px ≡ −i~d/dx. It
is convenient to denote the corresponding operators acting on ket vectors by, respec-
tively, â (corresponding to a− ) and ↠(the adjoint of â, corresponding to a+ ). These
two operators have the important property of satisfying the following commutation
relation:
[â, ↠] = 1. (8.8)
Moreover, the Hamiltonian of a linear harmonic oscillator of angular frequency ω can
be written in terms of the operators â and ↠as

Ĥ = ~ω ↠â + 1/2 . (8.9)




137
Note that this Hamiltonian is exactly the same as the Hamiltonian Ĥ of Eq. (8.3); the
only difference is that Ĥ is now written in terms of the operators â and ↠instead of
the operators x̂ and p̂x .

+ How these two operators came about is easily understood by noting


that Eq. (8.2) can also be written as
r r ! r r !
mω 2 1 mω 2 1
H(p, x) = x+i p x−i p ,
2 2m 2 2m
(8.10)
or as H(p, x) = A A with

r r
mω 2 1
A= x+i p. (8.11)
2 2m
Passing √ to the quantum mechanical Hamiltonian is then done by setting
a = A/ ~ω, writing H(p, x) as ~ω(aa∗ + a∗ a)/2, and replacing a by
the operators â and a∗ by the adjoint of â with
r r r r
mω 1 † mω 1
â = x̂+i p̂, â = x̂−i p̂. (8.12)
2~ 2m~ω 2~ 2m~ω

A and A∗ were divided by ~ω in the above so as to make the cor-
responding operators â and ↠dimensionless. Since a∗ a ≡ aa∗ ,
~ω(aa∗ + a∗ a)/2 ≡ ~ωa∗ a. However, ~ω(↠â + â↠)/2 6= ~ω↠â since
↠â 6= â↠. Replacing a and a∗ by â and ↠in ~ω(aa∗ + a∗ a)/2 rather
than in ~ωa∗ a ensures that the correct quantum mechanical Hamilto-
nian is obtained. (Note that ~ω(↠â + â↠)/2 = ~ω(↠â + 1/2) since
â↠= ↠â + 1.)

As was shown in the term 1 QM course, the energy levels En can be deduced from
Eq. (8.9) and from the commutation relation of â and ↠by a purely algebraic method
(i.e., without solving the Schrödinger equation as a differential equation). A key result
from this approach is that it is possible to find a set of normalized energy eigenstates
{|ψn i, n = 0, 1, . . .} such that

â|ψn i = n|ψn−1 i with â|ψ0 i = 0, (8.13)

↠|ψn i = n + 1|ψn+1 i. (8.14)

138
A number of other results can be derived from this, e.g., that n̂ = ↠â is a number
operator:
n̂|ψn i = n|ψn i, (8.15)
and also that Ĥ |ψn i = ~ω(n + 1/2)|ψn i. Going up from the energy level En to the
one immediately above amounts to adding a “quantum of energy" ~ω to En . Compared
to the ground state |ψ0 i, the energy eigenstate |ψn i can be understood as containing n
quanta of energy. The operator n̂ thus “counts" the number of energy quanta contained
in the states it acts on.

+ Iterating Eq. (8.14) gives all the normalized energy eigenstates |ψn i in
terms of the ground state, |ψ0 i:
1
|ψn i = √ (↠)n |ψ0 i. (8.16)
n!

Because these various properties hold irrespective of whether the Schrödinger equation
can or cannot be written as a differential equation, they carry over to systems which
cannot be formulated in the position representation — e.g., to photon fields in quantum
electrodynamics. Similar ladder operators are also widely used in quantum field theory.
You may also remember that the eigenvalues of angular momentum operators can be
derived algebraically using ladder operators.
Extension to 3D
Similar ladder operators can be introduced for 3D harmonic oscillators. namely âx and
â†x (the same as â and ↠) and also ây , â†y , âz and â†z . The operators ây and âz and their
adjoints are related to the position operators ŷ and ẑ and to the momentum operators
p̂y and p̂z in the same way as âx and â†x are related to x̂ and p̂x . These operators are
such that
[âx , â†x ] = [ây , â†y ] = [âz , â†z ] = 1. (8.17)
Recall that position and momentum operators pertaining to orthogonal directions al-
ways commute with each other, and also that position operators always commute with
other position operators and momentum operators always commute with other mo-
mentum operators. Therefore ladder operators pertaining to orthogonal directions also
commute with each other. In particular,

[âx , â†y ] = [âx , â†z ] = 0, (8.18)


[ây , â†x ] = [ây , â†z ] = 0, (8.19)
[âz , â†x ] = [âz , â†y ] = 0. (8.20)

139
8.3 The coherent states of a simple harmonic oscillator

As you may remember from a previous Quantum Mechanics course, the wave func-
tion Ψ(r, t) describing the quantum state of a free particle can always be written as
a superposition of plane waves — i.e., as an integral of the form
Z
1
Ψ(r, t) = φ(k) exp(ik · r) exp(−iEk t/~) d3 k, (8.21)
(2π)3/2

where Ek = ~2 k 2 /2m and φ(k) is a certain function of the wave vector k. (We stress
that this result applies to the case of a free particle, namely a particle not interact-
ing with anything. The potential energy of a free particle is the same everywhere
in space and at all times, and can be taken to be identically zero by choice of the
origin of the energy scale.) How the probability density |Ψ(r, t)|2 varies with r and
t depends on the function φ(k), and so does the uncertainty ∆x on the position of
this particle. However, whatever φ(k) is, it is always the case that ∆x will increase
without limit as t → ∞: a free particle always become more and more delocalized
at large times. This delocalization is often referred to as the spreading of the wave
packet.
Remarkably, non-spreading wave functions are possible for the case of a particle
trapped in a quadratic potential well (i.e., if the particle is a simple harmonic oscil-
lator). I.e, when the Hamiltonian is given by Eq. (8.4), the Schrödinger equation

∂Ψ
i~ = HΨ(x, t) (8.22)
∂t
has solutions for which ∆x remains constant in time. These wave functions describe
a particular class of states called coherent states (the word “coherent" meaning that
the wave packet “coheres", i.e., remains together rather than spreads out). Here is
a list of interesting facts about coherent states (see the homework and workshop
problems associated with the course for proofs of many these results):

1. The coherent states are the eigenstates of the ladder operator â. Any real or
complex number α is an eigenvalue of â and the coherent states are described
by the corresponding eigenvectors: If

â|αi = α|αi, (8.23)

then |αi describes a coherent state. (In contrast, the operator ↠has no eigen-
vector.)

140
2. The symbol |αi is usually reserved for the normalized eigenvectors of â. The
coherent state |0i corresponding to α = 0 is the ground state of the oscillator.
(I.e., |0i = |ψ0 i in the notation used above.) Coherent states other than |0i are
not eigenstates of the Hamiltonian. Hence they depend on time (when one
works in the so-called Schrödinger representation, see Chapter 11 of these
notes). Within an overall constant factor,

X αn
|αi(t) = exp(−|α|2 /2) √ exp(−iEn t/~)|ψn i, (8.24)
n=0 n!

where the energies En and the energy eigenstates |ψn i are defined as in
Eqs. (8.7) and (8.16).

3. The expectation values of the position and of the momentum vary in time like
the position and the momentum of a simple harmonic oscillator in classical
mechanics. Specifically, in a state |αi,
r
~
hxi(t) = 2|α| cos(ωt − arg α), (8.25)
2mω
r
m~ω
hpi(t) = −2|α| sin(ωt − arg α). (8.26)
2
The modulus and the argument of the complex number α thus define the
amplitude and the phase of the oscillation.

4. Defining uncertainties as in Section 6.4, the uncertainties on the position and


on the momentum of a particle in a coherent state |αi are given by the fol-
lowing equations:
r r
~ m~ω
∆x = , ∆p = . (8.27)
2mω 2
These uncertainties are constant in time (the wave packet does not spread)
and are the same for any coherent state (they do not depend on α). Moreover,
their product takes on the lowest value allowed by the Heisenberg uncertainty
principle:
∆x∆p = ~/2. (8.28)

5. In position representation, |αi(t = 0) can be represented by the wave func-


tion r 2 !

φα (x) = C exp − x−α , (8.29)
2~

141
where C is a normalization constant. Since φα (x) is the ground state wave
function when α = 0, a coherent state can be described as a “displaced ground
state".

142
9 Tensor products of Hilbert spaces

9.1 Bipartite and multipartite quantum systems


Example 1
Let us start with a brief study of the simple harmonic oscillator in Quantum Mechanics
(a single particle in a quadratic potential well). You are familiar with the Hamiltonian
operator for a linear harmonic oscillator of mass m and angular frequency ω,

~2 d2 1
H1D = − + mω 2 x2 . (9.1)
2m dx2 2
Similarly, for a 3D isotropic oscillator,

~2
 2
∂2 ∂2

∂ 1
H3D = − 2
+ 2 + 2 + mω 2 (x2 + y 2 + z 2 ). (9.2)
2m ∂x ∂y ∂z 2

Alternatively, in spherical polar coordinates,

~2 2 1
H3D = − ∇ + mω 2 r2 . (9.3)
2m 2
(Isotropic means “the same in every direction": the potential energy depends only on
r, the distance of the particle to the point of equilibrium, not on the polar angles θ and
φ describing its angular position.) For a 2D isotropic oscillator,

~2
 2
∂2

∂ 1
H2D = − 2
+ 2 + mω 2 (x2 + y 2 ). (9.4)
2m ∂x ∂y 2

We observe that H2D can also be written as the sum of two 1D Hamiltonians, one in x
and one in y:
H2D = H1Dx + H1Dy , (9.5)
where H1Dx and H1Dy are given by the following equations:

~2 ∂ 2 1 ~2 ∂ 2 1
H1Dx = − + mω 2 x2 , H1Dy = − + mω 2 y 2 . (9.6)
2m ∂x2 2 2m ∂y 2 2
(We write H1Dx and H1Dy in terms of partial derivatives, contrary to what we did in
Eq. (9.1), because we are now dealing with several independent variables. It is custom-
ary to write derivatives as total derivatives rather than partial derivatives when there
is only one independent variable.)

143
Since the eigenenergies of a 1D harmonic oscillator are always of the form ~ω(n +
1/2), where n is any non-negative integer, the eigenenergies of H1Dx and of H1Dy can
be written as ~ω(n + 1/2) and ~ω(n0 + 1/2), respectively, with n, n0 = 0, 1, 2, . . .
Let us denote by ψn (x) a normalized eigenfunction of H1Dx with eigenenergy En =
~ω(n + 1/2) and by ψn0 (y) a normalized eigenfunction of H1Dy with eigenenergy
En0 = ~ω(n0 + 1/2) (n, n0 = 0, 1, 2, . . .):

H1Dx ψn (x) = En ψn (x), H1Dy ψn0 (y) = En0 ψn0 (y). (9.7)

The sets {ψn (x)} and {ψn0 (y)} are both orthonormal since we assume that the ψn (x)’s
and ψn0 (y)’s are normalized and different values of n or n0 correspond to different
eigenenergies:
Z ∞ Z ∞

ψn (x)ψm (x) dx = δnm , ψn∗ 0 (y)ψm0 (y) dy = δn0 m0 . (9.8)
−∞ −∞

Since ψn (x) is an eigenfunction of H1Dx with eigenenergy En and ψn0 (y) is an eigen-
function of H1Dy with eigenenergy En0 , the product ψn (x)ψn0 (y) is an eigenfunction
of H2D with eigenenergy En + En0 :

H2D ψn (x)ψn0 (y) = [H1Dx ψn (x)] ψn0 (y) + ψn (x) [H1Dy ψn0 (y)]
= En ψn (x)ψn0 (y) + En0 ψn (x)ψn0 (y)
= (En + En0 )ψn (x)ψn0 (y). (9.9)

In fact, one can show that any eigenfunction of the Hamiltonian H2D is either a product
of the form ψn (x)ψn0 (y) or a linear combination of such products.
Given Eq. (9.8), it is easy to see that the products ψn (x)ψn0 (y) form an orthonormal
set:
Z ∞Z ∞
[ψn (x)ψn0 (y)]∗ [ψm (x)ψm0 (y)] dx dy
−∞ −∞
Z ∞ Z ∞

= ψn (x)ψm (x) dx ψn∗ 0 (y)ψm0 (y) dy = δnm δn0 m0 . (9.10)
−∞ −∞

It is also possible to show that given any square-integrable function f (x, y), there al-
ways exists a set of constants cnn0 such that
∞ X
X ∞
f (x, y) = cnn0 ψn (x)ψn0 (y). (9.11)
n=0 n0 =0

The products ψn (x)ψn0 (y) thus form an orthonormal basis spanning the space of the
functions square-integrable on the xy-plane. The coefficients cnn0 do not depend on x

144
or y but may depend on other variables, e.g., time. In particular, any time-dependent
wave function Ψ(x, y, t) can be written as an expansion of form
∞ X
X ∞
Ψ(x, y, t) = cnn0 (t)ψn (x)ψn0 (y). (9.12)
n=0 n0 =0

Note what we are doing here: we combine two 1D system into a single 2D system, and
write the wave functions of the latter in terms of wave functions of the former.
Now, rather than a single particle confined to the xy-plane, consider two particles con-
fined to the x-axis — i.e., a particle of mass mA and coordinate xA and a particle of
mass mB and coordinate xB . We denote the Hamiltonian of the first particle by HA
and the Hamiltonian of the second particle by HB , and take

~2 ∂ 2 1
HA = − 2
2 2
+ mA ωA xA , (9.13)
2mA ∂xA 2
~2 ∂ 2 1
HB = − 2
2 2
+ mB ωB xB . (9.14)
2mB ∂xB 2

It might well be possible to treat these two harmonic oscillators as if they were com-
pletely on their own, and, doing this, describe the quantum state of the first one by a
certain wave function ΨA (xA , t) and the quantum state of the other by a certain wave
function ΨB (xB , t). However, it would be necessary to treat them has forming a sin-
gle 2-particle system, rather than two 1-particle systems, if they were interacting with
each other. Typically, an interaction between the two oscillators would depend on the
position of particle B relative to particle A and would be represented by a potential
energy term V (xA − xB ) in the Hamiltonian of the joint system, HAB :

HAB = HA + HB + V (xA − xB ). (9.15)

Treating the two oscillators as a single system implies that the the quantum state of
this system is described by a wave function Ψ(xA , xB , t) rather than by separate wave
functions ΨA (xA , t) and ΨB (xB , t).
At this point, we note that HAB reduces to the sum HA + HB in the absence of this in-
teraction. HAB is then mathematically equivalent to the Hamiltonian H2D of Eq. (9.5),
apart from a trivial change of notation and the unimportant difference that the mass and
angular frequency of oscillators A and B may not be the same. Proceeding as above,
we can introduce a complete set of normalized eigenfunctions of HA and a complete
set of normalized eigenfunctions of HB , respectively {ψAn (xA ), n = 0, 1, 2, . . .} and

145
{ψBn0 (xB ), n0 = 0, 1, 2, . . .}, such that

HA ψAn (xA ) = ~ωA (nA + 1/2)ψAn (xA ), (9.16)


HB ψBn0 (xB ) = ~ωB (n0B + 1/2)ψBn0 (xB ), (9.17)

We then write the 2-particle wave function Ψ(xA , xB , t) as an expansion in products


of these 1-particle eigenfunctions:
∞ X
X ∞
Ψ(xA , xB , t) = cnn0 (t)ψAn (xA )ψBn0 (xB ). (9.18)
n=0 n0 =0

Considering the two harmonic oscillators as a single system may be necessary even if
they are not interacting. For example, take the wave function ψA0 (xA )ψB1 (xB ), which
describes a state where the first oscillator is in its ground state and the second in its
lowest excited state, and the wave function ψA1 (xA )ψB0 (xB ), which describes a state
where the first oscillator is in its lowest excited state and the second in its ground state.
(We do not indicate a dependence on time, here, to keep the notation as simple as pos-
sible. This dependence is not important for our discussion. Assume, e.g., that we are
considering these wave functions at the instant t = 0.) These two wave functions de-
scribe possible states of the system formed two non-interacting oscillators. Hence, by
the Principle of Superposition, any linear combination of these wave functions [(e.g.,
ψA0 (xA )ψB1 (xB ) + ψA1 (xA )ψB0 (xB )] also describes a possible state of this system.
Such linear combinations link the state of the first oscillator to the state of the sec-
ond oscillator; therefore they do not describe quantum states in which the state of one
oscillator can be treated independently from the state of the other.
Terminology
In relation to this first example:

• These two oscillators, when considered as a single system, are said to form a bi-
partite quantum system (i.e., a quantum system composed of two distinct parts
which can be considered jointly or in isolation, depending on the circumstances).
Systems composed of more than two distinct parts are called multipartite sys-
tems.

• A quantum state of the joint system described by a wave function which can be
factorized in a product of the form ΨA (xA , t)ΨB (xB , t) is called a separable (or
product) state (e.g., the state described by the wave function ψA0 (xA )ψB1 (xB )
is separable).

146
• Quantum states that are not separable are called entangled states (e.g., the state
described by the wave function ψA0 (xA )ψB1 (xB ) + ψA1 (xA )ψB0 (xB ) is entan-
gled).

Entangled states vs. separable states


Suppose that the two oscillators of our example do not interact with each other, and that
measurements are made on them. E.g., suppose that Alice checks whether oscillator A
is in the ground state or in the lowest excited state and that Bob does the same on
oscillator B. Clearly, the result will depend on which states A and B are in at the time
of these measurements.
First, let us assume that their joint system is in a separable state of wave function
φAB (xA , xB ), with
"r r # "r r #
1 2 2 3
φAB (xA , xB ) = ψA0 (xA ) + ψA1 (xA ) ψB0 (xB ) + ψB1 (xB )
3 3 5 5
(9.19)
"r r
2 3
= ψA0 (xA )ψB0 (xB ) + ψA0 (xA )ψB1 (xB )+
15 15
"r r #
4 6
ψA1 (xA )ψB0 (xB ) + ψA1 (xA )ψB1 (xB ) . (9.20)
15 15
(As above, we do not indicate a dependence on time as it is not essential.) We have
already seen that the products ψAi (xA )ψBj (xB ) are normalized (in fact, they are or-
thonormal), and it is easy to verify that φAB (xA , xB ) is also normalized. The probabil-
ity P r(0, 0; φAB ) that both Alice and Bob find their oscillator to be in the ground state
is given by the Born rule as
Z ∞ 2
Pr(0, 0; φAB ) = [ψA0 (xA )ψB0 (xB )]∗ φAB (xA , xB ) dxA dxB . (9.21)
−∞

The integral is readily calculated using the orthonormality of the products ψAi (xA )ψBj (xB ),
with the result that Pr(0, 0; φAB ) = 2/15. Similarly, there is a probability of 4/15 that
Alice finds her oscillator to be in the excited state and Bob finds his to be in the ground
state, of 3/15 that Alice finds hers to be in the ground state and Bob finds his to be in
the excited state, and of 6/15 that both Alice and Bob find their oscillator to be in the
excited state:
Pr(0, 0; φAB ) = 2/15, Pr(1, 0; φAB ) = 4/15,
Pr(0, 1; φAB ) = 3/15, Pr(1, 1; φAB ) = 6/15. (9.22)

147
It is worth noting that there is no correlation between the results found by Alice and
those found by Bob: whether Bob finds his oscillator to be in the ground state or in
the excited state, the probability that Alice finds hers to be in the ground state is half
the probability that she finds it to be in the excited state, and similarly, whatever Alice
finds for her oscillator, the probability that Bob finds his to be in the ground state is 2/3
the probability that he finds it to be in the excited state.
We would arrive to the same conclusion for any separable state: in such states, the
results of any measurement on A are completely independent from the results of any
measurement on B.
Instead, let us now assume that these two oscillators are in an entangled state of nor-
malized wave function ψAB (xA , xB ), with
1
ψAB (xA , xB ) = √ [ψA0 (xA )ψB1 (xB ) + ψA1 (xA )ψB0 (xB )] . (9.23)
2
The probability that both Alice and Bob find their oscillator to be in the ground state is
now zero, since ψA0 (xA )ψB0 (xB ) is orthogonal to both ψA0 (xA )ψB1 (xB ) and ψA1 (xA )ψB0 (xB ).
Repeating the calculation for the other product functions gives

Pr(0, 0; ψAB ) = 0, Pr(1, 0; ψAB ) = 1/2,


Pr(0, 1; ψAB ) = 1/2, Pr(1, 1; ψAB ) = 0. (9.24)

Instead of an absence of correlation, we now observe a perfect correlation between the


results found by Alice and those found by Bob (or rather, a perfect anticorrelation):
either Alice finds “ground" and Bob “excited" or Alice finds “excited" and Bob “ground".
There is a zero probability that they both find “ground" or both find “excited", hence
these two outcomes are impossible and will not occur (assuming, of course, that the
measurements are not erroneous).
Imagine that Alice and Bob make their measurement on a large number of pairs of
oscillators, all prepared in the state ψAB (xA , xB ). They write down the result they
obtain for each of the oscillators they measure. They will each find random results, on
average half of them “ground" and half of them “excited" (in the same way that you
obtain a random distribution of “heads" and “tails" when you toss a coin repeatedly).
Should they compare their lists, however, they would not find a single instance where
the two oscillators of a same pair were both in the ground state or both in the excited
state.

148
+ At first sight, the perfect anticorrelation between the results found by Al-
ice and by Bob in the measurements mentioned above is nothing different
than what you observe in everyday life. E.g., if Alice and Bob were toss-
ing a coin between them, there would be a perfect anticorrelation between
Alice’s wins and Bob’s wins: Bob would lose each time Alice would win
and Bob would win each time Alice would lose. This anticorrelation is
the same as that found in the measurements on the harmonic oscillators.
However, there is a profound difference between the quantum correlations
observed in measurements on entangled states and the classical correla-
tions observed in everyday life. This difference does not manifest in the
measurements discussed above but may be (and has been) revealed by well
chosen experiments.

Saying more about this difference would take us far beyond the scope of
the course. However, its conceptual importance can be appreciated from
the following. When you are tossing a coin, you take it for granted that
one of the two sides faces upwards before you check which one does, and
if you find this side to be head then you take it for granted that it was
head before you checked. By analogy, you may think that if Alice finds
her oscillator to be in the ground state rather than in the excited state,
and Bob finds the opposite for his oscillator, this must have been because
these two oscillators were in these states before they checked them. The
alternatives seem bizarre: if they were not in these states, then perhaps the
measurement itself would force them to be in these states through some in-
teraction between the two oscillators; however, such an interaction would
need to propagate faster than the speed of light since Alice and Bob may be
millions of kilometers away. Or perhaps it is not possible to say that the
oscillator A is truly separated from oscillator B, however far apart they
might be? Every day intuition suggests that instead, if A is found to be
in the ground state and B in the excited state, it is because A was in the
ground state and B in the excited state immediately before the measure-
ment. However, an in depth discussion of correlations between results of
measurements on entangled states shows that this view is untenable: when
in the state ψAB (xA , xB ), neither of the two oscillators can be assigned a
definite state (ground or excited) prior to the measurement. But if this is
the case for a quantum object, why wouldn’t it also be the case for a coin?
Could it be that which of the two sides faces up becomes determined only
when someone looks at it? Or instead, could it be that all what the wave
functions ψAB (xA , xB ), ψA0 (xA ), ψB1 (xB ), etc., refer to is what we can

149
say about the system, irrespective of what may actually be the case?
These issues of interpretation of Quantum Mechanics are deep, difficult
and unsettled.

Example 2
The two oscillators of Example 1 are distinct particles and are spatially separated.
However, what we have seen above in regards to the entanglement of the states of
separated particles also applies to the entanglement of different degrees of freedom
of a same particle — e.g., entanglement of its position and its spin, as we will now
see.
Recall the Stern-Gerlach experiment: a beam of silver atoms divides into two
branches when passing through an inhomogeneous magnetic field B. The beam
split into two because each atom has a magnetic dipole moment µ and experiences
a force ∇µ·B when passing through the field, and because the component of µ in the
direction of B has only two possible values. As these two values are spin-dependent,
the interaction with the magnetic field couples the spin state of each atom to its spa-
tial wave function. This coupling makes it necessary to consider spin and position
together rather than separately (in the same way as an interaction between two os-
cillators makes it necessary to consider them together rather than separately, i.e., as
a single system rather than as two separate systems).
For simplicity, we represent each atom by a mass point of coordinate R, ignoring
its internal structure. We describe its position by a time-dependent wave function
Ψ(R, t) and its spin state by a column vector χ. Let us assume that
   
1 0
χ=a +b . (9.25)
0 1
(We have seen previously that the two column vectors appearing in the right-hand
side describe the states of spin up and of spin down.) Before the atom has entered
the magnetic field, Ψ(R, t) and χ are uncoupled. For simplicity, we take the joint
spatial and spin state of the atom to be described by the product Ψ(R, t)χ, thus by
   
1 0
aΨ(R, t) + bΨ(R, t) .
0 1
The interaction with the magnetic field transforms this state into one of the form
   
1 0
aΨα (R, t) + bΨβ (R, t) ,
0 1

150
where the wave functions multiplying the two spin states now describe distinct dis-
tributions of position. Clearly, this transformed space + spin wave function does
not describe a separable state since it cannot be written as the product of a wave
function depending on R with a spin state: the interaction with the magnetic field
entangles the atom’s spatial and spin degrees of freedom.
Suppose that the probability density |Ψα (R, t)|2 is practically zero everywhere ex-
cept in a certain region A, and that |Ψβ (R, t)|2 is practically zero everywhere except
in a certain region B. If these two regions do not overlap, atoms found in the region
A are necessarily in a state of spin up and those found in the region B are necessarily
in a state of spin down. There would be no correlation between spin and position if
the joint spatial and spin state of the atom was separable: instead, finding the posi-
tion of the atom would reveal nothing about its spin. For the above entangled state,
measuring the position of an atom is in effect measuring whether the atom is in a
state of spin up or a state of spin down.
Example 3
Our last example is the system formed by the two electrons of a helium atom. You
will see in the level 3 Quantum Mechanics course that in the ground state of helium,
the two electrons are in a joint spin state described by the following combination of
column vectors:
"        #
− 1 1 0 0 1
ψ12 = √ − . (9.26)
2 0 1 1 2 1 1 0 2

Here the subscript attached to each column vector indicates whether this column
vector represents a spin state of the first electron or one of the second electron.
(The superscript − is traditional for that state and is a reminder of the minus sign

in the right-hand side. We neglect spin-orbit coupling here.) Thus ψ12 is a linear
combination of a state in which electron 1 is spin up and electron 2 is spin down
with a state in which electron 1 is spin down and electron 2 is spin up. It is not

possible to write ψ12 as a single product of a spin state of electron 1 with a spin state

of electron 2; therefore ψ12 describes an entangled state.
It is important to realize that the products of column vectors appearing in Eq. (9.26)
are not dot products or inner products of some kind. They represent pairs of column
vectors, in which one of these vectors pertain to one part of the system (electron 1)
and the other to another part (electron 2). Note the analogy with Eq. (9.23), in which
the two products ψA0 (xA )ψB1 (xB ) and ψA1 (xA )ψB0 (xB ) also represent states of
individual parts of the joint system.

151
To illustrate this formalism, let us imagine a thought experiment in which you would

prepare a pair of electrons in the state ψ12 and measure whether electron 1 is or is
not in a state of spin up and electron 2 is or is not in the spin state represented by
the normalized column vector  
1 1
√ .
2 i 2
From the Born rule, the probability of finding electron 1 in the state of spin up and
− 2
electron 2 in that particular spin state is |(φ12 , ψ12 )| , the square of the modulus of

the inner product of ψ12 with the vector φ12 , with
   
1 1 1
φ12 = √ . (9.27)
2 0 1 i 2

We note, at this stage, that it would not make sense to take the inner product of a
column vector describing a state of electron 1 with one describing a state of electron
2, not more than in the first example it would have made sense to calculate the inner
− 2
product of a function of xA with a function of xB . Calculating |(φ12 , ψ12 )| is done
by taking the inner products of column vectors pertaining to the same electron and
combining the results:
"   #"   #
− 1  1  0
(φ12 , ψ12 ) = 1 0 1 1 −i 2
2 0 1 1 2
"   # "   #!
 0  1
− 1 0 1 1 −i 2
1 1 0 2
= (1 × (−i) − 0 × 1)/2 = −i/2. (9.28)
− 2
Hence |(φ12 , ψ12 )| = 1/4.

+ You have come across still another example of bi-partite system in the Term
1 course, although it was not presented as such: the hydrogen atom. Ig-
noring relativistic effects, the quantum state of an atom of hydrogen-1 can
be described by a wave function Ψ(rpr , rel , t), where rpr and rel are, re-
spectively, the position vector of the proton and the position vector of the
electron. The Hamiltonian of this 2-particle system is

~2 ~2 e2 1
Hat = − ∇2pr − ∇2el − , (9.29)
2mpr 2mel 4π0 |rpr − rel |

152
with mpr the mass of the proton, mel the mass of the electron, and ∇2pr and
∇2el the Laplace operators with respect to the coordinates of the proton and
those of the electron:
∂2 ∂2 ∂2 ∂2 ∂2 ∂2
∇2pr = + + , ∇ 2
el = + + 2. (9.30)
∂x2pr ∂ypr 2 2
∂zpr ∂x2el ∂yel2 ∂zel
However, instead of writing the wave functions and the Hamiltonian in
terms of the coordinates of these two particles, we can also write them
in terms of the coordinates of the centre of mass of the atom and of the
coordinates of the electron with respect to the proton. Let us denote by
rCM the position vector of the centre of mass and by r the position of the
electron respective to the proton:
mpr rpr + mel rel
rCM = , r = rel − rpr . (9.31)
mpr + mel
Using rCM and r instead of rpr and rel is a transformation of the coordi-
nates. It can be shown that this transformation separates Hat into the sum
of a Hamiltonian HCM depending only on the coordinates of the centre of
mass and a Hamiltonian H depending only on the relative coordinates:
Hat = HCM + H (9.32)
with
~2 2 ~2 e2 1
HCM = − ∇CM , H = − ∇2 − . (9.33)
2M 2µ 4π0 r
In these equations, ∇2CM and ∇2 are the Laplace operators with respect to
rCM and to r, M is the mass of the atom (M = mpr + mel ) and µ is the
reduced mass of the electron - proton system,
mpr mel
µ= . (9.34)
mpr + mel
The operator H is the Hamiltonian you have studied in Term 1 when you
obtained the energy levels of hydrogen. As written in Eq. (9.32), the Hamil-
tonian of the atom does not contain a term coupling the motion of the
electron relative to the nucleus to the centre of mass motion. The atom
can thus be in a separable state whose wave function is the product of a
function of rCM and a function of r. For such states, it makes sense to talk
about the eigenergies and eigenfunctions of the Hamiltonian H without
reference to the motion of the atom as a whole. However, this is not the
case when the atom is in an entangled state in which its internal state is
not independent from its state of motion — e.g., in a state described by a
wave function of the form ΨCM (rCM , t)Ψ(r, t) + ΦCM (rCM , t)Φ(r, t).

153
9.2 Tensor products
As shown by the examples discussed in the previous section, it is often the case that
the system of interest is formed of distinct parts which need to be considered jointly.
Consider a bipartite system consisting of two individual quantum systems, namely sys-
tem A and system B. Suppose that the quantum states of system A are described by
vectors belonging to a certain Hilbert space, HA say, and those of system B by vectors
belonging to another Hilber space, HB , say. Consider these two systems jointly, as a
single quantum system. The quantum state of the joint system are then described by
vectors belonging to still another Hilbert space, HAB , called the tensor product of HA
and HB .
How HAB is related to HA and HB is not difficult to understand from the examples
of the previous section. We will work with ket vectors here. Consider, for example, a
ket vector |ψiA representing a state of system A and a ket vector |φiB representing a
state of system B. To these two ket vectors we can associate a vector |ψiA |φiB which
represents a state of the joint system (a product state in which system A is in the state
|ψiA and system B is in the state |φiB ).
It should be noted that the symbol |ψiA |φiB does not represent a product in the usual
sense of the word, even though what it represents is commonly referred to as the prod-
uct of |ψiA and |φiB . Properly speaking, it denotes what is called the tensor product
of |ψiA and |φiB . An alternative notation is |ψiA ⊗ |φiB , which makes it clear that we
are not talking about an usual product (the symbol ⊗ stands for the tensor product).
We will use this notation throughout the rest of this section, for clarity, but not later in
the course. Both |ψn iA |φm iB and |ψn iA ⊗ |φm iB represent the same thing, which is
the pair {|ψiA , |φiB }.

+ As long as it is clear that |ψiA and |φiB refer, respectively, to a state of system
A and a state of system B, the order in which these two vectors appear in the
product does not matter: |ψn iA ⊗ |φm iB ≡ |φm iB ⊗ |ψn iA .

In particular, consider an orthonormal set of vectors {|ψn iA , n = 1, . . . , N } forming


a basis for HA and an orthonormal set of vectors {|φm iB , m = 1, . . . , M } forming a
basis for HB . (We assume that HA and HB are finite-dimensional here, for simplic-
ity. The theory for infinite-dimensional spaces runs similarly.) Then the set of all the
product states {|ψn iA ⊗ |φm iB , n = 1, . . . , N, m = 1, . . . , M } forms a basis for the
Hilbert space HAB . In fact, HAB can be defined as being the vector space spanned by
these product states, equipped with the inner product defined below). Since there are
N × M different products |ψn iA ⊗ |φm iB if there are N different vectors |ψn iA and M

154
different vectors |φm iB , HAB can be of a much larger dimension than HA and HB : if
HA is N -dimensional and HB is M -dimensional, then HAB is (N × M )-dimensional.
Note that the vectors belonging to HAB include not only these product basis vectors
but also all the linear combinations that can be made of them. The terminology intro-
duced in the previous section applies generally: a state of the whole system is called a
separable state (or a product state) if it can be represented by the tensor product of a
vector of HA and a vector of HB , and the others are called entangled states.
Inner products and operators
The inner product of two vectors of HAB is defined in terms of the inner products for
HA and HB : If |ηiAB = |ψiA ⊗ |φiB and |η 0 iAB = |ψ 0 iA ⊗ |φ0 iB , then the inner
product of |ηiAB and |η 0 iAB is obtained by multiplying the inner product of |ψiA and
|ψ 0 iA by the inner product of |φiB and |φ0 iB :
0
AB hη|η iAB = A hψ|ψ 0 iA × B hφ|φ0 iB . (9.35)
(Both A hψ|ψ 0 iA and B hφ|φ0 iB are complex numbers, hence the right-hand side of this
equation is the product of two complex numbers.)
For example, take |ψn iA ⊗|φm iB and |ψn0 iA ⊗|φm0 iB , two of the basis vectors formed
by the tensor products of the orthonormal vectors |ψn iA with the orthonormal vectors
|φm iB . The inner product |ψn iA ⊗|φm iB and |ψn0 iA ⊗|φm0 iB is A hψn |ψn0 iAB hφm |φm0 iB ,
which is δnn0 δmm0 since both the |ψn iA ’s and the |φm iB ’s are orthonormal. Hence the
vectors |ψn iA ⊗ |φm iB are also orthonormal.

+ Since |ψiA and |φiB belong to different Hilbert spaces, their inner product is
not defined: the symbol A hψ|φiB has no mathematical meaning.

Operators acting on vectors belonging to HA or HB can also be made to act on the


joint states of A and B. More specifically, take the case of an operator ÂA acting on
the ket vector |ψiA and an operator B̂B acting on the ket vector |φiB . Then
ÂA |ψiA ⊗ |φiB = (ÂA |ψiA ) ⊗ |φiB , (9.36)
B̂B |ψiA ⊗ |φiB = |ψiA ⊗ (B̂B |φiB ), (9.37)
and also
(ÂA + B̂B )|ψiA ⊗ |φiB = (ÂA |ψiA ) ⊗ |φiB + |ψiA ⊗ (B̂B |φiB ). (9.38)
The important point is that when acting on a vector of HAB , the operator ÂA acts only
the vectors belonging to HA and the operator B̂B acts only the vectors belonging to
HB .

155
+ Tensor products of operators are also used in applications. By definition,
the tensor product of ÂA and B̂B is the operator ÂA ⊗ B̂B such that

ÂA ⊗ B̂B |ψiA ⊗ |φiB = (ÂA |ψiA ) ⊗ (B̂B |φiB ) (9.39)

for any |ψiA the operator ÂA may act on and any |φiB the operator B̂B
may act on. Since ÂA and B̂B always act on different vectors, it is clear
that ÂA ⊗ B̂B ≡ B̂B ⊗ ÂA .
For simplicity, the symbol ⊗ is usually not specified: in the same way that
|ψiA ⊗ |φiB is often written |ψiA |φiB , the operator ÂA ⊗ B̂B is often
written ÂA B̂B .

156
10 Unitary transformations

10.1 Unitary operators


An operator Û is said to be unitary if it is invertible and
Û † = Û −1 . (10.40)
By definition of the inverse of an operator, Û −1 Û = Iˆ = Û Û −1 with Iˆ the identity
operator. Therefore, if Û is unitary,
Û † Û = Iˆ = Û Û † . (10.41)

• The eigenvalues of a unitary operator are real or complex numbers of modulus


1. (I.e., if Û |ψi = λ|ψi, it is always the case that |λ| = 1 when Û is a unitary
operator.)

+ Proof: Suppose Û |ψi = λ|ψi with hψ|ψi = 1 (it is not restrictive to


assume that |ψi is normalized). As is explained in Section 4.3, it follows
from this equation that hψ| Û † = λ∗ hψ|, where Û † is understood to act
“on the left" on the bra vector hψ|. Thus hψ| Û † Û |ψi = λ∗ λhψ|ψi =
|λ|2 . However, we also have that hψ| Û † Û |ψi = hψ| Iˆ|ψi = hψ|ψi = 1.
Thus |λ| = 1. 

• Eigenvectors of a unitary operator corresponding to different eigenvalues are


always orthogonal. (I.e., if Û is a unitary operator, Û |ψi = λ|ψi and Û |φi =
µ|φi, it is always the case that hφ|ψi = 0 if λ 6= µ.)

+ Proof: hφ| Û † = µ∗ hφ| since Û |φi = µ|φi, and therefore


hφ| Û † Û |ψi = µ∗ λhφ|ψi. (10.42)
However, we also have that
hφ| Û † Û |ψi = hφ| Iˆ|ψi = hφ|ψi. (10.43)
Subtracting Eq. (10.42) from Eq. (10.43) gives
0 = (1 − µ∗ λ)hφ|ψi. (10.44)
However µ 6= 0 and µ∗ = 1/µ since |µ| = 1, and thus
0 = (1 − λ/µ)hφ|ψi = (µ − λ)hφ|ψi/µ (10.45)
Hence hφ|ψi = 0 if λ 6= µ. 

157
• The most important property of unitary transformations is that they conserve
the inner product: If |ψ 0 i = Û |ψi, |φ0 i = Û |φi and Û is unitary,

hφ0 |ψ 0 i = hφ|ψi. (10.46)

As a particular case of this relationship, hψ 0 |ψ 0 i = hψ|ψi for any |ψi: unitary


transformations conserve the norm.
+ Proof: We use the relation, seen previously, that the bra vector conju-
gate to the ket vector Û |φi can be written as hφ| Û † . Since |ψ 0 i = Û |ψi
and |φ0 i = Û |φi, we see that hφ0 |ψ 0 i = hφ| Û † Û |ψi = hφ| Iˆ|ψi =
hφ|ψi. 
+ More generally, a unitary transformation is an inner product preserving
one-to-one mapping from one Hilbert space to either the same Hilbert
space or to another one. The word operator refers to mappings from
one Hilbert space to the same one.
Representing ket vectors by column vectors is an example of a more
general unitary transformations: Ket vectors and column vectors be-
long to different Hilbert spaces. However, given an orthonormal basis
of ket vectors, there is a one-to-one correspondence between each ket
vector and the column vector representing that ket vector in that basis.
Furthermore, the inner product of any two ket vectors and the inner
product of the two column vectors representing these ket vectors are
always equal.
The Fourier transform is another example of a more general unitary
transformation. Indeed, it can be shown that if φa (k) is the Fourier
transform of ψa (x) and φb (k) the Fourier transform of ψb (x), then
Z ∞ Z ∞
ψa∗ (x)ψb (x) dx = φ∗a (k)φb (k) dk. (10.47)
−∞ −∞

I.e., the inner product of ψa (x) and ψb (x) is equal to the inner product
of their Fourier transforms.

10.2 Transformed operators


Suppose that the vector |ψi is transformed into the vector |ηi by a certain operator, Â:

|ηi = Â|ψi. (10.48)

158
Suppose, further, that the vectors |ψi and |ηi are transformed into the vector |ψ 0 i and
|η 0 i by a certain unitary operator, Û :

|ψ 0 i = Û |ψi, |η 0 i = Û |ηi. (10.49)

Then we see that |η 0 i = Û Â|ψi = Û ÂÛ † |ψ 0 i, where in the last step we have used the
facts that |ψi = Û −1 |ψ 0 i and that Û −1 = Û † . I.e.,

|η 0 i = Â0 |ψ 0 i (10.50)

with Â0 = Û ÂÛ † . Since this is the case for any |ψi on which  acts, we can say that
a unitary transformation which transforms ket vectors according to the equation

|ψ 0 i = Û |ψi (10.51)

transforms operators according to the equation

Â0 = Û ÂÛ † . (10.52)

The transformed operator Â0 has the same properties as  in the following sense:

1. If  is Hermitian, Â0 is also Hermitian. (The proof is left as an exercise.)

2. Sums and products of operators are transformed into sums and products of the
transformed operators: e.g., if  = αB̂ + β Ĉ D̂ where α and β are two complex
numbers, then Â0 = αB̂ 0 + β Ĉ 0 D̂0 .
+ Proof: We use the general relation between operators and their trans-
formed, and also Û Û † = I.
ˆ

Â0 = Û ÂÛ †
= Û (αB̂ + β Ĉ D̂)Û †
= αÛ B̂ Û † + β Û Ĉ(Û † Û )D̂Û †
= αÛ B̂ Û † + β(Û Ĉ Û † )(Û D̂Û † ),

from which it follows that Â0 = αB̂ 0 + β Ĉ 0 D̂0 . 

3. As a particular case of this relationship, if [Â, B̂] = Ĉ, then

[Â0 , B̂ 0 ] = Ĉ 0 = Û [Â, B̂]Û † . (10.53)

159
4. Â and Â0 = Û ÂÛ † have the same eigenvalues. (The proof is left as an exercise.)
5. hφ0 | Â0 |ψ 0 i = hφ| Â|ψi for any |φi, |ψi. (The proof is also left as an exercise.)

Unitary transformations and matrix representation


We only consider representations of operators by finite matrices here, i.e., represen-
tations of operators acting in a finite-dimensional Hilbert space. As noted previously,
the matrices and column vectors representing operators and vectors in a given basis
depend on the basis used in the representation. Different choices of basis result in
different matrices and column vectors. As we will now discuss, matrices and col-
umn vectors obtained in one basis are related to those obtained in another basis by
a unitary transformation.
Consider an operator  acting in a Hilbert space of dimension N . Consider,
also, two different orthonormal bases of this Hilbert space, namely a basis
{|φ1 i, |φ2 i, . . . , |φN i} and a basis {|ψ1 i, |ψ2 i, . . . , |ψN i}. That these two bases are
orthonormal means that
hφi |φj i = hψi |ψj i = δij , (10.54)
and also that
N
X N
X
|φk ihφk | = ˆ
|ψk ihψk | = I, (10.55)
k=1 k=1
where Iˆ is the identity operator. (This last equation is the completeness relation for
these two bases, see Section 5.2.)
The operator  is represented by the matrix A of elements Aij = hφi | Â|φj i when
working in the {|φn i} basis and by the matrix A0 of elements A0ij = hψi | Â|ψj i when
working in the {|ψn i} basis. These two matrices are usually different, although
they represent the same operator. Likewise, a state vector |Ψi is represented by the
column vector c of elements ci = hφi |Ψi when working in the {|φn i} basis and by
the column vector c0 of elements c0i = hψi |Ψi when working in the {|ψn i} basis.
These two column vectors are usually different, although they represent the same
state vector.
Now, given that the {|φn i} basis is complete, it is always possible to write each of
the |ψj i vectors as a linear combination of the |φi i vectors. The coefficients of these
linear combinations are complex numbers, which we will denote by Mij . Specifi-
cally,
XN
|ψj i = Mij |φi i. (10.56)
i=1

160
(Note the order of the indexes: the coefficient of |φi i is denoted Mij , not Mji .)
Since the |φi i are orthonormal, Mij = hφi |ψj i (see Section 2.10). Moreover Mij∗ =
hψj |φi i since hφi |ψj i∗ = hψj |φi i (see Section 2.8).
These complex coefficients can be arranged in an N × N matrix M in the usual way:
 
M11 M12 · · · M1N
 M21 M22 · · · M2N 
M= . .. .. ..  . (10.57)
 
 .. . . . 
M N 1 MN 2 · · · MN N

This matrix is unitary — i.e., M† M = I, where I is the unit matrix.

+ Proof: Let us work out the ij-element of the matrix M† M. Using Eq. (10.55),
N
X
M† M ij = M̂ † ik M̂ kj
  

k=1
XN

= Mki Mkj
k=1
XN
= hψi |φk ihφk |ψj i
k=1
= hψi | Iˆ|ψj i = hψi |ψj i = δij .

Since the elements of the unit matrix I are equal to δij (the diagonal elements
of I are all equal to 1 and all the other elements are equal to 0), we see that
M† M = I. Proving that MM† = I can be done similarly. 

We can thus pass from one basis to another by a unitary transformation. Transform-
ing the basis from {|φin } to {|ψin } as per the matrix M transforms both the column
vectors representing quantum states and the matrices representing operators. This
transformation is also unitary: If in the {|φin } basis the ket vector |ψi is represented
by the column vector c and the operator  is represented by the matrix A, and if the
same ket vector and the same operator are represented by the column vector c0 and
the matrix Â0 in the {|ψin } basis, then

c0 = M† c and A0 = M† AM. (10.58)

(The proof of these equations is left as an exercise. Note that the elements of the
column vectors c transform according to M† whereas the basis vectors transform

161
according to M.) These two equations can be brought to the same form as Eqs. (10.51)
and (10.52) by rewriting them in terms of the adjoint of the basis change matrix:
Setting U = M† ,
c0 = Uc and A0 = UAU† . (10.59)
However, note that these equations arise from a mere change of basis which has
no impact on the ket vectors representing quantum states, whereas Eqs. (10.51) and
(10.52) arise from a transformation of the ket vectors.

162
11 Time evolution

11.1 The Schrödinger equation


The time evolution of quantum states is often described by way of time-dependent wave
functions, column vectors or ket vectors satisfying the Schrödinger equation. E.g.,

∂Ψ
i~ = HΨ(x, y, z, t) (11.1)
∂t
if we describe the quantum states of the system by time-dependent wave functions
Ψ(x, y, z, t), or
d
i~ |Ψ(t)i = Ĥ |Ψ(t)i (11.2)
dt
if we describe them by time-dependent state vectors |Ψ(t)i.
You probably remember to have seen most of the following results and definitions, if
not all of them:

1. The operator Ĥ appearing in Eq. (11.2) is a Hermitian operator called the Hamil-
tonian of the system.

2. The inner product hΦ(t)|Ψ(t)i is constant if |Φ(t)i and |Ψ(t)i evolve in time
according to the Schödinger equation: hΦ(t)|Ψ(t)i = hΦ(t0 )|Ψ(t0 )i for any t, t0 .
In particular, the norm of a state vector does not change under time evolution.
+ Proof: We see from Eq. (11.2) that to first order in δt,
1
|Ψ(t + δt)i = |Ψ(t)i + Ĥ |Ψ(t)iδt. (11.3)
i~
Similarly,
1
hΦ(t + δt)| = hΦ(t)| − hΦ(t)| Ĥ † δt. (11.4)
i~
Therefore, to first order in δt,
1
hΦ(t + δt)|Ψ(t + δt)i = hΦ(t)|Ψ(t)i+ hΦ(t)| Ĥ |Ψ(t)iδt
i~
1
− hΦ(t)| Ĥ † |Ψ(t)iδt.
i~
(11.5)

163
The second and third terms in the right-hand side cancel since Ĥ is
Hermitian. Hence, to first order in δt,

hΦ(t + δt)|Ψ(t + δt)i = hΦ(t)|Ψ(t)i. (11.6)

Therefore
hΦ(t + δt)|Ψ(t + δt)i − hΦ(t)|Ψ(t)i
lim = 0, (11.7)
δt→0 δt
which means that dhΦ(t)|Ψ(t)i/dt = 0. 

3. As the Schrödinger equation is linear and of first order in time, giving Ĥ and
specifying |Ψ(t)i at a time t0 determines |Ψ(t)i at all times (at least in principle,
in practice the Schrödinger equation may be impossible to solve to sufficient
accuracy).

4. The eigenvalues and eigenvectors of Ĥ are defined by the equation

Ĥ|ψn i = En |ψn i, (11.8)

and similarly for the continuum eigenstates of Ĥ if there are any (but then in
terms of generalized eigenvalues and generalized eigenvectors, see Section 6.9).
The eigenvalues En are called the eigenenergies of the system. Since Ĥ is Her-
mitian, the eigenergies are real (not complex).

5. Ĥ may or may not depend on time. For example, the Hamiltonian used in the
Term 1 course to describe an unperturbed hydrogen atom,

~2 2 e2 1
− ∇ − ,
2µ 4π0 r
does not depend on time. By contrast, the Hamiltonian

~2 2 e2 1
− ∇ − + e F(t) · r,
2µ 4π0 r
which describes an atom of hydrogen perturbed by a time-dependent electric
field F(t), does depend on time. The Hamiltonian is normally time-independent,
unless the system it represents is subject to a time-dependent interaction with
the rest of the world.

6. If the Hamiltonian is time-independent, then

164
(a) The eigenenergies En and the energy eigenstates |ψn i are also time-independent,
and so is Eq. (11.8). This equation is often referred to as the time-independent
Schrödinger equation. Eq. (11.2) is the time-dependent Schrödinger equa-
tion.
(b) Given an energy eigenstate |ψn i and the corresponding eigenenergy En ,
the ket |ψn i exp(−iEn t/~) is a solution of Eq. (11.2):

d d
i~ |ψn i exp(−iEn t/~) = i~|ψn i exp(−iEn t/~)
dt dt
= i~|ψn i(−iEn /~) exp(−iEn t/~)
= En |ψn i exp(−iEn t/~)
= Ĥ |ψn i exp(−iEn t/~). (11.9)

(c) Imagine an experiment in which the system of interest is prepared in the


state |ψn i exp(−iEn t/~) and a certain observable A is measured on that
system at a time t. We assume, for simplicity, that |ψn i is an eigenvector
of Ĥ in the usual meaning of the term, not a generalized eigenvector de-
scribing a continuum state. Therefore |ψn i can be normalized in the usual
way (hψn |ψn i = 1), and so does |ψn i exp(−iEn t/~). As seen in Section
4.2, the probability of each possible outcome of this measurement is given
by the square of the modulus of the inner product of |ψn i exp(−iEn t/~)
with a normalized eigenvector |φi of the Hermitian operator representing
A,
|hφ|ψn i exp(−iEn t/~)|2 ,
or by a sum of such square moduli in case of degenerate eigenvalues.
(Again, for simplicity we assume that the eigenvalues of interest are dis-
crete, so that the corresponding eigenvectors are normalizable in the usual
way.) Since En is real, | exp(−iEn t/~)| = 1 and therefore

|hφ|ψn i exp(−iEn t/~)|2 = |hφ|ψn i|2 | exp(−iEn t/~)|2


= |hφ|ψn i|2 . (11.10)

This result shows that the probability of each possible outcome of a mea-
surement does not depend on the instant at which the measurement is
made. It is easy to see that we would arrive to the same conclusion for
the probability densities we would need to consider if the eigenvalues of
interest belonged to a continuum. As we have not assumed anything spe-
cific about the measured observable, the conclusion is that the probability

165
distribution of the results of any measurement which can be made on the
system is constant in time. In other words, the eigenvectors of the Hamil-
tonian describe stationary states, i.e., states whose physical properties are
the same at all times.
Stationary states are also states of well defined energy: Since the vector
|ψn i exp(−iEn t/~) is an eigenvector of Ĥ and eigenvectors correspond-
ing to different eigenenergies are orthogonal, a measurement of the energy
in the state |ψn i exp(−iEn t/~) would give En with probability 1. Corre-
spondingly, the uncertainty ∆E in the value of the energy is zero in that
state.
(d) Linear combinations of eigenvectors belonging to different eigenenergies
do not describe stationary states. In fact, any solution of Eq. (11.2) can be
written as an expansion on the eigenvectors and generalized eigenvectors
of Ĥ. Namely, if |Ψ(t)i is a time-dependent state vector and the Hamilto-
nian is time-independent, there exists a set of constant coefficients cn and
ck such that
X Z
|Ψ(t)i = cn exp(−iEn t/~)|ψn i + ck exp(−iEk t/~)|ψk i dk,
n
(11.11)
where the |ψn i’s and |ψk i’s are, respectively, eigenvectors and generalized
eigenvectors of Ĥ corresponding to the energies En and Ek . (As usual, n
and k represent the sets of quantum numbers which must be specified to
identify each of these eigenstates unambiguously.)
(e) Suppose that we know |Ψ(t)i at a particular time, t = 0 say (for simplicity).
Eq. (11.11) tells us that
X Z
|Ψ(t = 0)i = cn |ψn i + ck |ψk i dk. (11.12)
n

Assuming that the eigenvectors |ψn i and generalized eigenvectors |ψk i are
orthonormal, the coefficients cn and ck can then be calculated by projec-
tion: cn = hψn |Ψ(t = 0)i and ck = hψk |Ψ(t = 0)i. Plugging the results
into Eq. (11.11) then gives |Ψ(t)i at all times.

11.2 The evolution operator


Under time evolution, the state of a quantum system changes from one represented at
time t0 by a vector |Ψ(t0 )i to one represented at time t by a vector |Ψ(t)i. This time

166
evolution can be described as a transformation of |Ψ(t0 )i into |Ψ(t)i, this transforma-
tion being effected by an operator Û (t, t0 ) depending on t0 and t:

|Ψ(t)i = Û (t, t0 )|Ψ(t0 )i. (11.13)

More precisely, we define Û (t, t0 ) as being the operator which maps any vector |Ψ(t0 )i
to the vector |Ψ(t)i that |Ψ(t0 )i changes into under the time evolution governed by
the Schrödinger equation, for any t0 and any t. This operator is called the evolution
operator (or time evolution operator).
The requirement that |Ψ(t)i obeys the Schrödinger equation implies that

d
i~ Û (t, t0 ) = Ĥ Û (t, t0 ). (11.14)
dt

Together with the initial condition that Û (t = t0 , t0 ) = Iˆ (the identity operator),


Eq. (11.14) determines the evolution operator at all times once Ĥ is given.

+ Proof: Differentiating Eq. (11.13) with respect to time and multiplying each
side by i~ yields

d d
i~ |Ψ(t)i = i~ Û (t, t0 )|Ψ(t0 )i. (11.15)
dt dt
We also have, from Eq. (11.2),

d
i~ |Ψ(t)i = Ĥ |Ψ(t)i = Ĥ Û (t, t0 )|Ψ(t0 )i. (11.16)
dt
Hence, for any |Ψ(t0 )i,

d
i~ Û (t, t0 )|Ψ(t0 )i = Ĥ Û (t, t0 )|Ψ(t0 )i. (11.17)
dt
Eq. (11.14) follows. 

Note that solving Eq. (11.14) is, in general, as difficult as solving Eq. (11.2). The useful-
ness of introducing the time evolution operator lies primarily in the interesting theo-
retical developments possible in this approach. However, in the frequent case where
the Hamiltonian Ĥ is time-independent, a formal solution of Eq. (11.14) can be written
as
Û (t, t0 ) = exp[−iĤ(t − t0 )/~]. (11.18)

167
(See Section 3.3 for the definition of the exponential of an operator.) We stress that
Eq. (11.18) only applies to the case where Ĥ is time-independent. This equation is
generally not correct for time-dependent Hamiltonians.
Whether Eq. (11.18) applies or not, the evolution operator has the following properties:

1. Û (t0 , t0 ) = Iˆ since we must have that |Ψ(t0 )i = Û (t0 , t0 )|Ψ(t0 )i for any
|Ψ(t0 )i.

2. For any t0 , t1 and t,


Û (t, t0 ) = Û (t, t1 )Û (t1 , t0 ), (11.19)
since going from time t0 to time t1 and then from t1 to t is equivalent to going
from t0 to t.

3. In particular, Û (t0 , t0 ) = Û (t0 , t1 )Û (t1 , t0 ). Hence

Û (t0 , t)Û (t, t0 ) = Iˆ = Û (t, t0 )Û (t0 , t) (11.20)

for any t0 and t. Therefore the evolution operator is invertible and

Û (t0 , t) = Û −1 (t, t0 ). (11.21)

4. Û (t, t0 ) is a unitary operator. Hence

Û † (t, t0 ) = Û −1 (t, t0 ). (11.22)

+ Proof: Let us assume, to start, that the domain of Ĥ is the whole


of the relevant Hilbert space (we’ll revisit this assumption at the
end of this proof). Consider the inner product of two ket vectors
taken at a time t0 , hΦ(t0 )|Ψ(t0 )i. We saw, in Section 11.1, that this
quantity remains invariant when these two ket vectors evolve ac-
cording to the Schrödinger equation: hΦ(t)|Ψ(t)i = hΦ(t0 )|Ψ(t0 )i
if |Ψ(t)i = Û (t, t0 )|Ψ(t0 )i and |Φ(t)i = Û (t, t0 )|Φ(t0 )i. In terms
of the bra vectors conjugate to the ket vectors |Φ(t)i and |Φ(t0 )i,
hΦ(t)| = hΦ(t0 )| Û † (t, t0 ). Hence

hΦ(t0 )| Û † (t, t0 )Û (t, t0 )|Ψ(t0 )i = hΦ(t0 )|Ψ(t0 )i. (11.23)

This equation can also be written as

hΦ(t0 )| Û † (t, t0 )Û (t, t0 ) − Iˆ|Ψ(t0 )i = 0 (11.24)

168
and must be true for any |Φ(t0 )i and any |Ψ(t0 )i. In particular, it
must be true for any |Ψ(t0 )i and for

|Φ(t0 )i = [Û † (t, t0 )Û (t, t0 ) − I]|Ψ(t


ˆ 0 )i. (11.25)

However, with this choice of ket |Φ0 i, Eq. (11.24) reduces to the
equation hΦ(t0 )|Φ(t0 )i = 0, which implies that |Φ(t0 )i is the zero
vector. The operator Û † (t, t0 )Û (t, t0 ) − Iˆ thus maps every vector to
the zero vector, which is possible only if Û † (t, t0 )Û (t, t0 ) = I. ˆ Mul-
tiplying this equation on the right by the inverse of Û (t, t0 ) gives
Û † (t, t0 ) = Û −1 (t, t0 ). Therefore Û † (t, t0 ) is a unitary operator. 
Let us come back to the assumption made at the start that the do-
main of Ĥ is the whole Hilbert space (this assumption was made
necessary by our use of the Schrödinger equation, which makes
sense only for vectors in the domain of the Hamiltonian). This as-
sumption is not inocuous in infinite dimensional Hilbert spaces, but
removing it would require to add to the proof a detailed discussion
of what the domains of Ĥ, of Ĥ † , of Û (t, t0 ) and of Û † (t, t0 ) actu-
ally are. However, this complication is unnecessary: time evolution
can be defined from the onset as being a unitary transformation ef-
fected by a unitary operator Û (t, t0 ) obeying Eq. (11.19), and the
Hamiltonian can then be introduced as the self-adjoint operator Ĥ
such that Û (t, t0 ) satisfies Eq. (11.14). (The Hamiltonian is indeed a
self-adjoint operator, i.e., Ĥ † = Ĥ, not merely a Hermitian opera-
tor.) The mathematical basis of this latter approach is an important
theorem of functional analysis called the Stone’s theorem.

11.3 The Schrödinger picture and the Heisenberg picture


Recall that the expectation value hAi(t) of a dynamical variable A represented by an
operator  is hΨ(t)| Â|Ψ(t)i if the state of the system is described by the normalized
time-dependent vector |Ψ(t)i. It is often the case that operators describing dynamical
variables do not depend on time (think, e.g., to the position operator, the momentum
operator, the orbital angular momentum operator, the spin operator — none of them
dependss on t). Hence, to keep the discussion as simple as possible, we will assume
that  does not vary in time. Now, since

hΨ(t)| Â|Ψ(t)i = hΨ(t0 )| Û † (t, t0 )ÂÛ (t, t0 )|Ψ(t0 )i, (11.26)

169
hAi(t) can also be seen as the expectation value of the time-dependent operator

ÂH (t) = Û † (t, t0 )ÂÛ (t, t0 ) (11.27)

in the state represented by the time-independent vector |Ψ(t0 )i:

hAi(t) = hΨ(t0 )| ÂH (t)|Ψ(t0 )i. (11.28)

Note that Eq. (11.27) can also be written as

ÂH (t) = Û (t0 , t)ÂÛ † (t0 , t), (11.29)

since Û † (t0 , t) = Û −1 (t0 , t) = Û (t, t0 ). This alternative form is completely consistent


with what we have seen in Section 3.13 about how operators transform under a uni-
tary transformation. Here the transformation is taken to be a time evolution from t to
t0 ; under this transformation, the vector |Ψ(t)i transforms to Û (t0 , t)|Ψ(t)i and the
operator  to Û (t0 , t)ÂÛ † (t0 , t). (The operator ÂH (t) is usually time-dependent even
when  isn’t, but we will see in Section 11.4 that there is an important exception to
this general rule.)

Remember what we have seen about calculating the probability of finding a par-
ticular value of a dynamical variable in a measurement — e.g, that the probabil-
ity of finding the eigenvalue λn of  is |hψn |Ψ(t)i|2 if λn is non-degenerate, if
Â|ψn i = λn |ψn i and if the ket vectors |ψn i and |Ψ(t)i are normalized. Since
ÂH (t) = Û (t0 , t)ÂÛ † (t0 , t) and Û (t0 , t) is a unitary operator, the operator ÂH (t)
has the same eigenvalues as the operator  (see page 160). Moreover, if Â|ψn i =
λn |ψn i, then (1) ÂH (t)|ψnH (t)i = λn |ψnH (t)i, where |ψnH (t)i = Û † (t, t0 )|ψn i (i.e.,
|ψnH (t)i is an eigenvector of ÂH (t) corresponding to the eigenvalue λn ), and also
(2) |hψn |Ψ(t)i|2 = |hψnH (t)|Ψ(t0 )i|2 . (The proof of these two assertions is left as an
exercise for the reader.) Thus

Pr(λn ; |Ψ(t)i) = |hψn |Ψ(t)i|2 = |hψnH (t)|Ψ(t0 )i|2 . (11.30)

We see that the probability of finding λn can be calculated either in terms of the ket
vector |Ψ(t)i and an eigenvector of  or in terms of the ket vector |Ψ(t0 )i and an
eigenvector of ÂH (t), and these two approaches are completely equivalent. They cor-
respond to two alternative descriptions of quantum systems. The first one is probably
the most familiar of the two: quantum states are described by time-dependent vectors
and observables by (usually) time-independent operators. This description is referred to
as the Schrödinger picture of Quantum Mechanics (or the Schrödinger representation).

170
The alternative description, in which the states are represented by time-independent
vectors and the observables by (usually) time-dependent operators, is referred to as the
Heisenberg picture of Quantum Mechanics (or Heisenberg representation).
In the Schrödinger picture, time evolution is governed by the time-dependent Schrödinger
equation, Eq. (11.2). Its counterpart in the Heisenberg picture is the Heisenberg equa-
tion of motion. If  does not depend on time, this equation reads
d
i~ ÂH (t) = [ÂH (t), ĤH (t)], (11.31)
dt
where
ĤH (t) = Û (t0 , t)Ĥ Û † (t0 , t). (11.32)

+ These considerations generalize to the case of observables represented by a


time-dependent operator even in the Schrödinger picture. E.g., for a time-
dependent Â(t), Eq. (11.29) simply reads ÂH (t) = Û (t0 , t)Â(t)Û † (t0 , t).
However, Eq. (11.31) only applies to the case where Â(t) is constant in
t. Instead, the Heisenberg equation of motion for the “Heisenberg op-
erator" ÂH (t) corresponding to a time-dependent “Schrödinger operator"
Â(t) reads

d d †
i~ ÂH (t) = [ÂH (t), ĤH (t)] + i~ Û (t0 , t) Û (t0 , t). (11.33)
dt dt

Proof: Let ÂH (t) = Û † (t, t0 )Â(t)Û (t, t0 ). Differentiating this product of
operators is done by differentiating one operator at a time, as for ordinary
functions:
  " #
d d † † dÂ
i~ ÂH (t) = i~ Û (t, t0 ) Â(t)Û (t, t0 ) + Û (t, t0 ) i~ Û (t, t0 )
dt dt dt
 
† d
+ Û (t, t0 )Â(t) i~ Û (t, t0 ) . (11.34)
dt

The derivative of Û (t, t0 ) is given by Eq. (11.14). Transforming that equa-


tion into an equation for the adjoint operators gives

d †
− i~ Û (t, t0 ) = Û † (t, t0 )Ĥ † , = Û † (t, t0 )Ĥ, (11.35)
dt
where in the last step we have used the fact, mentioned in the note at
the end of Section 11.2, that Ĥ is self-adjoint. Making use of these two
relations in Eq. (11.34) yields Eq. (11.33). 

171
+ It follows from Eqs. (11.28) and (11.33) and from Eq. (11.37) of Section 11.4
that

d 1 dÂ
hAi(t) = hΨ(t)|[Â, Ĥ]|Ψ(t)i + hΨ(t)| |Ψ(t)i, (11.36)
dt i~ dt
which is a general form of the Ehrenfest theorem (you have encountered
this theorem in the Term 1 QM course).

11.4 Constants of motion


We now focus on the important case of an observable A represented by an operator Â
which, in the Schrödinger picture, (1) does not depend on time, and (2) commutes with
the Hamiltonian.
We have seen, in Section 10.2, how commutators transform under a unitary transfor-
mation. Here, Eq. (10.14) says that

[ÂH (t), ĤH (t)] = Û (t0 , t)[Â, Ĥ]Û † (t0 , t). (11.37)

Thus [ÂH (t), ĤH (t)] ≡ 0 if [Â, Ĥ] ≡ 0. In that case, Eq. (11.31) says that
d
i~ ÂH (t) ≡ 0. (11.38)
dt
Therefore ÂH (t) is constant in time and ÂH (t) ≡  if  commutes with Ĥ. In turns,
this implies that the probability of finding any given value of the observable A re-
mains the same as t varies. On account of these facts, and by analogy with Classical
Mechanics, a dynamical variable represented by an operator  commuting with the
Hamiltonian is said to be a constant of motion.

+ For example, take the case of an atom of hydrogen exposed to an external


electric field Fext (t) ẑ, which varies in time but is spatially uniform (Fext (t)
depends on t but not the position variables x, y and z). The potential
energy of the electron changes by eFext (t)z due to this electric field, and
the Hamiltonian takes on the following form:

~2 2 e2 1
H=− ∇ − + eFext (t)z. (11.39)
2µ 4π0 r

Because of the interaction term eFext (t)z, the solutions of the time-
dependent Schrödinger equation may be extremely complicated. However,

172
this Hamiltonian commutes with Lz , the z-component of the orbital an-
gular momentum operator. Hence, the z-component of the orbital angular
momentum of the electron is a constant of motion. For instance, if at some
time the atom is in in a eigenstate of Lz with eigenvalue m~, so that the
z-component of the electron’s orbital angular momentum is well defined
and equal to m~, then the atom will remain in an eigenstate of Lz corre-
sponding to that eigenvalue at all times, however complicated the wave
function might become as time increases.

173
12 Rotations and angular momentum

12.1 Orbital angular momentum


Later in these notes, we will see that the orbital angular momentum operator arises
naturally from the transformation properties of wave functions under a rotation. At a
more elementary level, however, this operator also crops up as the quantum mechanical
counterpart of the classical angular momentum of a mass point (specifically, the angular
momentum with respect to the origin of the system of coordinates). For memory, the
latter is defined by the equation

Lcl = r × pcl , (12.1)

where r is the position vector of the mass point and pcl is its momentum.
As discussed previously, we can define a position operator x̂ and a momentum operator
p̂x for the x-direction, a position operator ŷ and a momentum operator p̂y for the y-
direction, and a position operator ẑ and a momentum operator p̂z for the z-direction.
These operators can be taken to be the x-, y- and z-components of a vector position
operator r̂ and a vector momentum operator p̂:

r̂ = x̂ x̂ + ŷ ŷ + ẑ ẑ, (12.2)
p̂ = p̂x x̂ + p̂y ŷ + p̂z ẑ. (12.3)

(As in the rest of these notes, x̂, ŷ and ẑ are unit vectors in the x-, y- and z-directions.
The hat sign in x̂, ŷ and ẑ indicates that these objects are unit vectors, not that they are
operators, whereas the hat sign in x̂, ŷ and ẑ indicates that these objects are operators,
not that they are unit vectors.)
Likewise, the orbital angular momentum operator, L̂, is a geometric vector whose x-,
y- and z-components, denoted L̂x , L̂y and L̂z , are themselves operators:

L̂ = L̂x x̂ + L̂y ŷ + L̂z ẑ. (12.4)

This operator is related to the position operator r̂ and to the momentum operator p̂
in the same way as, in Classical Mechanics, the angular momentum of a mass point is
related to its position and its momentum:

L̂ = r̂ × p̂. (12.5)

174
The usual rules of the vector product apply, although the vectors are operators here. L̂
can thus be calculated as a determinant:

x̂ ŷ ẑ
L̂ = x̂ ŷ ẑ . (12.6)
p̂x p̂y p̂z

This gives
L̂z = x̂pˆy − ŷ pˆx , (12.7)
and similar expressions for L̂x and L̂y which can be obtained from Eq. (12.7) by circular
permutation of the indices (x → y, y → z, z → x):

L̂x = ŷ pˆz − ẑ pˆy , L̂y = ẑ pˆx − x̂pˆz . (12.8)

The operator L̂2 also plays an important role in the theory. It is defined as the dot
product of L̂ with itself, i.e.,

L̂2 = L̂ · L̂ = L̂2x + L̂2y + L̂2z . (12.9)

It is not particularly difficult to deduce from these definitions and from the properties
of the position and momentum operators that L̂x , L̂y , L̂z and L̂2 are Hermitian and
satisfy the following commutation relations:

[L̂x , L̂y ] = i~L̂z , [L̂y , L̂z ] = i~L̂x , [L̂z , L̂x ] = i~L̂y , (12.10)

and furthermore
[L̂x , L̂2 ] = [L̂y , L̂2 ] = [L̂z , L̂2 ] = 0. (12.11)
Given Eqs. (12.7) and (12.8) and how the momentum operators p̂x , p̂y and p̂z are rep-
resented, the operators L̂z , L̂x and L̂y take on the following forms in the position
representation:
 
∂ ∂
Lz = −i~ x −y , (12.12)
∂y ∂x
 
∂ ∂
Lx = −i~ y −z , (12.13)
∂z ∂y
 
∂ ∂
Ly = −i~ z −x . (12.14)
∂x ∂z

175
+ Passing to spherical polar coordinates (r, θ, φ) brings Lz to a particu-
larly simple form:

Lz = −i~ . (12.15)
∂φ
However, somewhat more complicated expressions are found for Lx
and Ly :
 
∂ ∂
Lx = −i~ − sin φ − cot θ cos φ , (12.16)
∂θ ∂φ
 
∂ ∂
Ly = −i~ cos φ − cot θ sin φ . (12.17)
∂θ ∂φ
(We define the angles θ and φ in the usual way in Physics: θ is measured
from the positive z-axis and φ is measured in the xy-plane from the
positive x-axis. Lx , Ly and Lz do not depend on r or on derivatives
with respect to r.)

+ In the position representation, the Hamiltonian of a particle of mass m


and potential energy V (r) can be written as −(~2 /2m) ∇2 + V (r). In
Cartesian coordinates,
∂2 ∂2 ∂2
∇2 = + + , (12.18)
∂x2 ∂y 2 ∂z 2
whereas in spherical polar coordinates,

∂2 2 ∂ 1 ∂2 cot θ ∂ 1 ∂2
∇2 = + + + + . (12.19)
∂r2 r ∂r r2 ∂θ2 r2 ∂θ r2 sin2 θ ∂φ2
This last equation can be somewhat simplified, and made more trans-
parent, by using the fact that
 2
1 ∂2

∂ ∂
2 2 2 2
L = Lx +Ly +Lz = −~ 2
+ cot θ + . (12.20)
∂θ2 ∂θ sin2 θ ∂φ2
Namely,
∂2 2 ∂ 1 L2
∇2 = + − . (12.21)
∂r2 r ∂r ~2 r2
Therefore
~2 2 ~2 ∂2 L2
 
2 ∂
H=− ∇ + V (r) = − 2
+ + + V (r).
2m 2m ∂r r ∂r 2mr2
(12.22)

176
Note that we take the potential to be central. I.e., we assume that V (r)
depends only on the distance of the particle to the origin, not on its
angular position. Such potentials are said to be central because the cor-
responding classical force, −∇V , is a vector directed towards or away
from a fixed point (the origin), the “centre of force". Remember that in
Classical Mechanics the angular momentum vector Lcl is a constant of
motion if the potential is central. Since H depends on θ and φ only
through L̂2 , that Lˆz commutes with L̂2 and that L̂2 and L̂z do not
depend on r, L̂2 and L̂z commute H. These two operators thus cor-
respond to quantum mechanical constants of motion if the potential is
central (see Section 11.4).
It is interesting to compare the Hamiltonian given by Eq. (12.22) with
the classical Hamiltonian for the same system,

1 2 |Lcl |2
Hcl = p + + V (r), (12.23)
2m r 2mr2
where pr is the generalized momentum conjugate to the radial variable
r. In Quantum Mechanics as in Classical Mechanics, the radial motion
of the particle is affected both by the potential energy V (r) and by
an “angular momentum barrier", L2 /(2mr2 ) in the quantum case or
|L|2 /(2mr2 ) in the classical case, which plays the role of an additional
potential energy.

12.2 Spin
Physicists became aware in the 1920s that many quantum systems have an angular-
momentum like property, distinct from the orbital angular momentum. At first, it was
theorized that this property could be related to some kind of self-rotation of the parti-
cles forming these systems, a bit as if electrons, protons, etc, were spinning tops. This
property became referred to as “spin" for that reason. It was soon realized that associ-
ating spin to an actual rotation is completely incorrect — an electron is not a spinning
top — but the word “spin" kept being used. The modern understanding of spin is that
this property has nothing to do with an actual motion and has no analogue in Classical
Mechanics.
There is a relation with the orbital angular momentum, though, in that spin is a dy-
namical variable described by a Hermitian vector operator whose components obey

177
the same commutation relations as those of the orbital angular momentum operator L̂.
We denote this vector operator by Ŝ and its x-, y- and z-components by Ŝx , Ŝy and Ŝz :

Ŝ = Ŝx x̂ + Ŝy ŷ + Ŝz ẑ. (12.24)

As for the orbital angular momentum,

[Ŝx , Ŝy ] = i~Ŝz , [Ŝy , Ŝz ] = i~Ŝx , [Ŝz , Ŝx ] = i~Ŝy , (12.25)

and also
[Ŝx , Ŝ2 ] = [Ŝy , Ŝ2 ] = [Ŝz , Ŝ2 ] = 0, (12.26)
with the operator Ŝ2 being the dot product of Ŝ with itself:

Ŝ2 = Ŝ · Ŝ = Ŝx2 + Ŝy2 + Ŝz2 . (12.27)

However, the commutation relations for Ŝx , Ŝy and Ŝz cannot be derived from those of
the position and momentum operators, in contrast to those of L̂x , L̂y and L̂z . We will
see that they can be obtained from the rotational properties of quantum states.
It is important to understand that spin is unrelated to position, momentum or orbital
angular momentum. Thus

[r̂, Ŝ] = [p̂, Ŝ] = [L̂, Ŝ] = 0. (12.28)

Spin is sometimes described as an “intrinsic angular momentum", i.e., an angular mo-


mentum which is not defined with respect to a particular point. (Note the contrast with
the orbital angular momentum: since r is the position vector with respect to the origin,
r × p depends on the choice of the origin.)

12.3 Rotations and rotation operators


First, a fact of fundamental importance:

Rotations about different axes do not commute.

I.e., the order in which rotations are made matters when they are made about different
axes.

178
+ Should you have any doubt about the above statement, try this experiment:
Define two orthogonal axes, fixed with respect to the room, e.g., a vertical
axis and a horizontal axis. Take a book, and rotate it first by 90 deg about
the vertical axis and then by 90 deg about the horizontal axis. Note its
new position. Then start again with the book in the same initial position
as before, but now rotate it first by 90 deg about the horizontal axis and
then by 90 deg about the vertical axis. Its new position won’t be the same
as what you found in the first sequence of rotations...
Clearly, rotations about a same axis do commute. For example, rotating a
book by 20 deg about the vertical axis and then by 30 deg about the same
axis is the same as rotating it first by 30 deg and then by 20 deg.

+ As an exception to the general rule that rotations about different axes


do not commute, a rotation by 180 deg commutes with a rotation by
180 deg about a perpendicular axis.
+ Any rotation in 3D space can be described by a 3 × 3 matrix. For ex-
ample, rotating a point by an angle α about the x-axis changes its co-
ordinates from (x, y, z) to (x0 , y 0 , z 0 ), with
 0   
x 1 0 0 x
y 0  = 0 cos α − sin α y  . (12.29)
z0 0 sin α cos α z
Rotations about the y- or z-axes can be represented similarly. Let us de-
note the corresponding rotation matrices by Rx (α), Ry (α) and Rz (α):
 
1 0 0
Rx (α) = 0 cos α − sin α , (12.30)
0 sin α cos α
 
cos α 0 sin α
Ry (α) =  0 1 0 , (12.31)
− sin α 0 cos α
 
cos α − sin α 0
Rz (α) =  sin α cos α 0 . (12.32)
0 0 1
Remember that the positive sense of rotation is given by the right-hand
rule: imagine that you grip the rotation axis with your right hand, your
thumb pointing in the positive direction of that axis. The direction in
which your fingers then curl in the positive direction of rotation.

179
+ The following, rather obvious facts are also worth noting in view of
their importance in the mathematical theory of these transformations:

• Rotating an object several times, about the same axis or about


different axes, always amounts to rotating it once: the product of
two rotations is also a rotation.
• Composing rotations is associative: Doing a rotation A, then a
rotation B and then a third rotation C can be described as first
doing the resultant of A and B and then C, or first doing A and
then the resultant of B and C.
• There is a “neutral rotation" (a rotation by a zero angle about any
axis). Composing any rotation A with that identity rotation gives
A.
• Any rotation has an inverse, such that doing a rotation and then
its inverse rotation amounts to no change at all (the zero rotation).

These four facts mean that these transformations form a group with
respect to the composition of rotations (in the mathematical meaning
of the word group).
As noted above, rotating a point transforms its coordinates, and this
transformation can be represented by a certain 3 × 3 matrix. Matrices
representing these rotations have three special features: they are real
(not complex), their transpose equal their inverse, and their determi-
nant is 1. A real invertible matrix whose transpose is its inverse is said
to be orthogonal (or unitary since for a real matrix the transpose is the
same as the conjugate transpose). Orthogonal matrices of unit deter-
minant form a group under matrix multiplication: the product of two
such matrices is also an orthogonal matrix of unit determinant, matrix
multiplication is associative, the identity matrix is an orthogonal matrix
of unit determinant, and any such matrix has an inverse, which is also
an orthogonal matrix of unit determinant. This group is called SO(3)
(SO stands for “special orthogonal", the word special referring to the
condition that the determinant is 1). Transformations of coordinates
amounting to a rotation are in 1 to 1 correspondance with elements of
SO(3).
SO(3) is related to the group SU(2), the group of the unitary 2 × 2 com-
plex matrices of unit determinant. In fact general rotations in 3D space
are best described by SU(2) matrices rather than by SO(3) matrices. An

180
explanation of why this is the case would require a lengthy mathemat-
ical analysis of rotations, as would a detailed account of the mathemat-
ics of these 2 × 2 matrices and their relationship with SO(3) matrices.
We just note that these two descriptions may differ significantly for fi-
nite rotations, but they don’t for infinitesimal rotations (rotations by an
infinitesimal angle). For example, it is possible to show that Eqs. (S.12–
S.14) imply that

Ry (−)Rx (−)Ry ()Rx () = Rz (−2 ) + O(3 ), (12.33)

where the symbol O(3 ) means that terms cubic and of higher order in
 have been neglected. (See the homework problem QT2.6 for details.)
Eq. (S.30) is a relation between the SO(3) matrices of Eqs. (S.12–S.14).
However, exactly the same relation would be obtained for the SU(2)
matrices describing the same rotations.

+ An active rotation of an object is a rotation within a fixed system of


coordinates (fixed, e.g., with respect to the room): the rotation changes
the actual position of the object with respect to both the room and the
axes of the coordinate system. By contrast, a passive rotation would be
a rotation of the system of coordinates with respect to the room with
no actual change in the position the object. The coordinates of points
of this object change under a passive rotation as well as under an active
rotation, although usually not in the same way. We only consider active
rotations in these notes.

We start by looking at how wave functions transform when we map each point of
space to its image by a rotation about the x-axis. Consider, e.g., an atom of hydro-
gen in a state described by a wave function ψ(x, y, z), ignoring spin. At a point P of
coordinates (xP , yP , zP ), the value of this wave function is a certain complex num-
ber ψ(xP , yP , zP ). The rotation maps P to a point P 0 of coordinates (x0P , yP0 , zP0 ). In
general, the value of the wave function ψ(x, y, z) at P 0 differs from its value at P . How-
ever, we can define a function ψ 0 (x, y, z) whose value at P 0 is the same as the value of
ψ(x, y, z) at P . How to do this is simple: we take ψ 0 (x, y, z) to be the function whose
value at the point of coordinate (x, y, z) is ψ(x00 , y 00 , z 00 ), where (x00 , y 00 , z 00 ) are the co-
ordinates of the point sent to the point (x, y, z) by the rotation [e.g., (x00 , y 00 , z 00 ) would
be (xP , yP , zP ) if (x, y, z) was (x0P , yP0 , zP0 )]. Clearly, if ψ(x, y, z) describes an atomic
state oriented in the z-direction (e.g., the 2pm=0 state), ψ 0 (x, y, z) describes a state ori-
ented in a different direction but otherwise identical to that described by ψ(x, y, z).

181
Passing from ψ(x, y, z) to ψ 0 (x, y, z) is a transformation. Formally,

ψ 0 (x, y, z) = Rx (α)ψ(x, y, z), (12.34)

where Rx (α) is a certain operator corresponding to a rotation by an angle α about the


x-axis. This operator is potentially quite complicated, as we want Eq. (12.34) to apply
to any wave function ψ(x, y, z), not just one in particular. We will see how to construct
this operator shortly. However, at this stage we simply note that Rx (α = 0) must be
the identity operator since a rotation by a zero angle amounts to no rotation at all.
Moreover, the rotation operator for a rotation by an infinitesimal angle dα must differ
from the identity operator only infinitesimally. I.e., there must be an operator Lx such
that
Rx (dα) = 1 − (i/~)Lx dα. (12.35)
(The factor of −i/~ multiplying Lx has been introduced for later convenience.) Hence,
for an infinitesimal rotation,

ψ 0 (x, y, z) = [1 − (i/~)Lx dα] ψ(x, y, z). (12.36)

Remarkably, it follows from this that


 
∂ ∂
Lx = −i~ y −z . (12.37)
∂z ∂y
Thus Lx is nothing else than the x-component of the orbital angular momentum op-
erator in the position representation. Rotations about the y-axis or the z-axis can be
treated in the same way: operators Ly and Lz can be introduced in terms of which
the corresponding rotation operators Ry (α) and Rz (α) reduce to 1 − (i/~)Ly dα and
1 − (i/~)Lz dα for infinitesimal rotations, and the same reasoning identifies Ly and
Lz with the respective components of the orbital angular momentum operator in the
position representation. Note that here we did not assume from the start that Lx , Ly
and Lz are the three components of the cross product of the position operator and the
momentum operator, as we did in Section 12.1. Instead, we have obtained these three
operators entirely from an analysis of how wave functions transform under a rotation.

+ The reader is referred to the model solution of Problem QT2.6 for the
principle of the calculation leading to Eq. (12.37).

The wave function ψ 0 (x, y, z) thus describes a state which is rotated about the x-axis
compared to the state described by the wave function ψ(x, y, z). In a sense, going from
ψ(x, y, z) to ψ 0 (x, y, z) is “rotating the atom" from one orientation to another. Imagine

182
an experiment in which the atom is excited from the ground state to the 2pm=0 state
by a laser beam whose electric field component is oriented in the z-direction (you will
study this process in the level 3 QM course). Let ψ(x, y, z) be the corresponding wave
function. Arranging for the electric field component of the laser to be oriented in a
direction rotated by an angle α about the x-axis will instead lead to an excited state
oriented in that different direction. This state can be described by the wave function
ψ 0 (x, y, z). The two states ψ(x, y, z) and ψ 0 (x, y, z) thus differ by a rotation R of the
apparatus used to prepare them. Moreover, any prediction one can make about the
results of measurements on the atom in the state ψ(x, y, z) are exactly the same as
those for measurements on the atom in the state ψ 0 (x, y, z), provided the measuring
apparatus is also rotated by R.
This carries over to more general systems, even to systems which are not amenable to
a description in terms of functions of x, y, and z. Imagine an experiment in which a
certain quantum system can be prepared and measured with the apparatus either in
an orientation A or in an orientation B differing from A by a rotation. Any quantum
state |ψi relevant for measurements made with the apparatus in the orientation A has
a counterpart |ψ 0 i for measurements made with the apparatus in the orientation B.
How each of the states |ψi is related to the “rotated state" |ψ 0 i depends on the axis and
the angle of the rotation which brings the apparatus from one orientation to the other.
This transformation can be expressed by the equation

|ψ 0 i = R̂n̂ (α)|ψi, (12.38)

where R̂n̂ (α) is a certain operator corresponding to a rotation by an angle α about an


axis pointing in the direction of the unit vector n̂. In particular, we can introduce the
operators R̂x (α), R̂y (α) and R̂z (α) corresponding to a rotation by an angle α about
the x-, y- or z-axis.
Since |ψi ≡ |ψ 0 i for a rotation by a zero angle, the operator R̂n̂ (α) must differ infinites-
imally from the identity operator if α is infinitesimal. The usual way of stating this
property is to introduce an α-independent Hermitian operator Jˆn̂ and write R̂n̂ (dα) =
Iˆ − (i/~)Jˆn̂ dα. In particular, R̂x (dα) = Iˆ − (i/~)Jˆx dα, R̂y (dα) = Iˆ − (i/~)Jˆy dα
and R̂z (dα) = Iˆ− (i/~)Jˆz dα. Since the identity operator has no physical dimensions,
the operators Jˆn̂ , Jˆx , Jˆy and Jˆz must have the same physical dimensions as ~, i.e., the
physical dimensions of an angular momentum. (These operators are, of course, angular
momentum operators. The letters J and j are commonly used to refer to general an-
gular momentum operators. In Atomic Physics they are also used to refer specifically
to the sum of the orbital angular momentum operator and the spin operator, but here
they simply refer to a general, unspecified, angular momentum.)

183
As seen in the homework problem QT2.6, it follows from the commutation relation
between 3D rotations that the three operators Jˆx , Jˆy and Jˆz do not commute with each
other, and instead that
[Jˆx , Jˆy ] = i~Jˆz , [Jˆy , Jˆz ] = i~Jˆx , [Jˆz , Jˆx ] = i~Jˆy . (12.39)

+ The detail of the calculations leading to this key result can be found in
the model solution of this problem.
The derivation may seem to be based on a (a priori reasonable) assump-
tion that if R1 (α1 ) and R2 (α2 ) are two rotation matrices describing ge-
ometric rotations of points in 3D space, and R1 (α1 ) and R2 (α2 ) are
the corresponding rotation operators, then R2 (α2 )R1 (α1 ) is the rota-
tion operator corresponding to the geometric rotation described by the
matrix R2 (α2 )R1 (α1 ). In fact, this assumption would not be correct
in general. It is correct for the transformations of wave functions dis-
cussed earlier, though, and it is always correct if the angles α1 and α2
are infinitesimal (which is the case considered in the derivation). The
issue is related to an important mathematical detail alluded to above,
which is that the geometric rotations of points in 3D space are described
by elements of the group SO(3) (the group of the orthogonal 3 × 3 real
matrices with determinant equal to 1) whereas the rotation operators
transforming quantum states are represented by elements of the related
group SU(2) (the group of the unitary 2 × 2 complex matrices with de-
terminant equal to 1).
+ A position vector r is transformed into the vector r+dr = r+ n̂×r dα
by an infinitesimal rotation of angle dα about an axis in the direction of
the unit vector n̂. If the x-, y- and z-components of n̂ are, respectively,
sin Θ cos Φ, sin Θ sin Φ and cos Θ, then
r+dr = r+sin Θ cos Φ (x̂×r) dα+sin Θ sin Φ (ŷ×r) dα+cos Θ (ẑ×r) dα.
(12.40)
Correspondingly,
 
R̂n̂ (dα) = Iˆ − (i/~) sin Θ cos Φ Jˆx + sin Θ sin Φ Jˆy + cos Θ Jˆz dα.
(12.41)
Thus Jˆn̂ = n̂ · Ĵ with Ĵ = Jˆx x̂ + Jˆy ŷ + Jˆz ẑ.
+ Because the norm of the ket |ψ 0 i = R̂n̂ (α)|ψi ought to be the same as
the norm of the ket |ψi, the rotation operator R̂n̂ (α) must be unitary:
R̂† (α)R̂n̂ (α) = R̂n̂ (α)R̂† (α) = I.

ˆ
n̂ (12.42)

184
For infinitesimal rotations,

R̂n̂ (dα) = Iˆ − (i/~)Jˆn̂ dα, R̂n̂† (dα) = Iˆ + (i/~)Jˆn̂† dα, (12.43)

and therefore, ignoring terms quadratic in dα,


 
R̂n̂† (dα)R̂n̂ (dα) = R̂n̂ (dα)R̂n̂† (dα) = Iˆ − (i/~) Jˆn̂ − Jˆn̂† dα.
(12.44)
In view of Eq. (12.42), it must be the case that Jˆn̂ = Jˆn̂† : the angular
momentum operators are self-adjoint.

Spin vs. orbital angular momentum


Depending on the system, the operators Jˆx , Jˆy and Jˆz , and more generally Jˆn̂ could
be spin operators, orbital angular momentum operators, or sums of spin operators and
orbital angular momentum operators.
The operator Jˆx x̂ + Jˆy ŷ + Jˆz ẑ can be identified with the spin operator Ŝ if the sys-
tem has no spatial extension (i.e., it is a point particle such as an electron). As seen
above, it can be identified with the orbital angular momentum operator L̂ if the system
has a spatial extension but no spin. The relevant angular momentum operators for ex-
tended systems of non-zero spin are the sum of a spin operator and an orbital angular
momentum operator.
Infinitesimal generators and finite rotations
We have seen that for rotation by an infinitesimal angle α,

R̂n̂ (α) = Iˆ − (i/~)Jˆn̂ α. (12.45)

One says that the angular momentum operator Jˆn̂ is the infinitesimal generator of the
rotations about the axis n̂ in the Hilbert space of the state vectors of the system.
Eq. (12.45) does not apply to the case of a finite rotation angle. Instead, for any finite or
infinitesimal value of α,
R̂n̂ (α) = exp − iαJˆn̂ /~ . (12.46)


Remember that the exponential of an operator is defined by its Taylor series (see Section
3.3). Here,
 α 1  α 2 ˆ2 1  α 3 ˆ3
exp − iαJˆn̂ /~ = Iˆ + −i Jˆn + Jn + · · · (12.47)

−i Jn + −i
~ 2! ~ 3! ~
Thus Eqs. (12.45) and (12.46) are consistent up to first order in α.

185
+ The momentum operator is the infinitesimal generator of translations in
space: as seen in a workshop problem,
exp(−ix0 P/~)ψ(x) = ψ(x − x0 ), (12.48)
where x0 is a length and P = −i~ d/dx.
Likewise, the Hamiltonian is the infinitesimal generator of translations in
time (see Section 11.2 of these notes).

12.4 Symmetries and conservation laws


Compare the ket |ψi representing the state of a molecule oriented in one direction to
the ket |ψ 0 i representing exactly the same state but rotated by an angle α about an axis
n̂ (e.g., about the x-axis). These two kets are related to each other by the equation
|ψ 0 i = R̂n̂ (α)|ψi, (12.49)
where R̂n̂ (α) is a rotation operator. This operator transforms ket vectors of the unro-
tated system into ket vectors of the rotated system, and is the same for any ket vector
(i.e., R̂n̂ (α) does not depend on |ψi).
Suppose that a measurement of the energy of this molecule would be made, e.g., by
recording the energy of the photon(s) it emits when de-exciting to the ground state.
The energy is a physical quantity which does not have a direction (it is not vectorial).
The values that could be obtained in this measurement and the probability of obtaining
each of them should therefore be exactly the same in the state |ψ 0 i as in the state |ψ 0 i,
since these two states only differ in their orientation. (This might not be true if the
molecule interacted, e.g., with an external electric or magnetic field, which would break
the rotational symmetry of space and make |ψi and |ψ 0 i unequivalent; we assume that
this is not the case here.) In particular, the expectation value of the Hamiltonian must
be the same in the rotated system as in the unrotated system. In other words,
hψ 0 | Ĥ |ψ 0 i = hψ| Ĥ |ψi (12.50)
if the system is invariant under rotation. It is possible to deduce from this that Jˆn̂
commutes with Ĥ (and similarly for any other rotation axis).
+ Proof: As shown earlier, rotation operators are unitary operators.
Therefore R̂n̂† (α) = R̂n̂−1 (α), and since a rotation by −α undoes a ro-
tation by α, R̂n̂−1 (α) = R̂n̂ (−α). Thus

hψ 0 | = hψ| R̂n̂† (α) = hψ| R̂n̂−1 (α) = hψ| R̂n̂ (−α). (12.51)

186
As seen above, R̂n̂ (±α) can be taken to be equal to Iˆ ∓ iαJˆn̂ /~ if the
angle α is infinitesimally small, with Jˆn̂ the component of the angu-
lar momentum operator in the n̂ direction and Iˆ the identity operator.
Denoting this infinitesimal angle by , we can therefore write
   
0 0 ˆ i ˆ ˆ i ˆ
hψ | Ĥ |ψ i = hψ| I + Jn̂ Ĥ I − Jn̂ |ψi
~ ~
i i 2
= hψ| Ĥ + Jˆn̂ Ĥ − Ĥ Jˆn̂ + 2 Jˆn̂ Ĥ Jˆn̂ |ψi. (12.52)
~ ~ ~
Neglecting the term of order 2 compared to the terms of order  on
account that  is infinitesimally small, we obtain
i ˆ i
hψ 0 | Ĥ |ψ 0 i = hψ| Ĥ + Jn̂ Ĥ − Ĥ Jˆn̂ |ψi
~ ~
i ˆ
= hψ| Ĥ + [Jn̂ Ĥ − Ĥ Jˆn̂ ]|ψi
~
i
= hψ| Ĥ |ψi + hψ|[Jˆn̂ , Ĥ]|ψi. (12.53)
~

In view of Eq. (12.50), hψ|[Jˆn̂ , Ĥ]|ψi must be zero. Since this must be
the case for any state vector |ψi, we can conclude that [Jˆn̂ , Ĥ] = 0. 

We see that the requirement that an isolated system is invariant under rotation, which
stems from the isotropy of space (space is identical in any direction), implies that the
angular momentum operator commutes with the Hamiltonian, hence that the angular
momentum is a constant of motion (see Section 11.4).
This relationship between the symmetry of the system (here, invariance under rotation)
and the existence of a conserved quantity (here, the angular momentum vector) is in
fact very general. For example, momentum is conserved if the system is invariant under
a spatial translation, and energy is conserved if the system is invariant under a “time
translation", t → t + τ with τ constant (there is invariance under a “time translation"
in the absence of any time-dependent interaction).

187
12.5 Angular momentum operators
A vector operator
Ĵ = Jˆx x̂ + Jˆy ŷ + Jˆz ẑ (12.54)
is said to be an angular momentum operator if its three components Jˆx , Jˆy and Jˆz are
Hermitian and satisfy the commutation relations

[Jˆx , Jˆy ] = i~Jˆz , [Jˆy , Jˆz ] = i~Jˆx , [Jˆz , Jˆx ] = i~Jˆy . (12.55)

The orbital angular momentum operator L̂ and the spin operator Ŝ are particular in-
stances of angular momentum operators.

• Angular momenta are often denoted by the letter J. Accordingly, in this section
we use J to denote general angular momentum operators. This letter is also often
used to represent, specifically, the “total angular momentum operator" L̂ + Ŝ;
what we cover here applies to L̂ + Ŝ as well as to any other angular momentum
operator.

• Knowing that [Jˆx , Jˆy ] = i~Jˆz , the other commutation relations can be obtained
by circular permutation of the indices (x → y, y → z, z → x).

We denote by Ĵ2 the dot product of the operator Ĵ with itself:

Ĵ2 = Ĵ · Ĵ = Jˆx2 + Jˆy2 + Jˆz2 . (12.56)

It is not difficult to show that the commutation relations satisfied by Jˆx , Jˆy and Jˆz
imply that these three operators commute with Ĵ2 :

[Jˆx , Ĵ2 ] = [Jˆy , Ĵ2 ] = [Jˆz , Ĵ2 ] = 0. (12.57)

Like Jˆx , Jˆy and Jˆz , Ĵ2 is a Hermitian operator.


Since Ĵ2 and Jˆz commute, any eigenvector of Ĵ2 can be written as a linear combination
of normalized ket vectors |j, mi that are eigenvectors of both Ĵ2 and Jˆz (see Section
5.3). The following is found, after rather long calculations:

1. The eigenvalues of Ĵ2 are j(j + 1)~2 , where j is a non-negative integer (0, 1,
2,. . . ) or half-integer (1/2, 3/2, 5/2,. . . ). In particular,

Ĵ2 |j, mi = j(j + 1)~2 |j, mi. (12.58)

188
2. The eigenvalues of Ĵz are m~, where m is an integer (0, ±1, ±2,. . . ) or half-
integer (±1/2, ±3/2, ±5/2,. . . ). In the case of simultaneous eigenvectors of Ĵ2
and Jˆz , the possible values of j and m are restricted to the range −j ≤ m ≤ j,
with m running from −j to j by integer steps. Therefore, for any of the |j, mi’s,

Jˆz |j, mi = m~|j, mi (12.59)

with m equal to −j, −j + 1,. . . , j − 1 or j.

3. hj, m|j 0 , m0 i = 0 if j 6= j 0 or m 6= m0 since eigenvectors belonging to different


eigenvalues of a Hermitian operator are always orthogonal. Thus

hj, m|j 0 , m0 i = δjj 0 δmm0 . (12.60)

For example, in the case of an orbital angular momentum operator, the operators Jˆz
and Ĵ2 correspond, in the position representation, to the operators Lz and L2 = L2x +
L2y + L2z , and the eigenvectors |j, mi to the spherical harmonics Ylm (θ, φ):

L2 Ylm (θ, φ) = l(l + 1)~2 Ylm (θ, φ), (12.61)


Lz Ylm (θ, φ) = m~Ylm (θ, φ), (12.62)

with l = 0, 1, 2, . . . (half-integer values of l are not allowed) and −l ≤ m ≤ l. The


spherical harmonics are thus simultaneous eigenfunctions of L2 and Lz .

+ Whereas j can in general be a half-integer as well as an integer, in the spe-


cific case of the orbital angular momentum this quantum number (usually
denoted by l for an orbital angular momentum) can only be an integer. An
in-depth analysis of why this must be so is beyond the scope of the course.

+ The choice of Jˆz , rather than another component of Ĵ, for defining the ba-
sis vectors |j, mi is purely conventional (however, it is a time-honoured
convention and everybody abides by). There is nothing special about the
z-direction. Instead of Jˆz , one could use, for example, the component Jˆn̂
of Ĵ in an arbitrary direction defined by the unit vector n̂ (Jˆn̂ = n̂ · Ĵ). All
what we said above about Jˆz also applies to Jˆn̂ : irrespective of the direc-
tion n̂, Jˆn̂ commutes with Ĵ2 , the eigenvalues of Jˆn̂ are m~ with m = 0,
±1/2, ±1, etc., one can construct a basis of the Hilbert space with simulta-
neous eigenvectors of Ĵ2 and Jˆn̂ , and for these eigenvectors −j ≤ m ≤ j.
However, if n̂ is not in the z-direction, these simultaneous eigenvectors of
Ĵ2 and Jˆn̂ in general will not be the same as the eigenvectors of Ĵ2 and Jˆz
defined above.

189
+ Rotating an eigenvector of Jˆn̂ by an angle α about the axis n̂ simply mul-
tiplies this eigenvector by a phase factor exp(−imα). For example, rotate
the state |j, mi about the z-axis: Since Jˆz |j, mi = ~m|j, mi, R̂z (α)|j, mi =
exp(−iαJˆz /~)|j, mi = exp(−iαm)|j, mi. Intriguingly, this means that
if j (and thus m) is a half integer, then a rotation by 2π transforms |j, mi
into −|j, mi: only a 4π rotation brings |j, mi back to itself...

+ How do the eigenvectors of Ĵ2 transform under a rotation? As


stated above, Ĵ2 commutes with the projection of Ĵ on any direction:
[Ĵ2 , Jˆn̂ ] = 0 for any direction n̂. Since these two operators commute,
Jˆn̂ transforms any eigenvector of Ĵ2 into an eigenvector of Ĵ2 belong-
ing to the same eigenvalue (see Section 3.11 of these notes). In other
words, if Ĵ2 |ψi = j(j + 1)~2 |ψi, then the vector Jˆn̂ |ψi is also an
eigenvector of Ĵ2 and

Ĵ2 Jˆn̂ |ψi = j(j + 1)~2 Jˆn̂ |ψi (12.63)


 

for the same value of j.


A consequence of this fact is that rotations transform any eigenvector
of Ĵ2 into an eigenvector of Ĵ2 belonging to the same eigenvalue. For
example, a p-state of an atom of hydrogen remains a p-state under a
rotation.
Proof: A rotation by an angle α about a direction n̂ transforms a ket
vector |ψi into a ket vector |ψ 0 i = R̂n̂ (α). In view of Eq. (12.46),

|ψ 0 i = exp − iαJˆn̂ /~ |ψi. (12.64)




Since Jˆn̂ commutes with Ĵ2 , any power of Jˆn̂ also commutes with Ĵ2 .
Therefore exp(−iαJˆn̂ /~) also commutes with Ĵ2 . Hence, if Ĵ2 |ψi =
j(j + 1)~2 |ψi, then, for any rotation angle α and any rotation axis n̂,

Ĵ2 R̂n̂ (α)|ψi = j(j + 1)~2 R̂n̂ (α)|ψi , (12.65)


 

which means that the rotated eigenvector |ψ 0 i = R̂n̂ (α)|ψi is also an


eigenvector of Ĵ2 corresponding to the same quantum number j. 

+ The operators Jˆ+ and Jˆ− , defined as Jˆ± = Jˆx ± iJˆy both commute
with Ĵ2 . However, they do not commute with each other and they are

not Hermitian — in fact, Jˆ+ = Jˆ− . These two operators play the role

190
of ladder operators for angular momentum. In particular, one finds,
through algebraic methods, that

Jˆ+ |j, mi = [j(j + 1) − m(m + 1)]1/2 ~|j, m + 1i, (12.66)


Jˆ− |j, mi = [j(j + 1) − m(m − 1)]1/2 ~|j, m − 1i, (12.67)

with Jˆ+ |j, ji = 0 and Jˆ− |j, −ji = 0.

12.6 Matrix representation of angular momentum opera-


tors
Simultaneous eigenvectors of both Ĵ2 and Jˆz were introduced in the previous sec-
tion (the normalized kets |j, mi). Recall that Ĵ2 and Jˆz are Hermitian operators, that
Ĵ2 |j, mi = j(j + 1)~2 |j, mi and Ĵz |j, mi = m~|j, mi, that such simultaneous eigen-
vectors exist for m ranging from −j to j by integer steps (hence, for a given value of
j, m can take 2j + 1 different values) and that hj, m|j 0 , m0 i = δjj 0 δmm0 . The set

{|0, 0i, |1/2, −1/2i, |1/2, 1/2i, |1, −1i, |1, 0i, |1, 1i, |3/2, −3/2i, . . .}

thus forms an orthonormal basis of the relevant Hilbert space.

+ Writing this, we implicitly assumed that the quantum numbers j and m


suffice to identify each of the simultaneous eigenvectors |j, mi uniquely.
This is not always the case. For example, the 2pm=0 and 3pm=0 states of
atomic hydrogen are linearly independent although they both have l = 1
and m = 0 (remember that for the orbital angular momentum operator L̂,
the quantum number j is usually denoted l). Instead of |j, mi, one could
write |j, m, τ i, where τ is a quantum number or a set of quantum num-
bers such that two kets |j, m, τ i and |j, m, τ 0 i differing by these additional
quantum numbers are always orthogonal. Doing so would complicate the
notation further, and therefore we do not do it here.

Since hj 0 , m0 |j, mi = 0 when j 0 6= j and since an eigenvector of Ĵ2 is transformed by Jˆz


into an eigenvector of Ĵ2 corresponding to the same value of j, hj 0 , m0 | Jˆz |j, mi = 0
if j 6= j 0 . In fact, this is the case for any component of Ĵ: e.g., hj 0 , m0 | Jˆx |j, mi =
hj 0 , m0 | Jˆy |j, mi = 0 if j 6= j 0 . It thus makes sense to consider each value of j sepa-
rately, and for a given j represent Jx , Jy and Jz by (2j + 1) × (2j + 1) matrices in a

191
basis formed by the 2j +1 eigenvectors |j, mi with m = −j, . . . , j. Therefore these op-
erators have 1-dimensional representations (for j = 0), 2-dimensional representations
(for j = 1/2), 3-dimensional representations (for j = 1), etc.
The Pauli matrices
The j = 1/2 case is particularly important because electrons, quarks, protons and neu-
trons are spin-1/2 particles. As mentioned previously, spin corresponds to an angular
momentum operator Ŝ. The eigenvalues of Ŝ2 are s(s + 1)~2 with s = 0, 1/2, 1,. . .
The electron, the proton and the neutron are said to be spin-1/2 particles because the
kets representing their quantum state are always eigenvectors of Ŝ2 with s = 1/2
(otherwise these kets would be physically incorrect).
Let us consider the more general case of an angular momentum operator Ĵ and of a
system in an eigenstate of Ĵ2 with eigenvalue j = 1/2. When j = 1/2, the quantum
number m has only two possible values in simultaneous eigenstates of Ĵ2 and Jˆz , i.e.,
−1/2 and 1/2. The Jx , Jy and Jz operators are thus represented by 2 × 2 matrices in
that case.
It is customary to work in the basis {|1/2, 1/2i, |1/2, −1/2i}. (Note the order of the
basis vectors in that set; the matrices representing the relevant operators would be
different if we would work in the {|1/2, −1/2i, |1/2, 1/2i} basis.) For memory, the
two basis vectors |1/2, 1/2i and |1/2, −1/2i are such that
~
Jˆz |1/2, ±1/2i = ± |1/2, ±1/2i. (12.68)
2
Alternative notations for |1/2, 1/2i (the “state of spin up") and |1/2, −1/2i (the “state
of spin down") are |+i and |−i, |χ+ i and |χ− i, and | ↑ i and | ↓ i:

|1/2, 1/2i = |+i = |χ+ i = | ↑ i, |1/2, −1/2i = |−i = |χ− i = | ↓ i. (12.69)

(The terms “spin up" and “spin down" are conventional and do not reflect a particular
orientation with respect to the vertical direction.)
The matrix Jz representing Jˆz in the {|+i, |−i} basis is therefore diagonal. Specifically,
 
~ 1 0
Jz = . (12.70)
2 0 −1

It can be shown that Jˆx and Jˆy are represented by the following matrices in this basis
(see Worksheet 9 for a proof):
   
~ 0 1 ~ 0 −i
Jx = , Jy = . (12.71)
2 1 0 2 i 0

192
Jx , Jy and Jz are often written in terms of the Pauli matrices σx , σy and σz :

Jx = (~/2)σx , Jy = (~/2)σy , and Jz = (~/2)σz , (12.72)

where      
0 1 0 −i 1 0
σx = , σy = , σz = . (12.73)
1 0 i 0 0 −1

Any j = 1/2 eigenvector of Ĵ2 can be written as a linear combination of the vectors
|+i and |−i. Namely, if |ψi is such that Ĵ2 |ψi = j(j + 1)~2 |ψi with j = 1/2, then
there always exists two complex numbers α and β such that

|ψi = α|+i + β |−i. (12.74)

This ket vector is represented by the column vector


 
h+|ψi
h−|ψi

in the {|+i, |−i} basis. Since h+|+i = h−|−i = 1 and h+|−i = h+|−i = 0,
   
h+|ψi α
= . (12.75)
h−|ψi β

In particular, the states of spin up and spin down, |+i and |−i, are represented, respec-
tively, by the column vectors
   
1 0
and .
0 1

12.7 The Clebsch-Gordan coefficients


Consider a system formed by two electrons, electron 1 and electron 2. Electron 1 can be,
e.g., in the state of spin up, which we will now write |+i1 with the subscript 1 indicating
that this ket vector pertains to the state of that electron. Likewise, electron 2 can be, e.g.,
in the state of spin down, |−i2 . We can represent the joint state of these two electrons
by the symbol |+i1 |−i2 . (Recall that this symbol does not represent a product like a
product of numbers or a dot product of geometrical vectors; as is explained in Chapter 7,
it represents a quantum state in which electron 1 is in the state |+i1 and electron 2 in the

193
state |−i2 .) Other possibilities for this bipartite system include |+i1 |+i2 , representing
a state in which both electrons are in a state of spin up, and also |−i1 |+i2 and |−i1 |−i2 .
However, by virtue of the principle of superposition, the state of this system can also
be a linear combination of |+i1 |+i2 , |+i1 |−i2 , |−i1 |+i2 and |−i1 |−i2 . In general,
the two-electron system can be in a joint spin state represented by the ket vector

|ψi12 = α|+i1 |+i2 + β|+i1 |−i2 + γ|−i1 |+i2 + δ|−i1 |−i2 , (12.76)

where α, β, γ and δ are four complex numbers.


More general joint angular momentum states of bipartite systems are often encountered
in applications. In general, such states can be written in the following way:
X
|ψi12 = cj1 m1 j2 m2 |j1 , m1 i1 |j2 , m2 i2 , (12.77)
j1 m1 j2 m2

where the kets |j1 , m1 i1 pertain to one part of the whole system and the kets |j2 , m2 i2
to the other part. Note that the angular momentum operator Ĵ1 acts only on the state
vectors pertaining to part 1 of the whole system while Ĵ2 acts only on the state vectors
pertaining to part 2. For example,
X
Jˆ2z |ψi12 = cj1 m1 j2 m2 |j1 , m1 i1 Jˆ2z |j2 , m2 i2 . (12.78)
j1 m1 j2 m2

One can show that under a rotation by an angle α about an axis n̂, |ψi12 → |ψ 0 i12 =
R̂n̂ (α)|ψi12 with

(12.79)
  
R̂n̂ (α) = exp − iα n̂ · Ĵ1 + n̂ · Ĵ2 /~ .

Let Ĵ = Ĵ1 + Ĵ2 . Each component of Ĵ is the sum of the corresponding components of
Ĵ1 and Ĵ2 :

Ĵ = Jˆx x̂ + Jˆy ŷ + Jˆz ẑ (12.80)


Ĵ1 = Jˆ1x x̂ + Jˆ1y ŷ + Jˆ1z ẑ (12.81)
Ĵ2 = Jˆ2x x̂ + Jˆ2y ŷ + Jˆ2z ẑ, (12.82)

with Jx = J1x + J2x , Jy = J1y + J2y and Jz = J1z + J2z . It is not difficult to show that
Ĵ is an angular momentum operator, i.e., that Jˆx , Jˆy and Jˆz are Hermitian and satisfy
the commutation relations

[Jˆx , Jˆy ] = i~Jˆz , [Jˆy , Jˆz ] = i~Jˆx , [Jˆz , Jˆx ] = i~Jˆy . (12.83)

194
Since Ĵ is an angular momentum operator, each of its components commute with Ĵ2 .
In particular, [Jˆz , Ĵ2 ] = 0. It is not difficult to show that the operators Ĵ21 and Ĵ22 also
commute with both Ĵ2 and Jˆz , besides commuting with each other:

[Ĵ21 , Ĵ22 ] = [Ĵ21 , Ĵ2 ] = [Ĵ21 , Ĵ2 ] = [Jˆz , Ĵ21 ] = [Jˆz , Ĵ22 ] = [Jˆz , Ĵ2 ] = 0. (12.84)

(However, Ĵ2 does not commute with J1z or J2z .) Hence, it is possible to construct a
basis of simultaneous eigenvectors of J21 , J22 , J2 and Jz . We will denote such simulta-
neous eigenvectors by |j1 , j2 , J, M i12 . The quantum numbers j1 , j2 , J and M identify
the corresponding eigenvalues:

Ĵ21 |j1 , j2 , J, M i12 = j1 (j1 + 1)~2 |j1 , j2 , J, M i12 , (12.85)


Ĵ22 |j1 , j2 , J, M i12 2
= j2 (j2 + 1)~ |j1 , j2 , J, M i12 , (12.86)
Ĵ2 |j1 , j2 , J, M i12 = J(J + 1)~2 |j1 , j2 , J, M i12 , (12.87)
Jˆz |j1 , j2 , J, M i12 = M ~|j1 , j2 , J, M i12 . (12.88)

For given values of j1 and j2 , the possible values of J and M in simultaneous eigen-
vectors of J21 , J22 , J2 and Jz are restricted by the “triangular inequality",

|j1 − j2 | ≤ J ≤ j1 + j2 ,

and by the usual condition that

−J ≤ M ≤ J. (12.89)

(Thus J can have any of the following values: |j1 − j2 |, |j1 − j2 | + 1, |j1 − j2 | + 2,. . . ,
j1 + j2 − 1, j1 + j2 . For a given J, M can have any of the following values: −J,
−J + 1,. . . , J − 1, J.)
Each of the eigenvectors |j1 , j2 , J, M i12 can be written as a linear combination of
the vectors |j1 , m1 i1 |j2 , m2 i2 . The coefficients of this superposition are real numbers
called Clebsch-Gordan coefficients (note the spelling: Gordan, not Gordon). Following
well established traditions, we will write them hj1 , j2 , m1 , m2 |J, M i:
X
|j1 , j2 , J, M i12 = hj1 , j2 , m1 , m2 |J, M i|j1 , m1 i1 |j2 , m2 i2 . (12.90)
m1 ,m2

Reciprocally,
X
|j1 , m1 i1 |j2 , m2 i2 = hj1 , j2 , m1 , m2 |J, M i|j1 , j2 , J, M i12 . (12.91)
J

195
It can be shown that the Clebsch-Gordan coefficient hj1 , j2 , m1 , m2 |J, M i is zero when
M 6= m1 + m2 (see Worksheet 9). Therefore, in Eq. (12.90), the double sum runs
only over the values of m1 and m2 such that M = m1 + m2 , and in Eq. (12.91) M is
necessarily equal to m1 + m2 . In both of these equations, the possible values of J are
restricted by the triangular inequality mentioned above.

196
Supplement to Chapter12
These notes complement Chapter 12. Section S.1 covers the same material as Part 1 of
Lecture 17, and offers a simpler and easier introduction to the maths of space symme-
tries than Section 12.3 of the notes. Sections S.2 and S.3 explain various mathematical
facts referred to, in Section 12.3, as having been covered in “Problem QT2.6”.

S.1 Translations and momentum


To keep it simple, we shall only consider translations in 1D for a system confined to the
x-axis. We take the case of a particle whose quantum state is represented by a certain
wave function ψ(x), assuming that ψ(x) is square-integrable on the infinite interval
(−∞, ∞). We also consider another wave function, ψ T (x), whose value at a point of
co-ordinate x is the same as a the value of ψ(x) at the point of co-ordinate x − x0 , for
any x:
ψ T (x) ≡ ψ(x − x0 ). (S.1)
In other words, if the point P is at xP and the point P 0 at xP 0 = xP +x0 , then the value
of ψ(x) at x = xP is the same as the value of ψ T (x) at x = xP 0 . It is easy to see that
the graph of ψ T (x) is identical to the graph of ψ(x) translated by x0 . Apart from this
translation, ψ T (x) represents exactly the same quantum state as ψ(x). We can imagine
being in that situation if we had a device preparing the particle in the state ψ(x) and
had translated this device by x0 so as to prepare the particle in the state ψ T (x).
We can introduce a translation operator T (x0 ) transforming any ψ(x) into the corre-
sponding “translated wave function” ψ T (x). I.e., for any x, any ψ(x) and any x0 ,

T (x0 )ψ(x) = ψ(x − x0 ). (S.2)

One may recognize that this translation operator is the displacement operator intro-
duced earlier in the course (Workshop 2 and Lecture 14), but let us pretend we did
not know this. We note that for x0 = 0, T (x0 ) is necessarily the identity operator, I,
since T (0)ψ(x) = ψ(x − 0) = ψ(x) for any x and any ψ(x). We also note that for
an infinitesimal non-zero displacement dx0 , the translation operator T (dx0 ) cannot
be the identity operator, and must differ from it only by a term of first order in the
displacement. Thus
T (dx0 ) = I − (i/~) O dx0 , (S.3)
where O is a certain operator which this equation defines (we could have included the
factor of −i/~ in the operator O, but haven’t done this for conformity with standard

197
practice). As it turns out, the operator O, so defined, is nothing else than the momentum
operator P .

+ Proof: It is easy to see that



ψ(x − δx) ≈ ψ(x) − δx , (S.4)
dx
to first order in δx. Indeed, by definition of the first order derivative,

dψ ψ(x + δx) − ψ(x) ψ(x − δx) − ψ(x)


= lim = lim . (S.5)
dx δx→0 δx δx→0 −δx
Hence, for an infinitesimal translation,

T (dx0 )ψ(x) = ψ(x) − dx0 . (S.6)
dx
But, from Eq. (S.3), we also have

T (dx0 )ψ(x) = ψ(x) − (i/~) Oψ(x) dx0 . (S.7)

Comparing these two equations, we see that


d
O = −i~ . (S.8)
dx
I.e., O is the momentum operator in the position representation. 

We therefore have that


T (dx0 ) = I − (i/~) P dx0 . (S.9)
One says that the momentum operator is the infinitesimal generator of translations.
Eq. (S.9) applies only to the case of an infinitesimal translation. For a finite translation,
one can show that
T (x0 ) = exp[−(i/~)P x0 ], (S.10)
as seen previously. (This operator was called the displacement operator in Worksheet 2
and Lecture 14.)
We stress that Eqs. (S.9) and (S.10) can be taken as defining the momentum operator:
one can say that P is the operator such that translating a wave function by a distance
x0 is effected by the operator exp[−(i/~)P x0 ].

198
S.2 Rotations and orbital angular momentum
First, recall that any rotation in 3D space can be described by a 3 × 3 matrix. For
example, rotating a point by an angle θ about the z-axis changes its co-ordinates
from (x, y, z) to (x0 , y 0 , z 0 ), with
 0   
x cos θ − sin θ 0 x
y 0  =  sin θ cos θ 0 y  . (S.11)
z 0 0 0 1 z
Rotations about the x- or y-axes can be represented similarly. Let us denote the
corresponding rotation matrices by Rx (θ), Ry (θ) and Rz (θ):
 
1 0 0
Rx (θ) = 0 cos θ − sin θ , (S.12)
0 sin θ cos θ
 
cos θ 0 sin θ
Ry (θ) =  0 1 0 , (S.13)
− sin θ 0 cos θ
 
cos θ − sin θ 0
Rz (θ) =  sin θ cos θ 0 . (S.14)
0 0 1

Now, consider a hydrogen atom in a state described by a wave function ψ(x, y, z),
ignoring spin and other relativistic effects. Imagine that you rotate this atom by an
angle θ about the z-axis of the system of coordinates, without otherwise disturbing
it in any way. For simplicity, assume that the origin of the system of coordinates
is at the centre of mass, so that rotations about the x-, y- or z-axis may change the
orientation of the atom but not its location in 3D space. The rotation transforms
the wave function ψ(x, y, z) into a new wave function ψ 0 (x, y, z), and generally
ψ 0 (x, y, z) will not be the same function as ψ(x, y, z). However, if ψ 0 (x, y, z) will not
differ much from ψ(x, y, z) if the rotation is by a very small angle. More precisely,
let us consider a rotation about the z-axis by an angle . Then,
 
∂ ∂
ψ 0 (x, y, z) = ψ(x, y, z) +  y −x ψ(x, y, z) + · · · , (S.15)
∂x ∂y
where the terms not written in the right-hand side are quadratic or of higher order
in .

199
+ Proof: First, we note that rotating both the state and the co-ordinates
amounts to no change. Hence ψ 0 (x, y, z) = ψ(x00 , y 00 , z 00 ) if (x00 , y 00 , z 00 )
are the initial co-ordinates of the point brought to the point of co-
ordinates (x, y, z) by the rotation. For a rotation by an angle  these
two sets of co-ordinates are related to each other by Eq. (S.11) with
θ = :      00 
x cos  − sin  0 x
y  =  sin  cos  0 y 00  . (S.16)
z 0 0 1 z 00
We can invert this equation to express x00 , y 00 and z 00 in terms of x, y
and z. We can do this simply by changing  into − (a rotation by θ
followed by a rotation by −θ is the same as no rotation at all, hence the
product of Rz (θ) by Rz (−θ) has got to be the unit matrix). Therefore
 00    
x cos  sin  0 x
y 00  = − sin  cos  0 y  . (S.17)
z 00 0 0 1 z
Since we work to first order in , we can change cos  by 1 (recall that
cos  = 1 − 2 /2 + · · · ) and sin  by . Thus, to first order in , x00 =
x + y, y 00 = y − x and z 00 = z. Moreover, also to first order in ,

ψ(x00 , y 00 , z 00 ) = ψ(x, y, z) +  (S.18)
d
 =0 00
∂ψ dy 00

∂ψ dx
= ψ(x, y, z) +  + 00 (S.19)
∂x00 d ∂y d =0
Since dx00 /d = y and dy 00 /d = −x,
 
00 00 00 ∂ψ ∂ψ
ψ(x , y , z ) = ψ(x, y, z) +  y 00 − x 00 (S.20)
∂x ∂y =0
We also note that
∂ ∂
ψ(x00 , y 00 , z 00 ) = ψ(x, y, z), (S.21)
∂x00 =0 ∂x
∂ ∂
ψ(x00 , y 00 , z 00 ) = ψ(x, y, z). (S.22)
∂y 00 =0 ∂y
Therefore, to first order in ,
 
00 00 00 ∂ ∂
ψ(x , y , z ) = ψ(x, y, z) +  y −x ψ(x, y, z), (S.23)
∂x ∂y
which is Eq. (S.15). 

200
The quadratic and higher order terms can be ignored in the right-hand side of
Eq. (S.15) if  is infinitesimal. Namely, for a rotation by an infinitesimal angle dα,
  
0 ∂ ∂
ψ (x, y, z) = 1 + dα y −x ψ(x, y, z). (S.24)
∂x ∂y

Let us write this equation as

ψ 0 (x, y, z) = [1 − (i/~)Jz dα]ψ(x, y, z). (S.25)

Clearly,  
∂ ∂
Jz = −i~ x −y , (S.26)
∂y ∂x
and we recognize that Jz is nothing else than Lz , the z-component of the orbital
angular momentum operator.

S.3 Commutation relations and angular momentum


Let us come back to Eqs. (S.12–S.14). We are particularly interested by rotations
by an infinitesimally small angle . To second order in , cos() = 1 − 2 /2 and
sin() = , and therefore
 
1 0 0
Rx () ≈ 0 1 − 2 /2 −  , (S.27)
0  2
1 −  /2
 2
1 −  /2 0 

Ry () ≈  0 1 0 , (S.28)
− 0 1 −  /22

1 − 2 /2
 
− 0
Rz () ≈   1 − 2 /2 0 . (S.29)
0 0 1

A little calculation shows that


1 2 0
 

Ry (−)Rx (−)Ry ()Rx () ≈ −2 1 0 ≈ Rz (−2 ), (S.30)


0 0 1

neglecting terms in 3 or of higher order in .

201
+ Proof: We start by calculating the matrix product Ry ()Rx ():
1 − 2 /2 0
  
 1 0 0
Ry ()Rx () ≈  0 1 0  0 1 − 2 /2 − 
− 0 1 −  /22 0  1 − 2 /2
(S.31)
1 − 2 /2 2  − 3 /2
 

≈ 0 1 − 2 /2 −  (S.32)
− 3 2 4
 −  /2 1 − 2( /2) +  /4
Keeping only the terms of order 0, 1 or 2 in  results in
1 − 2 /2 2
 

Ry ()Rx () ≈  0 1 − 2 /2 −  . (S.33)
−  1 − 2
We now calculate the product of the four matrices, using the fact
that the matrix Ry (−)Rx (−) is simply the matrix Ry ()Rx () with
 changed into −. For clarity, we first do the whole calculation and
only then drop any term that is not of order 0, 1 or 2 in . It would have
been as good (and more expedient) to drop these higher order terms as
soon as they arise, without writing them at all.
Ry (−)Rx (−)Ry ()Rx ()
1 − 2 /2 2 1 − 2 /2 2
  
− 
≈ 0 1 − 2 /2   0 1 − 2 /2 − 
 − 1− 2 −  1 − 2
(S.34)
1 − 2(2 /2) + 4 /4 + 2 22 − 2(4 /2) − 2  − 3 /2 − 3 −  + 3
 

≈ −2 1 − 2(2 /2) + 4 /4 + 2 − + 3 /2 +  − 3 


 − 3 /2 −  + 3 3 −  + 3 /2 +  − 3 22 + 1 − 22 + 4
(S.35)

1  0 2 

≈ −2 1 0 .
 (S.36)
0 0 1

We should compare this to Rz (−2 ). Replacing  by −2 in Eq. (S.29)


and dropping the terms of order 4 yields
1 2 0
 

Rz (−2 ) ≈ −2 1 0 . (S.37)


0 0 1

202
Hence, to order 2 in , Ry (−)Rx (−)Ry ()Rx () = Rz (−2 ). We can
be sure that all the terms of order 0, 1 or 2 are taken into account in the
above calculations because cos θ differs from 1 − θ2 /2 only by terms in
θ4 and of higher order and sin θ differs from θ only by terms in θ3 and
of higher order. 

Formally, the relation between the state of a rotated system and the correspond-
ing state of the unrotated system can be expressed by an equation such as |ψ 0 i =
R̂z (θ)|ψi, where R̂z (θ) is an operator corresponding to a rotation by an angle θ
about the z-axis. Suppose that we would rotate the system by an angle θ1 about the
x-axis and then by an angle θ2 about the y-axis. Correspondingly, the initial state
vector |ψi would be transformed into the state vector |ψ 00 i = R̂y (θ2 )R̂x (θ1 )|ψi. It
can be shown that these rotation operators can be written in the following forms:

R̂x (θ) = exp(−iθJˆx /~), (S.38)


R̂y (θ) = exp(−iθJˆy /~), (S.39)
R̂z (θ) = exp(−iθJˆz /~), (S.40)

or more generally, for a rotation about an axis n̂,

R̂n (θ) = exp(−iθJˆn /~). (S.41)

The operators Jˆx , Jˆy , Jˆz and Jˆn are defined by these equations. They have the
same physical dimensions as ~, which are the physical dimensions of an angular
momentum.
Eq. (S.30) says that if  is sufficiently small, then a rotation by  about the x-axis
followed by a rotation by  about the y-axis followed by a rotation by − about
the x-axis followed by a rotation by − about the y-axis is effectively the same as
a rotation by −2 about the z-axis. In terms of rotation operators, we must have,
correspondingly,

R̂y (−)R̂x (−)R̂y ()R̂x () = R̂z (−2 ). (S.42)

This equation implies that [Jˆx , Jˆy ] = i~Jˆz . I.e., the operators Jˆx and Jˆy satisfy the
commutation relation of angular momentum operators.

+ Proof: We note that up to order 2 ,


 
i 1
R̂x () ≈ Iˆ −  Jˆx − 2 2 Jˆx2 , (S.43)
~ 2~

203
and similarly for R̂y (). We rewrite Eq. (S.42), replacing the rotation
operators by these approximate expressions, and simplify the result.
Let us start with R̂y ()R̂x ():
  
i 1 i 1
R̂y ()R̂x () ≈ Iˆ −  Jˆy − 2  Jˆy Iˆ −  Jˆx − 2  Jˆx
2 2 2 2
~ 2~ ~ 2~
(S.44)
i   2  
≈ Iˆ −  Jˆx + Jˆy − 2 Jˆy Jˆx + Jˆx2 /2 + Jˆy2 /2 .
~ ~
(S.45)

Then

R̂y (−)R̂x (−)R̂y ()R̂x ()



i ˆ  2  
ˆ ˆ ˆ ˆ ˆ 2
≈ I +  Jx + Jy − 2 Jy Jx + Jx /2 + Jy /2 × ˆ 2
~ ~

i   2  
Iˆ −  Jˆx + Jˆy − 2 Jˆy Jˆx + Jˆx /2 + Jˆy /2
2 2
(S.46)
~ ~
2  ˆ
 2  
ˆ
≈I+ 2 ˆ ˆ ˆ ˆ2 ˆ2
Jx + Jy − 2 Jy Jx + Jx /2 + Jy /2 (S.47)
~
2 h i
≈ Iˆ + 2 Jˆx Jˆy − Jˆy Jˆx . (S.48)
~

Now, up to order 2 in , R̂z (−2 ) ≈ Iˆ + (i/~) 2 Jˆz . Hence


(2 /~2 )[Jˆx Jˆy − Jˆy Jˆx ] = (i/~)2 Jˆz , from which we see that, indeed,
the operators Jˆx , Jˆy and Jˆz must be such that Jˆx Jˆy − Jˆy Jˆx = i~Jˆz .

204
Appendix A: Multiplying matrices, column vec-
tors and row vectors
This appendix is just a brief reminder of how to multiply two matrices, a column vector
by a matrix, a column vector by a row vector, or a matrix by a row vector. The rule is
always the same: one multiplies each element of a row by each element of a column
and sum up the products.

1. Multiplying two square matrices. The result is a square matrix:


 0
α β0
  0
αα + βγ 0 αβ 0 + βδ 0
 
α β
= . (A.1)
γ δ γ 0 δ0 γα0 + δγ 0 γβ 0 + δδ 0

2. Multiplying a column vector by a matrix. The result is a column vector:


    
α β a αa + βb
= . (A.2)
γ δ b γa + δb

3. Multiplying a column vector by a row vector (the row vector on the left of the
column vector). The result is a number:
 a0
 
a b = aa0 + bb0 . (A.3)
b0

4. Multiplying a matrix by a row vector (the row vector on the left of the column
vector). The result is a row vector:
 
 α β
(A.4)

a b = aα + bγ aβ + bδ .
γ δ

5. Multiplying a row vector by a column vector (the row vector on the right of the
column vector). The result is a matrix:
 0
aa ab0
  
a 0 0 (A.5)

a b = .
b ba0 bb0

Recall, also, that multiplying a matrix by a scalar (a number) amounts to multiplying


each element of the matrix by that scalar. E.g.,
   
α β cα cβ
c = . (A.6)
γ δ cγ cδ

205
Appendix B: Complex numbers
Familiarity with complex numbers is essential in Quantum Mechanics, even though not
quite all quantum mechanical calculations involve complex numbers. All the essentials
are summarised here. In principle, this appendix contains nothing that you have not
already seen previously. Please refer to your maths courses, to a maths textbook and/or
to reliable online material if you are unfamiliar with any of the results stated below.
Common errors

1. Referring to any complex numbers as imaginary numbers. (Not a bad error, but
the practice suggests a lack of familiarity with complex numbers. See below.)

2. Confusing the squared modulus or a complex number with the square of this
number. (A bad error, see below.)

An ABC of complex numbers


A complex number is basically a set of two real numbers. The standard form of such a
number is a + ib, with a and b being real numbers. The symbol i represents the imagi-
nary unit: i is such that i2 = −1. (The imaginary unit is almost always represented by
the letter i in Quantum Mechanics, but the letter j is also used in other areas of Physics,
e.g., in Electronics.)
Suppose that z = a + ib with a and b real. Then:

• a is called the real part of z and b is called the imaginary part of z. The
corresponding symbols are Re z and Im z, or alternatively <(z) and =(z):

a = Re z, b = Im z. (B.1)

• z is said to be real if b = 0 and to be imaginary if a = 0 and b 6= 0. Thus an


imaginary number is a complex number with a zero real part and a non-zero
imaginary part. For example, 3i is an imaginary number (3i ≡ 0 + i × 3). In this
context, a real number is a complex number with a zero imaginary part.

• The complex conjugate of z, denoted z ∗ , is a − ib. Thus

z + z∗ z − z∗
Re z = , Im z = . (B.2)
2 2i

206

• The modulus of z, denoted |z|, is the real number a2 + b2 . Note that z and z ∗
have the same modulus. Moreover, zz ∗ = z ∗ z = |z|2 :

z ∗ z = (a + ib)(a − ib) = a2 − (ib)2 = a2 − i2 b2 = a2 + b2 = |z|2 . (B.3)

Unless z is real, the modulus squared of z, |z|2 , is not the same number as the
square of z, z 2 . E.g.,
|2 + 3i|2 = 22 + 32 = 13, (B.4)
whereas
(2 + 3i)2 = 22 + 2 × 2 × 3i + (3i)2 = −5 + 12i. (B.5)

• |z1 z2 | = |z1 | |z2 |, but in general |z1 + z2 | =


6 |z1 | + |z2 |.

Complex exponentials
The familiar exponential function can be generalized to a function of a complex vari-
able, exp(z). This function crops up very often in applications.

• If x is a real number, then

exp(±ix) = cos x ± i sin x, (B.6)

exp(ix) + exp(−ix) exp(ix) − exp(−ix)


cos x = , sin x = , (B.7)
2 2i
and
| exp(±ix)| = 1. (B.8)
Eq. (B.6) is Euler’s formula.

• exp(a + ib) = exp(a) exp(ib), hence if a and b are real, | exp(a + ib)| = exp(a).

• z can always be written as |z| exp(iα), where α is a real number called the ar-
gument of z. One can write
α = arg z. (B.9)
(The argument of a complex number is not unique: if α is such that z = |z| exp(iα),
then z = |z| exp[i(α + 2nπ)], n = 0, ±1, ±2, . . .) Note that if z = |z| exp(iα),
then z ∗ = |z| exp(−iα).

207

You might also like