Kinetic Theory of Self-Gravitating Systems: James Binney
Kinetic Theory of Self-Gravitating Systems: James Binney
James Binney
1 Introduction 1
1.1 What differentiates stellar and electrostatic plasmas? 1
1.2 Virial theorem 1
1.3 Thermal equilibrium? 2
1.4 Escape 3
1.5 Fluctuations 3
2 Mean-field model 4
2.1 Angle-action variables 5
• Adiabatic invariance 6 • Hamilton-Jacobi equation 6 • Choice of actions 7
2.2 Self-consistent, mean-field model 7
2.3 Biorthogonal potential-density pairs 8
3 Perturbing the DF 8
Appendices
A Rewriting D1 20
1
Introduction
The aim of these 8 lectures is to show how the ideas introduced earlier in the second section
of the course in connection with electrostatic plasmas can be extended to stellar systems, with
sometimes surprising results. This is rather a small, even niche, corner of stellar dynamics, but
an intriguing one that fits neatly with the remainder of this course. A general introduction to
stellar dynamics can be found in Galactic Dynamics, Binney & Tremaine, PUP (2008) – hereafter
BT08 – while rather a different perspective is given in a review arXiv1309.2794 (NewAR, 57, 29).
Any fan of recorded lectures can try the lectures at http://iactalks.iac.es/talks/view/329.
K is the cluster’s kinetic energy while W is its potential energy. So we have an N -particle version
of the 1-particle virial theorem, which should be familiar from quantum mechanics: that if the
P.E. V (x) scales as V (sx) = sα V (x), then 2 hKi = α hV i, where K = p2 /2m is the kinetic-energy
operator.
Our Galaxy is thought to have a (largely dark) mass M ∼ 1012 M⊙ distributed through a
volume of characteristic radius R ∼ 100 kpc. Taking |xα − xβ | ∼ R and summing N 2 terms 1/R
we estimate
GM 2
W ∼− . (1.6)
2R
If the typical speed of a dark particle is σ, 2K ∼ M σ 2 , so the virial theorem yields
GM
σ2 ∼ . (1.7)
2R
Putting in numbers
r
6.7 × 10−11 × 1012 × 1.6 × 1030
σ= ≃ 2.0 × 105 m s−1 ∼ 200 km s−1 .
2 × 100 × 3 × 1019
is a typical random velocity of a dark particle.
1.4 Escape
There’s another conceptual problem associated with the system attaining thermal equilibrium:
p
gravity confines particles only up to a finite escape speed vesc (x) = −2Φ(x), where Φ is the
gravitational potential. Hence in thermal equilibrium the df f (x, v) would have to vanish for
v > vesc , so could not be a Maxwellian since the latter is non-zero all the way to ∞. Yet surely
the processes that maintain thermal equilibrium will scatter stars to v > vescape and such stars
will then escape (‘evaporate’) from the system. We can assess the scale of this issue by computing
the mean-square value of vesc :
Z Z
2 1 2 W K
Vesacape = d3 x ρvesc
2
=− d3 x ρΦ = −4 =8 = 4σ 2 . (1.9)
M M M M
So Vesc = 2σ, i.e. twice the rms speed, which isn’t far into the√
high-velocity tail. The fraction of
a Maxwellian distribution with one-dimensional dispersion σ/ 3 that lies above 2σ is
R∞ R∞ 2
dv v 2 exp(−3v 2 /2σ 2 ) √ dx x2 e−x
6 1
fesc = R2σ
∞ = R∞ 2
∼ . (1.10)
2 2
dv v exp(−3v /2σ ) 2 2
dx x e −x 140
0 0
Since the velocities of stars will be reshuffled into a Maxwellian once every relaxation time, each
such time we expect ∼ M/140 of the mass to evaporate.
1.5 Fluctuations
So how long is the relaxation time? Consider a system of mass M and characteristic scale R,
p
in which the characteristic internal speed is σ = GM/R. Consider now a subregion of size
r = xR, which contains mass Mr ≃ x3 M . If there are N stars in the entire system, then n ≃ x3 N
is the typical number
√ of stars
√ in the subregion, and on account of Poisson noise Mr fluctuates
by δMr = Mr / n = x3 M/ x3 N during times δt = r/σ. Consider a point that is distance yR
from our subregion. At this point a single fluctuation in the subregion’s gravitational attraction
will change the velocity of a test star by
This formula states that for given y, large volumes x ≃ 1 perturb v very much more strongly
than small volumes x ≪ 1. Against this trend we must bear in mind that (a) y ≥ x, (b) the
number of subregions perturbing increases as x−3 as x decreases, and (c) the time within which
the contribution (1.11) comes about decreases with x, so in a given time each small subregion
makes many more contributions to v than does a large subregion.
We assume that the contributions to v from different subregions are statistically independent,
so it’s appropriate to add the δv in quadrature. There are ∼ 4π(y/x)2 subregions of scale x
that are distance yR from our point, and in a crossing time tcross = R/σ each such subregion
contributes x−1 times. So in a crossing time all these subregions change v 2 by
y2 σ 2 x2
(∆v)2 = 4π 3
(δv)2 = 4π 2 . (1.12)
x y N
Now we have to sum over y = x, 2x, 3x, . . . , 1. We convert the sum to an integral using dy = x
and have Z
X 1 1 1 dy 1 1 1
2
≃ 2
= − 1 ≃ 2. (1.13)
y x x y x x x
Hence in a crossing time the subregions of scale x change v 2 by
N
trelax ≃ tcross . (1.16)
4 ln N
In an ideal gas the number of molecules in a given volume experiences Poisson fluctuations
as was assumed above, and these fluctuations can be considered to arise from thermally excited
sound waves. The self-gravity of a stellar system makes the system more compressible on large
scales than on small scales, where self gravity is unimportant and an ideal gas provides a valid
model. Hence, large-scale fluctuations have a larger amplitude than simple Poisson fluctuations,
with the consequence that contrary to our finding above of equal contributions from all scales,
fluctuations on the size of the system are dominant. In Chapter 4 we will develop the apparatus
required to include the amplifying effect of self gravity, and in Chapter 5 we will see that self-
gravity accelerates the relaxation of stellar discs by orders of magnitude. Its effect is much smaller
in star clusters.
The conclusions we’ve reached in §§1.3 and 1.4 make it clear that the statistical mechanics
of self-gravitating systems must be very different from anything we have previously encountered.
2
Mean-field model
In this section we assemble the tools needed to figure out the long-term evolution of self-
gravitating systems. The key step is to recognise that the evolution can be described as a
sequence of steady states of a ‘mean-field’ model. Since the dominant forces come from remote
particles (Figure 1.1), an excellent approximation to F can be obtained by smearing the masses of
each particle over distances somewhat larger than the inter-particle distance. The gravitational
potential Φ0 of this mean-field model is the time-average of the system’s real fluctuating Φ.
The latter may be computed by smearing the passes of particles through volumes that extend
just a bit further than the local inter-particle distance. This system has a pretty smooth density
distribution ρ(x) and consequently a very smooth gravitational potential
Z
ρ(x′ )
Φ(x) = −G d3 x′ . (2.1)
|x − x′ |
Conservation of particles as they flow through phase space requires that the one-particle df
f (x, v) of the mean-field model satisfies
∂f ∂ ∂
0= + · (f ẋ) + · (f v̇) = 0, (2.2)
∂t ∂x ∂v
and since by Hamilton’s equations
∂ ∂2H ∂
· ẋ = =− · v̇, (2.3)
∂x ∂x · ∂v ∂v
2.1 Action-angle coordinates 5
where the 2d or 3d vectors n have integer components and the vector Ω is made up of 2 or 3
frequencies that are characteristic of the orbit.2 From the quasiperiodic nature of x(t) it can be
shown (see V.I. Arnold Mathematical Methods of Classical Mechanics Springer) that the orbit
admits at least as many independent integrals of motion I(x, v) as it has degrees of freedom.
That is, there are at least 2 or 3 (depending on whether or not the orbit is confined to a plane)
independent functions on phase space such that
d
I[x(t), v(t)] = 0. (2.6)
dt
In a time-independent potential, H is always an integral of motion, and in an axisymmetric
potential the appropriate component of angular moment is always another integral. The non-
trivial numerical result is that there is almost always a third integral of motion of unknown
functional form.
Given a set of integrals of motion Ii , any function Ji (I1 , I2 , I3 ) of three variables provides
another integral. Given this choice, it’s natural to ask whether a set of integrals can be found
that can be complemented by canonically-conjugate variables, θi . For if we had a system of
canonical coordinates (θ, J) such that the momenta were constant, half of Hamilton’s equations
would read
∂H
0 = J˙i = − . (2.7)
∂θi
That is, these equations of motion establish that the Hamiltonian, and its derivatives, are func-
tions of the Ji only and are therefore constant on each orbit. The other equations of motion are
now trivially solved:
∂H
θ̇i = = Ωi (J) a constant ⇒ θi (t) = θi (0) + Ωi t. (2.8)
∂Ji
So in the (θ, J) coordinate system dynamics becomes trivial. The magic integrals Ji are called
actions and their conjugate variables θi are called angles because one usually scales the actions
1 Binney & Spergel, ApJ, 252, 308 (1982)
2 Whereas the Fourier decomposition of a periodic function contains only integer multiples of a single funda-
mental frequency, a quasiperiodic function contains only integer linear combinations of 2 or more fundamental
frequencies.
6 Chapter 2: Mean-field model
so ordinary phase-space coordinates such as x are 2π periodic in the angles. That is the function
on phase space x can be expanded as
X
x(θ, J) = Xn (J)ein·θ . (2.9)
n
The Fourier expansion (2.5) from which we started arises by eliminating θ between equations
(2.8) and (2.9).
Whenever the frequencies Ωi are incommensurable (that is, no relation of the form n · Ω = 0
exists) the actions constitute a complete set of integrals of motion in the sense that any integral
of motion can be obtained as a function of them. Since almost all real numbers are irrational,
the frequencies of most orbits are incommensurable and the actions are generically a complete
set of integrals.
We have seen that by Jeans’ theorem the df of an equilibrium mean-field model is an
integral of motion, so it is a function f (J) of the actions. In a plasma we assume f (v) because
in a homogeneous system, v = constant. Many formulae derived for a plasma will go over to a
stellar system with the substitutions x → θ, v → J.
where −L2 is a constant of separation. Sφ = Lφ follows trivially, and almost as easily we get
Z r
r
L2
Sr = dr 2(E − Φ) − . (2.13)
r2
These operations yield a function S(x, E, L), which is not of the required form: the integrals of
motion E, L are (unknown) functions of the required action integrals; the pair (E, L) cannot be
complemented by variables to form a set of canonical coordinates. The actions Ji are defined by
I
1
Ji = dx · p, (2.14)
2π γi
where each path γi around the part of phase space accessible to the orbit cannot be deformed
into another of the γi without leaving the accessible region.3
3 The integrals (2.14) are unchanged by sliding γ over the accessible region because the latter has vanishing
P i
Poincaré invariant i dxi dpi .
2.2 Mean-field model 7
Once we’ve separated the H-J eqn, we can evaluate the integral (2.14) that defines an action
associated with each spatial coordinate because a separated equation such as (2.12) makes pi a
function of only its coordinate xi . In the case of 2d motion, we hold r constant along one path,
and φ constant along the other path. Then the first path trivially yields Jφ = L and the second
path yields r
I Z
1 1 rmax L2
Jr (E, L) = dr pr = dr 2(E − Φ) − 2 . (2.15)
2π π rmin r
To find the angle variables we have to use the chain rule
In the spherical limit Jz = L − |Lz | is the angular momentum in the (x, y) plane.
4) Return to step (2) with ρ0 replaced by ρ1 and iterate until ρn (x) differs negligibly from
ρn−1 . This typically requires ∼ 5 iterations.5
The only tricky part of this procedure is obtaining the angle-action coordinates of Φn . In practice
approximations to the true (θ, J) coordinates are used.6
4 In the 20th c. composers appeared who argued that any series of notes constitutes music. We disagree:
writing music involves observing rules regarding scales, chords, etc. Similarly, creating plausible stellar systems
requires adherence to rules regarding how f (J) behaves in certain parts of action space. But these rules are a
matter of good taste.
5 Binney, MNRAS, 440, 787 (2014)
6 Sanders & Binney, MNRAS, 457, 2107 (2016)
2.3 Biorthogonal potential-density pairs
Unfortunately, while Φ is a function of only x, it becomes a function of both θ and J. So while
angle-action variables make dynamics trivial (advance θ linearly in t), they seriously complicate
the solution of Poisson’s eqn.
We finesse this difficulty by introducing a basis of biorthogonal potential-density pairs.
That is, a set of pairs (ρ(α) , Φ(α) ) such that
Z
′
4πGρ(α) = ∇2 Φ(α) and d3 x Φ(α)∗ ρ(α ) = −Eδαα′ , (2.20)
where E is an arbitrary constant with the dimensions of energy. Given a density distribution
ρ(x), we expand it in the basis
X
Φ(x) = Aα Φ(α) (x),
X
α
ρ(x) = Aα ρ(α) (x) ⇒ Z (2.21)
1
α Aα = − d3 x Φ(α)∗ (x)ρ(x).
E
3
Perturbing the DF
∂f0 ∂f1
0= + + [f0 , H0 ] + [f0 , H1 ] + [f1 , H0 ] + [f1 , H1 ]. (3.2)
∂t ∂t
By Jeans’ theorem, [f0 , H0 ] = 0. When we ensemble-average the equation, the parts linear in f1
or H1 vanish, so we are left with
∂f0
0= + h[f1 , H1 ]i. (3.3a)
∂t
The second term in this equation is clearly O(f12 ) or smaller, so the time derivative of f0 is small,
as expected.
Since we are not formally expanding in some small parameter (e.g., 1/N ) we haven’t yet
defined f1 exactly: our only requirement is that its ensemble average vanishes, so hf i = f0 .
Hence we are free to define f1 such that the part of eqn (3.2) that is O(f1 ) is identically zero,
which is a stronger statement that f1 has vanishing ensemble average. That is, we now require
∂f1
0= + [f0 , H1 ] + [f1 , H0 ], (3.3b)
∂t
Introduction 9
where Z
f1 (x′ , v)
H1 = Φ1 (x) = −G d3 x′ d3 v
|x − x′ |
because Ronly the potential term fluctuates and it is related to the perturbation to the density,
ρ1 (x) = d3 v f1 (x, v), by the Poisson integral.
Since the (θ, J) system, like the (x, v) one, is canonical and Poisson brackets are invariant
under changes of canonical coordinates, we can substitute x → θ, v → J in all these formulae if
we wish. Then we have
f (θ, J, t) = f0 (J) + f1 (θ, J, t) (3.4)
and
H = H0 (J) + Φ1 (θ, J, t) (3.5)
and eqn (3.4) becomes
Using these results, we can rewrite the linearised Vlasov equation (3.6) as
!
X ∂ fˆ1 ∂ ˆ0
f
0= e in·θ
+ in · Ωfˆ1 − in · Φ̂1 . (3.8)
n
∂t ∂J
Since θ is arbitrary, for this equation to hold, every coefficient of ein·θ must separately vanish,
so we obtain an infinite set of equations
∂ fˆ1 ∂ fˆ0
= in · Φ̂1 − in · Ωfˆ1 for n with integer components. (3.9)
∂t ∂J
We use Laplace transforms to solve (3.9): multiplying by e−pt (with ℜ(p) > 0) and integrat-
ing over t, we get1
∂f0 e
pfe1 (n, J, p) − fˆ1 (n, J, 0) + in · Ωfe1 (n, J, p) − in · Φ1 (n, J, p) = 0, (3.10)
∂J
where the tildes denote Laplace transforms:
Z ∞
fe1 (n, J, p) ≡ dt e−pt fˆ1 (n, J, t). (3.11)
0
in · ∂f0 e
∂J Φ1 (n, J, p) + fˆ1 (n, J, 0)
fe1 (n, J, p) = . (3.12)
p + in · Ω
This equation provides one connection between a perturbation to the potential Φ1 and the re-
sponse f1 it induces dynamically.
We now need to put into maths the principle that Φ1 is the potential generated by the
perturbation to the density that’s associated with f1 . To obtain the coefficients Aα of the
1 While the dimensions of a quantity are unchanged by a hat, a tilde raises the dimensions by a factor T .
10 Chapter 3: Perturbing the DF
potential-density expansion (2.21) of the perturbed densityP (or at any rate their temporal Laplace
transforms), we multiply the left side of (3.12) by Φ(α)∗ n ein·θ and integrate over phase space:
Z X Z
d3 θd3 J Φ(α)∗ (x) ein·θ fe1 (n, J, p) = d3 xd3 v Φ(α)∗ (x)fe1 (x, v, p)
n
Z (3.13)
= 3
d xΦ (α)∗
(x)e eα (p).
ρ1 (x, p) = −E A
Here we have exploited the fact the Jacobian between any two sets of canonical coordinates is
unity, so d3 θd3 J = d3 xd3 v. Now operating in the same way on the rhs of eqn (3.12) we have
Z X in · + fˆ1 (n, J, 0)
∂f0 e
∂J Φ1 (n, J, p)
d3 θd3 J ein·θ Φ(α)∗ (x)
n
p + in · Ω
Z P e (3.14)
X in · ∂f0
α′ Aα′ (p)Φ̂
(α′ )
(n, J) + fˆ1 (n, J, 0)
= (2π)3 d3 J [Φ̂(α) (n, J)]∗ ∂J
.
n
p + in · Ω
Uniting the two sides (3.13) and (3.14) of equation (3.12) we obtain an equation for A eα :
3 Z
P
X in · ∂f0 e (n, J)]∗ Φ̂(α ) (n, J) + fˆ1 (n, J, 0)[Φ̂(α) (n, J)]∗
′
(α)
eα (p) = − (2π) α′ Aα′ (p)[Φ̂
A d3 J ∂J
.
E n
p + in · Ω
(3.15)
We move the term on the right containing Aα′ to the left side so we can write
X 3 Z X fˆ1 (n, J, 0)[Φ̂(α) (n, J)]∗
eα′ (p) = − (2π)
ǫαα′ (p)A d3 J , (3.16a)
E p + in · Ω
′α n
where Z
(2π)3 X n · ∂f0 ′
ǫαα′ (p) ≡ δαα′ + i d3 J ∂J
[Φ̂(α) (n, J)]∗ Φ̂(α ) (n, J) (3.16b)
E n
p + in · Ω
is the analogue of the dielectric function (cf. Schekochihin eqn. 3.11). In both integrals over J
in equations (3.16) we must use the Landau prescription. That is, we must ensure that in · Ω
passes to the left of p in the complex plane (Box 3.2).
After computing the inverse of the dimensionless matrix ǫ, we have an explicit expression
for Aeα (p). Multiplying this by Φ̂(α) (n, J) and summing over α we obtain the Laplace transform
of the potential perturbation arising from the initial condition f1 (n, J, 0):
X
e 1 (n′ , J′ , p) =
Φ eα′ (p)Φ̂(α′ ) (n′ , J′ )
A
α′
Z X fˆ1 (n, J, 0) X
(2π)3 ′
=− d3 J Φ̂(α ) (n′ , J′ )ǫ−1
α′ α (p)[Φ̂
(α)
(n, J)]∗ (3.17a)
E n
p + in · Ω
αα′
Z X fˆ1 (n, J, 0)
= −(2π)3 d3 J En′ n (J′ , J, p) ,
n
p + in · Ω
where
1 X (α′ ) ′ ′ −1
En′ n (J′ , J, p) ≡ Φ̂ (n , J )ǫα′ α (p)[Φ̂(α) (n, J)]∗ , (3.17b)
E ′
αα
has dimensions M −1 L2 T −2 and is (to within a factor E) ǫ−1 written in the (n, J) basis rather
than the (α, J) basis. Equation (3.17a) is analogous to Schekochihin eqn. (3.13) in giving the
Laplace transform of the response potential set up by a specified initial condition. It’s more
complicated than Schekochihin eqn. (3.13) because: (a) in the latter Poisson’s equation is solved
by simply dividing by k 2 while here we do acrobatics with the potential basis functions; (b) we
have E where Schekochihin eqn. 3.13 has 1/ǫ and the case ǫ = 0 becomes the case in which
our matrix ǫ has no inverse, so E, which is basically this inverse, diverges; (c) Schekochihin
eqn. (3.13) involves an integral over v with the denominator of the integrand linear in v, while
here we integrate over J and the denominator involves the non-linear function n · Ω(J). The
generalisation of the Landau prescription to this more complex context is given in Box 3.2.
Introduction 11
When we use this equation in (2), we obtain the needed analogue of the Plemelj formula.
Z Z Z
3 k(J) K(z)
d J = −iP dz + π d3 J k(J)δ(n · Ω − ω) (p = −iω + 0). (3)
p + in · Ω z−ω
4
Evolution of the mean-field model
We have been studying the properties of mean-field equilibrium systems. Such systems are fully
characterised by a non-negative df of the form f (J). We have shown how to compute the
evolution of the df when at t = 0 it differs very slightly from f (J). In all the above we have
been imagining that the system comprises an extremely large number of particles withR extremely
low masses, so statistical fluctuations of the density around its mean value, ρ(x) = d3 v f (x, v),
vanish. In this section we explore how to compute the evolution of f that occurs because its
constituent particles have non-zero masses, so ρ and Φ fluctuate around their mean values.
Recall from Paul Dellar’s discussion of the BBGKY hierarchy that the 1-particle df f (x, v)
satisfies a Boltzmann equation in which the 2-particle correlation function g (2) (x, v, x′ , v′ ) ap-
pears (Problem 7):
Z
df ′
3 ′ 3 ′ ∂u(x − x ) ∂g
(2)
(w, w′ )
= (N − 1) d x d v · , (4.1)
dt w ∂x′ ∂v
where w ≡ (x, v) denotes position in phase space and u(x − x′ ) is the interaction potential
between two particles. The physical content of this equation is that evolution of the mean-field
model, f (x, v), is driven by the tendency, encoded in g (2) for particles to cluster together, so you
are more likely to find a second particle near you if you stand on a particle than if you stand in
a random location. Heyvaerts1 obtains from equation (4.1) the equation for the evolution of f ,
which is what we seek in this section, but we’ll proceed along a different path, similar to that
laid out by Chavanis.2
∂f0 ∂
=− · F, (4.3a)
∂t ∂J
D E
This expression for the diffusive flux is made up of a part that’s proportional to Φ e 1 (n)Φ
e 1 (−n)
that will
D be non-vanishing
E regardless of the physical cause of fluctuations in the potential, and
ˆ e
a part f1 (n)Φ1 (−n) that will be non-vanishing only to the extent that the fluctuations in Φ
are generated by the fluctuations in f . Moreover, the first term is proportional to the gradient
of f0 (J) while the second is not. These distinctions will prove important (e.g., Problem 10), so
we explicitly break F = F1 + F2 into two parts,
D E
X Z dp Z ′ ′
dp p t fˆ1 (n, J, 0)Φ
e 1 (−n, J, p′ )
F1 = i n ept e
n
2πi 2πi p + in · Ω
D E (4.6)
X Z Z ′ ′ e 1 (n, J, p)Φ
Φ e 1 (−n, J, p′ )
∂f0 dp pt dp p t
F2 = − nn· e e .
n
∂J 2πi 2πi p + in · Ω
3 Strictly, the density of stars in action space is (2π)3 f (J) and the action-space flux is (2π)3 F(J) rather than
0
F(J), but in heuristic discussions it’s convenient to ignore the factor (2π)3 .
14 Chapter 4: Evolution of the mean-field model
Now
X
X
δ(θ − θi )δ(J − Ji )δ(θ′ − θj )δ(J′ − Jj ) = δ(θ − θi )δ(J − Ji )δ(θ′ − θj )δ(J′ − Jj )
ij i6=j
X
+ δ(θ − θi )δ(J − Ji )δ(θ′ − θ)δ(J′ − J)
i
= m−2 f0 (J)f0 (J′ ) + m−1 f0 (J)δ(θ − θ′ )δ(J − J′ ),
where we have assumed that the particles are uniformly distributed in θ and uncorrelated
(so the expectation value of products of delta-functions associated with different particles is
the product of the expectation values of the individual terms). When the last equation is
used in the previous equation, we obtain
f1 (θ, J)f1 (θ′ , J′ ) = mf0 (J)δ(θ − θ′ )δ(J − J′ ),
which simply states that particles are only correlated with themselves. Finally Fourier
transforming
D E Z Z 3 ′
ˆ ˆ d3 θ d θ −i(n·θ+n′ ·θ′ )
′ ′ ′
f1 (n, J)f1 (n , J ) = mf0 (J)δ(J − J ) 3
e δ(θ − θ′ )
(2π) (2π)3
= (2π)−3 mf0 (J)δ(J − J′ )δn,−n′ .
Inserting this and using the δ-function to carry out the integral over J′ in the equation for F1
and over J′′ in the equation for F2 , we get
X Z dp 1
Z
dp′ p′ t f0 (J)
pt
F1 (J) = −im n e e E−n−n (J, J, p′ ) ′
n
2πi p + in · Ω 2πi p − in · Ω
X Z ∂f Z X
dp pt n · ∂J 0
1
F2 (J) = −(2π)3 m n e d3 J′ Enn′ (J, J′ , p) (4.9)
n
2πi p + in · Ω p + in′ · Ω′
n′
Z
dp′ p′ t f0 (J′ )
× e E−n−n′ (J, J′ , p′ ) ′ .
2πi p − in′ · Ω′
The expression for F1 is easy to simplify further because E−n−n won’t contribute a pole at
ℜ(p′ ) ≥ 0: if it had such a pole, the underlying model would be unstable (Box 3.1), and we are
interested in the case when it’s stable. So the only singularity we need consider is the obvious
one when p′ = in · Ω. Similarly, the integration over p follows immediately from the pole at
p = −in · Ω. So we have
X
F1 (J) = −im nE−n−n (J, J, in · Ω)f0 (J). (4.10)
n
Notice that the time dependencies introduced by the two inverse Laplace transforms have can-
celled, so the flux F1 is constant.
4.1 Dynamics of fluctuations 15
Now we turn to F2 . The integral over p′ is straightforward because the integrand has only
the obvious pole at p′ = in′ · Ω′ . After doing the p′ integral we have
X Z dp n · ∂f0
n
2πi p + in · Ω
Z (4.11)
X ′ ′ f0 (J′ )
× d3 J′ ein ·Ω t Enn′ (J, J′ , p)E−n−n′ (J, J′ , in′ · Ω′ )
p + in′ · Ω′
n′
Now we perform the integral over J′ using the Landau prescription (Box 3.2) to handle the pole
at in′ · Ω′ = −p:
X Z dp ∂f0 Z X
3 pt n · ∂J
F2 (J) = −(2π) m n e −iP + π d3 J′ e−pt δ(in′ · Ω′ − ip)
n
2πi p + in · Ω
n ′
(4.12)
′ ′ ′
× Enn′ (J, J , p)E−n−n′ (J, J , −p)f0 (J ) ,
where P is the (real) principal part of the integral. It is now straightforward to execute the
integral over p because the integrand has just the simple pole at p = −in · Ω. After integration
over p we have
X Z X
3 ∂f0
F2 (J) = −(2π) m ne −in·Ωt
n· −iP + π d3 J′ ein·Ωt δ(n′ · Ω′ − n · Ω)
n
∂J
n ′
(4.13)
× Enn′ (J, J , −in · Ω)E−n−n′ (J, J′ , in · Ω)f0 (J′ ) ,
′
We now argue that since F2 is real, the contribution from the principal part, P , must vanish,
and we have finally
X ∂f0 Z X
F2 (J) = − 12 (2π)4 m n n· d3 J′ δ(n′ · Ω′ − n · Ω)
n
∂J (4.14)
′ n
′ ′ ′
× Enn′ (J, J , −in · Ω)E−n−n′ (J, J , in · Ω)f0 (J ).
Notice that the time dependence has disappeared from F2 as it did from F1 .
At this point we assume that we are working with real basis functions Φ(α) for then by the
bottom-right equation of (3.7), [Φ̂(α) (n, J)]∗ = Φ̂(α) (−n, J). Also [ǫ(p)]∗ = ǫ(p∗ ) (Problem 8).
Consequently, from (3.17b)
1 X (α) ∗ (α′ )
[Enn′ (J, J′ , −in · Ω)]∗ = [Φ̂ (n, J)]∗ [ǫ−1
αα′ (−in · Ω)] Φ̂ (n′ , J′ )
E ′
αα
1 X (α) (α′ ) (4.15)
= Φ̂ (−n, J)ǫ−1αα′ (in · Ω)[Φ̂ (−n′ , J′ )]∗
E ′
αα
= E−n−n′ (J, J′ , in · Ω).
where D1 is the (vector) drag coefficient and D2 is the (tensor) diffusion coefficient:
X
D1 (J) = im E−n−n (J, J, in · Ω) n
n
XZ 2 (4.18)
D2 (J) = 12 (2π)4 m d3 J′ Enn′ (J, J′ , −in · Ω) f0 (J′ )δ(n′ · Ω′ − n · Ω) n ⊗ n.
nn′
Notice that the sign of D2 is positive, so the flux that it generates is in the opposite direction
to the gradient of f0 : stars diffuse away from regions of high phase-space density. Whereas the
flux of heat in a metal bar, q = −κ∇T is simply proportional to the gradient of the heat-
density T , our diffusive flux has, in addition to a term that’s proportional to the gradient of
the star density, a term that’s proportional to the density itself. To understand the necessity
of this additional term, consider how the system would evolve if it were absent. Then stars
would diffuse from modest initial actions to ever higher actions, so eventually the density of
stars would become uniform throughout phase space, just as heat diffusion will eventually make
the temperature uniform throughout a bar. However, energy conservation, which is encoded in
the dynamics we have been using, excludes a uniform distribution of stars in action space, since
larger actions are associated with more energy. Consequently, the tendency of the term in F
proportional to ∂f0 /∂J to drive the system to uniformity in action space has to be counteracted
by the term proportional to f0 , which generates a net drift towards the origin of action space.
In thermal equilibrium, F must vanish by detailed balance. Then the df f0 = exp(−βH),
where H is the Hamiltonian and β = (kB T )−1 is the inverse temperature. Since ∂H/∂J = Ω,
for F to vanish the diffusion coefficients (which depend on f0 ) must satisfy
everywhere in action space. This relation provides a useful check on any formulae for the diffusion
coefficients (Problem 9). It also suggests that whatever the origin of the fluctuations that drive
diffusion (here Poisson fluctuations), D1 and D2 will be closely related to one another. In fact,
from our expression for D1 one can derive (Appendix A)
XZ 2 ∂f0
D1 (J) = − 21 (2π)4 m d3 J′ Enn′ (J, J′ , −in · Ω) n′ · δ(n′ · Ω′ − n · Ω) n, (4.20)
∂J′
nn′
The formalism developed in the last section gives fascinating insight into the dynamics of galactic
discs similar to that in which we reside. These systems were among the first to be studied by
N-body simulation when electronic computers became widely available, but it is only recently
that we have achieved a reasonable understanding of their dynamics.
Fouvry et al. (arXiv150706887) have applied the formalism of Chapter 4 to razor-thin discs:
restricting motion to the xy plane significantly simplifies the computations. First, angle-action
coordinates are readily constructed for an axisymmetric disc (Problem 3). Second, Kalnajs (1976)
has defined a convenient set of orthonormal potential-density pairs
where α = (l, n). Φln is a specified polynomial and ρln is a polynomial in r times a half power of
1 − r2 /r02 , where r0 is the edge of the disc.
Next they compute the AA representation of their basis potentials:
Z
1 ra
Φ̂(α) (n, J) = δα2 ,n2 dr Φln (r) cos[n1 θ1 + n2 (θ2 − φ)].
π rp
They considered a disc that is confined by a potential that generates a circular speed vc =
(R∂Φ/∂R)1/2 that is everywhere constant. If the disc generated this potential on its own, its
surface density Σ(R) would be proportional to R−1 . It is more realistic (and numerically more
convenient) to assume that Φ is generated by three components: (i) a bulge that dominates the
mass density near the origin, (ii) a dark halo that dominates the mass density far from the centre,
and (iii) the disc, which contributes ∼ 0.5 of the radial force at intermediate radii. One says that
a “Mestel” disc with Σ(R) ∝ R−1 has been “tapered” at small and large radii to accommodate
the bulge and the dark halo. The unperturbed df is
2
f0 (E, Jφ ) = ξCJφq e−E/σr Tin (Jφ )Tout (Jφ ), (5.2a)
where E = 12 (vR
2
+ vφ2 ) + Φ, C normalises the df such that with ξ = Tin = Tout = 1 the disc
generates the entire potential, σr is a parameter that controls the magnitude of stars’ random
motions, and
q = (vc /σr )2 − 1 (5.2b)
was taken to have the value 11.4. Finally the taper functions are
Jφ4 (Rout vc )5
Tin (Jφ ) = Tout (Jφ ) = , (5.2c)
(Rin vc )4 + Jφ4 (Rout vc )5 + |Jφ |5
where Rout = 11.5Rin. By increasing ξ between zero and unity, the dynamical importance of
the disc’s self-gravity can be increased from unimportant to dominant. For ξ ≃ 0.5 this disc is
known to be stable in the sense (Box 3.1) that all its normal modes are damped (Toomre 1981).
In Figure 5.1 arrows show the diffusive flux F computed from equation (A.7). We see that
F is small except along a ridge that slopes leftwards up from the Jφ axis (which is where the
18 Chapter 5: Diffusion in a galactic disc
Figure 5.1 The diffusive flux of stars in action space for a tapered Mestel disc with active mass fraction ξ = 0.5
(From Fouvry et al 2015).
Figure 5.2 Left panel: a plot of div F, computed from equation (A.7), with blue indicating negative and red
positive values. Right panel: a corresponding plot of the increment (blue) or decrement (red) in the df during
an N-body simulation of the same disc by Sellwood (2012).
stars of a cool disc are strongly concentrated). The narrowness of this ridge is emphasised by
the left panel of Figure 5.2, which shows div F. We see that div F is negligible except along a
narrow ridge, where it is positive lower down and negative higher up, indicating that stars are
diffusing from near circular orbits to more eccentric orbits with slightly less angular momentum.
The ridge of non-negligible F nearly coincides with a line
2Ωφ − Ωr = constant ≡ 2ωp . (5.3)
Stars on this line are said to be at the inner Lindblad resonance of a perturbation. This
perturbation is a bi-symmetric structure that rotates in the same sense as the stars in the disc
with angular velocity ωp . So Figure 5.1 indicates that the dynamics of the disc are dominated
by a coherent structure. Given that this is a stable disc, how can this be?
The answer is that although all the disc’s normal modes are damped, some are only weakly
damped. Consequently, for some values of p the matrix E that occurs squared in equation (A.7)
becomes large. Consequently F becomes large at points in action space at which in · Ω coincides
(for some n) with such a value of p.
Actually, to get a large value of F it is not sufficient to have a large value of |E|2 ; the product
(∂f0 /∂J)f0 (J′ ) should also be large. In short, the requirements for obtaining a large flux are
quite specific and it is perhaps not surprising that they are satisfied only along a ridge in action
space.
Introduction 19
Equation (4.3a) states that f0 will increase where div F < 0 and decrease where div F > 0.
The right panel of Figure 5.2 shows the change over some time interval in the df of an N-body
simulation of the disc. We see that the blue region of increase and the red region of decrease
broadly coincide with the regions of negative and positive div F.
In a series of carefully controlled N-body simulations, Fouvry et al (2015) checked the pre-
dictions of Chapter 4 regarding how the magnitude of F scales with particle number N and
with active mass fraction ξ (which determines the magnitude of |E|2 and thus F). The flux is
predicted to be proportional to m, the mass of a single particle, so it should scale as 1/N . The
experiments indicate that the changes in f0 induced by the flux follow this scaling within the
errors.
From equation (A.7) one can deduce that increasing ξ from 0.5 to 0.6 magnifies F by a factor
42 while in the N-body experiments this change in ξ increases the change in f0 in a given period
by a factor 29. Given the significant uncertainties in the numerical work, this comparison again
amounts to agreement within the errors.
The work of Fouvry et al leaves no doubt that equation (A.7) correctly predicts the action-
space flux that Poisson noise drives in a stable but responsive disc, and that until this flux has
substantially modified f0 , Poisson noise is the only driver of evolution. The evolution occurs 1000
to 10 000 times faster than naive estimates of two-particle relaxation predict because the noise
excites collective motions, which are then amplified by a process called swing amplification.
Specifically, noise excites leading spiral waves of density that propagate towards the corotation
radius (where Ωφ = ωp ). These waves are gradually unwound by shear in the disc: the angular
velocity of star streaming increases inwards, so any structure in the disc is constantly being
sheared towards a tightly-wound trailing structure. As a leading-arm spiral is sheared into a
trailing-arm structure, self-gravity abruptly amplifies it. The resulting larger-amplitude trailing
wave then propagates back inwards. It is Landau damped when it reaches stars that resonate with
it. The strength of the swing amplifier determines the magnitude of E, and thus the magnitude
of the diffusive flux. Increasing the active mass fraction ξ strengthens the swing amplifier and
thus increases F.
Fouvry et al do not compute the evolution of f0 by integrating equation (4.3a) because
computing F at a single time is already a major task. But the N-body models give insight into
what we would find if we did integrate (4.3a), and it’s extraordinary. The ridge of enhanced div F
in Figure 5.2 would create a ridge of enhanced f0 , and this ridge would make the disc unstable
at a collisionless level. That is, Poisson noise drives a disc that is stable towards one that is
unstable. Indeed all simulations of initially stable discs that have been integrated for sufficiently
long have developed O(1) non-axisymmetries and degenerated into a strong bar. The larger the
number N of particles in your disc, the longer you have to wait for the bar to form, but it always
does form. Moreover, the final stages in which O(1) spiral structure develops into a bar occur on
the same timescale regardless of the value of N – this fact implies that the final-stage dynamics
are collisionless. By increasing N you simply increase the delay before the final stage is reached
by decreasing the value of f1 (n, J, 0) in our equations and thus the initial rate at which f0 is
modified into an unstable df.
20 Appendix A: Rewriting D1
Appendix A: Rewriting D1
We can bring F1 to a form that closely parallels eqn (4.16) for F2 . We first note that
X X
nE−n−n (J, J, in · Ω) = (−n)Enn (J, J, −in · Ω),
n n
so from (3.17b)
X
F1 (J) = −im 12 f0 (J) n [E−n−n (J, J, in · Ω) − Enn (J, J, −in · Ω)]
n
X
= im 21 f0 (J) n {Enn (J, J, −in · Ω) − [Enn (J, J, −in · Ω)]∗ } (A.1)
n
m X X (α′ )
= i f0 (J) n Φ̂ (n, J)[Φ̂(α) (n, J)]∗ ǫ−1 −1
α′ α (−in · Ω) − [ǫαα′ (−in · Ω)]
∗
,
2E n ′ αα
where the second equality uses (4.15). In the curly bracket of the last line we have the difference between
ǫ−1 and ǫ−1† . We use
ǫ−1 − ǫ−1† = ǫ−1 (ǫ† − ǫ)ǫ−1† . (A.2)
From equation (3.16b)
n o Z X ′ ∂f0 (β) ′ ′ ∗ (β ′ ) ′ ′
(2π)3
[ǫ(p)]† − ǫ(p) =− i d3 J′ n · [Φ̂ (n , J )] Φ (n , J )
ββ ′ E ∂J′
n′
(A.3)
1 1
× + .
p∗ − in′ · Ω′ p + in′ · Ω′
We need to put p = γ − in · Ω with γ > 0 and extract the limit γ → 0 as per Box 3.2.
n o Z X ′ ∂f0 (β) ′ ′ ∗ (β ′ ) ′ ′
(2π)3
†
[ǫ(p)] − ǫ(p) =− i d3 J′ n · [Φ̂ (n , J )] Φ (n , J )
ββ ′ E ∂J′
n′
(A.4)
1 1
× − .
i(n · Ω − n′ · Ω′ ) + γ i(n · Ω − n′ · Ω′ ) − γ
The principal parts of the two integrals cancel but the contributions from skirting the pole add because
in the left integral the pole is at z = n · Ω − iγ and in the right integral it’s at z = n · Ω + iγ. Glancing
back at (A.3) we see that the right integral has exactly the form considered in Box 3.2, so it yields +πK.
The left integral will yield minus this, so
n o Z X ′ ∂f0
(2π)4 ′
[ǫ(−in · Ω)]† − ǫ(−in · Ω) = −i d3 J′ n · δ(n′ · Ω′ − n · Ω)[Φ̂(β) (n′ , J′ )]∗ Φ(β ) (n′ , J′ ).
ββ ′ E ′
∂J ′
n
(A.5)
Inserting eqn (A.5) in (A.2) and then in (A.1), we arrive at
X X Z X ′ ∂f0
(2π)4 ′
F1 (J) = m 2 12 f0 (J) n Φ̂(α ) (n, J)ǫ−1
α′ β (−in · Ω) d3 J′ n ·
E n ′ ′ ′
∂J′
αα ββ n
(β) (β ′ )
′ ′
× δ(n · Ω − n · Ω)[Φ̂ (n , J )] Φ (n , J ′ ′ ∗
· Ω)] [Φ̂(α) (n, J)]∗
′ ′
)[ǫ−1
αβ ′ (−in
∗ (A.6)
X Z X ′ ∂f0
= (2π)4 m 21 f0 (J) n d3 J′ n · ′
δ(n′ · Ω′ − n · Ω)n|Enn (J, J′ , −in · Ω)|2 .
n ′
∂J
n
Comparing equations (4.16) and (A.6) we see that they have extremely similar structures, so when
we combine them to form the total flux F = F2 + F1 we obtain quite a simple bottom line:
X Z 3 ′
∂f0 ∂f0
F(J) = (2π)4 m 21 n d J |Enn (J, J′ , −in · Ω)|2 δ(n′ · Ω′ − n · Ω) n′ · f0 (J) − n · f0 (J ′
)
∂J′ ∂J
nn′
X Z
∂ ∂
= (2π)4 m 12 n d3 J′ |Enn (J, J′ , −in · Ω)|2 δ(n′ · Ω′ − n · Ω) n′ · −n· f0 (J)f0 (J′ ).
′
∂J′ ∂J
nn
(A.7)
Problems 21
Problems
1 Write down the generating function S(θ, J′ ) of the canonical transformation (θ, J) ↔ (θ′ , J′ ) that
makes ordinary phase-space coordinates periodic in the θ′i with period unity rather than 2π.
2 Let S(x, J) be the generating function of the canonical transformation between (x, p) and the angle-
action coordinates of the harmonic oscillator H(x, p) = 12 (p2 + ω 2 x2 ). Explain what the Hamilton-Jacobi
equation is, and show that for this system it yields
E
S= ψ + 12 sin 2ψ , (P.8)
ω
√
where sin ψ ≡ ωx/ 2E. Define the action and show that for this system it is J = E/ω. Hence show
that
S = J ψ + 21 sin 2ψ , (P.9)
Hence show that θ = ψ
3 Particles move in the (r, φ) plane in the potential Φ(r). Write down the Hamilton-Jacobi equation
for the generating function S(r, φ, Jr , Jφ ). By writing S = Sr (r, Jr , Jφ ) + Sφ (φ, Jr , Jφ ) show that Jφ = pφ
and obtain an integral for Jr . Show that
Z
dr
θr (r, J) = Ωr (P.10)
pr
where Ωr is the radial frequency. Give a physical interpretation of this result.
4 N particles form a system with Hamiltonian
X 2 X
H = 12 pi + u(qi , qj ) , (P.11)
i j
where u is a symmetric function of its arguments. Show from first principles that the N -particle df
satisfies
∂f (N)
+ [f (N) , H] = 0, (P.12)
∂t
where the Poisson bracket [f, g] is defined by
XN
∂f ∂g ∂f ∂g
[f, g] = · − · . (P.13)
i=1
∂qi ∂pi ∂pi ∂qi
5 An interesting orthogonal potential-density basis (Φ(n) , ρ(n) ) starts with the Hernquist sphere
ρ0 2πGρ0 a2
ρ(0) (r) = ↔ Φ(0) (r) = − , (P.14)
r/a(1 + r/a)3 1 + r/a
where ρ0 and a are constants. Show that in this case the constant E = GM 2 /3a, where M = 2πρ0 a3 is
the system’s mass. Explain why it’s reasonable to adopt as other members of the family
(r/a)l C
Φ(0,l,m) = Ylm (θ, φ), (P.15)
(1 + r/a)2l+1
where C is a constant to be determined. Show that the corresponding density distribution is
(2l + 1)(l + 1)C (r/a)l−1
ρ(0,l,m) = − Y m (θ, φ). (P.16)
2πGa2 (1 + r/a)2l+3 l
Explain how the constant C is determined. (Much more detail in Hernquist & Ostriker, ApJ, 386, 375
(1992))
6 Show that the action-space flux F defined by equation (4.4) is necessarily real.
7 Derive from Liouville’s equation for the full N -particle df f (N) (x1 , . . . , vN ) the Boltzmann eq that
connects the 1-particle df f (1) to the 2-particle correlation function
g(w, w′ ) ≡ f (2) (w, w′ ) − f (1) (w)f (1) (w′ ).
9 Using the result of Appendix A, show that the diffusive flux F vanishes when the df is f0 (J) ∝
e−βH0 (J) , where β is a constant. What physical principle does this result vindicate/illustrate?
22 Problems
10 Let f0 (J) be the distribution function (df) of an equilibrium stellar system that has gravitational
potential Φ0 (x) and angle-action coordinates (θ, J). Show that if we write the df of the perturbed model
f (x, v, t) = f0 + f1 (x, v, t), then to first order in the perturbations f1 satisfies
∂f1 ∂f1 ∂f0 ∂Φ1
+ Ω0 · − · = 0, (P.18)
∂t ∂θ ∂J ∂θ
where Ω0 = ∂H0 /∂J and the perturbed potential is Φ(x, t) = Φ0 (x) + Φ1 (x, t). Hence or otherwise show
that
0 e
in · ∂f Φ1 (n, J, p) + f̂1 (n, J, 0)
fe1 (n, J, p) = ∂J
, (P.19)
p + in · Ω0
where the meanings of a tilde and a hat should be explained.
What physical principle is used to obtain from the last equation the expression
Z X
e 1 (n′ , J′ , p) = −(2π)3 d3 J f̂1 (n, J, 0)
Φ En′ n (J′ , J, p) , (P.20)
n
p + in · Ω0
where E is the inverse of the “dielectric tensor”? Explain (without calculation) how from this equation
we can obtain
Z X
n · ∂f 0
f̂1 (n′ , J′ , 0) f̂1 (n, J, 0)
fe1 (n, J, p) = −(2π)3 i ∂J
d3 J′ Enn′ (J, J′ , p) ′ · Ω′
+ . (P.21)
p + in · Ω0 ′
p + in 0 p + in · Ω0
n
Fluctuations in Φ drive a diffusive flux F of the mass-bearing stars through phase space. F is given
by
* Z Z +
X dp pt e dp′ p′ t e ′
F(J) = i n e f1 (n, J, p) e Φ1 (−n, J, p ) , (P.22)
n
2πi 2πi
where h·i indicates an ensemble average. A population of massless tracer particles orbits within the
stellar system. Let g0 (J) and g1 (x, v, t) be the unperturbed and perturbed dfs of this population. Show
that the phase-space flux G of the tracer population is given by an expression of the form (an expression
for D2 is not required)
∂g0
G = −D2 (J) · . (P.23)
∂J
Explain the physical significance of the form taken by G.
11 An equilibrium stellar system is described by the distribution function (df) f0 (x, v) and the mean-
field potential Φ0 (x). Write down two equations that must be satisfied by f0 and Φ0 .
Let f1 (x, v, t) be the small change in the system’s DF when it is out of equilibrium. Obtain the
equation that governs the evolution of f1 to first order in small quantities.
State three properties of angle-action coordinates (θ, J). Use these coordinates to simplify your
equation for f1 .
Fluctuations in the DF cause the equilibrium state f0 to evolve slowly. Show that this evolution is
governed by the equation
* +
∂f0 ∂ X
= −i · nf̂1 (n, J, t)Φ̂1 (−n, J, t) , (P.24)
∂t ∂J n
Show that the right side of the equation for ∂f0 /∂t is real and explain the significance of its taking
the form of a divergence.
The evolution equation can be brought to the form
∂f0 ∂ ∂f0
=− · D1 (J)f0 + D2 (J) · (P.26)
∂t ∂J ∂J
Given that f0 describes particles in thermal equilibrium at inverse temperature β = (kB T )−1 , show that
D1 = D2 · K, (P.27)