0% found this document useful (0 votes)

27 views251 pages

png2pdf

Uploaded by

Toh Han Wei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views251 pages

png2pdf

Uploaded by

Toh Han Wei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 251

Utah State University

DigitalCommons@USU

All Complete Monographs

6-6-2022

Introduction to Classical Field Theory

Charles G. Torre
Department of Physics, Utah State University, [email protected]

Follow this and additional works at: https://digitalcommons.usu.edu/lib_mono

Part of the Applied Mathematics Commons, Cosmology, Relativity, and Gravity Commons, Elementary
Particles and Fields and String Theory Commons, and the Geometry and Topology Commons

Recommended Citation
Torre, Charles G., "Introduction to Classical Field Theory" (2022). All Complete Monographs. 3.
https://digitalcommons.usu.edu/lib_mono/3

This Book is brought to you for free and open access by

DigitalCommons@USU. It has been accepted for
inclusion in All Complete Monographs by an authorized
administrator of DigitalCommons@USU. For more
information, please contact [email protected].
Introduction to
Classical Field Theory

C. G. Torre
Department of Physics
Utah State University

Version 1.4
June 2022
2
About this text

This is a quick and informal introduction to the basic ideas and mathematical
methods of classical relativistic field theory. Scalar fields, spinor fields, gauge
fields, and gravitational fields are treated. The material is based upon lecture
notes for a course I teach from time to time at Utah State University on
Classical Field Theory.
The following is version 1.4 of the text. It is roughly the same as version
1.3. The update to 1.4 includes:

• numerous small improvements in exposition;

• fixes for a number of typographical errors and various bugs;

• a few new homework problems.

I am grateful to the students who participated in the 2020 Pandemic Special

Edition of the course for helping me to improve the text. The students were:
Lyle Arnett, Eli Atkin, Alex Chanson, Guillermo Frausto, Tyler Hansen,
Kevin Rhine, and Ben Shaw.

c 2016, 2022 C. G. Torre

3
4
Contents

Preface 9

1 What is a classical field theory? 13

1.1 Example: waves in an elastic medium . . . . . . . . . . . . . . 13
1.2 Example: Newtonian gravitational field . . . . . . . . . . . . . 14
1.3 Example: Maxwell’s equations . . . . . . . . . . . . . . . . . . 14

2 Klein-Gordon field 17
2.1 The Klein-Gordon equation . . . . . . . . . . . . . . . . . . . 17
2.2 Solving the KG equation . . . . . . . . . . . . . . . . . . . . . 18
2.3 A small digression: one particle wave functions . . . . . . . . . 20
2.4 Variational principle . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Getting used to δ. Functional derivatives. . . . . . . . . . . . . 26
2.6 The Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.7 The Euler-Lagrange equations . . . . . . . . . . . . . . . . . . 28
2.8 Jet Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.9 Miscellaneous generalizations . . . . . . . . . . . . . . . . . . 33
2.9.1 External “sources” . . . . . . . . . . . . . . . . . . . . 33
2.9.2 Self-interacting field . . . . . . . . . . . . . . . . . . . 34
2.9.3 KG in arbitrary coordinates . . . . . . . . . . . . . . . 35
2.9.4 KG on any spacetime . . . . . . . . . . . . . . . . . . . 40
2.10 PROBLEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3 Symmetries and conservation laws 43

3.1 Conserved currents . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 Conservation of energy . . . . . . . . . . . . . . . . . . . . . . 45
3.3 Conservation of momentum . . . . . . . . . . . . . . . . . . . 47
3.4 Energy-momentum tensor . . . . . . . . . . . . . . . . . . . . 48

5
6 CONTENTS

3.5 Conservation of angular momentum . . . . . . . . . . . . . . . 50

3.6 Variational symmetries . . . . . . . . . . . . . . . . . . . . . . 51
3.7 Infinitesimal symmetries . . . . . . . . . . . . . . . . . . . . . 53
3.8 Divergence symmetries . . . . . . . . . . . . . . . . . . . . . . 56
3.9 A first look at Noether’s theorem . . . . . . . . . . . . . . . . 57
3.10 Time translation symmetry and conservation of energy . . . . 59
3.11 Space translation symmetry and conservation of momentum . 60
3.12 Conservation of energy-momentum revisited . . . . . . . . . . 62
3.13 Angular momentum revisited . . . . . . . . . . . . . . . . . . 63
3.14 Spacetime symmetries in general . . . . . . . . . . . . . . . . . 64
3.15 Internal symmetries . . . . . . . . . . . . . . . . . . . . . . . . 66
3.16 The charged KG field and its internal symmetry . . . . . . . . 67
3.17 More generally. . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.18 SU(2) symmetry . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.19 A general version of Noether’s theorem . . . . . . . . . . . . . 75
3.20 “Trivial” conservation laws . . . . . . . . . . . . . . . . . . . . 76
3.21 Conservation laws in terms of differential forms . . . . . . . . 79
3.22 PROBLEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4 The Hamiltonian formulation 85

4.1 Review of the Hamiltonian formulation of mechanics . . . . . 85
4.2 Hamiltonian formulation of the scalar field . . . . . . . . . . . 92
4.3 Poisson brackets . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.4 Symmetries and conservation laws . . . . . . . . . . . . . . . . 98
4.5 PROBLEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5 Electromagnetic field theory 103

5.1 Review of Maxwell’s equations . . . . . . . . . . . . . . . . . . 103
5.2 Electromagnetic Lagrangian . . . . . . . . . . . . . . . . . . . 105
5.3 Gauge symmetry . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.4 Noether’s second theorem in electromagnetic theory . . . . . . 113
5.5 Noether’s second theorem . . . . . . . . . . . . . . . . . . . . 115
5.6 The canonical energy-momentum tensor . . . . . . . . . . . . 117
5.7 Improved Maxwell energy-momentum tensor . . . . . . . . . . 119
5.8 The Hamiltonian formulation of Electromagnetism. . . . . . . 121
5.8.1 Phase space . . . . . . . . . . . . . . . . . . . . . . . . 122
5.8.2 Equations of motion . . . . . . . . . . . . . . . . . . . 123
CONTENTS 7

5.8.3The electromagnetic Hamiltonian

and gauge transformations . . . . . . . . . . . . . . . . 125
5.9 PROBLEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

6 Scalar Electrodynamics 133

6.1 Electromagnetic field with scalar sources . . . . . . . . . . . . 133
6.2 Minimal coupling: the gauge covariant derivative . . . . . . . 135
6.3 Global and Local Symmetries . . . . . . . . . . . . . . . . . . 138
6.4 A lower-degree conservation law . . . . . . . . . . . . . . . . . 142
6.5 Scalar electrodynamics and fiber bundles . . . . . . . . . . . . 146
6.6 PROBLEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

7 Spontaneous symmetry breaking 153

7.1 Symmetry of laws versus symmetry of states . . . . . . . . . . 153
7.2 The “Mexican hat” potential . . . . . . . . . . . . . . . . . . . 156
7.3 Dynamics near equilibrium and Goldstone’s theorem . . . . . 161
7.4 The Abelian Higgs model . . . . . . . . . . . . . . . . . . . . . 164
7.5 PROBLEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

8 The Dirac field 169

8.1 The Dirac equation . . . . . . . . . . . . . . . . . . . . . . . . 169
8.2 Representations of the Poincaré group . . . . . . . . . . . . . 172
8.3 The spinor representation . . . . . . . . . . . . . . . . . . . . 176
8.4 Dirac Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . 180
8.5 Poincaré symmetry . . . . . . . . . . . . . . . . . . . . . . . . 181
8.6 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
8.7 Anti-commuting fields . . . . . . . . . . . . . . . . . . . . . . 185
8.8 Coupling to the electromagnetic field . . . . . . . . . . . . . . 187
8.9 PROBLEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

9 Non-Abelian gauge theory 191

9.1 SU(2) doublet of KG fields, revisited . . . . . . . . . . . . . . 192
9.2 Local SU(2) symmetry . . . . . . . . . . . . . . . . . . . . . . 195
9.3 Infinitesimal gauge transformations . . . . . . . . . . . . . . . 198
9.4 Geometrical interpretation: parallel propagation . . . . . . . . 200
9.5 Geometrical interpretation: curvature . . . . . . . . . . . . . . 203
9.6 Lagrangian for the YM field . . . . . . . . . . . . . . . . . . . 205
9.7 The source-free Yang-Mills equations . . . . . . . . . . . . . . 207
8 CONTENTS

9.8 Yang-Mills with sources . . . . . . . . . . . . . . . . . . . . . 209

9.9 Noether theorems . . . . . . . . . . . . . . . . . . . . . . . . . 210
9.10 Non-Abelian gauge theory in general . . . . . . . . . . . . . . 212
9.11 Chern-Simons Theory . . . . . . . . . . . . . . . . . . . . . . . 213
9.12 PROBLEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

10 Gravitational field theory 223

10.1 Spacetime geometry . . . . . . . . . . . . . . . . . . . . . . . . 223
10.2 The Geodesic Hypothesis . . . . . . . . . . . . . . . . . . . . . 227
10.3 The Principle of General Covariance . . . . . . . . . . . . . . 229
10.4 The Einstein-Hilbert Action . . . . . . . . . . . . . . . . . . . 232
10.5 Vacuum spacetimes . . . . . . . . . . . . . . . . . . . . . . . . 236
10.6 Diffeomorphism symmetry and the contracted Bianchi identity 237
10.7 Coupling to matter - scalar fields . . . . . . . . . . . . . . . . 240
10.8 The contracted Bianchi identity revisited . . . . . . . . . . . . 243
10.9 PROBLEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

11 Goodbye 249
11.1 Suggestions for Further Reading . . . . . . . . . . . . . . . . . 249
Preface

This document was created to support a course in classical field theory which
gets taught from time to time here at Utah State University. In this course,
hopefully, you acquire information and skills that can be used in a variety
of places in theoretical physics, principally in quantum field theory, particle
physics, electromagnetic theory, fluid mechanics and general relativity. As
you may know, it is usually in courses on such subjects that the techniques,
tools, and results we shall develop here are introduced – if only in bits and
pieces as needed. As explained below, it seems better to give a unified,
systematic development of the tools of classical field theory in one place. If
you want to be a theoretical/mathematical physicist you must see this stuff
at least once. If you are just interested in getting a deeper look at some
fundamental/foundational physics and applied mathematics ideas, this is a
good place to do it.
The traditional physics curriculum supports a number of classical1 field
theories. In particular, there is (i) the “Newtonian theory of gravity”, based
upon the Poisson equation for the gravitational potential and Newton’s laws,
and (ii) electromagnetic theory, based upon Maxwell’s equations and the
Lorentz force law. Both of these field theories appear in introductory physics
courses as well as in upper level courses. Einstein provided us with another
important classical field theory – a relativistic gravitational theory – via
his general theory of relativity. This subject takes some investment in geo-
metrical technology to adequately explain. It does, however, also typically
get a course of its own. These courses (Newtonian gravity, electrodynam-
ics, general relativity) are traditionally used to cover a lot of the concepts

1
Here, and in all that follows, the term “classical” is to mean “not quantum”, e.g., as
in “the classical limit”. Sometimes people use “classical” to also mean non-relativistic; we
shall definitely not being doing that here. Indeed, every field theory we shall consider is
“relativistic”.

9
10 Preface

and methodology of classical field theory. The other field theories that are
important (e.g., Dirac, Yang-Mills, Klein-Gordon) typically arise, physically
speaking, not as classical field theories but as quantum field theories, and it
is usually in a course in quantum field theory that these other field theories
are described. So, in a typical physics curriculum, it is through such courses
that a student normally gets exposed to the tools and results of classical field
theory. This book reflects an alternative approach to learning classical field
theory, which I will now try to justify.
The traditional organization of material just described, while natural in
some ways, overlooks the fact that field theories have many fundamental fea-
tures in common – features which are most easily understood in the classical
limit – and one can get a really good feel for what is going on in “the big pic-
ture” by exploring these features in a general, systematic way. Indeed, many
of the basic structures appearing in classical field theory (Lagrangians, field
equations, symmetries, conservation laws, gauge transformations, etc.) are
of interest in their own right, and one should try to master them in general,
not just in the context of a particular example (e.g., electromagnetism), and
not just in passing as one studies the quantum field. Moreover, once one has
mastered a sufficiently large set of such field theoretic tools, one is in a good
position to discern how this or that field theory differs from its cousins in
some key structural way. This serves to highlight physical and mathematical
ingredients that make each theory special.
From a somewhat more pragmatic point of view, let me point out that
most quantum field theory texts rely quite heavily upon one’s facility with the
techniques of classical field theory. This is, in part, because many quantum
mechanical structures have analogs in a classical approximation to the theory.
By understanding the “lay of the land” in the classical theory through a
course such as this one, one gets a lot of insight into the associated quantum
field theories. It is hard enough to learn quantum field theory without having
to also assimilate at the same time concepts that are already present in the
much simpler setting of classical field theory. So, if you are hoping to learn
quantum field theory some day, this class should help out quite a bit.
A final motivation for the creation and teaching of this course is to support
the research activities of a number of faculty and students here at Utah State
University. The geometric underpinnings of classical field theory feature in a
wide variety of research projects here. If you want to find out what is going
on in this research – or even participate – you need to speak the language.
I have provided a number of problems you can use to facilitate your
11

learning the material. They are presented first within the text in order to
amplify the text and to give you a contextual hint about what is needed
to solve the problem. At the end of each chapter the problems which have
appeared are summarized for your convenience.
I would like to thank the numerous students who have endured the rough
set of notes from which this document originated and who contributed nu-
merous corrections. I also would like to thank Joseph Romano for his (unfor-
tunately rather lengthy) list of corrections. Finally, I would like to acknowl-
edge Ian Anderson for his influence on my geometric point of view regarding
Lagrangians and differential equations.
12 Preface
Chapter 1

What is a classical field theory?

In physical terms, a classical field is a dynamical system with an infinite

number of degrees of freedom labeled by spatial location. In mathematical
terms, a classical field is a section of some fiber bundle which obeys some
PDEs. Of course, there is much, much more to the story, but this is at
least a good first pass at a definition. By contrast, a mechanical system is a
dynamical system with a finite number of degrees of freedom that is described
by ODEs.
One place where classical fields naturally occur in physics is when non-
rigid extended bodies such as bodies of water, elastic solids, strings under
tension, portions of the atmosphere, and so forth, are described using a clas-
sical (as opposed to quantum) continuum approximation to their structure.
Another place where classical fields arise – and this place is where we will
focus most of our attention – is in a more or less fundamental description
of matter and its interactions. Here the ultimate description is via quan-
tum field theory, but the classical approximation sometimes has widespread
macroscopic validity (e.g., Maxwell theory) or the classical approximation
can be very useful for understanding the structure of the theory. Here are
some simple examples of classical fields in action, just to whet your appetite.
Some of these theories will be explored in detail later.

1.1 Example: waves in an elastic medium

Let us denote by u(r, t) the displacement of some observable characteristic
of an elastic medium at the position r at time t from its equilibrium value.

13
14 CHAPTER 1. WHAT IS A CLASSICAL FIELD THEORY?

For example, if the medium is the air surrounding you, u could represent
its compression/rarefaction relative to some standard value. For small dis-
placements, the medium can often be well-modeled by supposing that the
displacement field u satisfies the wave equation
1
u,tt − ∇2 u = 0, (1.1)
c2
where the comma notation indicates partial derivatives, ∇2 is the Laplacian,
and c is a parameter representing the speed of sound in the medium. We say
that u is the field variable and that the wave equation is the field equation.

1.2 Example: Newtonian gravitational field

This is the original field theory. Here the field variable is a function φ(r, t),
the gravitational potential. The gravitational force F(r, t) exerted on a (test)
mass m at spacetime position (r, t) is given by
F(r, t) = −m∇φ(r, t). (1.2)
The gravitational potential is determined by the mass distribution that is
present via the field equation
∇2 φ = 4πGρ, (1.3)
where ρ = ρ(r, t) is the mass density function and G is Newton’s constant.
Equation (1.3) is just Poisson’s equation, of course.
Equations (1.2) and (1.3) embody a pattern in nature which persists even
in the most sophisticated physical theories. The idea is as follows. Matter
interacts via fields. The fields are produced by the matter, and the matter
is acted upon by the fields. In the present case, the gravitational field,
represented by the “scalar field” φ, determines the motion of mass via (1.2)
and Newton’s second law. Conversely, mass determines the gravitational
field via (1.3). The next example also exhibits this fundamental pattern of
nature.

1.3 Example: Maxwell’s equations

Here the electromagnetic interaction of particles is mediated by a pair of
vector fields, E(r, t) and B(r,t), the electric and magnetic fields. The force
1.3. EXAMPLE: MAXWELL’S EQUATIONS 15

on a test particle with electric charge q at spacetime event (r, t) is given by

q
F(r, t) = qE(r, t) + [v(t) × B(r, t)] , (1.4)
c
where v(t) is the particle’s velocity at time t and c is the speed of light in vac-
uum.1 The electromagnetic field is determined from the charge distribution
that is present via the Maxwell equations:

∇ · E = 4πρ, (1.5)

∇ · B = 0, (1.6)
1 ∂E 4π
∇×B− = j, (1.7)
c ∂t c
1 ∂B
∇×E+ = 0, (1.8)
c ∂t
where ρ is the electric charge density and j is the electric current density.
As we shall see later, the electromagnetic field is also fruitfully described
using potentials. They are defined via
1 ∂A
E = −∇φ − , B = ∇ × A. (1.9)
c ∂t
In each of these examples, the field variable(s) are u, φ, (E, B) – or
(φ, A), and they are determined by PDEs. The “infinite number of degrees
of freedom” idea is that, roughly speaking, the general solution to the field
equations (wave, Poisson, or Maxwell) involves arbitrary functions. Thus
the space of solutions to the PDEs – physically, the set of field configurations
permitted by the laws of physics – is infinite dimensional.

1
We are using Gaussian units for the electromagnetic field.
16 CHAPTER 1. WHAT IS A CLASSICAL FIELD THEORY?
Chapter 2

Klein-Gordon field

The simplest relativistic classical field is the Klein-Gordon field. It and its
various generalizations are used throughout theoretical physics.

2.1 The Klein-Gordon equation

To begin with we can think of the Klein-Gordon field as simply a function on
spacetime, also known as a scalar field, ϕ : R4 → R. Introduce coordinates
xα = (t, x, y, z) on R4 . The field equation – known as the Klein-Gordon
equation – is given by
ϕ − m2 ϕ = 0, (2.1)
where is the wave operator (or “d’Alembertian”),
∂ 2ϕ ∂ 2ϕ ∂ 2ϕ ∂ 2ϕ
ϕ = − + 2 + 2 + 2
∂t2 ∂x ∂y ∂z
2
∂ ϕ
= − 2 + ∇2 ϕ
∂t
= −∂t2 ϕ + ∂x2 ϕ + ∂y2 ϕ + ∂z2 ϕ
= −ϕ,tt + ϕ,xx + ϕ,yy + ϕ,zz ,
(2.2)

and m is a parameter known as the mass of the Klein-Gordon field.1 (Here I

have taken the liberty to show you most of the notations for derivatives that
1
Here we are using units in which ~ = c = 1. It is a nice exercise to put things in terms
of SI units.

17
18 CHAPTER 2. KLEIN-GORDON FIELD

you will see in this text.) You can see that the Klein-Gordon (KG) equation
is just a simple generalization of the wave equation, which it reduces to when
m = 0. In quantum field theory, quantum states of the Klein-Gordon field
can be characterized in terms of “particles” with rest mass m and no other
structure (e.g., no spin, no electric charge, etc.) So the Klein-Gordon field
is physically (and mathematically, too) the simplest of the relativistic fields
that one can study.
If you like, you can view the Klein-Gordon equation as a “toy model” for
the Maxwell equations which describe the electromagnetic field. The quan-
tum electromagnetic field is characterized by “photons” which have vanishing
rest mass, no electric charge, but they do carry intrinsic “spin”. Coherent
states of the quantum electromagnetic field contain many, many photons and
are well approximated using “classical” electromagnetic fields satisfying the
Maxwell equations. Likewise, you can imagine that coherent states involv-
ing many “Klein-Gordon particles” (sometimes called scalar mesons) are well
described by a classical scalar field satisfying the Klein-Gordon equation.
The KG equation originally arose in an attempt to give a relativistic
generalization of the Schrödinger equation. The idea was to let ϕ be the
complex-valued wave function describing a spinless particle of mass m. But
this idea didn’t quite work out as expected (see below for a hint as to what
goes wrong). Later, when it was realized that a more viable way to do
quantum theory in a relativistic setting was via quantum field theory, the KG
equation came back as a field equation for a quantum field whose classical
limit is the KG equation above. The role of the KG equation as a sort of
relativistic Schrödinger equation does survive the quantum field theoretic
picture, however. The story is too long to go into in this course, but we will
give a hint as to what this means in a moment.

2.2 Solving the KG equation

Let us have a closer look at the KG equation and its solutions. The KG
equation is a linear PDE with constant coefficients. One standard strategy
for solving such equations is via Fourier analysis. To this end, let us suppose
that ϕ is sufficiently well behaved2 so that we have the following Fourier
expansion of ϕ at any given value of t:
2
“Sufficiently well behaved” could mean for example that, for each t, ϕ ∈ L1 (R3 ) or
ϕ ∈ L2 (R3 ).
2.2. SOLVING THE KG EQUATION 19

3/2 Z
1
ϕ(t, r) = d3 k ϕ̂k (t) eik·r , (2.3)
2π R3

where k = (kx , ky , kz ) ∈ R3 , and the complex-valued Fourier transform sat-

isfies
ϕ̂−k = ϕ̂∗k , (2.4)
since ϕ is a real-valued function. The KG equation implies the following
equation for the Fourier transform ϕ̂k (t) of ϕ:
ϕ̂¨k + (k 2 + m2 )ϕ̂k = 0. (2.5)
This equation is, of course, quite easy to solve. We have
ϕ̂k (t) = ak e−iωk t + bk eiωk t , (2.6)
where √
ωk = k 2 + m2 , (2.7)
and ak , bk are complex constants for each k. The reality condition (2.4) on
ϕ̂k implies
b−k = a∗k , ∀ k, (2.8)
so that ϕ solves the KG equation if and only if it takes the form
3/2 Z
1
d3 k ak eik·r−iωk t + a∗k e−ik·r+iωk t .

ϕ(x) = (2.9)
2π R3

Problem:
1. Verify (2.4)–(2.9).

Let us pause and notice something familiar here. Granted a little Fourier
analysis, the KG equation is, via (2.5), really just an infinite collection of
uncoupled “harmonic oscillator equations” for the real and imaginary parts
of ϕ̂k (t) with “natural frequency” ωk . Thus we can see quite explicitly how
the KG field is akin to a dynamical system with an infinite number of degrees
of freedom. Indeed, if we label the degrees of freedom with the Fourier wave
vector k then each degree of freedom is a harmonic oscillator. It is this
interpretation which is used to make the (non-interacting) quantum Klein-
Gordon field: each harmonic oscillator is given the usual quantum mechanical
treatment.
20 CHAPTER 2. KLEIN-GORDON FIELD

2.3 A small digression: one particle wave func-

tions
From our form (2.9) for the general solution of the KG equation we can take
a superficial glimpse at how this equation is used as a sort of Schrödinger
equation in the quantum field theory description. To begin, we note that
ϕ = ϕ+ + ϕ− , where
3/2 Z
+ 1
ϕ := d3 k ak eik·r−iωk t , (2.10)
2π R3

and 3/2 Z
− 1
ϕ := d3 k a∗k e−ik·r+iωk t (2.11)
2π R3

are each complex-valued solutions to the KG equation. These are called

the positive frequency and negative frequency solutions of the KG equation,
respectively. Let us focus on the positive frequency solutions. They satisfy
the KG equation, but they also satisfy the stronger equation

∂ϕ+ √
i = −∇2 + m2 ϕ+ , (2.12)
∂t
where the square root operator is defined via Fourier analysis to be3
3/2 Z
√

1
−∇2 + m2 ϕ+ := d3 k ωk ak eik·r−iωk t . (2.13)
2π R3

Thus the positive frequency solutions of the KG equation satisfy a Schrödinger

equation with Hamiltonian
√
H = −∇2 + m2 . (2.14)

This Hamiltonian can be interpreted as the kinetic energy of a relativistic

particle (in the reference frame labeled by spacetime coordinates (t, x, y, z)).
It is possible to give a relativistically invariant normalization condition on
the positive frequency wave functions so that one can use them to compute
probabilities for the outcome of measurements made on the particle. Thus
3
The domain of this operator will necessarily be limited to a subspace of functions.
2.4. VARIATIONAL PRINCIPLE 21

the positive frequency solutions are sometimes called the “one-particle wave
functions”.
In a quantum field theoretic treatment, the (normalizable) positive fre-
quency solutions represent the wave functions of KG particles. What about
the negative frequency solutions? There are difficulties in using the negative
frequency solutions to describe KG particles. For example, you can easily
check that they satisfy a Schrödinger equation with a negative kinetic energy,
which is unphysical. Moreover, the relativistic inner product with respect to
which one can normalize the positive frequency solutions leads to a negative
norm for the negative frequency solutions. This means that the negative fre-
quency part of the solution to the KG equation cannot be used to describe the
quantum mechanics of a single particle. It turns out that when one couples
a single quantum mechanical KG particle to its environment one invariably
brings the negative frequency solutions into play, thus destroying the single
particle quantum description (see, for example, the “Klein paradox”). In
quantum field theory (as opposed to quantum mechanics), the negative fre-
quency solutions are interpreted in terms of the possibility for destruction
of particles or the creation of anti-particles. Quantum field theory, you see,
allows for creation and destruction of particles. But now we are going too
far afield. . .

2.4 Variational principle

In physics, fundamental theories always arise from a variational principle.
Ultimately this stems from their roots as quantum mechanical systems, as
can be seen from Feynman’s path integral formalism. For us, the presence of
a variational principle is a very powerful tool for organizing the information
in a field theory. Presumably you have seen a variational principle or two in
your life. I will not assume you are particularly proficient with the calculus
of variations – one of the goals of this course is to make you better at it – but
I will assume that you are familiar with the basic strategy of a variational
principle.
So, consider any map ϕ : R4 → R (not necessarily satisfying any field
equations) and consider the following integral
Z
1 2
d4 x ϕ,t − (∇ϕ)2 − m2 ϕ2 .

S[ϕ] = (2.15)
R ⊂ R4 2
22 CHAPTER 2. KLEIN-GORDON FIELD

The region of integration R can be anything you like at this point, but
typically we assume

R = {(t, x, y, z)| t1 ≤ t ≤ t2 , −∞ < x, y, z < ∞}. (2.16)

We restrict attention to fields such that S[ϕ] exists. For example, we can
assume that ϕ is always a smooth function of compact spatial support. The
value of the integral, of course, depends upon which function ϕ we choose,
so the formula (2.15) assigns a real number to each function ϕ. We say that
S = S[ϕ] is a functional of ϕ. We will use this functional to obtain the KG
equation by a variational principle. When a functional can be used in this
manner it is called an action functional for the field theory.
The variational principle goes as follows. Consider any family of func-
tions, labeled by a parameter λ, which includes some given function ϕ0 at
λ = 0. We say we have a one-parameter family of fields, ϕλ . As a random
example, we might have
2 +y 2 +z 2 )
ϕλ = cos(λ(t + x)) e−(x . (2.17)

You can think of ϕλ as defining a curve in the “space of fields” which passes
through the point ϕ0 .4 We can evaluate the functional S along this curve;
the value of the functional defines an ordinary function of λ, again denoted
S:
S(λ) := S[ϕλ ]. (2.18)
Note that different choices of curve ϕλ will determine different functions S(λ).
We now define a critical point ϕ0 of the action functional S[ϕ] to be a “point
in the space of fields”, that is, a field ϕ = ϕ0 (x), which defines a critical
point of the function S(λ) for any curve passing through ϕ0 .
This way of defining a critical point is a natural generalization to the space
of fields of the usual notion of critical point from multi-variable calculus.
Recall that in ordinary calculus a critical point (x0 , y0 , z0 ) of a function f is
a point where all the first derivatives of f vanish,

∂x f (x0 , y0 , z0 ) = ∂y f (x0 , y0 , z0 ) = ∂z f (x0 , y0 , z0 ) = 0. (2.19)

4
The space of fields is the set of all allowed functions on spacetime. It can be endowed
with enough structure to view it as a smooth manifold whose points are the allowed
2 2 2
functions. The curve in equation (2.17) passes through the point ϕ = e−(x +y +z ) at
λ = 0.
2.4. VARIATIONAL PRINCIPLE 23

This is equivalent to the vanishing of the rate of change of f along any curve
through (x0 , y0 , z0 ). To see this, define the parametric form of a curve ~x(λ)
passing through ~x(0) = (x0 , y0 , z0 ) via

~x(λ) = (x(λ), y(λ), z(λ)) (2.20)

with λ = 0 corresponding to the point ~x0 on the curve,

(x(0), y(0), z(0)) = (x0 , y0 , z0 ). (2.21)

The tangent vector T (λ) to the curve at the point ~x(λ) has Cartesian com-
ponents
T~ (λ) = (x0 (λ), y 0 (λ), z 0 (λ)), (2.22)
and the rate of change of a function f = f (x, y, z) along the curve at the
point ~x(λ) is given by the directional derivative along T~ :

T~ (λ) · ∇f = x0 (λ)∂x f (~x(λ)) + y 0 (λ)∂y f (~x(λ)) + z 0 (λ)∂z f (~x(λ)). (2.23)

~
x=~
x(λ)

It should be apparent from this equation that the vanishing of the rate of
change of f along any curve passing through the point ~x0 is equivalent to the
vanishing of the gradient of f at ~x0 , which is the same as the vanishing of all
the first derivatives of f at ~x0 . It is this interpretation of “critical point” in
terms of vanishing rate of change along any curve that we generalize to the
infinite-dimensional space of fields.
We shall show that the critical points of the functional S[ϕ] correspond
to functions on spacetime which solve the KG equation. To this end, let
us consider a curve that passes through a putative critical point ϕ at, say,
λ = 0.5 This is easy to arrange. For example, let ϕ̂λ be any curve in the
space of fields. Define ϕλ via

ϕλ = ϕ̂λ − ϕ̂0 + ϕ. (2.24)

If ϕλ=0 = ϕ is a critical point, then

dS(λ)
δS := = 0. (2.25)
dλ λ=0

5
We now drop the distracting subscript 0.
24 CHAPTER 2. KLEIN-GORDON FIELD

We call δS the first variation of the action; its vanishing for all curves through
ϕ is the condition for a critical point. We can compute δS explicitly by
applying (2.25) to S[ϕλ ]; we find
Z
d4 x ϕ,t δϕ,t − ∇ϕ · ∇δϕ − m2 ϕδϕ ,

δS = (2.26)
R

where the function δϕ – the variation of ϕ – is defined by

dϕ(λ)
δϕ := . (2.27)
dλ λ=0

The critical point condition means that δS = 0 for all variations of ϕ, and
we want to see what that implies about the critical point ϕ.
To this end we observe that δϕ is a completely arbitrary function (aside
from regularity and boundary conditions to be discussed below). To see this,
let ψ be any function you like and consider the curve

ϕλ = ϕ + λψ, (2.28)

so that
δϕ = ψ. (2.29)
To make use of the requirement that δS = 0 must hold for arbitrary δϕ, we
“integrate by parts” in δS via the divergence theorem:
Z
δS = d4 x − ϕ,tt + ∇2 ϕ − m2 ϕ δϕ
R
hZ it2 Z t2 Z
3
+ d x ϕ,t δϕ − dt d2 A n · ∇ϕδϕ
R3 t1 t1 r→∞
(2.30)

Here the last two terms are the boundary contributions from ∂R. For con-
creteness, I have assumed that

R = [t1 , t2 ] × R3 . (2.31)

The last integral is over the “sphere at infinity” in R3 , with n being the
outward unit normal to that sphere.
If you need a little help seeing where (2.30) came from, the key is to write

ϕ,t δϕ,t − ∇ϕ · ∇δϕ = ∂t (ϕ,t δϕ) − ∇ · (∇ϕδϕ) − ϕ,tt δϕ + (∇2 ϕ)δϕ. (2.32)
2.4. VARIATIONAL PRINCIPLE 25

The first term’s time integral is easy to perform, and the second term’s spatial
integral can be evaluated using the divergence theorem.
To continue with our analysis of (2.30), we make two assumptions re-
garding the boundary conditions to be placed on our various fields. First, we
note that ϕ, and ϕλ , and hence δϕ, must vanish at spatial infinity (r → ∞
at fixed t) in order for the action integral to converge. Further, we assume
that ϕ and δϕ vanish as r → ∞ fast enough so that in δS the boundary
integral over the sphere at infinity vanishes. One way to do this systemat-
ically is to assume that all fields have “compact support” in space, that is,
at each time t they all vanish outside of some bounded region in R3 . Other
asymptotic conditions are possible, but since the area element (dA) in the
integral over the sphere grows like r2 the integrand should fall off faster than
1/r2 as r → ∞ for this boundary term to vanish.
Secondly, we hold fixed the initial and final values of the fields – a step
which should be familiar to you from the variational formulation of classical
mechanics. To this end we fix two functions

φ1 , φ2 : R3 → R (2.33)

and we assume that at t1 and t2 , for any allowed ϕ (not just the critical
point),
ϕ|t1 = φ1 , ϕ|t2 = φ2 . (2.34)

The functions φ1 and φ2 are fixed but arbitrary, subject to the asymptotic
conditions as r → ∞. Now, for our one parameter family of fields we also
demand
ϕλ |t1 = φ1 , ϕλ |t2 = φ2 , (2.35)

which forces
δϕ|t1 = 0 = δϕ|t2 . (2.36)

This forces the vanishing of the first term in the boundary contribution to
δS in (2.30).
With these boundary conditions, we see that the assumption that ϕ is a
(smooth) critical point implies that
Z
d4 x −ϕ,tt + ∇2 ϕ − m2 ϕ δϕ

0= (2.37)
R
26 CHAPTER 2. KLEIN-GORDON FIELD

for any function δϕ subject to (2.36) and the asymptotic conditions just
described. Now, it is a standard theorem in calculus6 that this implies
− ϕ,tt + ∇2 ϕ − m2 ϕ = 0 (2.38)
everywhere in the region R.
This, then, is the variational principle for the KG equation. The critical
points of the KG action, subject to the two types of boundary conditions we
described (asymptotic conditions at spatial infinity and initial/final boundary
conditions), are the solutions of the KG equation.
I would understand if, at this point, you are thinking: “Why would I
want to replace a relatively simple PDE with all this complicated variational
stuff?” The payoff for this investment in variational technology turns out to
be quite large, as I hope you will see by the end of this course.

2.5 Getting used to δ. Functional derivatives.

I have already introduced in passing the notation δ, known colloquially as a
“variation”. For any quantity W [ϕ] built from the field ϕ, and for any one
parameter family of fields ϕλ , we define

d
δW := W [ϕλ ] . (2.39)
dλ λ=0

Evidently, δW is the change in W given by letting λ be displaced an infinites-

imal amount dλ from λ = 0. A couple of important properties of δ are: (i)
it is a linear operation obeying the Leibniz product rule (it is a derivation);
(ii) variations commute with differentiation, e.g.,
δ(ϕ,α ) = ∂α (δϕ) ≡ δϕ,α . (2.40)
In the last section we computed the first variation of the KG action:
Z
d4 x ϕ,t δϕ,t − ∇ϕ · ∇δϕ − m2 ϕδϕ .

δS[ϕ] = (2.41)
R

By definition, if the first variation of a functional S[ϕ] can be expressed as

Z
δS[ϕ] = d4 x F (x) δϕ(x) (2.42)
R
6
The proof goes by choosing δϕ to be a localized “bump function” at any chosen point
in R.
2.6. THE LAGRANGIAN 27

then we say that S is differentiable and that

δS
F (x) ≡ (2.43)
δϕ(x)
is the functional derivative of the action with respect to ϕ. For a differentiable
functional S[ϕ], then, we write
Z
δS
δS[ϕ] = d4 x δϕ. (2.44)
R δϕ
We have seen that, with our choice of boundary conditions, the KG action
is differentiable and that
δS
= −ϕ,tt + ∇2 ϕ − m2 ϕ. (2.45)
δϕ
In general, the idea of variational principles is to encode the field equations
into an action S[ϕ] so that they arise as the equations
δS
= 0. (2.46)
δϕ

2.6 The Lagrangian

Following the pattern of classical mechanics, I will now introduce the notion
of a Lagrangian L for the KG equation, which is defined as an integral at
fixed time t: Z
1
d3 x ϕ2,t − (∇ϕ)2 − m2 ϕ2 ,

L= (2.47)
R3 2
so that Z t 2

S[ϕ] = dt L. (2.48)
t1
In classical mechanics, the Lagrangian is a function of the independent vari-
ables, the dependent variables (the “degrees of freedom”), and the derivatives
of the dependent variables. In field theory we have, in effect, degrees of free-
dom labeled by spatial points. We then have the possibility of expressing the
Lagrangian as an integral over space (a sum over the degrees of freedom). In
the KG theory we have that
Z
L= d3 x L, (2.49)
R3
28 CHAPTER 2. KLEIN-GORDON FIELD

where
1
(ϕ,t )2 − (∇ϕ)2 − m2 ϕ2

L= (2.50)
2
is called the Lagrangian density. At a point (t, x, y, z), the Lagrangian density
for the KG field depends on the values of the field ϕ and its first derivatives
at (t, x, y, z). We say that L is a local function of the field.7 Theories like
the KG theory which admit an action which is a spacetime integral of a local
Lagrangian density are called local field theories.
Finally, notice that the Lagrangian for the KG theory can be viewed as
having the same structure as that for a finite dimensional dynamical system
in non-relativistic Newtonian mechanics, namely, L = T − U , where T can
be viewed as a kinetic energy for the field,
Z
1
T = d3 x ϕ2,t (2.51)
R3 2
and U plays the role of potential energy:
Z
1
d3 x (∇ϕ)2 + m2 ϕ2 .

U := (2.52)
R3 2
Evidently, we can view 21 ϕ2,t as the kinetic energy density, and view (∇ϕ)2 +
m2 ϕ2 as the potential energy density.

2.7 The Euler-Lagrange equations

We have seen that the Lagrangian density L of the KG theory is a local
function of the KG field. We can express the functional derivative of the KG
action purely in terms of the Lagrangian density. To see how this is done,
we note that
δL = ϕ,t δϕ,t − ∇ϕ · ∇δϕ − m2 ϕδϕ
∂
= −ϕ,tt + ∇2 ϕ − m2 ϕ δϕ + α V α ,

(2.53)
∂x
where xα = (t, x, y, z), α = 0, 1, 2, 3, and we are using the Einstein summation
convention,
∂ α X ∂ α
V ≡ V . (2.54)
∂xα α
∂xα
7
A non-local function of the field would depend upon the field and/or its derivatives
at other points as well.
2.7. THE EULER-LAGRANGE EQUATIONS 29

Here
V 0 = ϕ,t δϕ, V i = −(∇ϕ)i δϕ. (2.55)
The term involving V α is a four-dimensional divergence and leads to the
boundary contributions to the variation of the action via the divergence the-
orem. Assuming the boundary conditions are such that these terms vanish,
we see that the functional derivative of the action is computed by (1) varying
the Lagrangian density, and (2) rearranging terms to move all derivatives of
the field variations into divergences and (3) throwing away the divergences.
We now give a slightly more general way to think about this last com-
putation, which is very handy for certain purposes. This point of view is
developed more formally in the next section.
First, we view the formula giving the definition of the Lagrangian as a
function of 9 variables
L = L(x, ϕ, ϕα ), (2.56)
where now, formally, ϕ and ϕα are just a set of 5 variables upon which the
Lagrangian density depends.8 (The KG Lagrangian density does not actually
depend upon xα except through the field, so in this example L = L(ϕ, ϕα ),
but it is useful to allow for this possibility in the future.) This 9 dimensional
space is called the first jet space for the scalar field. From this point of view,
the field ϕ does not depend upon xα and neither does ϕα . The fields are
recovered as follows. For each function f (x) there is a field obtained as a
graph in the 5-dimensional space of (xα , ϕ), specified by ϕ = f (x). Similarly,
in this setting we do not view ϕα as the derivatives of ϕ; given a function
f (x) we can extend the graph into the 9-dimensional space (xα , ϕ, ϕα ) via
(ϕ = f (x), ϕα = ∂α f (x)). We can keep going like this. For example, we
could work on a space parametrized by (xα , ϕ, ϕα , ϕαβ ), where ϕαβ = ϕβα
parametrizes the values of the second derivatives. This space is the second
jet space; it has dimension 19 (exercise)! Given a field ϕ = f (x) we have a
graph in this 19 dimensional space given by (xα , f (x), ∂α f (x), ∂α ∂β f (x)).
Next, for any formula F (x, ϕ, ϕα ) built from the coordinates, the fields,
and the first derivatives of the fields, introduce the total derivative
∂F ∂F ∂F
Dα F (x, ϕ, ϕα ) = α
+ ϕα + ϕαβ . (2.57)
∂x ∂ϕ ∂ϕβ
8
Notice that we temporarily drop the comma in the notation for the derivative of ϕ.
This is just to visually enforce our new point of view. You can mentally replace the comma
if you like. We shall eventually put it back to conform to standard physics notation.
30 CHAPTER 2. KLEIN-GORDON FIELD

The total derivative just implements in this new setting the calculation of
spacetime derivatives of F via the chain rule. In particular, if we imagine
substituting a specific field, ϕ = f (x) into the formula F , then F becomes a
function F of x only:
F(x) = F (x, f (x), ∂α f (x)). (2.58)
The total derivative of F , when restricted to ϕ = f (x), is the same as the
derivative of F:
Dα F (x, ϕ, ϕα ) = ∂α F(x). (2.59)
ϕ=f (x)

We can extend this apparatus to include the jets of field variations δϕ

and δϕα . The variation of the Lagrangian density is then defined as
∂L ∂L
δL := δϕ + δϕα . (2.60)
∂ϕ ∂ϕα
As a nice exercise you should verify that δL can be written as

∂L ∂L
δL = − Dα δϕ + Dα V α , (2.61)
∂ϕ ∂ϕα
where
∂L
Vα = δϕ. (2.62)
∂ϕα
We define the Euler-Lagrange derivative of (or Euler-Lagrange expression for)
the Lagrangian density via
∂L ∂L
E(L) := − Dα . (2.63)
∂ϕ ∂ϕα
Evidently, with our boundary conditions the functional derivative of the KG
action is the same as the EL derivative of the Lagrangian density (evaluated
on a field (ϕ = ϕ(x), ϕα = ∂α ϕ(x), ϕαβ = ∂α ∂β ϕ(x))). We have

δS
= E(L) = −ϕ,tt + ∇2 ϕ − m2 ϕ, (2.64)
δϕ
ϕ=ϕ(x)

and the KG field equation is the Euler-Lagrange equation of the Lagrangian

density L:
E(L) = 0. (2.65)
2.8. JET SPACE 31

The reason I introduce you to all this jet space formalism is that often
in field theory we want to manipulate a formula such as L(x, ϕ, ∂ϕ) using
∂L
the ordinary rules of multivariable calculus, e.g., calculate ∂x α , so that we

are viewing the Lagrangian as a function of 9 variables. For example, it will

be useful to know when a Lagrangian density does not depend explicitly on
∂L
the spacetime coordinates. In the jet space formalism this is easy: ∂x α =

0. Indeed, the KG Lagrangian density has this property. If we view the

Lagrangian density as a function of ϕ(x), it his hard to usefully interpret
such a condition – except that the Lagrangian is somehow a trivial constant.
A more detailed discussion of jet spaces occurs in the next section. For
the most part in this book, I will not use the fancy jet space formalism
unless I really need it. Instead I will follow the usual physicist way of doing
things where one always imagines that objects like Lagrangian densities are
evaluated on fields ϕ = ϕ(x) and the partial derivative is really the total
derivative as in (2.59).

Problems:

2. Compute the Euler-Lagrange derivative of the KG Lagrangian density

(2.50) and explicitly verify that the Euler-Lagrange equation is indeed
the KG equation.

3. Consider a Lagrangian density that is a divergence:

L = Dα W α , (2.66)

where
W α = W α (ϕ). (2.67)
Show that
E(L) ≡ 0. (2.68)
so the EL equations are trivial (0 = 0).

2.8 Jet Space

Here I will sketch some of the elements of a general jet space description of
a field theory. To begin, we generalize to the case where the Lagrangian is
32 CHAPTER 2. KLEIN-GORDON FIELD

a (local) function of the spacetime location, the fields, and the derivatives of
the fields to any finite order. We write
L = L(x, ϕ, ∂ϕ, ∂ 2 ϕ, . . . , ∂ k ϕ). (2.69)
Viewed this way, the Lagrangian is a function on a large but finite-dimensional
space called the k th jet space for the field theory. We denote this space by J k .
Remarkably, if we vary L we can always rearrange things so that all deriva-
tives of δϕ appear inside a total divergence. We have the Euler-Lagrange
identity:
δL = E(L)δϕ + Dα V α , (2.70)
where the general form for the Euler-Lagrange derivative is given by
∂L ∂L ∂L ∂L
E(L) := − Dα + Dα Dβ − · · · + (−1)k Dα1 · · · Dαk ,
∂ϕ ∂ϕ,α ∂ϕ,αβ ∂ϕ,α1 ···αk
(2.71)
and where the general form of the total derivative operator on a function
F = F (x, ϕ, ∂ϕ, ∂ 2 ϕ, . . . , ∂ k ϕ), (2.72)
is given by
∂F ∂F ∂F ∂F
Dα F = + ϕ ,α + ϕ,αβ + · · · + ϕ,αα1 ···αk (2.73)
∂xα ∂ϕ ∂ϕ,β ∂ϕ,α1 ···αk
Here we use the comma notation for (would-be) derivatives in conformation
with standard notation in physics. Notice that the total derivative of a
function on J k is a function on J k+1 .
From the total derivative formula, it follows that divergences have trivial
Euler-Lagrange derivatives
E(Dα V α ) = 0. (2.74)
This reflects the fact that the Euler-Lagrange derivative corresponds to the
functional derivative of the action integral in the case that the action func-
tional is differentiable. In particular, the Euler-Lagrange derivative ignores
all terms on the boundary of the domain of integration of the action integral.
In order to make contact between jet space and the usual calculus of
variations, one evaluates jet space formulas on a specific function ϕ = ϕ(x)
via
∂ϕ(x) ∂ 2 ϕ(x)
ϕ = ϕ(x), ϕ,α = , ϕ,αβ = , .... (2.75)
∂xα ∂xα ∂xβ
2.9. MISCELLANEOUS GENERALIZATIONS 33

In this way formulas defined as functions on jet space become formulas in-
volving only the spacetime. A good framework for doing all this is to view jet
space as a fiber bundle over spacetime. A particular KG field defines a cross
section of that fiber bundle which can be used to pull back various structures
to the base space.

Problems:

4. Consider the following Lagrangian density, viewed as a function on J 2 :

1
L = ϕ( − m2 )ϕ. (2.76)
2
Compute the Euler-Lagrange equation of this Lagrangian and show
that it yields the KG equation. Show that this Lagrangian density
differs from our original Lagrangian density for the KG equation by a
divergence.

5. Obtain a formula for the vector field V α appearing in the boundary

term in the Euler-Lagrange identity (2.70).

2.9 Miscellaneous generalizations

There are a number of possible ways that one can generalize the KG field the-
ory. Here I briefly mention a few generalizations that often arise in physical
applications. The easiest way to describe them is in terms of modifications
of the Lagrangian density.

2.9.1 External “sources”

It is useful for some purposes to consider an inhomogeneous version of the KG
equation. This is done by adding a term to the Lagrangian representing the
interaction of the KG field with a prescribed “source”, which mathematically
is a given function σ(t, x, y, z). We have

1
L = (ϕ2,t − (∇ϕ)2 − m2 ϕ2 ) − σϕ. (2.77)
2
34 CHAPTER 2. KLEIN-GORDON FIELD

The Euler-Lagrange (EL) equations are then

0 = E(L) = ( − m2 )ϕ − σ. (2.78)
The slickest way to solve this KG equation with sources is via Green’s func-
tions.
This is a “toy model” of the Maxwell equations for the electromagnetic
field in the presence of sources (electric charges and currents). Note that we
have here an instance of a Lagrangian which, viewed as a function on jet
space, depends upon the spacetime point (t, x, y, z) via σ = σ(t, x, y, z). In
quantum field theory, the presence of a source will lead to particle produc-
tion/annihilation via a transfer of energy-momentum (to be defined soon)
between the field and the source.

2.9.2 Self-interacting field

The KG Lagrangian is quadratic in the fields and the corresponding EL
equations are linear. Physically this corresponds to a “non-interacting” field.
One way to see this is to recall the interpretation of the KG field as a col-
lection of (many!) oscillators. In quantum field theory, the oscillator energy
eigenstates are identified with states containing particles. In such an inter-
pretation the particles propagate freely. This is why the KG field we have
been studied is often called the “free” or “non-interacting” scalar field. Later
we will show how to introduce various kinds of interactions with other fields.
The corresponding quantum field theory involves interacting particles of dif-
ferent types. Here we show how to introduce interactions among the KG
particles themselves. In the classical field theory this is described by a “self-
interacting” KG field which is described by non-linear generalizations of the
KG equation.
One can modify the classical KG theory in a variety of ways to make it
non-linear, that is, to introduce “self-interactions”. The simplest way to do
this is to add a “potential term” to the Lagrangian so that we have
1
L = (ϕ2,t − (∇ϕ)2 − m2 ϕ2 ) − V (ϕ), (2.79)
2
where V is a differentiable function of one variable. From the oscillator point
of view, such a term represents anharmonic contributions to the potential
energy. The EL field equations are
0 = E(L) = ( − m2 )ϕ − V 0 (ϕ) = 0. (2.80)
2.9. MISCELLANEOUS GENERALIZATIONS 35

Provided V is not just a quadratic function, this field equation is non-linear.9

Physically, this corresponds to a “self-interacting” field. Of course, one can
also add a source to the self-interacting theory.
An interesting example of a scalar field potential is the “double well po-
tential”, which is given by

1 1
V (ϕ) = − a2 ϕ2 + b2 ϕ4 . (2.81)
2 4
We shall explore the physical utility of this potential a bit later.

Problem:

6. Consider a self-interacting scalar field with the potential (2.81.) Char-

acterize the set of solutions of the form ϕ = constant in terms of the
values of the parameters m, a and b.

2.9.3 KG in arbitrary coordinates

We have presented the KG equation, action, Lagrangian, etc. using what
we will call inertial Cartesian coordinates xα = (t, x, y, z) on spacetime. Of
course, we may use any coordinates we like to describe the theory. It is a
straightforward – if perhaps painful – exercise to transform the KG equa-
tion into any desired coordinate system. As a physicist you would view the
result as a new – but equivalent – representation of the KG field equation.
Note however that, mathematically speaking, when you change coordinates
you generally do get a different differential equation as the “new” KG equa-
tion. For example, in inertial Cartesian coordinates the KG equation is linear
with constant coefficients. If you adopt spherical polar coordinates for space,
thereby defining inertial spherical coordinates, then the new version of the
KG equation will still be linear but it will have variable coefficients. While
this distinction may appear to be largely a pedantic one, there are real con-
sequences to the fact that the form of the equation changes when you change
coordinate systems. We shall discuss this more later. For now, let us just
note that (like most PDEs in physics) one needs to know what coordinate
system one is going to use in order to come up with the correct version of
the KG equation.
9
Can you guess the interpretation of a quadratic potential?
36 CHAPTER 2. KLEIN-GORDON FIELD

It is possible to give an elegant geometric prescription for computing the

Lagrangian and field equations in any coordinate system. To do this will
require a little familiarity with the basics of tensor analysis.
Introduce the spacetime metric g, which is a symmetric tensor field of type
0
(2 ) on Minkowski spacetime. Alternatively, you can view the metric as an as-
signment of an inner product on vectors at each point of spacetime. Using this
latter point of view, if V α = (V t , V x , V y , V z ) and W α = (W t , W x , W y , W z )
are components of two vector fields V~ and W ~ at a given point, their inner
product at that point is given by

g(V~ , W
~ ) = gαβ V α W β = −V t W t + V x W x + V y W y + V z W z . (2.82)

Here we have defined the components of the metric in the xα = (t, x, y, z)

coordinates:  
−1 0 0 0
0 1 0 0
gαβ = 
0
 (2.83)
0 1 0
0 0 0 1
The metric has an inverse g −1 which is a symmetric tensor of type (20 ) and
which defines an inner product at each point on 1-forms (dual vector fields) at
that point. If Aα = (At , Ax Ay , Az ) and Bα = (Bt , Bx , By , Bz ) are components
of 1-forms Ã and B̃ at a point in spacetime, their inner product is

g −1 (Ã, B̃) = g αβ Aα Bβ = −At Bt + Ax Bx + Ay By + Az Bz , (2.84)

where the components of the inverse metric in the inertial Cartesian coordi-
nates happens to be the same as the components of the metric:
 
−1 0 0 0
 0 1 0 0
g αβ = 
 0 0 1 0 .
 (2.85)
0 0 0 1

The matrices gαβ and g αβ are symmetric and they are each other’s inverse:

gαβ g βγ = δαγ . (2.86)

Let me take a moment to spell out how the metric behaves under a
change of coordinates. Call the old coordinates xα = (t, x, y, z). Call the
2.9. MISCELLANEOUS GENERALIZATIONS 37

new coordinates x̂α . Of course we will have an invertible transformation

between the two coordinate systems. With the usual abuse of notation we
will write
xα = xα (x̂), x̂α = x̂α (x). (2.87)
The metric in the new coordinates has components
∂xγ ∂xδ
ĝαβ (x̂) = gγδ (x(x̂)). (2.88)
∂ x̂α ∂ x̂β
Note that, while the original metric components formed a diagonal array
of constants, the new metric components will, in general, form some 4 × 4
symmetric array of functions. One should now compute the inverse metric
components,
∂ x̂γ ∂ x̂δ γδ
ĝ αβ (x̂) = g (x(x̂)). (2.89)
∂xα ∂xβ
Equivalently, one can compute ĝ αβ (x̂) by finding the matrix inverse to ĝαβ (x̂).
Although I won’t prove it here, it is a fundamental result from the calculus
of variations that the formula (2.63) for computing the EL equations does not
depend upon the choice of coordinates. In particular, it is possible to show
that, after a coordinate transformation, the EL equations of the transformed
Lagrangian density are the transformed EL equations. (This is not obvious,
but must be proved!) Consequently, the easiest way to find the transformed
field equation is to transform the Lagrangian and then compute the field
equations.
The KG Lagrangian density can be expressed in terms of the metric by
1p
− det(g) g −1 (dϕ, dϕ) + m2 ϕ2

L=−
2
1p
− det(g) g αβ ∂α ϕ∂β ϕ + m2 ϕ2 .

=− (2.90)
2
There are two ingredients to this formula which should be explained. First,
recall that if ϕ is a function on spacetime then dϕ – the differential of ϕ –
is a 1-form with components ϕ,α . We have made a scalar from dϕ using the
inner product defined
p by the inverse metric. Second, we have introduced an
overall factor of − det(g). You can easily check that this factor is unity
when inertial Cartesian coordinates are used. Under a change of coordinates
the determinant changes as
∂x 2
xα → x̂α , det(ĝ) = det( ) det(g). (2.91)
∂ x̂
38 CHAPTER 2. KLEIN-GORDON FIELD

This compensates the change in the coordinate volume element in the action
integral:

4
p ∂ x̂ 4 ∂x p p
d x̃ − det(ĝ) = det( ) d x det( ) − det(g) = d4 x − det(g),
∂x ∂ x̂
(2.92)
so that the same formula (2.90) can be used in any coordinates provided you
use the metric appropriate to that coordinate system. Notice that while the
Lagrangian does not explicitly depend upon the coordinates when using an
inertial Cartesian coordinate system it may depend upon the coordinates
in general. Indeed, just switching (x, y, z) to spherical polar coordinates will
introduce explicit coordinate dependence in the Lagrangian, as you can easily
verify.
It is now a straightforward exercise to show that the EL equations of the
KG Lagrangian (2.90) take the form:
p p
∂α − det(g)g ∂β ϕ(x) − m2 − det(g)ϕ(x) = 0.
αβ
(2.93)

Under a change of coordinates xα → x̂α the transformed Euler-Lagrange

equations can be computed using (2.93) provided the metric ĝαβ appropriate
to the x̂α coordinate system is used.

Problem:

7. Using (2.90), calculate the KG Lagrangian density in inertial cylin-

drical coordinates. Compute the Euler-Lagrange equations for this La-
grangian and verify they are equivalent to the Euler-Lagrange equations
in inertial Cartesian coordinates.

At this point I want to try to nip some possible confusion in the bud.
While we have a geometric prescription for computing the KG Lagrangian in
any coordinates using the metric and so forth, it is not a good idea to think
that there is but one KG Lagrangian for all coordinate systems. Strictly
speaking, different coordinate systems will, in general, lead to different La-
grangians. This comment is supposed to be completely analogous to the
previously mentioned fact that, while we can compute “the KG equation”
in any coordinate system, each coordinate system leads, in general, to a
different PDE. Likewise, we have different functions L(x, ϕ, ∂ϕ) in different
2.9. MISCELLANEOUS GENERALIZATIONS 39

coordinates. For example, in Cartesian coordinates L is in fact indepen-

dent of xα , which need not be true in other coordinates, e.g., spherical polar
coordinates.
Coordinates are convenient for computations, but they are more or less
arbitrary, so it should be possible – and is usually advantageous – to have
a formulation of the KG theory which is manifestly coordinate-free. Let me
just sketch this so you can get a flavor of how it goes.
We consider R4 equipped with a flat metric g of Lorentz signature.10 We
introduce a scalar field ϕ : R4 → R and a parameter m. Let (g) be the
volume 4-form defined by the metric. We define the KG Lagrangian as a
4-form via
1
L = − g −1 (dϕ, dϕ) + m2 ϕ2 (g).

(2.94)
2
The Lagrangian is to be viewed as a function on the jet space J 1 of the KG
field. The metric must be specified to construct this function. Our discussion
concerning the fact that different coordinates imply different Lagrangians
(and EL equations) can be stated in coordinate free language as follows.
Consider a diffeomorphism
f : R4 → R4 . (2.95)
The diffeomorphism defines a new metric ĝ by pull-back:
ĝ = f ∗ g. (2.96)
We can define a new Lagrangian using this new metric.
1
L̂ = − ĝ −1 (dϕ, dϕ) + m2 ϕ2 (ĝ).

(2.97)
2
This metric is flat and can equally well be used to build the KG theory. The
EL equations arising from L̂ or L are the KG equations defined using ĝ or g,
respectively. The relation between the solution spaces of these two equations
is that there is a bijection between the spaces of smooth solutions to each
equation. The bijection between solutions ϕ to the EL equations of L and
the solutions ϕ̂ to the EL equations of L̂ is simply
ϕ̂ = f ∗ ϕ. (2.98)
On the other hand, inasmuch as ĝ 6= g, the two Lagrangians and correspond-
ing field equations are different as functions on jet space, strictly speaking.
10
This means that the eigenvalues of the metric component matrix have the signs (-
+++). If this is true in one coordinate system it will be true in any coordinate system.
40 CHAPTER 2. KLEIN-GORDON FIELD

2.9.4 KG on any spacetime

A spacetime is defined as a pair (M, g) where M is a manifold and g is
a Lorentzian metric.11 We will always assume that everything in sight is
smooth, unless otherwise indicated. Physically, we should take M to be
four-dimensional, but this is not required mathematically and we shall not
do it here. It is possible to generalize the KG field theory to any spacetime,
but the generalization is not unique. Here I just mention two of the most
popular possibilities.
First, we have the minimally coupled KG theory, defined by the La-
grangian
1
L1 = − g −1 (dϕ, dϕ) + m2 ϕ2 (g).

(2.99)
2
Of course, this is formally the same as our flat spacetime Lagrangian. The
term “minimal coupling” has a precise technical definition, but we will not
bother to discuss it. It amounts to making the most straightforward gener-
alization from flat spacetime to curved spacetime as you can see here.
A second possibility is the curvature-coupled KG theory, defined by the
Lagrangian
1 −1
g (dϕ, dϕ) + (m2 + ξR(g))ϕ2 (g),

L2 = − (2.100)
2
where R(g) is the scalar curvature of the metric and ξ is a parameter. The
resulting theory is usually described with the terminology “non-minimally
coupled”.
If the spacetime is (R4 , g), with g a flat metric, then both of these La-
grangians reduce to one of the possible Lagrangians in flat spacetime that
we discussed in the previous subsection. So, both (2.99) and (2.100) can be
considered possible generalizations to curved spacetimes.
Finally, I emphasize that all of these Lagrangians require the specifica-
tion of a metric for their definition. If you change the metric then, strictly
speaking, you are using a different Lagrangian (viewed, say, as a function on
jet space). This is why, in a precise technical sense, one does not use the ad-
jectives “generally covariant” or “diffeomorphism invariant” to describe the
11
Also called a “Lorentzian manifold”, this means that at each point x ∈ M , there
exists a basis eα for Tx M such that g(eα , eβ ) = diag(−1, 1, 1, 1, · · · , 1). By contrast, a
Riemannian manifold uses a positive-definite metric so that the components of the metric
at a point can be rendered as diag(1, 1, 1, 1, · · · , 1).
2.10. PROBLEMS 41

KG field theories introduced above. If these Lagrangians had this property,

then under a redefinition of the KG field via a diffeomorphism f : M → M ,
ϕ → f ∗ ϕ, (2.101)
the Lagrangian should not change. Of course, in order for the Lagrangian to
stay unchanged (say, as a function on jet space) one must also redefine the
metric by the diffeomorphism,
g → f ∗ g. (2.102)
But, as we already agreed, the Lagrangian changes when you use a different
metric. The point is that the metric is not one of the fields in the KG field
theory and you have no business treating it as such. Now, if the metric itself
is treated as a field (not some background structure), subject to variation,
EL equations, etc., then the Lagrangians we have written are generally co-
variant. Of course, we no longer are studying the KG theory, but something
much more complex, e.g., there are now 11 coupled non-linear field equations
instead of 1 linear field equation. We will return to this issue again when we
discuss what is meant by a symmetry.

2.10 PROBLEMS

1. Verify (2.4)–(2.9).

2. Compute the Euler-Lagrange derivative of the KG Lagrangian density

(2.50) and explicitly verify that the Euler-Lagrange equation is indeed
the KG equation.

3. Consider a Lagrangian density that is a divergence:

L = Dα W α , (2.103)
where
W α = W α (ϕ). (2.104)
Show that
E(L) ≡ 0. (2.105)
42 CHAPTER 2. KLEIN-GORDON FIELD

4. Consider the Lagrangian density, viewed as a function on J 2 :

1
L = ϕ( − m2 )ϕ. (2.106)
2
Compute the Euler-Lagrange equation of this Lagrangian density and
show that it yields the KG equation. Show that this Lagrangian density
differs from our original Lagrangian density for the KG equation by a
divergence.

5. Obtain a formula for the vector field V α appearing in the boundary

term in the Euler-Lagrange identity (2.70).

6. Consider a self-interacting scalar field with the potential (2.81). Char-

acterize the set of solutions of the form ϕ = constant in terms of the
values of the parameters m, a and b.

7. Using (2.90), calculate the KG Lagrangian density in inertial cylin-

drical coordinates. Compute the Euler-Lagrange equations for this La-
grangian and verify they are equivalent to the Euler-Lagrange equations
in inertial Cartesian coordinates.
Chapter 3

Symmetries and conservation

laws

In physics, conservation laws are of undisputed importance. They are the

foundation for every fundamental theory of nature. They also provide valu-
able physical information about the complicated behavior of non-linear dy-
namical systems. From the mathematical point of view, when analyzing the
mathematical structure of differential equations and their solutions the ex-
istence of conservation laws (and their attendant symmetries via Noether’s
theorem) are also very important. We will now spend some time studying
conservation laws for the KG equation. Later we will introduce the notion
of symmetry and then describe a version of the famous Noether theorem
relating symmetries and conservation laws. As usual, we begin by defining
everything in terms of the example at hand: the KG field theory. It will then
not be hard to see how the idea of conservation laws works in general.

3.1 Conserved currents

We say that a vector field on spacetime, constructed as a local function,

j α = j α (x, ϕ, ∂ϕ, . . . , ∂ k ϕ), (3.1)

is a conserved current or defines a conservation law if the divergence of j α

vanishes whenever ϕ satisfies its field equations (the KG equation). We write

Dα j α = 0, when ( − m2 )ϕ = 0. (3.2)

43
44 CHAPTER 3. SYMMETRIES AND CONSERVATION LAWS

It is understood that the current is divergence-free provided all relations

between derivatives defined by the field equations and all the subsequent
relations which can be obtained by differentiating the field equations to any
order are imposed. Note that we are using the total derivative notation, which
is handy when viewing j α as a function on jet space, that is, as being built
via some function of finitely many variables. The idea of the conservation law
is that it provides a formula for taking any given solution the field equations

ϕ = ϕ(x), ( − m2 )ϕ(x) = 0, (3.3)

and then building a vector field on spacetime – also called j α by a standard

abuse of notation –

∂ϕ(x) ∂ k ϕ(x)
j α (x) := j α (x, ϕ(x), ,..., ) (3.4)
∂x ∂xk
such that
∂ α
j (x) = 0. (3.5)
∂xα
You can easily see in inertial Cartesian coordinates that our definition of
a conserved current j α = (j 0 , j 1 , j 2 , j 3 ) simply says that the field equations
imply a continuity equation for the density ρ ≡ j 0 (a function on spacetime)
and the current density ~j = (j 1 , j 2 , j 3 ) (a time dependent vector field on
space) associated with any solution ϕ(x) of the field equations:

∂ρ
+ ∇ · ~j = 0, (3.6)
∂t
where
∂ϕ(x) ∂ k ϕ(x)
ρ(x) = j 0 (x, ϕ(x), ,..., ), (3.7)
∂x ∂xk
and
∂ϕ(x) ∂ k ϕ(x)
(~j(x))i = j i (x, ϕ(x), ,..., ), i = 1, 2, 3. (3.8)
∂x ∂xk
The utility of the continuity equation is as follows. Define the total charge
contained in the region V of space at a given time t to be
Z
QV (t) := d3 x ρ(t, ~x). (3.9)
V
3.2. CONSERVATION OF ENERGY 45

Note that the total charge is a functional of the field, that is, its value depends
upon which field you choose. The integral over V of the continuity equation
(3.6) implies that Z
d ~ .
~j · dS
QV (t) = − (3.10)
dt ∂V

Here ∂V is the boundary of V and dS ~ is its oriented surface element according

to the divergence theorem. Keep in mind that this relation is meant to be
valid when the field is a solution to the field equation.

Problem:

1. Derive (3.10) from the continuity equation.

We call the right hand side of (3.10) the net flux into V . We say the charge QV
is conserved since we can account for its time rate of change purely in terms
of the flux into or out of the region V . In this sense there is no “creation”
or “destruction” of the charge, although the charge can move from place to
place.
With suitable boundary conditions, one can choose V such that charge
cannot enter or leave the region and so the total charge is constant in time.
In this case we speak of a constant of the motion. For example, we have
seen that a reasonable set of boundary conditions to put on the KG field
(motivated, say, by the variational principle) is to assume that the KG field
vanishes at spatial infinity. Let us then consider the region V to be all of
space, that is, V = R3 . If the fields vanish at spatial infinity fast enough,1
the flux will vanish asymptotically and we will have

dQV
= 0. (3.11)
dt

3.2 Conservation of energy

Let us look at an example of a conservation law for the KG equation. Con-
sider the following spacetime vector field locally built from the KG field and
1
Since the area element of the sphere of radius r grows like r2 , the fields should fall off
at infinity such that n̂ · ~j falls off faster than 1/r2 as r → ∞.
46 CHAPTER 3. SYMMETRIES AND CONSERVATION LAWS

its first derivatives – we give its components in an inertial Cartesian reference

frame:
1 2
j0 = ϕ,t + (∇ϕ)2 + m2 ϕ2 ,

(3.12)
2
j i = −ϕ,t (∇ϕ)i . (3.13)
Let us see how this defines a conserved current. We compute2

∂0 j 0 = ϕ,t ϕ,tt + ∇ϕ · ∇ϕ,t + m2 ϕϕ,t , (3.14)

and
∂i j i = −(∇ϕ,t ) · (∇ϕ) − ϕ,t ∇2 ϕ. (3.15)
All together, we get

∂α j α = ϕ,t ϕ,tt − ∇2 ϕ + m2 ϕ

= −ϕ,t (ϕ − m2 ϕ). (3.16)

It follows from the identity (3.16) that if ϕ(x) is a solution to the Klein-
Gordon equation then the resulting vector field j α (x) defined in (3.12), (3.13)
will satisfy ∂α j α = 0.
The conserved charge QV associated with this conservation law is called
the energy of the KG field in the region V and is denoted by EV :
Z
1
d3 x ϕ2,t + (∇ϕ)2 + m2 ϕ2 .

EV = (3.17)
V 2
There are various reasons why we view this as an energy. First of all, if you
put in physically appropriate units, you will find that EV has the dimensions
of energy. The best reason comes from Noether’s theorem, which we shall
discuss later. For now, let us recall that the Lagrangian has the form

L = T − U, (3.18)

where the “kinetic energy” is given by

Z
1
T = d3 x ϕ2,t , (3.19)
V 2
2
Here I have moved away from the total derivative notation back to the physicist’s
notation in which one imagines evaluating all the relevant formulas on a given field ϕ =
ϕ(x).
3.3. CONSERVATION OF MOMENTUM 47

and the “potential energy” is given by

Z
1
d3 x (∇ϕ)2 + m2 ϕ2 .

U= (3.20)
V 2
Naturally, then, the conserved charge that arises as

EV = T + U (3.21)

is called the total energy (in the region V ).

The net flux of energy into V is given by
Z Z
− ~
j · dS = ϕ,t ∇ϕ · dS. (3.22)
∂V ∂V

Evidently, if the solution to the KG equation is chosen to be static, ϕ,t = 0,

or such that the component of its spatial gradient along the normal to the
boundary vanishes, then the flux into the volume vanishes and the energy is
a constant of the motion. If we choose V = R3 , then the total energy of the
KG field – in the whole universe – is independent of time if the product of the
time rate of change of ϕ and the radial derivative of ϕ vanish as r → ∞ faster
than r12 . Of course, for the total energy to be defined in this case the integral
of the energy density j 0 (x) must exist and this also imposes asymptotic decay
conditions on the solutions ϕ(x) to the KG equation. Indeed, we must have
that ϕ(x), its time derivative, and the magnitude of its spatial gradient should
1
decay “at infinity” faster than r3/2 . This will guarantee that the net flux into
R3 vanishes.

Problem:
2. What becomes of conservation of energy for the KG field when an
external source is present, as in (2.77)? How do you physically interpret
this state of affairs?

3.3 Conservation of momentum

Let us look at another conservation law for the KG equation known as the
conservation of momentum. This arises as a triplet of conservation laws in
which
ρ(i) = ϕ,t ϕ,i , i = 1, 2, 3 (3.23)
48 CHAPTER 3. SYMMETRIES AND CONSERVATION LAWS

1
(~j(i) )l = −(∇ϕ)l ϕ,i + δil (∇ϕ)2 − (ϕ,t )2 + m2 ϕ2 .

(3.24)
2

Problem:
3. Verify that the currents (3.23), (3.24) are conserved. (If you like, you
can just fix a value for i, say, i = 1 and check that j1α is conserved.)

The origin of the name “momentum” of these conservation laws can be

understood on the basis of units: the conserved charges
Z
P(i) = d3 x ϕ,t ϕ,i , (3.25)
V

have the dimensions of momentum (if one takes account of the various di-
mensionful constants that we have set to unity). The name can also be
understood from the fact that the each of the three charge densities ρ(i) cor-
responds to a component of the current densities for the energy conservation
law. Roughly speaking, you can think of this quantity as getting the name
“momentum” since it defines the “flow of energy”. In a little while we will get
an even better explanation from Noether’s theorem. Finally, recall that the
total momentum of a system is represented as a vector in R3 . The Cartesian
components of this vector in the case of a KG field are the P(i) .

3.4 Energy-momentum tensor

The conservation of energy and conservation of momentum can be given a
unified treatment by introducing a (02 ) tensor field on spacetime known as the
energy-momentum tensor (also known as the “stress-energy-momentum ten-
sor”, the “stress-energy tensor”, and the “stress tensor”). Given a function
on spacetime ϕ : R4 → R (not necessarily satisfying any field equations), the
energy-momentum tensor is defined as
1 −1
T = dϕ ⊗ dϕ − g (dϕ, dϕ) + m2 ϕ2 g, (3.26)
2
where g is the metric tensor of spacetime. The components of the energy-
momentum tensor take the form
1 1
Tαβ = ϕ,α ϕ,β − gαβ g γδ ϕ,γ ϕ,δ − m2 ϕ2 gαβ . (3.27)
2 2
3.4. ENERGY-MOMENTUM TENSOR 49

Our conservation laws were defined for the KG field on flat spacetime and
the formulas were given in inertial Cartesian coordinates xα = (t, xi ) such
that the metric takes the form

g = gαβ dxα ⊗ dxβ , (3.28)

with
gαβ = diag(−1, 1, 1, 1). (3.29)
The formulas (3.26) or (3.27) are in fact correct on any spacetime provided
the metric in the chosen coordinates is specified. Note that the energy-
momentum tensor is symmetric:

Tαβ = Tβα . (3.30)

If desired, one can view the formula for T as defining a collection of functions
on jet space representing a formula for a tensor field on spacetime. More
0
1

precisely, we can view T as a mapping from J into the 2 tensor fields on
spacetime.
Using inertial coordinates xα = (t, ~x) on flat spacetime you can check
that the conserved energy current has components given by
α
jenergy = −Ttα ≡ −g αβ Ttβ . (3.31)

In particular the energy density is T tt . Likewise, the components of the

conserved momentum currents are given by
α
jmomentum = −Tiα ≡ −g αβ Tiβ , i = 1, 2, 3, (3.32)

so that, in particular, the momentum density in the direction labeled i is

given by −T ti . The conservation of energy and momentum are encoded in
the important identity:

g βγ Dγ Tαβ = ϕ,α ( − m2 )ϕ, (3.33)

where I remind you that we have defined

ϕ = g αβ ϕ,αβ . (3.34)

This relation shows that when evaluated on a function satisfying the KG

equations the resulting energy-momentum tensor field on spacetime has van-
ishing divergence.
50 CHAPTER 3. SYMMETRIES AND CONSERVATION LAWS

Although we are not emphasizing relativistic considerations in our dis-

cussions, it is perhaps worth mentioning that there is no absolute distinction
between energy and momentum. A change of reference frame will mix up
these quantities. One therefore usually speaks of the “conservation of energy-
momentum”, or the “conservation of four-momentum”, represented by the
currents j(α) , α = 0, 1, 2, 3 with components given by
β
j(α) = −Tαβ = −g βγ Tαγ . (3.35)

3.5 Conservation of angular momentum

Finally, let me introduce 6 more conservation laws, known as the conservation
laws of relativistic angular momentum. They are given by 6 currents M (µ)(ν)
with components:
M α(µ)(ν) = T αµ xν − T αν xµ . (3.36)
Note that
M α(µ)(ν) = −M α(ν)(µ) , (3.37)
which is why there are only 6 independent currents.

Problem:
4. Show that the six currents (3.36) are conserved. (Hint: Don’t panic!
This is actually the easiest one to prove so far, since you can use

g βγ Dγ Tαβ = ϕ,α ( − m2 )ϕ, (3.38)

which we have already established.)

In a given inertial reference frame labeled by coordinates xα = (t, xi ) =

(t, x, y, z) the relativistic angular momentum naturally decomposes into two
pieces in which (α, β) take the values (i, j) and (0, i). Let us look at the
charge densities; we have

ρ(i)(j) := M 0(i)(j) = T 0i xj − T 0j xi , (3.39)

ρ(0)(i) := M 0(0)(i) = T 00 xi − T 0i t. (3.40)

The first charge density represents the usual notion of (density of) angu-
lar momentum. Indeed, you can see that it has the familiar position ×
3.6. VARIATIONAL SYMMETRIES 51

momentum form. The second charge density, ρ(0)(i) , when integrated over
a region V yields a conserved charge which can be interpreted, roughly, as
the “center of mass-energy at t = 0” in that region. Just as energy and
momentum are two facets of a single, relativistic energy-momentum, you can
think of these two conserved quantities as forming a single relativistic form
of angular momentum.
Let us note that while the energy-momentum conserved currents are (in
Cartesian coordinates) local functions of the fields and their first derivatives,
the angular momentum conserved currents are also explicit functions of the
spacetime coordinates. Thus we see that conservation laws may be, in gen-
eral, functions on the full jet space (x, ϕ, ∂ϕ, ∂ 2 ϕ, . . .).

3.6 Variational symmetries

Let us now (apparently) change the subject to consider the notion of sym-
metry in the context of the KG theory. We shall see that this is not really a
change in subject when we come to Noether’s theorem relating symmetries
and conservation laws.
A slogan for the definition of symmetry in the style of the late John
Wheeler might be something like: “change without change”. When we speak
of an object admitting a “symmetry”, we usually have in mind some kind of
transformation of that object that leaves some specified aspects of that object
unchanged. We can partition transformations into two types: discrete and
continuous. The continuous transformations depend continuously on one or
more parameters. For example, the group of rotations of R3 about the z-axis
defines a continuous transformation parametrized by the angle of rotation.
The “inversion” transformation
(x, y, z) → (−x, −y, −z) (3.41)
is an example of a discrete transformation. We will be focusing primarily on
continuous transformations in what follows.
For a field theory such as the KG theory, let us define a one parameter
family of transformations – also called a continuous transformation – to be a
rule for building from any given field ϕ a family of KG fields (not necessarily
satisfying any field equations), denoted by ϕλ . We always assume that the
transformation starts at λ = 0 in the sense that
ϕλ=0 = ϕ. (3.42)
52 CHAPTER 3. SYMMETRIES AND CONSERVATION LAWS

You are familiar with such “curves in field space” from our discussion of
the variational calculus. There we considered a single (though arbitrary)
curve through a critical point. Here we make a stronger assumption and
assume that the transformation defines a curve through each point in the
space of fields via a formula which is a local function of the field and its
derivatives.3 We view this family of curves as defining a transformation of
any field ϕ, where the transformation varies continuously with the parameter
λ, and such that λ = 0 is the identity transformation.
A simple example of a one parameter family of transformations is the
scaling transformation:

ϕ(t, x, y, z) −→ ϕλ (t, x, y, z) := eλ ϕ(t, x, y, z). (3.43)

As another example, the following is the transformation induced on ϕ due to

a time translation:

ϕ(t, x, y, z) −→ ϕλ (t, x, y, z) := ϕ(t + λ, x, y, z). (3.44)

We say that a continuous transformation is a continuous variational sym-

metry for the KG theory if it leaves the Lagrangian invariant in the sense
that, for any KG field ϕ = ϕ(x),

∂ϕλ (x) ∂ϕ(x)

L(x, ϕλ (x), ) = L(x, ϕ(x), ). (3.45)
∂x ∂x
Explicitly, we want

1 αβ 1
g ∂α ϕλ (x)∂β ϕλ (x) + m2 ϕ2λ = − g αβ ∂α ϕ(x)∂β ϕ(x) + m2 ϕ2 .

−
2 2
(3.46)
An equivalent way to express this is that

∂ ∂ϕλ (x)
L(x, ϕλ (x), ) = 0. (3.47)
∂λ ∂x
I think you can see why this is called a “symmetry”. While the KG field is
certainly changed by the symmetry transformation, from the point of view
of the Lagrangian nothing is changed by the transformation.
3
The tangents to all these curves defines a vector field in jet space.
3.7. INFINITESIMAL SYMMETRIES 53

Our definition of variational symmetries did not rely in any essential way
upon the continuous nature of the transformation. For example, you can
easily see that the discrete transformation

ϕ → −ϕ (3.48)

leaves the KG Lagrangian unchanged and so would be called a discrete vari-

ational symmetry. Any transformation of the KG field that leaves the La-
grangian unchanged will be called simply a variational symmetry. Noether’s
theorem, which is our goal, involves continuous variational symmetries.
There is a notion of “symmetry” for field equations, too. Briefly, a trans-
formation is a symmetry of the field equations if it maps solutions of the field
equations to other solutions of the (same) field equations. A fundamental
result of the variational calculus is that variational symmetries are always
symmetries of the EL equations. (The converse is not true!)
The following problem provides a simple illustration.

Problems:

5. Consider the real scalar field with the double-well self-interaction po-
tential (2.81). Show that ϕ → ϕ̂ = −ϕ is a variational symmetry. Con-
sider the 3 constant solutions to the field equation (which you found
in the problem just after (2.81)) and check that this symmetry maps
these solutions to solutions.

This problem asks you to prove the general result.

6. Let ϕ → ϕ̂ = F (ϕ) be a variational symmetry of a Lagrangian. Show

that it maps solutions of the Euler-Lagrange equations to new solutions,
that is, if ϕ is a solution, so is ϕ̂. If you like, you can restrict your
attention to Lagrangians L(x, ϕ, ∂ϕ).

3.7 Infinitesimal symmetries

Let us now restrict our attention to continuous symmetries. A fundamental
observation going back to Lie is that, when considering aspects of problems
54 CHAPTER 3. SYMMETRIES AND CONSERVATION LAWS

involving continuous transformations, it is always best to formulate the prob-

lems in terms of infinitesimal transformations. Roughly speaking, the techni-
cal advantage provided by an infinitesimal description is that many non-linear
equations that arise become linear. The idea of an infinitesimal transforma-
tion is that we consider continuous transformations ϕ → ϕλ for “very small”
values of the parameter λ. More precisely, we define the infinitesimal change
of the field in much the same way as we do a field variation,

∂ϕλ
δϕ = , (3.49)
∂λ λ=0
which justifies the use of the same notation, I think. Still it is important
to see how these two notions of δϕ are the same and are different. A field
variation in a variational principle involves studying curves in field space
passing through a specific point (a critical point) so that, for each curve, δϕ
is a single function on spacetime. An infinitesimal transformation δϕ will be
a spacetime function which will depend upon the field ϕ being transformed,
and it is this dependence which is the principal object of study. From a more
geometric point of view, field variations in the calculus of variations repre-
sent tangent vectors at a single point in the space of fields. An infinitesimal
transformation is a vector field on the space of fields – a continuous assign-
ment of a vector to each point in field space. Just as one can restrict a vector
field to a given point and get a vector there, one can restrict an infinitesimal
transformation to a particular field and get a particular field variation there.
An example is in order. For the scaling transformation

ϕλ = eλ ϕ, (3.50)

we get
δϕ = ϕ, (3.51)
which shows quite clearly that δϕ is built from ϕ so that, while it is a function
on spacetime for a given field ϕ = ϕ(x), this spacetime function varies from
point to point in the space of fields. Likewise for time translations:

ϕλ (t, x, y, z) = ϕ(t + λ, x, y, z) (3.52)

δϕ = ϕ,t . (3.53)
Of course, just as it is possible to have a constant vector field, it is possi-
ble to have a continuous transformation whose infinitesimal form happens
3.7. INFINITESIMAL SYMMETRIES 55

to be independent of ϕ. For example, given some function f = f (x) the

transformation
ϕλ = ϕ + λf (3.54)
has the infinitesimal form
δϕ = f. (3.55)
This transformation is sometimes called a field translation.
The infinitesimal transformation gives a formula for the “first-order”
change of the field under the indicated continuous transformation. This
first order information is enough to completely characterize the transforma-
tion. The idea is that a finite continuous transformation can be viewed as
being built by composition of “many” infinitesimal transformations. Indeed,
if you think of a continuous transformation as a family of curves foliating
field space, then an infinitesimal transformation is the vector field defined by
the tangents to those curves at each point. As you may know, it is enough
to specify the vector field to determine the foliation of curves (via the “flow
of the vector field”). If this bit of mathematics is obscure to you, then you
may be happier by recalling that, say, the “electric field lines” are completely
determined by specifying the electric vector field, and conversely.
For a continuous transformation to be a variational symmetry it is neces-
sary and sufficient that its infinitesimal form defines an (infinitesimal) vari-
ational symmetry. By this I mean that the variation induced in L by the
infinitesimal transformation vanishes for all fields.4 That this condition is
necessary is clear from our earlier observation that a continuous symmetry
satisfies:
∂ ∂ϕλ (x)
L(x, ϕλ (x), ) = 0. (3.56)
∂λ ∂x
Clearly, this implies that

∂ ∂ϕλ (x)
0= L(x, ϕλ (x), )
∂λ ∂x λ=0
∂L ∂L
= δϕ + δϕα
∂ϕ ∂ϕ,α
= δL. (3.57)
4
Contrast this with idea of a critical point. An infinitesimal variational symmetry is a
particular family of field variations that does not change the Lagrangian for any choice of
the field. The critical point is a particular field such that the action does not change for
any field variation.
56 CHAPTER 3. SYMMETRIES AND CONSERVATION LAWS

That this condition is sufficient follows from the fact that it must hold at
all points in the space of fields, so that the derivative with respect to λ
vanishes everywhere on the space of fields. Thus one often checks whether
a continuous transformation is a variational symmetry by just checking its
infinitesimal condition (3.57).

3.8 Divergence symmetries

We have defined a variational symmetry as a transformation that leaves
the Lagrangian unchanged. This is a reasonable state of affairs since the
Lagrangian determines the field equations. However, we have also seen that
any two Lagrangians L and L0 differing by a divergence

L0 = L + Dα V α , (3.58)

where V α = V α (x, ϕ, ∂ϕ, . . . ), will define the same EL equations since

E(Dα V α ) = 0 (3.59)

so that
E(L0 ) = E(L) + E(Dα V α ) = E(L). (3.60)
Therefore, it is reasonable – and we shall see quite useful – to consider gener-
alizing our notion of symmetry. We say that a transformation is a divergence
symmetry if the Lagrangian only changes by the addition of a divergence. In
infinitesimal form, a divergence symmetry satisfies

δL = Dα W α , (3.61)

for some spacetime vector field W α , built locally from the scalar field and its
derivatives, W α = W α (x, ϕ, ∂ϕ, . . . ). Of course, a variational symmetry is
just a special case of a divergence symmetry arising when W α = 0.

Problem:

7. Show that the scaling transformation (3.50) is neither a variational

symmetry nor a divergence symmetry for the KG Lagrangian.
3.9. A FIRST LOOK AT NOETHER’S THEOREM 57

Time translation,
δϕ = ϕ,t , (3.62)
defines a divergence symmetry of the KG Lagrangian. Let’s see how this
works. We begin by writing the Lagrangian as

1 αβ
g ϕ,α ϕ,β + m2 ϕ2 ,

L=− (3.63)
2

where g αβ = diag(−1, 1, 1, 1). We then have

δL = − g αβ ϕ,α ϕ,βt + m2 ϕϕ,t

1 αβ 2 2

= Dt − g ϕ,α ϕ,β + m ϕ
2
= Dt L
= Dα (δtα L) , (3.64)

so that we can choose

W α = δtα L. (3.65)

Physically, the presence of this symmetry reflects the fact that there is
no preferred instant of time in the KG theory. A shift in the origin of time
t → t + constant does not change the field equations.

3.9 A first look at Noether’s theorem

We now have enough technology to have a first, somewhat informal look at
Noether’s theorem relating symmetries and conservation laws. I will use the
jet space formalism to ensure maximum precision. The idea is as follows.
Consider a Lagrangian of the form

L = L(x, ϕ, ∂ϕ). (3.66)

Of course, the KG Lagrangian is of this form. Suppose that δϕ is an in-

finitesimal variational symmetry. Then the induced change in the Lagrangian
vanishes,
δL = 0, (3.67)
58 CHAPTER 3. SYMMETRIES AND CONSERVATION LAWS

everywhere in the space of fields. But, at any given point in field space, we
always have the identity, valid for any kind of variation, which defines the
Euler-Lagrange expression:

δL = E(L)δϕ + Dα V α , (3.68)

where
∂L ∂L
E(L) = − Dα , (3.69)
∂ϕ ∂ϕ,α
and5
∂L
Vα = δϕ. (3.70)
∂ϕ,α
This identity holds for any field variation. By hypothesis, our field varia-
tion δϕ is some expression built from x, ϕ, and its derivatives that has the
property that δL = 0. Consequently, for the infinitesimal symmetry trans-
formation δϕ we have the relation

0 = E(L)δϕ + Dα V α ⇐⇒ Dα V α = −E(L)δϕ. (3.71)

This is exactly the type of relationship that defines a conserved current V α ,

since it says that the divergence of V α will vanish if V α is built from a KG
field ϕ that satisfies the EL-equation (the KG equation). Note that the
specific form of V α as a function of ϕ (and its derivatives) depends upon
∂L
the specific form of the Lagrangian via ∂ϕ ,α
and on the specific form of the
transformation via δϕ.
More generally, suppose that the infinitesimal transformation δϕ defines
a divergence symmetry, that is, there exists a vector field W α built from ϕ
such that
δL = Dα W α . (3.72)
We still get a conservation law since the variational identity applied to a
divergence symmetry now yields

Dα W α = E(L)δϕ + Dα V α , (3.73)

which implies
Dα (V α − W α ) = −E(L)δϕ, (3.74)
5
There is an ambiguity in the definition of V α here which we shall ignore for now to keep
things simple. We will confront it when we study conservation laws in electromagnetism.
3.10. TIME TRANSLATION SYMMETRY AND CONSERVATION OF ENERGY59

so that the conserved current is now V α − W α .

To summarize, if δϕ(x, ϕ, ∂ϕ, . . . ) is a divergence symmetry of L(x, ϕ, ∂ϕ),

δL = Dα W α , (3.75)

then there is a conserved current given by

∂L
jα = δϕ − W α . (3.76)
∂ϕ,α

This is a version of “Noether’s first theorem”.

3.10 Time translation symmetry and conser-

vation of energy
Using Noether’s first theorem we can see how the conserved current defining
conservation of energy arises via the time translation symmetry. (I will con-
tinue to use the jet space formalism – the total derivative D, in particular.)
Recall that time translation symmetry is a divergence symmetry:

δϕ = Dt ϕ = ϕ,t =⇒ δL = Dα (δtα L). (3.77)

With
∂L
δϕ = Dt ϕ = ϕ,t , W α = δtα L, = −g αβ ϕ,β , (3.78)
∂ϕ,α
we can apply the results of the previous section to obtain a conserved current:

j α = −g αβ ϕ,β ϕ,t − δtα L

= −Ttα , (3.79)

which is our expression of the conserved energy current in terms of the energy-
momentum tensor.
It is worth pointing out that the existence of the time translation sym-
metry, and hence conservation of energy, is solely due to the fact that the
KG Lagrangian has no explicit t dependence; no other structural features of
the Lagrangian play a role. To see this, consider any Lagrangian whatsoever

L = L(x, ϕ, ∂ϕ, . . .) (3.80)

60 CHAPTER 3. SYMMETRIES AND CONSERVATION LAWS

satisfying
∂
L(x, ϕ, ∂ϕ, . . .) = 0. (3.81)
∂t
Here I should emphasize that this partial derivative is really a partial deriva-
tive – it only applies to the explicit coordinate dependence of the Lagrangian.
So, for example, in the jet space context of the present discussion we have
∂t (tϕ) = ϕ, Dt (tϕ) = ϕ + tϕ,t . (3.82)
From the identity
∂L ∂L ∂L
Dt L = + ϕ,t + ϕ,tα (3.83)
∂t ∂ϕ ∂ϕ,α
we have – provided ∂t L = 0:
∂L ∂L
ϕ,t + ϕ,tα = Dt L, (3.84)
∂ϕ ∂ϕ,α
which can be interpreted as saying the time translation δϕ = ϕ,t yields a
divergence symmetry (3.77), leading to conservation of energy. One says
that the conserved current for energy is the Noether current associated to
time translational symmetry.

3.11 Space translation symmetry and conser-

vation of momentum
We can use spatial translation symmetry to obtain conservation of mo-
mentum as follows. Let n̂ be a constant unit vector field in space. With
(t, x, y, z) ≡ (t, ~x), the continuous transformation corresponding to a spatial
translation along the direction specified by n̂ is given by
ϕ(t, ~x) → ϕλ (t, ~x) = ϕ(t, ~x + λn̂), (3.85)
Infinitesimally, we have
δϕ = n̂ · ∇ϕ = ni ϕ,i . (3.86)
To check that this is a symmetry of the KG Lagrangian we compute
δL = ϕ,t (n̂ · ∇ϕ),t − ∇ϕ · ∇(n̂ · ∇ϕ) − m2 ϕ(n̂ · ∇ϕ)
= n̂ · ∇L
= ∇ · (n̂L)
= Dα W α , (3.87)
3.11. SPACE TRANSLATION SYMMETRY AND CONSERVATION OF MOMENTUM61

where
W α = (0, ni L), (3.88)

and D is the total derivative.

As before, it is not hard to see that this result is a sole consequence of
the fact that the Lagrangian has no dependence on the spatial (Cartesian)
coordinates. In particular,

∂
ni L(x, ϕ, ∂ϕ, . . .) = 0, (3.89)
∂xi

so that
∂L ∂L
ni ϕ,i + ni ϕ,ij = ni Di L (3.90)
∂ϕ ∂ϕ,j

Thus we have the conservation law

j α = (ρ, j i ), (3.91)

with
ρ = ϕ,t n̂ · ∇ϕ, (3.92)

and
1
j i = −ϕ,i n̂ · ∇ϕ + ni (∇ϕ)2 − ϕ2,t + m2 ϕ2 .

(3.93)
2
Since the direction n̂ is arbitrary, it is easy to see that we really have three in-
dependent conservation laws corresponding to 3 linearly independent choices
for n̂. These three conservation laws correspond to the conservation laws for
momentum that we had before. The relation between ρ and j i here and ρ(i)
and ~j(i) there is given by

ρ = nk ρ(k) , j i = nk (~j(k) )i . (3.94)

You can see that the translational symmetry in the spatial direction defined
by n̂ leads to a conservation law for the component of momentum along
n̂. Thus the three conserved momentum currents are the Noether currents
associated with spatial translation symmetry.
62 CHAPTER 3. SYMMETRIES AND CONSERVATION LAWS

3.12 Conservation of energy-momentum re-

visited
Here we revisit the conservation of energy-momentum from a more relativistic
point of view, bringing into play the energy-momentum tensor. Let aα be
any constant vector field on spacetime. Consider the following continuous
transformation, which is a spacetime translation:
ϕλ (xα ) = ϕ(xα + λaα ), δϕ = aα ϕ,α . (3.95)
As a nice exercise you should check that we then have
δL = Dα (aα L) (3.96)
so that from Noether’s theorem we have the following conserved current:
j α = −g αβ ϕ,β (aγ ϕ,γ ) − aα L. (3.97)
Here we are using gαβ = g αβ = diag(−1, 1, 1, 1). By choosing aα to define a
time or space translation we get the corresponding conservation of energy or
momentum, as explored in the previous sections.
Since the components aα are arbitrary constants, it is easy to see that for
each value of γ, the current
α
j(γ) = −g αβ ϕ,β ϕ,γ − δγα L (3.98)
is conserved, corresponding to the four independent conservation laws of
energy and momentum. Substituting for the KG Lagrangian:
1 αβ
g ϕ,α ϕ,β + m2 ϕ2 ,

L=− (3.99)
2
we get that
α
j(γ) = −Tγα ≡ −g αβ Tβγ . (3.100)
Thus the energy-momentum tensor can be viewed as the set of Noether cur-
rents associated with spacetime translational symmetry.

Problem:
8. Verify that the Noether currents associated with a spacetime transla-
tion do yield the energy-momentum tensor.
3.13. ANGULAR MOMENTUM REVISITED 63

3.13 Angular momentum revisited

We have seen the correspondence between spacetime translation symmetry
and conservation of energy-momentum. What symmetry is responsible for
conservation of angular momentum? It is Lorentz symmetry. Recall that the
Lorentz group consists of “boosts” and spatial rotations. Mathematically, a
Lorentz transformation is a linear transformation on the spacetime R4 ,

xα −→ Sβα xβ , (3.101)

that leaves invariant the quadratic form

gαβ xα xβ = −t2 + x2 + y 2 + z 2 . (3.102)

We have then
Sγα Sδβ gαβ = gγδ . (3.103)
Consider a 1-parameter family of such transformations, S(λ), so that
α
α α
∂Sβ
Sβ (0) = δβ , =: ωβα (3.104)
∂λ λ=0

Using these transformations in (3.103) and differentiating with respect to λ

we obtain
ωγα gαδ + ωδβ gγβ = 0. (3.105)
This is the infinitesimal version of (3.103). Defining

ωαβ = gβγ ωαγ (3.106)

we see that a Lorentz transformation is “generated” by ω if and only if the

array ωαβ is anti-symmetric:

ωαβ = −ωβα . (3.107)

Infinitesimal Lorentz transformations are thus in one-to-one correspondence

with antisymmetric arrays ωαβ . In a given inertial reference frame with
inertial Cartesian coordinates xα = (t, xi ), the spatial rotations are generated
by the infinitesimal transformations provided by ωij , while the boosts are
generated by the infinitesimal transformations provided by ω0i .
64 CHAPTER 3. SYMMETRIES AND CONSERVATION LAWS

Let us now determine the infinitesimal transformation induced on the

scalar field by the infinitesimal Lorentz transformation generated by ωαβ .
Consider the following transformation

ϕλ (xα ) = ϕ(Sβα (λ)xβ ), (3.108)

Differentiate both sides with respect to λ and set λ = 0 to find the infinites-
imal transformation:
δϕ = (ωβα xβ )ϕ,α , (3.109)
with an antisymmetric ωαβ as above. It is now a short computation to check
that, for the KG Lagrangian,

δL = Dα ωβα xβ L .

(3.110)

(Here I am still using the jet space notation with the total derivative D.)
Thus the Lorentz transformations define a divergence symmetry for the KG
Lagrangian. The resulting Noether current is given by

j α = −g αβ ϕ,β (ωδγ xδ )ϕ,γ − ωβα xβ L

= ωγδ M α(γ)(δ) , (3.111)

where M α(γ)(δ) are the conserved currents associated with relativistic angular
momentum.

Problem:
9. Derive (3.110).

3.14 Spacetime symmetries in general

The symmetry transformations we have been studying involve spacetime
translations:
xα −→ xα + λaα , (3.112)
where aα = const. and Lorentz transformations,

xα −→ Sβα (λ)xβ , (3.113)

3.14. SPACETIME SYMMETRIES IN GENERAL 65

where
Sβα Sδγ ηαγ = ηβδ . (3.114)
These symmetries are, naturally enough, called spacetime symmetries since
they involve transformations in spacetime. These symmetry transformations
have a nice geometric interpretation which goes as follows.
Given a spacetime (M, g) we can consider the group of diffeomorphisms,
which are smooth mappings of M to itself with smooth inverses. Given a
diffeomorphism
f : M → M, (3.115)
there is associated to the metric g a new metric f ∗ g via the pull-back. In
coordinates xα on M the diffeomorphism f is given as

xα → f α (x), (3.116)

and the pullback metric has components related to the components of g via

∂f γ ∂f δ
(f ∗ g)αβ (x) = gγδ (f (x)). (3.117)
∂xα ∂xβ
We say that f is an isometry if

f ∗ g = g. (3.118)

The idea of an isometry is that it is a symmetry of the metric – the spacetime

points have been moved around, but the metric can’t tell it happened.
Consider a 1-parameter family of diffeomorphisms fλ such that f0 =
identity. As the parameter varies, each point in M traces out a curve. The
tangent vectors to all these curves constitute a vector field, which I will
denote by X. The Lie derivative of the metric along X is defined as

d ∗
LX g := f g . (3.119)
dλ λ λ=0
If fλ is a 1-parameter family of isometries then we have that

LX g = 0. (3.120)

This is the infinitesimal version of (3.118).

It is not too hard to verify that the spacetime translations and the Lorentz
translations define isometries of the Minkowski metric
66 CHAPTER 3. SYMMETRIES AND CONSERVATION LAWS

g = ηαβ dxα ⊗ dxβ . (3.121)

In fact, it can be shown that all continuous isometries of flat spacetime are
contained in the Poincaré group, which is the group of diffeomorphisms built
from spacetime translations and Lorentz transformations. As an exercise you
can convince yourself that this group is labeled by 10 parameters.
The KG Lagrangian depends upon a choice of spacetime for its definition.
Recall that a spacetime involves specifying two structures: a manifold M
and a metric g on M . Isometries are symmetries of that structure: they
are diffeomorphisms – symmetries of M – that also preserve the metric. It
is not too surprising, then, that the Lagrangian symmetries we have been
studying are symmetries of the spacetime, since that is the only structure
used to construct the Lagrangian. The existence of conservation laws of
energy, momentum and angular momentum is contingent upon the existence
of suitable spacetime symmetries.

3.15 Internal symmetries

Besides the spacetime symmetries, there is another class of symmetries in
field theory that is very important since, for example, it is the source of
myriad other conservation laws besides energy, momentum and angular mo-
mentum. This class of symmetries is known as internal symmetries since
they do not involve transformations of spacetime, but only transformations
on the space of fields.
A simple example of an internal Lagrangian symmetry (albeit a discrete
symmetry) for the KG theory is given by ϕ → −ϕ, as you can easily verify
by inspection. This symmetry extends to self-interacting scalar theories with
potentials which are an even function of ϕ, e.g., the double-well potential.
For our purposes, there are no particularly interesting continuous internal
symmetries of the KG theory unless one sets the rest mass to zero. Then we
have the following situation.

Problem:

10. Consider the KG theory with m = 0. Show that the transformation

ϕλ = ϕ + λ (3.122)
3.16. THE CHARGED KG FIELD AND ITS INTERNAL SYMMETRY67

is a variational symmetry. Use Noether’s theorem to find the conserved

current and conserved charge.

There is an important generalization of the KG theory which admits a fun-

damental internal symmetry, and this is our next topic.

3.16 The charged KG field and its internal

symmetry
An important generalization of the Klein-Gordon field is the charged Klein-
Gordon field. The charged KG field can be viewed as a mapping
ϕ : M → C, (3.123)
so that there are actually two real-valued functions in this theory. The La-
grangian for the charged KG field is
L = −(g αβ ϕ,α ϕ∗,β + m2 |ϕ|2 ), gαβ = g αβ = diag(−1, 1, 1, 1). (3.124)

Problem:
11. Show that this Lagrangian is the sum of the Lagrangians for two (real-
valued) KG fields ϕ1 and ϕ2 with m1 = m2 and with the identification
1
ϕ = √ (ϕ1 + iϕ2 ). (3.125)
2

Recall from Lagrangian mechanics that two dynamical systems, described

respectively by Lagrangians L1 and L2 , can be combined into a total system
(with no interactions) by using the Lagrangian L1 +L2 . Thus the Lagrangian
for the charged KG field represents that of two non-interacting KG fields.
From this problem you can also surmise that the field equations for the
charged KG field consist of two identical KG equations for the real and
imaginary parts of ϕ. In terms of the complex-valued function ϕ you can
check that the field equations – computed as Euler-Lagrange equations or
via the critical points of the action – are simply
Eϕ (L) = ( − m2 )ϕ∗ = 0, (3.126)
68 CHAPTER 3. SYMMETRIES AND CONSERVATION LAWS

Eϕ∗ (L) = ( − m2 )ϕ = 0. (3.127)

Note that one can do field-theoretic computations such as deriving these
Euler-Lagrange equations either using the real functions ϕ1 and ϕ2 or using
the familiar trick of using “complex coordinates” on the space of fields, that
is, treating ϕ and ϕ∗ as independent variables. In the former case there are
2 EL equations:

∂L ∂L ∂L ∂L
− Dµ = 0, − Dµ = 0. (3.128)
∂ϕ1 ∂ϕ1,µ ∂ϕ2 ∂ϕ2,µ

In the latter case, the field equations — equivalent to those above — can be
computed via

∂L ∂L ∂L ∂L
− Dµ = 0, ∗
− Dµ ∗ = 0. (3.129)
∂ϕ ∂ϕ,µ ∂ϕ ∂ϕ,µ

Problem:

12. Using ϕ = ϕ1 +iϕ2 , show that the field equations (3.128) and (3.129) are
equivalent. You should be able to do this for an arbitrary Lagrangian
density, but you should at least do it for the one given in (3.124).

In any case, one has doubled the size of the field space. As we shall see,
the new “degrees of freedom” that have been introduced allow for a notion
of conserved electric charge.6
It is easy to see that the Lagrangian (3.124) for the complex KG field
admits the continuous symmetry

ϕλ = eiλ ϕ, ϕ∗λ = e−iλ ϕ∗ . (3.130)

This continuous variational symmetry is given various names. Sometimes it

is called a “phase transformation” for obvious reasons. Because the set of
unitary linear transformations of the vector space of complex numbers, de-
noted U (1), is precisely the multiplicative group of phases eiλ , sometimes the
symmetry transformation (3.130) is called a “rigid U (1) transformation”, or
6
In addition, in the corresponding quantum field theory, the doubling of the degrees
of freedom leads to anti-particles which are distinct from the particles.
3.16. THE CHARGED KG FIELD AND ITS INTERNAL SYMMETRY69

a “global U (1) transformation”, or just a “U (1) transformation”. For vari-

ous reasons related to Noether’s second theorem (as we shall see), sometimes
this transformation is called a “gauge transformation of the first kind”. You
will also find various mixtures of these terms in the literature. Whatever the
name, you can see that the transformation is simply a rotation in the vector
space of values of the fields ϕ1 and ϕ2 which were defined in the last problem.
The Lagrangian is rotationally invariant in field space, hence the symmetry.
It is straightforward to compute the conserved current associated with
the U (1) symmetry using Noether’s (first) theorem. The only novel feature
here is that we have more than one field. I will therefore give the gory details.
The infinitesimal transformation is given by

δϕ = iϕ, δϕ∗ = −iϕ∗ . (3.131)

The variation of the Lagrangian is, in general, given by

∂L ∂L ∂L ∗ ∂L
δL = δϕ + δϕ,α + δϕ + δϕ∗
∂ϕ ∂ϕ,α ∂ϕ ∗ ∂ϕ∗,α ,α

∗ ∂L ∂L ∗
= Eϕ (L)δϕ + Eϕ∗ (L)δϕ + Dα δϕ + δϕ . (3.132)
∂ϕ,α ∂ϕ∗,α

From the phase symmetry we know that when we set δϕ = iϕ it follows that
δL = 0, so we have

∗ ∂L ∂L ∗
0 = Eϕ (L)iϕ − Eϕ∗ (L)iϕ + Dα iϕ − iϕ . (3.133)
∂ϕ,α ∂ϕ∗,α

Using
∂L
= g αβ ϕ∗,β , (3.134)
∂ϕ,α
∂L
= g αβ ϕ,β , (3.135)
∂ϕ∗,α
we get a conserved current

j α = −ig αβ ϕ∗ ϕ,β − ϕϕ∗,β .

(3.136)

Problem:
70 CHAPTER 3. SYMMETRIES AND CONSERVATION LAWS

13. Verify directly from the above formula for j α that

Dα j α = 0, (3.137)

when the field equations for ϕ and ϕ∗ are satisfied.

The total “U (1) charge” contained in a spatial volume V at t = const. is

given by Z
d3 x ϕ∗ ϕ,t − ϕϕ∗,t .

Q=i (3.138)
V

Note that the sign of this charge is indefinite: the charged KG field contains
both positive and negative charges. This charge can be used to model electric
charge in electrodynamics. It can also be used to model the charge which
interacts via neutral currents in electroweak theory.

Problem:

14. The double well potential can be generalized to the U (1)-symmetric

charged scalar field by choosing V (ϕ) = −a2 |ϕ|2 + b2 |ϕ|4 . Check that
the Lagrangian with this self-interaction potential still admits the U (1)
symmetry. (Hint: this is really easy.) Plot the graph of V as a function
of the real and imaginary parts of ϕ. (Hint: you should see why this is
often called the “Mexican hat potential” for an appropriate choice for
m and a. ) Find all solutions of the field equations of the form ϕ =
constant. How do these solutions transform under the U (1) symmetry?

3.17 More generally. . .

We can generalize our previous discussion as follows. Recall that, given a
group G, a (linear) representation of G is a pair (r, V ) where V is a vec-
tor space and r : G → GL(V ) is a group homomorphism, that is, r is an
identification of linear transformations r(g) on V with elements g ∈ G such
that
r(g1 g2 ) = r(g1 )r(g2 ). (3.139)
This way of viewing things applies to the U (1)-symmetric charged Klein-
Gordon theory as follows. For the charged scalar field the group G = U (1)
3.18. SU(2) SYMMETRY 71

is the set of phases eiλ , labeled by λ, and with group multiplication being
ordinary multiplication of complex numbers:

eiλ1 eiλ2 = ei(λ1 +λ2 ) . (3.140)

The vector space is V = C, and the representation r is via ordinary multi-

plication of elements z ∈ C by the phase z → r(λ)z = eiλ z.
In this U (1) case, the internal symmetry transformation is multiplication
of the complex-valued field by a phase. So we can view this situation as
coming from the facts that (1) we can view ϕ as a map from spacetime into
the representation vector space C, and (2) the group U (1) acts on C, so that
the composition of the maps, r(λ) ◦ ϕ defines the symmetry transformation.
There is a pretty straightforward generalization of this to a general group.
Given a group G one picks a representation (r, V ). One introduces fields that
are maps from spacetime into V ; we write

ϕ : M → V. (3.141)

Each element g ∈ G defines a field transformation via

ϕ −→ r(g)ϕ. (3.142)

While it is possible to generalize still further, this construction captures al-

most all instances of (finite-dimensional) internal symmetries in field theory.
Of course, for the transformation just described to be a (divergence) sym-
metry, it is necessary that the Lagrangian be suitably invariant under the
action of r(g). One can examine this issue quite generally, but we will be
content with exhibiting another important example.

3.18 SU(2) symmetry

The group SU(2) can be defined as the group of unitary, unimodular transfor-
mations of the vector space C2 , equipped with its standard inner-product and
volume element.7 In terms of the Hermitian conjugate (complex-conjugate-
transpose) †, the unitarity condition on a linear transformation U is

U † = U −1 , (3.143)
7
This way of defining SU(2) in terms of a representation provides the “defining repre-
sentation”.
72 CHAPTER 3. SYMMETRIES AND CONSERVATION LAWS

which is equivalent to saying that the linear transformation preserves the

standard Hermitian inner product. The unimodularity condition is

det U = 1, (3.144)

which is equivalent to saying that the linear transformation preserves the

standard volume form on C2 .
Let us focus on the “defining representation” of SU(2) as just stated.
Then the representation vector space is C2 and each element of SU(2) can
be represented by a matrix of the form
θ θ
r(g) = U (θ, n) = cos( ) I + i sin( ) nj σj , (3.145)
2 2
where
n = (n1 , n2 , n3 ), (n1 )2 + (n2 )2 + (n3 )2 = 1, (3.146)
and
0 1 0 −i 1 0
σ1 = σ2 = , σ3 = . (3.147)
1 0 i 0 0 −1
are the Pauli matrices. Note that group elements are uniquely labeled by the
values of three parameters; one corresponding to θ and two corresponding to
the free parameters defining the unit vector n.8
We can use this group representation to define a continuous transforma-
tion group of a field theory using the general strategy we outlined earlier.
The fields are defined to be mappings

ϕ : M → C2 , (3.148)

so we now have two charged KG fields or, equivalently, four real KG fields.
You can think of ϕ as a 2-component column vector whose entries are complex
functions on spacetime. Let U (λ) be any one parameter family of SU(2)
transformations, as described above. We assume that

U (0) = I. (3.149)

We define
ϕλ = U (λ)ϕ. (3.150)
8
The elements of SU(2) can be parametrized by a unit vector and an angle, just as are
elements of the rotation group SO(3). This is related to the fact that SU(2) provides a
spinor representation of the group of rotations.
3.18. SU(2) SYMMETRY 73

The infinitesimal form of this transformation is

δϕ = iτ ϕ, (3.151)

where τ is a Hermitian, traceless 2 × 2 matrix defined by

1 dU
τ= . (3.152)
i dλ λ=0

Note that
δϕ† = −iϕ† τ † = −iϕ† τ. (3.153)
By the way, you can see that τ is traceless and Hermitian by considering
our formula for U (θ, n) above, or by simply noting that U (λ) satisfies

U † (λ)U (λ) = I, det(U (λ)) = 1 (3.154)

for all values of λ. Differentiation of each of these relations and evaluation

at λ = 0 yields the Hermitian (τ † = τ ) and trace-free conditions, respec-
tively. It is not hard to see that every Hermitian trace-free matrix is a linear
combination of the Pauli matrices:

τ = ai σ i , (3.155)

where ai∗ = ai . Thus the SU(2) transformations can also be parametrized

by the three numbers ai .
On C2 we have the usual Hermitian inner product (·, ·) that is invariant
with respect to the SU(2) transformation. With

u1 v
u= , v= 1 , (3.156)
u2 v2

the is inner product is given by

hu, vi = u† v = u∗1 v1 + u∗2 v2 . (3.157)

We can use it to define an inner product on the values of the fields ϕ:

hϕ1 , ϕ2 i = ϕ†1 ϕ2 . (3.158)

74 CHAPTER 3. SYMMETRIES AND CONSERVATION LAWS

Here is how to see that this inner product is SU(2)-invariant:

hU ϕ1 , U ϕ2 i = (U ϕ1 )† (U ϕ2 )
= ϕ†1 U † U ϕ2
= ϕ†1 ϕ2
= hϕ1 , ϕ2 i. (3.159)
This allows us to build a Lagrangian that has the SU(2) transformation as
an internal variational symmetry:
L = − g αβ hϕ,α , ϕ,β i + m2 (ϕ, ϕ) , gαβ = g αβ = diag(−1, 1, 1, 1). (3.160)

This Lagrangian describes a pair of charged KG fields (or a quartet of real

KG fields) with mass m. To see this, we write
1
ϕ
ϕ= , (3.161)
ϕ2
and then
L = − g αβ (ϕ1∗ 1 2∗ 2 2 1∗ 1 2∗ 2

,α ϕ,β + ϕ,α ϕ,β ) + m (ϕ ϕ + ϕ ϕ ) . (3.162)
Representing the components of ϕ as ϕa , a = 1, 2, we have the Euler-
Lagrange equations
Ea = ( − m2 )ϕa = 0, (3.163)
which are equivalent to
E = ( − m2 )ϕ = 0, (3.164)
in our matrix notation. Of course, the complex (or Hermitian) conjugates of
these equations are also field equations.
Just as before, we can use Noether’s theorem to find the current that is
conserved by virtue of the SU(2) symmetry.
Problem:
15. Show that for the symmetry δϕ = iτ ϕ the associated conserved current
is given by
j α = ig αβ (ϕ†,β τ ϕ − ϕ† τ ϕ,β ). (3.165)

Using (3.165) we can build three independent conserved currents correspond-

ing to the three independent symmetry transformations. To see them explic-
itly, one can pick τ to be each of the Pauli matrices. The 3 conserved charges
associated with the SU(2) symmetry are often called “isospin”, stemming
from their use in nuclear and particle physics.
3.19. A GENERAL VERSION OF NOETHER’S THEOREM 75

3.19 A general version of Noether’s theorem

Let me briefly indicate, without proof, a more general version of Noether’s
theorem. (This is sometimes called “Noether’s first theorem”.) Given all of
our examples, this theorem should not be very hard to understand.
Consider a field theory described by a set of functions ϕa , a = 1, 2, . . . , m,
and a Lagrangian, viewed as a function on the jet space J k :

L = L(x, ϕa , ∂ϕa , . . . , ∂ k ϕa ). (3.166)

The Euler-Lagrange equations arise via the identity

δL = Ea δϕa + Dα η α (δϕ), (3.167)

where η α (δϕ) is a linear differential operator on δϕa constructed from the

fields ϕa and their derivatives via the usual integration by parts procedure.9
Suppose that there is an infinitesimal transformation,

δϕb = F b (x, ϕa , ∂ϕa , . . . , ∂ l ϕa ) (3.168)

that is a divergence symmetry:

δL = Dα W α , (3.169)

for some W α locally constructed from x, ϕa , ϕa,α , etc. Then the following is
a conserved current:
j α = η α (F ) − W α . (3.170)
Noether’s theorem, as it is conventionally stated – more or less as above,
shows that symmetries of the Lagrangian beget conservation laws. But the
scope of this theorem is actually significantly larger. It is possible to prove
a converse to the result shown above, to the effect that to each conservation
law for a system of Euler-Lagrange equations there is a corresponding sym-
metry of the Lagrangian. Indeed, Noether’s theorem establishes a one-to-one
correspondence between conservation laws and symmetries of the Lagrangian
for a wide class of field theories (including the KG field and its variants that
have been discussed up until now). There is even more than this! But it is
time to move on. . . .
9
There is an ambiguity in the definition of η α here which we shall ignore for now to keep
things simple. We will confront it when we study conservation laws in electromagnetism.
76 CHAPTER 3. SYMMETRIES AND CONSERVATION LAWS

3.20 “Trivial” conservation laws

For any field theory there are two ways to construct conserved currents that
are in some sense “trivial”. The first is to suppose that we have a conserved
current that happens to vanish when the field equations hold. For example,
in the KG theory we could use

j α = ( − m2 )∂ α ϕ. (3.171)

It is easy to check that this current is conserved. It is even easier to check

that this current is completely uninteresting since it vanishes for any solution
of the field equations. The triviality of such conservation laws also can be
seen by constructing the conserved charge in a region by integrating j 0 over
a volume. Of course, when you try to substitute a solution of the equations
of motion into j 0 so as to perform the integral you get zero. Thus you end
up with the trivial statement that zero is conserved.
The second kind of “trivial” conservation law arises as follows. Suppose
we create an antisymmetric, (20 ) tensor field locally from the fields and their
derivatives:
S αβ = −S βα . (3.172)
For example, in KG theory we could use

S αβ = k α ϕ, β − k β ϕ, α , (3.173)

where k α = k α (x) is any given vector field on spacetime. Now make a current
via
j α = Dβ S αβ . (3.174)
It is easy to check that such currents are always conserved, irrespective of
field equations, because the order of differentiation is immaterial:

Dα j α = Dα Dβ S αβ = Dβ Dα S αβ = −Dβ Dα S βα = −Dα j α =⇒ Dα j α = 0.
(3.175)
These sorts of conservation laws are “trivial” because they do not really
reflect properties of the field equations but rather simple derivative identities
analogous to the fact that the divergence of the curl is zero, or that the curl
of the gradient is zero. Indeed, the current above is divergence free for any
function ϕ(x), whether or not it satisfies any field equations.
3.20. “TRIVIAL” CONSERVATION LAWS 77

It is also possible to understand this second kind of triviality from the

point of view of the conserved charge
Z
QV = dn x j 0 . (3.176)
V

I am setting the dimension of spacetime to be n. Here, in order to perform

the integral, it is understood that the current has been evaluated on some
field configuration, i.e., ϕ = ϕ(x). Keep this in mind as we proceed. For a
trivial conservation law arising as the divergence of an antisymmetric tensor
we can integrate by parts, i.e., use the divergence theorem, to express QV as
an area integral over the boundary B of V :
Z Z
n 0β
QV = d x ∂β S = dn−1 A ni S 0i . (3.177)
V B

Here dn−1 A is the area element of the boundary, n is the covariant unit
normal to the boundary, and i = 1, 2, . . . , n. From the continuity equation,
the time rate of change of QV arises from the flux through B:
Z
d
QV = − dn−1 A ni j i . (3.178)
dt B

But because this continuity equation is an identity (rather than holding by

virtue of field equations) this relationship is tautological. To see this, we
write:
Z Z Z
n−1 i n−1 i0 ij d d
dn−1 A ni S 0i = QV ,

− d A ni j = − d A ni S ,0 +S ,j =
B B dt B dt
(3.179)
where I used (1) the divergence theorem, and (2) a straightforward applica-
tion of Stokes theorem in conjunction with the fact that ∂B = ∂∂V = ∅ to
get Z
dn−1 A ni S ij ,j = 0. (3.180)
B

Thus the conservation law is really just saying that dQ dt

V
= dQ
dt
V
.
Another way to view this kind of trivial conservation law is to note that,
from (3.177), the conserved charge is really just a function of the values of
the field on the boundary of the region V and has nothing to do with the
state of the field in the interior of V . Again, the charge is conserved whether
or not any field equations are satisfied.
78 CHAPTER 3. SYMMETRIES AND CONSERVATION LAWS

Problem:

16. Let S be a two dimensional surface in Euclidean space with unit normal
~n and boundary curve C with tangent d~l. Show that
Z Z
1
2 ij
d S ni S ,j = V~ · d~l, (3.181)
S 2 C

where
1
V i = ijk Sjk . (3.182)
2

We have seen there are two kinds of conservation laws that are in some
sense trivial. We can combine these two kinds of triviality. So, for example,
the current
j α = Dβ (k [β (x)Dα] ϕ) + (ϕ − m2 ϕ)Dα ϕ (3.183)
is trivial.
We can summarize our discussion with a formal definition. We say that
a conservation law j α is trivial if there exists a skew-symmetric tensor field
S αβ – locally constructed from the fields and their derivatives – such that

j α = Dβ S αβ , modulo the field equations. (3.184)

Given a conservation law j α (trivial or non-trivial) we see that we have

the possibility to redefine it by adding a trivial conservation law. Thus given
one conservation law there are infinitely many others “trivially” related to it.
In particular, this means that, without some other criteria to choose among
these conservation laws, there is no unique notion of “charge density” ρ = j 0
since one can change the form of this quantity quite a bit by adding in a trivial
conservation law. Furthermore, without some specific boundary conditions,
there is no unique choice of the total charge contained in a region. Usually,
in a physical application of these ideas, there are additional criteria and
specific boundary conditions that largely – if not completely – determine the
choice of charge density and charge in a region. (If not, then the physicist
should find these criteria!) Mathematically speaking, the optimal way to
view conservation laws is really in terms of equivalence classes, with two
conservation laws being equivalent if they differ by a trivial conservation
law.
3.21. CONSERVATION LAWS IN TERMS OF DIFFERENTIAL FORMS79

3.21 Conservation laws in terms of differen-

tial forms
Let me briefly describe a particularly nice way to think about conservation
laws, which is in terms of differential forms. If you don’t know about dif-
ferential forms, you could give this section a miss. But consider spending a
little time learning about differential forms when you have a chance.
For simplicity, I will stick to four spacetime dimensions. On our four
dimensional spacetime, the vector field j α can be converted to a 1-form ω =
ωα dxα using the metric:
ωα = gαβ j β . (3.185)
This 1-form can be converted to a 3-form,
1
∗ω = (∗ω)αβγ dxα ∧ dxβ ∧ dxγ , (3.186)
3!
using the Hodge dual
(∗ω)αβγ = g µδ αβγδ ωµ . (3.187)
If the vector field on spacetime j is divergence free modulo the field equations,
this is equivalent to ∗ω being closed modulo the field equations:

d(∗ω) = 0, modulo the field equations. (3.188)

Keep in mind that ∗ω is really a 3-form locally constructed from the field
and its derivatives, that is, it is a 3-form-valued function on the jet space
for the theory. The exterior derivative in (3.188) is a total derivative. As
you know, an exact 3-form is of the form dβ for some 2-form β. If there is a
2-form β locally constructed from the fields such that

∗ ω = dβ modulo the field equations, (3.189)

then clearly ∗ω is closed modulo the field equations. This is just the differ-
ential form version of a trivial conservation law. Indeed, the anti-symmetric
tensor field that is the “potential” for the conserved current is given by
1
S αβ = αβγδ βγδ . (3.190)
2
Let me mention and dispose of a common point of confusion concerning
trivial conservation laws. This point of confusion is why I felt compelled to
80 CHAPTER 3. SYMMETRIES AND CONSERVATION LAWS

occasionally stick in the phrase “locally constructed from the field” in the
discussion above. For simplicity, I will use the flat metric and Cartesian
coordinates on the spacetime manifold M = R4 in what follows. To expose
the potential point of confusion, let me remind you of the following standard
result from multivariable calculus. Let V α be a vector field on Minkowski
space, expressed in the usual inertial Cartesian coordinates. V α is not to be
viewed as locally constructed from the field, except in the trivial sense that it
does not depend upon the fields at all, only the spacetime point, V α = V α (x).
If V α is divergence free,
∂α V α = 0, (3.191)
then there exists an antisymmetric tensor field S αβ such that

V α = ∂β S αβ . (3.192)

This is just the dual statement to the well-known fact that all closed 3-forms
(indeed, all closed forms of degree higher than 0) on R4 are exact i.e., the De
Rham cohomology of R4 is trivial. This result might tempt you to conclude
that all conservation laws are trivial! Unlike the case in real life, you should
not give in to temptation here. There are two reasons. First, a conservation
law should not be viewed as just a single divergence-free vector field on the
spacetime manifold M . A conservation law is a formula which assigns a
divergence-free vector field to each solution of the field equations. Each field
configuration will, in principle, define a different conserved current. Second,
as we have been saying, each of the (infinite number of) divergence-free vector
fields is to be locally constructed from the fields, i.e., are functions on jet space
(rather than just x space). Put differently, the conserved current at a point
x is required to depend upon the values of the fields and their derivatives at
the point x. The correct notion of triviality is that a conserved current j α is
trivial if for each field configuration it is (modulo the field equations) always a
divergence of a skew tensor field S αβ that is itself locally constructed from the
fields. If we take a conservation law and evaluate it on a particular solution
to the field equations, then we end up with a divergence-free vector field on
M (or a closed 3-form on M , if you prefer). If M = R4 we can certainly write
this vector field as the divergence of an antisymmetric tensor on M (or as the
exterior derivative of a 2-form on M ). But the point is that for non-trivial
conservation laws there is no way to construct all the antisymmetric tensors
(2-forms) for all possible field configurations using a local formula in terms of
the fields and their derivatives. So, while conservation laws are in many ways
3.22. PROBLEMS 81

like de Rham cohomology (closed modulo exact forms on M ), they actually

represent a rather different kind of cohomology.10 . Sometimes this kind of
cohomology is called “local cohomology” or “Euler-Lagrange cohomology”.
Finally, as I have mentioned without explanation here and there via foot-
notes, there is some ambiguity in the definition of the vector field which
appears in the divergence term in the variation of the Lagrangian density
(see e.g., V in (3.68)). In light of our definition of trivial conservation laws,
I think you can easily see what that ambiguity is. Namely, the variational
identity only determines this vector field up to addition of the divergence of
a skew tensor (locally constructed from the fields and field variations). This
ambiguity affects the Noether theorem formula (3.170) for the conserved
current as follows. Using two different choices for the vector field in the di-
vergence term in (3.167) one finds that the two resulting conservation laws,
each constructed via (3.170), differ by a trivial conservation law.

Problem:
17. Consider two possible choices, η1 and η2 , for η defined in (3.167). They
are related by
η2α = η1α + Dβ S αβ , (3.193)
for some skew tensor S locally constructed from the fields and field
variations. Suppose there is a divergence symmetry of the Lagrangian.
Show that the conserved currents j1 and j2 , constructed via (3.170)
using η1 and η2 , respectively, differ by a trivial conservation law.

3.22 PROBLEMS

1. Derive (3.10) from the continuity equation.

2. What becomes of conservation of energy for the KG field when an

external source is present, as in (2.77)? How do you physically interpret
this state of affairs?

3. Verify that the currents (3.23), (3.24) are conserved. (If you like, you
can just fix a value for i, say, i = 1 and check that j1α is conserved.)
10
One says the conservation laws are “horizontally” closed forms on the jet space.
82 CHAPTER 3. SYMMETRIES AND CONSERVATION LAWS

4. Show that the six currents (3.36) are conserved. (Hint: Don’t panic!
This is actually the easiest one to prove so far, since you can use

g βγ Dγ Tαβ = ϕ,α ( − m2 )ϕ,

which we have already established.)

5. Consider the real scalar field with the double well self-interaction poten-
tial (2.81). Show that ϕ̂ = −ϕ is a symmetry. Consider the 3 constant
solutions to the field equations (which you found in the problem just
after (2.81)) and check that this symmetry maps these solutions to
solutions.

6. Let ϕ̂ = F (ϕ) be a variational symmetry of a Lagrangian. Show that it

maps solutions of the Euler-Lagrange equations to new solutions, that
is, if ϕ is a solution, so is ϕ̂. If you like, you can restrict your attentions
to Lagrangians L(x, ϕ, ∂ϕ).

7. Show that the scaling transformation (3.50) is neither a variational

symmetry nor a divergence symmetry for the KG Lagrangian.

8. Verify that the Noether currents associated with a spacetime transla-

tion do yield the energy-momentum tensor.

9. Derive (3.110).

10. Consider the KG theory with m = 0. Show that the transformation

ϕλ = ϕ + λ

is a variational symmetry. Use Noether’s theorem to find the conserved

current and conserved charge.

11. Show that the Lagrangian (3.124) is the sum of the Lagrangians for
two (real-valued) KG fields ϕ1 and ϕ2 with m1 = m2 and with the
identification
1
ϕ = √ (ϕ1 + iϕ2 ).
2
3.22. PROBLEMS 83

13. Verify directly from (3.136) that

Dα j α = 0,
when the field equations for ϕ and ϕ∗ are satisfied.

14. The double well potential can be generalized to the U (1)-symmetric

15. Show that the SU(2) symmetry δϕ = iτ ϕ of (3.160) has an associated

conserved current given by
j α = ig αβ (ϕ†,β τ ϕ − ϕ† τ ϕ,β ).

16. Let S be a two dimensional surface in Euclidean space with unit normal
~n and boundary curve C with tangent d~l. Show that
Z Z
2 ij
d S ni S ,j = V~ · d~l, (3.194)
S C
where
1
V i = ijk Sjk . (3.195)
2

17. Consider two possible choices, η1 and η2 , for η defined in (3.167). They
are related by
η2α = η1α + Dβ S αβ , (3.196)
for some skew tensor S locally constructed from the fields and field
variations. Suppose there is a divergence symmetry of the Lagrangian.
Show that the conserved currents j1 and j2 , constructed via (3.170)
using η1 and η2 , respectively, differ by a trivial conservation law.
84 CHAPTER 3. SYMMETRIES AND CONSERVATION LAWS
Chapter 4

The Hamiltonian formulation

The Hamiltonian formulation of dynamics is in many ways the most ele-

gant, powerful, and geometric approach. For example, the relation between
symmetries and conservation laws becomes an identity in the Hamiltonian
formalism. Another important motivation comes from quantum theory: one
uses elements of the Hamiltonian formalism to construct quantum theories
that have the original dynamical system as a classical limit. Such features
provide ample motivation for a brief foray into the application of Hamilto-
nian techniques in field theory. To begin, it is worth reminding you how we
do classical mechanics from the Hamiltonian point of view.

4.1 Review of the Hamiltonian formulation

of mechanics
Probably you are familiar with the Hamiltonian formulation of mechanics in
the following form. There is a set of “canonical coordinates and momenta”,
(q i , pi ), i = 1, 2, . . . , n, where n is the number of “degrees of freedom” of the
mechanical system. The space parametrized by (q, p) is the “phase space”
of the system. There is a function on phase space (and maybe time), H =
H(q, p, t), known as the “Hamiltonian”, which defines the equations of motion
via curves in the phase space according to a set of 2n first-order ODEs:

i ∂H ∂H
q̇ (t) = , ṗi (t) = − . (4.1)
∂pi q=q(t) ∂q i q=q(t)
p=p(t) p=p(t)

These are “Hamilton’s equations of motion”.

85
86 CHAPTER 4. THE HAMILTONIAN FORMULATION

The Hamiltonian form of the equations of motion is related to the La-

grangian version as follows. Given a Lagrangian L = L(q, q̇, t), the canonical
momenta are defined by
∂L
pi = i . (4.2)
∂ q̇
One solves this equation for the velocities,

q̇ i = q̇ i (q, p, t), (4.3)

and defines the Hamiltonian via

H(q, p, t) = pi q̇ i (q, p, t) − L(q, q̇(q, p, t), t). (4.4)

The Hamilton equations can then be shown to be equivalent to the EL equa-

tions when the relation (4.2) between the momenta and velocities are taken
into account. Notice that the EL equations will be a system of n second-
order ODEs, while the Hamilton equations will be a system of 2n first-order
ODEs.
I would now like to introduce you to an elegant, more geometric version
of all this. This fancier presentation has a number of advantages when it
is applied to relativistic field theory, e.g. it allows for “manifest spacetime
covariance”.
In classical mechanics a Hamiltonian system has 2 ingredients: (1) a phase
space, and (2) a function on phase space – the Hamiltonian – defining time
evolution. The phase space is an even-dimensional manifold Γ equipped with
a non-degenerate, closed 2-form Ω. In coordinates z i on Γ, the 2-form can be
viewed as a tensor field defined by components Ωij = Ωij (q, p) = −Ωji which
form an antisymmetric invertible matrix:

Ω = 2Ωij dz i ⊗ dz j = Ωij (dz i ⊗ dz j − dz j ⊗ dz i ) ≡ Ωij dz i ∧ dz j . (4.5)

The invertibility of the array Ωij is equivalent to the non-degeneracy require-

ment. To say that Ω is “closed” means that its exterior derivative vanishes,
dΩ = 0. In terms of its components this means

∂i Ωjk + ∂j Ωki + ∂k Ωij = 0. (4.6)

The 2-form is called the symplectic form, and the pair (Γ, Ω) is known mathe-
matically as a symplectic manifold. Physicists call it the phase space or state
4.1. REVIEW OF THE HAMILTONIAN FORMULATION OF MECHANICS87

space for the mechanical system. The dimensionality of Γ is 2n, where the
integer n is the number of degrees of freedom of the system.
The second ingredient in a Hamiltonian system is a function, H : Γ → R,
called the Hamiltonian. The Hamiltonian defines how states of the system
evolve in time. Given coordinates z i for Γ, the dynamical evolution of the
mechanical system is a curve z i = z i (t) in the phase space defined by the
ordinary differential equations:

∂H
ż i (t) = Ωij , (4.7)
∂z j
z=z(t)

where Ωij are the components of the inverse symplectic form. The ODEs
(4.7) are known as Hamilton’s equations of motion. More generally, given
any function G : Γ → R, there is an associated foliation of Γ by curves
z i = z i (s) defined by
∂G
ż i (s) = Ωij j . (4.8)
∂z
z=z(s)

The probably more familiar textbook presentation of the Hamiltonian

formalism emerges via the Darboux theorem. This theorem asserts that,
given a closed, non-degenerate 2-form such as Ω, there exist coordinates,
z i = (q a , pb ), a, b = 1, 2, . . . , n, such that Ω can be put into a standard form:

Ω = dpa ⊗ dq a − dq a ⊗ dpa ≡ dpa ∧ dq a . (4.9)

You can check that this form of Ω is closed and non-degenerate. The co-
ordinates (q, p) are called canonical coordinates (or “canonical coordinates
and momenta”). The existence of such coordinates is a direct consequence of
the non-degeneracy of Ω and the condition (4.6). There are infinitely many
canonical coordinate systems.1 As a nice exercise, you should determine the
components Ωij of the symplectic form in canonical coordinates. In canonical
coordinates the Hamilton equations take the familiar textbook form
∂H ∂H
q̇ a = , ṗa = − . (4.10)
∂pa ∂q a

Problem:
1
They are all related by “canonical transformations”.
88 CHAPTER 4. THE HAMILTONIAN FORMULATION

1. Derive (4.10) from (4.7) and (4.9).

Given an action principle and/or a non-degenerate Lagrangian (see below

for the meaning of “non-degenerate”), there is a canonical construction of an
associated Hamiltonian system. It is worth seeing how this goes.
Let the configuration variables and velocities be denoted by (q a , q̇ a ). For
simplicity, we only exhibit this construction in the familiar situation where
L = L(q, q̇). Recall the variational identity:
d
δL = Ea δq a + θ, (4.11)
dt
where Ea is the Euler-Lagrange formula,
∂L d ∂L
Ea = − , (4.12)
∂q a dt ∂ q̇ a
and
∂L a
θ= δq . (4.13)
∂ q̇ a
represents the “boundary term” which arises via integration by parts in the
associated variational principle. The quantity θ is a function of q a , q̇ a and
depends linearly upon δq a :
We will construct the Hamiltonian formalism from these ingredients. This
means we will use these data to define the phase space (Γ, Ω) and a Hamil-
tonian H : Γ → R. We do this as follows.
Define Γ to be the set of solutions to the EL equations Ea = 0. In our
setting, assuming L = L(q, q̇) and making the non-degeneracy assumption
2
∂ L
det 6= 0, (4.14)
∂ q̇ a ∂ q̇ b
the EL equations are a quasi-linear system of second order ODEs and we can
expect that the solution space is a 2n-dimensional manifold parametrized by,
e.g., the initial conditions. The points of Γ are thus solutions q a (t). Tangent
vectors to Γ at a point defined by the solution q a (t) are then variations δq a (t),
which are solutions to the linear system of equations obtained by linearizing
the EL equations about the solution q a (t).

Problem:
4.1. REVIEW OF THE HAMILTONIAN FORMULATION OF MECHANICS89

2. Given L = L(q, q̇), give a formula for the EL equations linearized about
a solution q a = q a (t).

The symplectic form on Γ is constructed from θ using the following ge-

ometric point of view. Think of θ as the value of a 1-form Θ on a tangent
vector δq a (t) to Γ,
θ = Θ(δq). (4.15)
The symplectic 2-form is then constructed via

Ω = dΘ. (4.16)

In formulas, this prescription says that the value of the symplectic 2-form on
a pair of tangent vectors δ1 q a (t) and δ2 q a (t) at the point of Γ specified by
q a (t) is given by

∂ 2L ∂ 2L
Ω(δ1 q, δ2 q) = a b (δ1 q δ2 q − δ1 q δ2 q ) + a b (δ1 q̇ b δ2 q a − δ1 q a δ2 q̇ b ),
b a a b
∂ q̇ ∂q ∂ q̇ ∂ q̇
(4.17)
where it is understood that all expressions involving q(t) and q̇(t) are eval-
uated at the given solution, and that all instances of δq(t) are evaluated at
solutions of the equations of motion linearized about q(t).
The formula (4.17) appears to depend upon the time at which the solu-
tions and the linearized solutions are evaluated. But it can be shown that Ω
does not depend upon t by virtue of the EL equations satisfied by q a (t) and
their linearization satisfied by δq a (t).

Problem:

3. Show that the symplectic form given in (4.17) does not depend upon
time. (Hint: Differentiate (4.17) with respect to time and then use the
EL equations and their linearization.)

Finally, we need to define a Hamiltonian and recover Hamilton’s equa-

tions. Given a function H on Γ, Hamilton’s equations specify a family of
curves on Γ by specifying their tangent vector field X using the relation
(4.7), which can be written as:

X = Ω−1 (·, dH). (4.18)

90 CHAPTER 4. THE HAMILTONIAN FORMULATION

Since we are currently viewing the phase space as the set of solutions to
the EL equations, the Hamiltonian should define one parameter families of
solutions corresponding to translations in time of those solutions. For sim-
plicity in what follows, I will suppose we are working with a system whose
Lagrangian is not explicitly dependent upon time, L = L(q, q̇).2 I will now
show how the canonical energy function,
∂L
E = q̇ i − L, (4.19)
∂ q̇ i
will implement (4.18) for time translations.
To begin, it is a little easier to rewrite (4.18) as

Ω(δq, X) = dE(δq), (4.20)

where δq is any solution to the linearized equations and X defines the lin-
earized solution defined by infinitesimal time translations3 :

X = q̇ a . (4.21)

Using (4.17), we compute:

∂ 2L ∂ 2L
Ω(δq, X) = a b (q̇ δq − q̇ δq ) + a b (δ q̇ b q̇ a − δq a q̈ b ).
a b b a
(4.22)
∂ q̇ ∂q ∂ q̇ ∂ q̇
Let us compare this with (after some simplifications!)
∂ 2L a b ∂ 2 L a b ∂L a
dE(δq) = q̇ δq + q̇ δ q̇ − a δq . (4.23)
∂ q̇ a ∂q b ∂ q̇ a ∂ q̇ b ∂q
The first two terms appear as the first and third terms in (4.22). Since the
EL equations are satisfied by q a (t), the third term is
∂L a d ∂L
− a
δq = −δq a
∂q dt ∂ q̇ a
2
∂ 2L b

a ∂ L b
= −δq q̇ + a b q̈ (4.24)
∂ q̇ a ∂q b ∂ q̇ ∂ q̇
2
Otherwise, time translations will not map solutions to solutions, and a more sophisti-
cated approach must be used.
3
Notice that if the Lagrangian depends explicitly upon time, L = L(q, q̇, t) then q̇ a
will not in general define a solution to the linearized equations and thus does not define a
tangent vector to phase space. To deal with this this case takes us too far afield.
4.1. REVIEW OF THE HAMILTONIAN FORMULATION OF MECHANICS91

This matches the second and fourth terms in (4.22), and (4.20) is verified.
An equivalent way to think about time evolution makes use of an iden-
tification of the phase space with the space of initial data for the solutions.
This is the most common approach found in textbooks. Since we have chosen
L = L(q, q̇) the EL equations are second-order ODEs whose solution space is
parametrized by initial data at some time t0 which we denote (q a (t0 ), q̇ a (t0 )).
We assume that there is a bijective correspondence between solutions and
initial conditions; we can therefore identify Γ with the initial conditions. A
tangent vector at a point of Γ is then a pair (δq(t0 ), δ q̇(t0 )). It is then easy
to restrict Θ and Ω to t = t0 and interpret them as forms on the space of
initial data. Henceforth we drop the notation pertaining to t0 . If we define
∂L
pa = , (4.25)
∂ q̇ a
then
Θ = pa dq a , Ω = dpa ∧ dq a . (4.26)
So, if we make a change of variables (q, q̇) −→ (q, p) on Γ via (4.25) then
(q a , pa ) are canonical coordinates and momenta for Γ. The Hamiltonian
which generates time evolution is then given by the canonical energy for-
mula:4
H = pa q̇ a − L, (4.27)
where it is understood that H = H(q, p) – all velocities q̇ a being eliminated
in terms of momenta by the inverse formula q̇ a = q̇ a (q, p) to (4.25).5 It is
easy to check that the relation (4.18) now yields (4.10), and with the choice
(4.27) the Hamilton equations are equivalent to the original EL equations.

Problem:
4. A harmonic oscillator is defined by the Lagrangian and EL equations
1 1
L = mq̇ − mω 2 q 2 , q̈(t) = −ω 2 q(t). (4.28)
2 2
What does the symplectic structure look like when the space of solu-
tions to the EL equations is parametrized according to the following
formulas?
4
We denote the Hamiltonian by H to distinguish it from E since the former is a
function of (q, p) while the latter is a different function of (q, q̇).
5
The non-degeneracy condition (4.14) ensures the local existence of an inverse.
92 CHAPTER 4. THE HAMILTONIAN FORMULATION

(a) q(t) = A cos(ωt) + B sin(ωt)

(b) q(t) = ae−iωt + a∗ eiωt

4.2 Hamiltonian formulation of the scalar field

As we saw earlier, the Lagrangian density for the scalar field with self-
interaction potential V is given by
1 2
ϕ,t − (∇ϕ)2 − m2 ϕ2 − V (ϕ),

L= (4.29)
2
and the Lagrangian is Z
L= d3 x L. (4.30)
R3
As shown earlier, the variational identity is

δL = −ϕ,tt + ∇2 ϕ − m2 ϕ − V 0 (ϕ) δϕ + Dα W α ,

(4.31)

where we can choose

W 0 = ϕ,t δϕ, W i = −ϕ,i δϕ. (4.32)

We can now mimic the construction of the Hamiltonian formalism from the
previous section.
We let the phase space Γ consist of a suitable6 function space of solutions
ϕ to the EL equations,

ϕ − m2 ϕ − V 0 (ϕ) = 0. (4.33)

Assuming the solutions vanish at spatial infinity, the variational identity for
the Lagrangian evaluated on Γ reads:
Z
d
δL = d3 x W 0 . (4.34)
dt R3
We define a 1-form Θ by its linear action on δϕ, a tangent vector to Γ:
Z Z
3 0
Θ(δϕ) = d xW = d3 x ϕ,t δϕ. (4.35)
R3 R3
6
“Suitable” could be, for example, smooth solutions to the field equations with com-
pactly supported initial data.
4.2. HAMILTONIAN FORMULATION OF THE SCALAR FIELD 93

From (4.16), the symplectic 2-form evaluated on a pair of tangent vectors

δ1 ϕ, δ2 ϕ at a solution ϕ to (4.33) is given by
Z
Ω(δ1 ϕ, δ2 ϕ) = d3 x (δ2 ϕ,t δ1 ϕ − δ1 ϕ,t δ2 ϕ) . (4.36)
R3

To calculate the symplectic structure using (4.36) we have to pick a time

t at which to perform the volume integral. Indeed, it would appear that we
actually have defined many 2-forms – one for each value of t. Let us show
that the symplectic form given in (4.36) does not in fact depend upon which
time one chooses to perform the volume integral. Take a time derivative to
get
Z
d
Ω(δ1 ϕ, δ2 ϕ) = d3 x (δ2 ϕ,tt δ1 ϕ − δ1 ϕ,tt δ2 ϕ + δ2 ϕ,t δ1 ϕ,t − δ1 ϕ,t δ2 ϕ,t )
dt R3
Z
= d3 x (δ2 ϕ,tt δ1 ϕ − δ1 ϕ,tt δ2 ϕ) .
R3
(4.37)

The tangent vectors to Γ each satisfy the linearization of equation (4.33)

about a solution ϕ:

δϕ − m2 δϕ − V 00 (ϕ)δϕ = 0. (4.38)

This means, in particular, that

δϕ,tt = ∇2 δϕ − (m2 + V 00 (ϕ))δϕ (4.39)

Using this relation for each linearized solution in (4.37) we get

Z
d
d3 x (∇2 δ2 ϕ)δ1 ϕ − (∇2 δ1 ϕ)δ2 ϕ

Ω(δ1 ϕ, δ2 ϕ) =
dt 3
ZR
= d3 x (∇δ1 ϕ · ∇δ2 ϕ − ∇δ2 ϕ · ∇δ1 ϕ)
R3
= 0. (4.40)

To get this result I used the divergence theorem and the requirement that
the solutions and their linearization vanish asymptotically.
Up to this point the phase space has been defined implicitly inasmuch as
we have not given an explicit parametrization of the set of solutions to the
94 CHAPTER 4. THE HAMILTONIAN FORMULATION

field equations. There are various ways to parametrize the space of solutions
depending upon how much analytic control you have over this space. Let
us use the most traditional parametrization, which relies upon the existence
of a well-posed initial value problem: for every pair of functions on R3 ,
(φ(~x), π(~x)) there is uniquely determined a solution ϕ(x) of the field equations
such that at a given initial time t = t0 :
ϕ(t0 , ~x) = φ(~x), ∂t ϕ(t0 , ~x) = π(~x). (4.41)
We can thus view a point in Γ as just a pair of functions (φ, π) on R3 . Using
this parametrization of solutions, from (4.35) and (4.36):
Z
Θ(δφ, δπ) = d3 x π δφ. (4.42)
R3
Z
Ω({δφ1 , δπ1 }, {δφ2 , δπ2 }) = d3 x (δ2 π δ1 φ − δ1 π δ2 φ) . (4.43)
R3
Hopefully you recognize the pattern familiar from particle mechanics where
Θ = pi dq i and Ω = dpi ∧dq i . Indeed, if one views the field as just a mechanical
system with an infinite number of degrees of freedom labeled by the spatial
point ~x, then one can view the integrations over ~x as the generalizations of the
various sums over degrees of freedom which occur in particle mechanics. For
this reason people often call φ(~x) the “coordinate” and π(~x) the “momentum”
for the scalar field. Although we shall not worry too much about precisely
what function spaces φ and π live in, it is useful to note that both of these
functions must vanish at spatial infinity if the symplectic structure is to be
defined.
The formula (4.43) makes it easy to check that the symplectic form is
non-degenerate as it should be. Can you construct a proof?
Problems:
5. Show that the symplectic structure (4.43) is non-degenerate. (Hint:
A 2-form Ω is non-degenerate if and only if Ω(u, v) = 0, ∀v implies
u = 0.)

6. Express the symplectic structure on the space of solutions of the KG

equation, ( − m2 )ϕ = 0 in terms of the Fourier parametrization of
solutions which we found earlier:
3/2 Z
1
d3 k ak eik·r−iωk t + a∗k e−ik·r+iωk t .

ϕ(x) =
2π R3
4.2. HAMILTONIAN FORMULATION OF THE SCALAR FIELD 95

Hamilton’s equations should define curves in Γ, that is, one parameter

families (φ(~x, t), π(~x, t)), corresponding to the time evolution dictated by the
field equation (4.33). You can easily verify as an exercise that the following
equations are equivalent to the original field equation (4.33) once we make
the correspondence ϕ(x) = φ(~x, t).
∂φ(~x, t)
= π(~x, t) (4.44)
∂t
∂π(~x, t)
= ∇2 φ(~x, t) − m2 φ(~x, t) − V 0 (~x, t). (4.45)
∂t

I will now show that these equations can be put into Hamiltonian form,
∂φ δH
=
∂t δπ
∂π δH
=− , (4.46)
∂t δφ
where the Hamiltonian H generating time evolution along the foliation of
Minkowski spacetime by hypersurfaces t = const. is given by the energy as
defined in this inertial reference frame. In terms of our parametrization of Γ
using initial data:
Z
3 1 2 1 1 2 2
H[φ, π] = dx π + ∇φ · ∇φ + m φ + V (φ) , (4.47)
R3 2 2 2
where ∇ denotes the usual spatial gradient. In preparation for calculating
the Hamilton equations (4.46), let us consider the functional derivatives of
H. To do this we vary φ and π:
Z
d3 x π δπ + ∇φ · ∇δφ + m2 φ δφ + V 0 δφ .

δH = (4.48)
R3

Integrating by parts in the term with the gradients, and using the fact that
φ vanishes at infinity to eliminate the boundary terms, we get
Z
d3 x π δπ + −∇2 φ + m2 φ + V 0 δφ .

δH = (4.49)
R3

This means
δH
= π(~x), (4.50)
δπ(~x)
96 CHAPTER 4. THE HAMILTONIAN FORMULATION

and
δH
= −∇2 φ(~x) + m2 φ(~x) + V 0 (φ(~x)). (4.51)
δφ(~x)
You can now see that the Hamilton equations (4.46) are equivalent to the
field equations (4.33).
Finally, using standard techniques of classical mechanics it is possible to
deduce the Hamiltonian from the Lagrangian. Recall that the KG Lagrangian
is (in a particular inertial reference frame (t, ~x))
Z Z
3 3 1 2 2 2 2

L= d xL = dx ϕ − (∇ϕ) − m ϕ − V (ϕ) . (4.52)
R3 R3 2 ,t

At any fixed time t, view φ ≡ ϕ(t, ~x) and φ̇ ≡ ϕ,t (t, ~x) as independent fields
on R3 . Define the canonical momentum as the functional derivative of the
Lagrangian with respect to the velocity:
δL
π= = φ̇. (4.53)
δ φ̇
The Lagrangian now assumes the canonical form:
Z Z
3 3 1 2 1 1 2 2
L= d x π φ̇ − dx π + ∇φ · ∇φ + m φ + V (φ) , (4.54)
R3 R3 2 2 2
so that the Hamiltonian is given by the familiar Legendre transformation:
Z Z
3 3 1 2 1 1 2 2
H= d x π φ̇ − L = dx π + ∇φ · ∇φ + m φ + V (φ) .
R3 R3 2 2 2
(4.55)

4.3 Poisson brackets

Associated to any Hamiltonian system there is a fundamental algebraic struc-
ture known as the Poisson bracket. This bracket is a useful way to store
information about the Hamiltonian system, and its algebraic properties cor-
respond to those of the corresponding operator algebras in quantum theory.
Here I will introduce the Poisson bracket for the scalar field theory primarily
so we can use it in the next section to give an elegant formulation of the
connection between symmetries and conservation laws. Let me begin with a
brief summary of the Poisson bracket in mechanics.
4.3. POISSON BRACKETS 97

Recall that in classical mechanics the Poisson bracket associates to any 2

functions on phase space, say A and B, a third function, denoted [A, B] via:
[A, B] = Ωij ∂i A ∂j B. (4.56)
In terms of canonical coordinates and momenta (q a , pa ), in which
∂ ∂
Ω = dpa ∧ dq a , Ω−1 = ∧ , (4.57)
∂q a ∂pa
we have
∂A ∂B ∂A ∂B
[A, B] = − . (4.58)
∂q ∂pa ∂pa ∂q a
a

In particular, we have the canonical Poisson bracket relations,

[q a , pb ] = δba . (4.59)
Notice that the Poisson bracket is antisymmetric,
[A, B] = −[B, A]; (4.60)
it is bilinear,
[A, cB] = c[A, B] = [cA, B], c = const.; (4.61)
[A, B + C] = [A, B] + [A, C], [A + B, C] = [A, C] + [B, C] (4.62)
and satisfies the Jacobi identity,
[[A, B], C], +[[B, C], A] + [[C, A], B] = 0. (4.63)
This endows the vector space of functions on phase space with the structure
of a Lie algebra called the Poisson algebra of functions. The Poisson bracket
also defines a derivation on the algebra of functions:
[A, BC] = [A, B]C + [A, C]B, [AB, C] = [A, C]B + [B, C]A. (4.64)
This property, and the others listed above, are useful to keep in mind when
performing various calculations of Poisson brackets. Finally, in terms of
Poisson brackets, the Hamilton equations can be written7
q̇ a = [q a , H], ṗa = [pa , H]. (4.65)
7
In quantum theory, the canonical Poisson bracket relations correspond to the canon-
ical commutation relations, and the Hamilton equations correspond to the Heisenberg
equations.
98 CHAPTER 4. THE HAMILTONIAN FORMULATION

Returning to field theory, using the parametrization of the scalar field

phase space in terms of initial data (φ, π), it is not too hard to see how (4.58)
generalizes to functions on the scalar field phase space. Let A = A[φ, π] and
B = B[φ, π] be differentiable functionals on phase space. The Poisson bracket
is defined by
Z
3 δA δB δA δB
[A, B] = dx − . (4.66)
R3 δφ(~x) δπ(~x) δπ(~x) δφ(~x)

Two important applications of this formula are the canonical Poisson bracket
relations,
[φ(~x), π(~y )] = δ(~x − ~y ), (4.67)
and the Hamilton equations:

∂φ(~x, t) ∂π(~x, t)
= [φ(~x), H], = [π(~x), H]. (4.68)
∂t ∂t

Problem:

7. Verify (4.67) follows from (4.66) and that (4.68) agrees with (4.44),
(4.45).

4.4 Symmetries and conservation laws

One of the most beautiful aspects of the Hamiltonian formalism is the way it
implements the relation between symmetries and conservation laws. To see
how this works, let us start by reviewing the relevant results from classical
mechanics. Recall that the time evolution of any function C on phase space
is given in terms of the Poisson bracket by:

Ċ = [C, H]. (4.69)

More generally, given any function G on phase space there is defined a 1-

parameter family of canonical transformations whose infinitesimal form is
given by
δq a = [q a , G], δpa = [pa , G]. (4.70)
4.4. SYMMETRIES AND CONSERVATION LAWS 99

G is called the infinitesimal generator of the transformation. Given the trans-

formation generated by G, the infinitesimal change in any function C on
phase space is given by
δC = [C, G]. (4.71)
Given a Hamiltonian system with Hamiltonian function H, we say that G
defines a symmetry of the Hamiltonian system if the canonical transformation
it generates preserves H, that is,

δH = [H, G] = 0. (4.72)

Since [H, G] = −[G, H], we see that if G defines a symmetry, then

Ġ = [G, H] = −[H, G] = 0, (4.73)

so that G is also a conserved quantity. Thus we have the ultimate statement

of the connection between symmetries and conservation laws: a function on
phase space generates a symmetry if and only if it is conserved.
Let us see how some of these results look in the case of a scalar field by way
of a couple of examples. The idea will be to show that symmetry generators
G[φ, π] are also constants of the motion. The key will be to compute the
Poisson bracket [G, H] = 0. Of course, if we pick G = H then this condition
is satisfied just by the anti-symmetry of the Poisson bracket. Thus the total
energy is conserved by virtue of time translation symmetry. A less trivial
example is provided by the total momentum of the field. Recall that the
component of the field momentum along a fixed direction ~v (with constant
Cartesian components) is given by
Z
P (~v ) = d3 x ϕ,t ϕ,i v i , (4.74)
R3

The field momentum is a constant of the motion for solutions of the scalar
field equation (4.33). We showed this earlier in the special case of the Klein-
Gordon field (V (ϕ) = 0), but it is not hard to see that this result generalizes
to the case of equation (4.33), as it must because the Lagrangian density still
has the spatial translation symmetry.

Problem:
8. Show that P (~v ) is a constant of the motion for solutions of the scalar
field equation (4.33).
100 CHAPTER 4. THE HAMILTONIAN FORMULATION

In terms of our parametrization of the phase space using initial data, the
field momentum along ~v is given by
Z
P (~v ) = d3 x πφ,i v i , (4.75)
R3

Let us compute the Poisson bracket of the field momentum with the Hamilto-
nian. There are a few ways to organize this computation, but let us emphasize
the role of P as an infinitesimal generator of translations. Our strategy is
to use (4.48) in conjunction with the infinitesimal change in (φ, π) under the
canonical transformation generated by P (~v ). We have (try it!)

δφ(~x) ≡ [φ(~x), P (~v )] = v i φ,i ≡ ~v · ∇φ, (4.76)

and
δπ(~x) ≡ [π(~x), P (~v )] = v i π,i ≡ ~v · ∇π. (4.77)
This is indeed what one expects to be the change in a function under an
infinitesimal translation along ~v . The change in the KG Hamiltonian (4.47)
for any changes in the fields is given by (4.48). Using (4.76) and (4.77) we
get
Z
d3 x π ~v · ∇π + ∇φ · ∇(~v · ∇φ) + m2 φ ~v · ∇φ + V 0 ~v · ∇φ

δH =
3
ZR
3 1 2 1 1 2 2
= d x ∇ · ~v π + ∇φ · ∇φ + m φ + V (φ) . (4.78)
R3 2 2 2
Next, we use the divergence theorem to convert this integral to a surface
integral “at infinity”. The asymptotic decay of the fields needed to make the
energy finite in the first place then implies that the fields decay fast enough
such that this integral vanishes. Thus the Hamiltonian is invariant under
spatial translations. As already noted, this is equivalent to the fact that the
momentum P (~v ) is conserved. Of course, we already knew this from our
discussion of Noether’s theorem.
Here is an example for you to try.

Problem:
9. Consider the massless, free scalar field. Show that
Z
Q= d3 x π(~x) (4.79)
R3
4.5. PROBLEMS 101

has vanishing Poisson bracket with the Hamiltonian so that Q is con-

served. What is the symmetry generated by Q? (For simplicity, you
can assume that the fields (φ, π) vanish outside of a compact set.)

4.5 PROBLEMS

1. Derive (4.10) from (4.7) and (4.9).

2. Given L = L(q, q̇), give a formula for the EL equations linearized about
a solution q a = q a (t).

3. Show that the symplectic form given in (4.17) does not depend upon
time. (Hint: Differentiate (4.17) with respect to time and then use the
EL equations and their linearization.)

4. A harmonic oscillator is defined by the Lagrangian and EL equations

1 1
L = mq̇ 2 − mω 2 q 2 , q̈(t) = −ω 2 q(t). (4.80)
2 2
What does the symplectic structure look like when the space of solu-
tions to the EL equations is parametrized according to the following
formulas?

• q(t) = A cos(ωt) + B sin(ωt)

• q(t) = ae−iωt + a∗ eiωt ?

5. Show that the symplectic structure (4.43) is non-degenerate. (Hint:

A 2-form Ω is non-degenerate if and only if Ω(u, v) = 0, ∀v implies
u = 0.)
102 CHAPTER 4. THE HAMILTONIAN FORMULATION

6. Express the symplectic structure on the space of solutions of the KG

equation, ( − m2 )ϕ = 0 in terms of the Fourier parametrization of
solutions which we found earlier:
3/2 Z
1
d3 k ak eik·r−iωk t + a∗k e−ik·r+iωk t .

ϕ(x) =
2π R3

7. Verify (4.67) follows from (4.66) and that (4.68) agrees with (4.44),
(4.45).

8. Show that P (~v ) in (4.74) is a constant of the motion for solutions of

the scalar field equation (4.33).

9. Consider the massless, free scalar field. Show that

Z
Q= d3 x π(~x) (4.81)
R3

has vanishing Poisson bracket with the Hamiltonian so that Q is con-

served. What is the symmetry generated by Q? (For simplicity, you
can assume that the fields (φ, π) vanish outside of a compact set.)
Chapter 5

Electromagnetic field theory

Presumably you’ve had a course or two treating electrodynamics. Our focus

here will be on the salient features of electrodynamics from a field-theoretic
perspective. In particular, we will be exploring what it means for electrody-
namics to be a “gauge theory”. We shall see that certain structural features
familiar from KG theory appear also for electromagnetic theory and that new
structural features appear as well.

5.1 Review of Maxwell’s equations

We begin with a quick review of Maxwell’s equations. Hopefully you’ve seen
some of this before.

Problems:

1. In units where c = 1, Maxwell’s equations for the electric and magnetic

~ B)
field (E, ~ associated to charge density and current density (ρ, ~j) are
given by
∇·E ~ = 4πρ, (5.1)
~ = 0,
∇·B (5.2)
~
~ − ∂ E = 4πσ,
∇×B (5.3)
∂t
~
∂B
~+
∇×E = 0. (5.4)
∂t

103
104 CHAPTER 5. ELECTROMAGNETIC FIELD THEORY

~ x, t) the electric
Show that for any function φ(~x, t) and vector field A(~
and magnetic fields defined by

~
~ = −∇φ − ∂ A ,
E ~ =∇×A
B ~ (5.5)
∂t
satisfy (5.2) and (5.4).

2. Define the anti-symmetric array Fµν in inertial Cartesian coordinates

xα = (t, x, y, z), α = 0, 1, 2, 3 via

F0i = −Ei , Fij = ijk B k , i, j = 1, 2, 3. (5.6)

Under a change of inertial reference frame corresponding to a boost

along the x axis with speed v,

1
t0 = γ(t − vx), x0 = γ(x − vt), y 0 = y, γ=√ z 0 = z, ,
1 − v2
(5.7)
~ ~ ~ 0 ~0
the electric and magnetic fields change (E, B) → (E , B ), where

E x0 = E x , E y0 = γ(E y − vB z ), E z0 = γ(E z + vB y ) (5.8)

B x0 = B x , B y0 = γ(B y + vE z ), B z0 = γ(B z − vE y ). (5.9)

Show that this is equivalent to saying that Fµν are the components of
~ ·B
a spacetime tensor of type 02 . Show that the two quantities E ~ and

2 2
E − B do not change under the boost.

3. Define the 4-current

j α = (ρ, j i ), i = 1, 2, 3. (5.10)

Show that the Maxwell equations take the form

F αβ ,β = 4πj α , Fαβ,γ + Fβγ,α + Fγα,β = 0, (5.11)

where indices are raised and lowered with the usual Minkowski metric.
5.2. ELECTROMAGNETIC LAGRANGIAN 105

4. Show that the scalar and vector potentials, when assembled into the
4-potential
Aµ = (−φ, Ai ), i = 1, 2, 3, (5.12)
are related to the electromagnetic tensor Fµν by

Fµν = ∂µ Aν − ∂ν Aµ . (5.13)

Show that this formula for Fµν solves the homogeneous Maxwell equa-
tions Fαβ,γ + Fβγ,α + Fγα,β = 0.

5.2 Electromagnetic Lagrangian

While the electromagnetic field can be described in terms of the field tensor
F in Maxwell’s equations, if we wish to use a variational principle to describe
this field theory we will have to use potentials.1 So, we will describe electro-
magnetic theory using the scalar and vector potentials, which can be viewed
as a spacetime 1-form
A = Aα (x)dxα . (5.14)
Depending upon your tastes, you can think of this 1-form as (1) a cross-
section of the cotangent bundle of the spacetime manifold M ; (2) tensor field
of type (01 ); (3) a connection on a U (1) fiber bundle; (4) a collection of 4
functions, Aα (x) defined in a given coordinate system xα and such that in
any other coordinate system xα0

∂xβ
A0α (x0 ) = Aβ (x(x0 )). (5.15)
∂xα0
In any case, A is called the “Maxwell field”, the “electromagnetic field”,
the “electromagnetic potential”, the “gauge field”, the “4-vector potential”,
the “U(1) connection”, and some other names as well, along with various
mixtures of these.
As always, having specified the geometric nature of the field, the field
theory can be defined by choosing a Lagrangian. To define the Lagrangian
1
It can be shown using techniques from the inverse problem of the calculus of variations
that there is no variational principle for Maxwell’s equations built solely from (E, ~ B)
~
(equivalently from Fαβ ) and their derivatives.
106 CHAPTER 5. ELECTROMAGNETIC FIELD THEORY

we introduce the field strength tensor F , also known as the “Faraday tensor”,
or as the “curvature” of the gauge field A. We write
F = Fαβ (x)dxα ⊗ dxβ , Fαβ = ∂α Aβ − ∂β Aα . (5.16)
The field strength is in fact a two-form (an anti-symmetric (02 ) tensor field):
Fαβ = −Fβα , (5.17)
and we can write
1 1
F = Fαβ (dxα ⊗ dxβ − dxβ ⊗ dxα ) = Fαβ dxα ∧ dxβ , (5.18)
2 2
where the anti-symmetric tensor product is known as the “wedge product”,
denoted with ∧. In terms of differential forms, the field strength is the
exterior derivative of the Maxwell field:
F = dA. (5.19)
This guarantees that dF = 0, which is equivalent to the homogeneous
Maxwell equations Fαβ,γ + Fβγ,α + Fγα,β = 0. So, the use of potentials
solves half the Maxwell equations; the only remaining Maxwell equations to
be considered are F αβ ,β = 4πj α .
The 6 independent components of F in an inertial Cartesian coordinate
chart (t, x, y, z) define the electric and magnetic fields as perceived in that
reference frame. Note, however, that all of the definitions given above are
in fact valid on an arbitrary spacetime manifold in an arbitrary system of
coordinates.
The Lagrangian for electromagnetic theory – on an arbitrary spacetime
(M, g) – can be defined by the n-form (where n = dim(M )),
1
L = − F ∧ ∗F = L dx1 ∧ dx2 ∧ · · · ∧ dxn , (5.20)
4
where ∗F is the Hodge dual defined by the spacetime metric g. In terms of
components in a coordinate chart we have
1
(∗F )αβ = αβ γδ Fγδ , (5.21)
2
and the Lagrangian density given by
1√
L=− −gF αβ Fαβ , (5.22)
4
5.2. ELECTROMAGNETIC LAGRANGIAN 107

where
F αβ = g αγ g βδ Fγδ . (5.23)
Of course, we can – and usually will – restrict attention to the flat spacetime
in the standard Cartesian coordinates for explicit computations. It is always
understood that F is built from A in what follows.
Let us compute the Euler-Lagrange derivative of L. For simplicity we
will work on flat spacetime in inertial Cartesian coordinates so that

M = R4 , g = gαβ dxα ⊗dxβ = −dt⊗dt+dx⊗dx+dy ⊗dy +dz ⊗dz (5.24)

We have
1
δL = − F αβ δFαβ
2
1
= − F αβ (δAβ,α − δAα,β )
2
= −F αβ δAβ,α
= F αβ ,α δAβ + Dα −F αβ δAβ .

(5.25)

From this identity the Euler-Lagrange expression is given by

E β (L) = F αβ ,α , (5.26)

and the source-free Maxwell equations are

F αβ ,α = 0. (5.27)

There are some equivalent expressions of the field equations that are worth
knowing about. First of all, we have that

F αβ ,α = 0 ⇐⇒ g αγ Fαβ,γ ≡ Fαβ ,α = 0, (5.28)

so that the field equations can be expressed as

g αγ (Aβ,αγ − Aα,βγ ) = 0. (5.29)

We write this using the wave operator (which acts component-wise on the
1-form A) and the operator

div A = g αβ Aα,β = Aα, α = Aα ,α . (5.30)

108 CHAPTER 5. ELECTROMAGNETIC FIELD THEORY

via
Aβ − (div A),β = 0. (5.31)
You can see that this is a modified wave equation.
A more sophisticated expression of the field equations, which is manifestly
valid on any spacetime, uses the technology of differential forms. Recall that
on a spacetime one has the Hodge dual, which identifies the space of p-forms
with the space of n − p forms. This mapping is denoted by

α → ∗α. (5.32)

If F is a 2-form, then ∗F is an (n − 2)-form. The components are related by

(5.21). The source-free field equations (5.27) are equivalent to the vanishing
of a 1-form:
∗d∗F = 0, (5.33)
where d is the exterior derivative. This equation is valid on any spacetime
(M, g) and is equivalent to the EL equations for the Maxwell Lagrangian as
defined above on any spacetime.
Let me show you how this formula works in our flat 4-d spacetime. The
components of ∗F are given by
1
∗F αβ = αβγδ F γδ . (5.34)
2
The exterior derivative maps the 2-forms ∗F to a 3-form d ∗ F via
3
(d ∗ F )αβγ = 3∂[α ∗ Fβγ] = ∂[α βγ]µν F µν . (5.35)
2
The Hodge dual maps the 3-form d ∗ F to a 1-form ∗d∗F via
1
(∗d∗F )σ = σαβγ (d∗F )αβγ
6
1
= σαβγ βγµν ∂ α Fµν
4
= −δσ[µ δαν] ∂ α Fµν
= Fασ ,α . (5.36)

So that
∗d∗F = 0 ⇐⇒ Fαβ ,α = 0. (5.37)
5.2. ELECTROMAGNETIC LAGRANGIAN 109

The following problems establish some key structural features of electro-

magnetic theory.

Problems:

5. Show that the EL derivative of the Maxwell Lagrangian satisfies the

differential identity
Dβ E β (L) = 0. (5.38)

6. Restrict attention to flat spacetime in Cartesian coordinates, as usual.

Fix a vector field on spacetime, j α = j α (x). (1) Show that the La-
grangian
1
Lj = − F αβ Fαβ + 4πj α Aα (5.39)
4
gives the field equations

F αβ ,α = −4πj β . (5.40)

These are the Maxwell equations with prescribed electric sources – a

charge density ρ and current density ~j, where

j α = (ρ, ~j). (5.41)

(2) Use the results from the preceding problem to show that the Maxwell
equations with sources have no solution unless the vector field repre-
senting the sources is divergence-free:

∂α j α = 0. (5.42)

(3) Show that this condition is in fact the usual continuity equation
representing conservation of electric charge.

7. Show that the Lagrangian density for source-free electromagnetism can

be written in terms of the electric and magnetic fields (in any given
inertial frame) by L = 21 (E 2 − B 2 ). This is one of the 2 relativistic
invariants that can be made algebraically from E ~ and B.
~

8. Show that E ~ ·B
~ is relativistically invariant (unchanged by a Lorentz
transformation). Express it in terms of potentials and show that it is
just a divergence, with vanishing Euler-Lagrange expression.
110 CHAPTER 5. ELECTROMAGNETIC FIELD THEORY

5.3 Gauge symmetry

Probably the most significant new aspect of electromagnetic theory, field the-
oretically speaking, is that it admits an infinite-dimensional group of varia-
tional symmetries known as gauge symmetries. Their appearance stems from
the fact that the electromagnetic Lagrangian only depends upon the vector
potential through the field strength tensor via

F = dA, (5.43)

so that if we redefine the 4-vector potential by a gauge transformation

A → A0 = A + dΛ, (5.44)

where Λ : M → R, then

F 0 = dA0 = d(A + dΛ) = dA = F, (5.45)

where we used the fact that, on any differential form, d2 = 0. It is easy to

check all this explicitly in terms of components.

Problem:
9. For any function Λ = Λ(x), define

A0α = Aα + ∂α Λ. (5.46)

Show that
A0α,β − A0β,α = Aα,β − Aβ,α . (5.47)
Show that, in terms of the scalar and vector potentials, this gauge
transformation is equivalent to

φ → φ0 = φ − ∂t Λ, ~→A
A ~0 = A
~ + ∇Λ. (5.48)

This shows, then, that under the transformation

A → A0 (5.49)

we have
F → F. (5.50)
5.3. GAUGE SYMMETRY 111

Evidently, the Lagrangian – which contains A only through F – is invari-

ant under this transformation of A. We say that the Lagrangian is gauge
invariant.
The gauge transformations constitute a very large set of variational sym-
metries. Up to boundary conditions, one can use any function Λ to define
a new set of potentials. Mathematically, the gauge transformations form an
infinite-dimensional Abelian group.
Insofar as classical electrodynamics can be formulated in terms of the field
strength tensor, the gauge transformation symmetry has no physical content
in the sense that one always identifies physical situations described by gauge-
equivalent Maxwell fields. Thus the Maxwell fields A provide a redundant
description of the physics. On the other hand, while the potential does not
have direct classical physical significance, it does have a physical role to play:
the need to use the potential A can be understood from the desire to have a
local variational principle – which is crucial for quantum theory. Indeed, in
the quantum context the potential plays a more important role.
The gauge symmetry is responsible for the fact that the Maxwell equa-
tions for the potential

A − d(div A) = −4πj (5.51)

are not hyperbolic. (For comparison, the KG equation is hyperbolic.) Indeed,

hyperbolic equations will have a Cauchy problem with unique solutions for
given initial data. It is clear that, because the function Λ is arbitrary, one
can never have unique solutions to the field equations for A associated to
given Cauchy data. To see this, let A be any solution for prescribed Cauchy
data on a hypersurface t = const. Let A0 be any other solution obtained by
a gauge transformation:
A0 = A + dΛ. (5.52)
It is easy to see that A0 also solves the field equations. This follows from a
number of points of view. For example, the field equations are conditions
on the field strength F , which is invariant under the gauge transformation.
Alternatively, the field equations are invariant under the field equations be-
cause the Lagrangian is. Finally, you can check directly that dΛ solves the
source-free field equations:

[( − d div)dΛ]α = ∂ β ∂β (∂α Λ) − ∂α (∂ β ∂β Λ) = 0. (5.53)

112 CHAPTER 5. ELECTROMAGNETIC FIELD THEORY

Since Λ is an arbitrary smooth function, we can choose the first two deriva-
tives of Λ to vanish on the initial hypersurface so that A0 and A are distinct
solutions with the same initial data.
To uniquely determine the potential A from Cauchy data and the Maxwell
equations one has to add additional conditions on the potential beyond the
field equations. This is possible since one can adjust the form of A via gauge
transformations. It is not too hard to show that one can gauge transform
any given potential into one which satisfies the Lorenz gauge condition:

∂ α Aα = 0. (5.54)

To see this, take any potential, say, Ã and gauge transform it to a potential
A = Ã + dΛ such that the Lorenz gauge holds; this means

∂ α ∂α Λ = −∂ α Ãα . (5.55)

Viewing the right-hand side of this equation as given, we see that to find
such a gauge transformation amounts to solving the wave equation with a
given source, which can always be done.
In the Lorenz gauge the Maxwell equations are just the usual, hyper-
bolic wave equation for each inertial-Cartesian component of the 4-vector
potential,
Aα = −4πjα . (5.56)

It is therefore tempting to suppose that one can thus identify the EM field
theory with 4 copies of a massless KG field theory. Things are a little more
interesting than that: the 4-potential still must satisfy (5.54), and even in the
Lorenz gauge the potentials are not uniquely determined! Pick any solution
Λ of the wave equation Λ = 0 and use Λ to make a gauge transformation.
You can easily check that the transformed potentials still satisfy the condition
(5.54). It is possible to show that the Lorenz condition along with this
residual gauge freedom ultimately permits the elimination of two functions
worth of information from A.2 One says that the electromagnetic field has
“two degrees of freedom per spatial point”; this is intimately related to the
two helicity states of photons in the associated quantum theory.

2 ~ = 0.
In detail, one can choose Λ such that A0 = 0 and, therefore, so that ∇ · A
5.4. NOETHER’S SECOND THEOREM IN ELECTROMAGNETIC THEORY113

5.4 Noether’s second theorem in electromag-

netic theory
We have seen that a variational (or divergence) symmetry leads to a con-
served current. The gauge transformation defines a variational symmetry for
electromagnetic theory. Actually, there are many gauge symmetries: because
each function on spacetime (modulo an additive constant) defines a gauge
transformation the set of gauge transformations is infinite dimensional! Let
us consider our Noether type of analysis for these symmetries. We will see
that the analysis that led to Noether’s (first) theorem can be taken a little
further when the symmetry involves arbitrary functions.
Consider a 1-parameter family of gauge transformations:
A0 = A + dΛs , (5.57)
characterized by a 1-parameter family of functions Λs where
Λ0 = 0. (5.58)
Infinitesimally, we have3
δA = dσ, (5.59)
where
∂Λs
σ= . (5.60)
∂s s=0
It is easy to see that the function σ can be chosen arbitrarily just as we had for
field variations in the usual calculus of variations analysis. The Lagrangian
is invariant under the gauge transformation; therefore it is invariant under
its infinitesimal version. Let us check this explicitly. For any variation we
have
1
δL = − F αβ δFαβ , (5.61)
2
and under a variation defined by an infinitesimal gauge transformation
δFαβ = ∂α δAβ − ∂β δAα
= ∂α (∂β σ) − ∂β (∂α σ)
= 0, (5.62)
3
Notice that the infinitesimal gauge transformation has the same form as a finite gauge
transformation. This is due to the fact that the gauge transformation is an affine (as
opposed to non-linear) transformation.
114 CHAPTER 5. ELECTROMAGNETIC FIELD THEORY

so that δL = 0.
For any variation, the first variational identity is

δL = E β δAβ + Dα −F αβ δAβ ,

(5.63)

where
E β = F αβ ,α . (5.64)
For a variation induced by an infinitesimal gauge transformation we know
that δL = 0, so the variational identity tells us that

0 = E β ∂β σ + Dα −F αβ ∂β σ ,

(5.65)

which is valid for any function σ. Now we take account of the fact that the
function σ is arbitrary. We rearrange the derivatives of σ to get them inside
a divergence:

0 = −Dβ E β σ + Dα −F αβ ∂β σ + F αβ ,β σ

(5.66)

Restrict this equation to a potential A = A(x), then integrate the result over
a spacetime region R:
Z Z
αβ
−F αβ ∂β σ + F αβ ,β σ dΣα .

0=− F ,αβ σ + (5.67)
R ∂R

This must hold for any function σ; we can use the fundamental theorem
of variational calculus to conclude that the Euler-Lagrange equations must
satisfy the differential identity

Dβ E β = 0, (5.68)

which you proved directly in a previous homework problem. Note that this
says the Euler-Lagrange expression is divergence-free, and that this holds
whether or not the field equations are satisfied – it is an identity arising due
to the gauge symmetry of the Lagrangian.
Compare our results above to Noether’s first theorem. We have seen that
the gauge symmetry – being a continuous variational symmetry – leads to a
divergence-free vector field, as it must by Noether’s first theorem. But we
now have a new ingredient: the gauge symmetry is built from an arbitrary
function of all the independent variables xα so that the gauge transformation
can be localized to an arbitrary location in spacetime. This leads to the vector
5.5. NOETHER’S SECOND THEOREM 115

field being divergence-free identically, independent of the field equations.

Indeed, the divergence relation is an identity satisfied by the field equations.
All this is an example of Noether’s second theorem, and the resulting identity
is sometimes called the “Noether identity” associated to the gauge symmetry.

Problem:
10. Consider the electromagnetic field coupled to sources with the La-
grangian density
1
Lj = − F αβ Fαβ + 4πj α Aα (5.69)
4
Show that this Lagrangian is gauge invariant (up to a divergence) if
and only if the spacetime vector field j α is chosen to be divergence-free.
What is the Noether identity in this case?

5.5 Noether’s second theorem

Let us spend a moment having a look at Noether’s second theorem from a
more general point of view.
Consider a system of fields ϕa , a = 1, 2, . . . , on a manifold labeled by
coordinates xα , described by a Lagrangian density L and field equations
Ea (L) = 0 defined by the Euler-Lagrange identity

δL = Ea (L)δϕa + Dα η α , (5.70)

where η α is a linear differential operator acting on δϕa . As usual, all these

quantities are functions on jet space.
Let us define an infinitesimal gauge transformation to be an infinitesimal
transformation
δϕa = δϕa (Λ), (5.71)
that is constructed as a linear differential operator D (constructed locally
from the fields and their derivatives) on arbitrary functions ΛA = ΛA (x),
A = 1, 2, . . . :
δϕa (Λ) = [D(Λ)]a . (5.72)
The gauge transformation is an infinitesimal gauge symmetry if it leaves the
Lagrangian invariant up to a divergence of a vector field W α ,

δL = Dα W α (Λ), (5.73)
116 CHAPTER 5. ELECTROMAGNETIC FIELD THEORY

where W α (Λ) is a linear differential operator acting on the functions ΛA ,

with the linear operator being locally constructed from the fields ϕa and
their derivatives.
Noether’s second theorem now asserts that the existence of a gauge sym-
metry implies differential identities satisfied by the field equations. To see
this, we simply use the fact that, for any functions ΛA ,

0 = δL − Dα W α = Ea (L)[D(Λ)]a + Dα (η α − W α ), (5.74)

where both η and W are built as linear differential operators acting ΛA . As

before, we integrate this identity over a region and choose the functions ΛA
to vanish in a neighborhood of the boundary so that the divergence terms
can be neglected. We then have that, for all functions ΛA ,
Z
Ea (L)[D(Λ)]a = 0. (5.75)
R

Now imagine integrating by parts each term in the linear operator [D(Λ)]a so
that all derivatives of Λ are removed. The boundary terms that arise vanish
with our boundary conditions on ΛA . This process defines the formal adjoint
D∗ of the linear differential operator D:
Z Z
a
Ea [D(Λ)] = ΛA [D∗ (E)]A . (5.76)
R R

The gauge symmetry condition is now

Z
ΛA [D∗ (E)]A = 0, (5.77)
R

for all functions ΛA (vanishing in the neighborhood of the boundary). The

fundamental theorem of variational calculus then tells us that the Euler-
Lagrange expressions must obey the differential identities:

[D∗ (E)]A = 0. (5.78)

You can easily check out this argument via our Maxwell example. The
gauge transformation is defined by the exterior derivative on functions:

[D(Λ)]α = ∂α Λ. (5.79)
5.6. THE CANONICAL ENERGY-MOMENTUM TENSOR 117

The infinitesimal transformation

δAα = [D(Λ)]α (5.80)

is a symmetry of the Lagrangian with

W α = 0. (5.81)

The adjoint of the exterior derivative is given by (minus) the divergence

operation:
V α ∂α Λ = −Λ∂α V α + ∂α (ΛV α ) (5.82)
so that
[D∗ (E)] = Dα E α , (5.83)
which leads to the Noether identity

Dα E α = 0 (5.84)

for any field equations coming from a gauge invariant Lagrangian.

Let me emphasize that the above considerations only work because Λ can
be chosen to be any function of all the independent variables xα . This is
needed for various integrations by parts, and it is needed to use the funda-
mental theorem of the variational calculus. The bottom line here is that you
have a gauge symmetry (as opposed to an ordinary symmetry) if and only if
the support in spacetime of the transformation can be freely specified.

5.6 The canonical energy-momentum tensor

In §3.12 we saw how the translational symmetries of Minkowski spacetime
led to the construction of the energy-momentum tensor for the KG field. We
now apply similar reasoning to the electromagnetic field.
Consider a 1-parameter family of translations, say,

x α → x α = x α + aα , where aα = λbα . (5.85)

The corresponding (”pullback”) transformation of the vector potential is

given by
∂xβ
Aα → Aα = Aβ (x + λb) = Aα (x + λb). (5.86)
∂xα
118 CHAPTER 5. ELECTROMAGNETIC FIELD THEORY

We have then4
δAα = bβ Aα,β . (5.87)
This implies that
δFµν = bα Fµν,α (5.88)
and hence the translations define a divergence symmetry:
1 1
δL = − bγ F αβ Fαβ,γ = Dγ (− bγ F αβ Fαβ ). (5.89)
2 4
Recalling the variational identity:

δL = F αβ ,α δAβ + Dα −F αβ δAβ ,

(5.90)

this leads to the conserved current

α γ αβ 1 α µν
j = −b F Aβ,γ − δγ F Fµν . (5.91)
4
You can easily check with a direct computation that j α is conserved, that is,

Dα j α = 0, (5.92)

when the field equations hold.

Problem:
11. Verify equations (5.87)–(5.92).

Since this conservation law exists for each constant vector bα , we can sum-
marize these conservation laws using the canonical energy-momentum tensor

α αβ 1 α µν
Tγ = F Aβ,γ − δγ F Fµν , (5.93)
4
which satisfies
Dα Tβα = 0, (5.94)
modulo the field equations. We view the energy-momentum tensor as a
collection of conserved currents labeled by the index γ.
4
It is worth noting that this formula is not gauge-invariant; it really only defines the
change in the fields due to a translation modulo a gauge transformation. We will address
this issue soon.
5.7. IMPROVED MAXWELL ENERGY-MOMENTUM TENSOR 119

5.7 Improved Maxwell energy-momentum ten-

sor
There is one glaring defect in the structure of the canonical energy-momentum
tensor: it is not gauge invariant. Indeed, under a gauge transformation

A −→ A + dΛ (5.95)

we have
Tβα −→ Tβα + F αµ ∂µ ∂β Λ. (5.96)
In order to see what to do about this, we need to use some of the flexibility
we have in defining conserved currents. This is our next task.
It is possible to show that all local and gauge invariant expressions must
depend on the vector potential only through the field strength. Consequently,
the currents are not gauge invariant because of the explicit presence of the
potentials A. With that in mind we write

1 α µν
Tγ = F Fγβ − δγ F Fµν − F αβ Aγ ,β
α αβ
4

1 α µν
= F Fγβ − δγ F Fµν − Dβ (F αβ Aγ ) + Aγ Dβ F αβ . (5.97)
αβ
4
According to §3.20, the last two terms are trivial conservation laws. So,
modulo a set of trivial conservation laws, the canonical energy-momentum
tensor takes the gauge-invariant form
1
Tγα = F αβ Fγβ − δγα F µν Fµν . (5.98)
4
This tensor is called the “gauge-invariant energy-momentum tensor” or the
“improved energy-momentum tensor” or the “general relativistic energy-
momentum tensor”. The latter term arises since this energy-momentum
tensor serves as the source of the gravitational field in general relativity and
can be derived using the variational principle of that theory.
The improved energy-momentum tensor has another valuable feature rel-
ative to the canonical energy-momentum tensor (besides gauge invariance).
The canonical energy-momentum tensor,

α αβ 1 α µν
Tγ = F Aβ,γ − δγ F Fµν , (5.99)
4
120 CHAPTER 5. ELECTROMAGNETIC FIELD THEORY

is not a symmetric tensor. If we define

Tαβ = gαγ Tβγ (5.100)

then you can see that

T[αβ] = F[α γ ∂β] Aγ . (5.101)
Here we used the notation
1
T[αβ] ≡ (Tαβ − Tβα ). (5.102)
2
On the other hand the improved energy-momentum tensor,

1
Tγα = F αβ Fβγ − δγα F µν Fµν . (5.103)
4
is symmetric:
Tαβ = Tβα . (5.104)
Why is all this important? Well, think back to the KG equation. There,
you will recall, the conservation of angular momentum, which stems from
the symmetry of the Lagrangian with respect to the Lorentz group, comes
from the currents

M α(µ)(ν) = xµ T αν − xν T αµ = 2T α[µ xν] . (5.105)

These 6 currents were conserved since (1) T αβ is divergence free (modulo

the equations of motion) and (2) T [αβ] = 0. This result will generalize to
give conservation of angular momentum in electromagnetic theory using the
improved energy-momentum tensor. So the improved tensor in electromag-
netic theory plays the same role relative to angular momentum as does the
energy-momentum tensor of KG theory.
Why did we have to “improve” the canonical energy-momentum ten-
sor? Indeed, we have a paradox: the Lagrangian is gauge invariant, so why
didn’t Noether’s theorem automatically give us the gauge invariant energy-
momentum tensor? As with most paradoxes, the devil is in the details.
Noether’s theorem involves using the variational identity in the form

δL = E(L) + Dα η α (5.106)
5.8. THE HAMILTONIAN FORMULATION OF ELECTROMAGNETISM.121

to construct the conserved current from η α and any divergence Dα W α which

arises in the symmetry transformation of L. To construct the canonical
energy-momentum tensor from Noether’s first theorem we used

1
W α = − bα Fµν F µν , (5.107)
4

along with

η α = −F αβ δAβ , and δAβ = bγ Aβ,γ =⇒ η α = −bγ F αβ Aβ,γ . (5.108)

The lack of gauge invariance snuck into the calculation via the formula for
the change in A under an infinitesimal translation, δAβ = bγ Aβ,γ . We de-
stroyed gauge invariance with this formula since its right hand side is not
gauge invariant. A gauge invariant formula for the change of A under an
infinitesimal translation can be gotten by accompanying the translation with
a gauge transformation:

δAβ = bγ Aβ,γ − Dβ (bγ Aγ ) = bγ Fγβ . (5.109)

With this improved symmetry transformation we get the improved energy-

momentum tensor from Noether’s theorem. Notice also that this additional
gauge transformation is a symmetry and corresponds to the additional trivial
conservation law needed to improve the energy-momentum tensor in the first
place.

5.8 The Hamiltonian formulation of Electro-

magnetism.
The Hamiltonian formulation of the electromagnetic field offers some signif-
icant new features beyond what we found for the scalar field. These new
features are associated with gauge invariance. Mathematically speaking, the
new features stem from the failure of the non-degeneracy condition (4.14).
The novelties which appear here also appear (in a more elaborate form) in
Yang-Mills theories and in generally covariant theories, e.g., of gravitation.
122 CHAPTER 5. ELECTROMAGNETIC FIELD THEORY

5.8.1 Phase space

We start with the source-free EM Lagrangian in Minkowski spacetime using
inertial-Cartesian coordinates:
1
L = − Fαβ F αβ , Fαβ = ∂α Aβ − ∂β Aα . (5.110)
4
The phase space is the vector space of solutions to the source-free Maxwell
equations,
∂β F αβ = 0, (5.111)
equipped with a symplectic structure which is constructed as follows.
The variation of the Lagrangian is given by:
Z
d3 x F αβ ,α δAβ + ∂α −F αβ δAβ

δL = (5.112)
R3
Z
d3 x F αβ ,α δAβ + ∂0 −F 0β δAβ ,

= (5.113)
R3

where the integral takes place at some chosen value for x0 ≡ t and we used
the spatial divergence theorem along with boundary conditions at spatial
infinity to get the second equality. The 1-form Θ which is normally used to
construct the symplectic 2-form is then defined by
Z Z
3 β0
Θ(δA) = d x F δAβ = d3 x F i0 δAi , (5.114)
R3 R3

where we shall use Latin letters to denote spatial components, i = 1, 2, 3.

Using this naive definition of the symplectic form Ω = dΘ, we get
Z
d3 x (∂t δ1 Ai − ∂ i δ1 At )δ2 Ai − (∂t δ2 Ai − ∂ i δ2 At )δ1 Ai .

Ω(δ1 A, δ2 A) =
R3
(5.115)

Problem:
12. Show that the 2-form (5.115) does not depend upon the time at which
it is evaluated.

The new feature that appears here is that the 2-form Ω is in fact degen-
erate. This means that there exists a vector ~v such that Ω(~u, ~v ) = 0 for all
~u. Let us see how this happens in detail.
5.8. THE HAMILTONIAN FORMULATION OF ELECTROMAGNETISM.123

Consider using in (5.115) a field variation δ2 A which is an infinitesimal

gauge transformation:
δ2 At = ∂t Λ, δ2 Ai = ∂i Λ, (5.116)
where Λ : R3 → R is any function of compact support. Substituting this
into (5.115) it is easy to see that the second group of terms vanish by the
commutativity of partial derivatives. Let us examine the first group of terms.
Upon substitution of (5.116) we get:
Z Z
3 i i
d x (∂t δ1 A − ∂ δ1 At )δ2 Ai = d3 x (∂t δ1 Ai − ∂ i δ1 At )∂i Λ
R3 3
RZ

=− d3 x Λ ∂i (∂t δ1 Ai − ∂ i δ1 At ), (5.117)
R3

where integration by parts and the divergence theorem were used to get the
last equality. The boundary term “at infinity” vanishes since Λ has compact
support. Next, recall that the field variations δA represent tangent vectors to
the space of solutions of the Maxwell equations (5.111) and so are solutions
to the linearized equations. Because the Maxwell equations are linear, their
linearization is mathematically the same:
∂ β (∂α δAβ − ∂β δAα ) = 0. (5.118)
Setting α = 0 we get
∂ i (∂t δAi − ∂i δAt ) = 0. (5.119)
This means that (5.117) vanishes and we conclude that the “pure gauge”
tangent vectors (5.116) are degeneracy directions for the 2-form Ω.
Degenerate 2-forms are often called “pre-symplectic” because there is
a canonical procedure for extracting a unique symplectic form on a smaller
space from a pre-symplectic form. We will not develop this elegant geometric
result here. Instead, we will proceed in a useful if more roundabout route
by examining the Hamiltonian formulation of the theory that arises when we
parametrize the space of solutions to the field equations (5.111) with initial
data.

5.8.2 Equations of motion

Returning to (5.114), we see from its “p dq” form that we can identify the
canonical coordinates with the initial data for the spatial part of the vector
124 CHAPTER 5. ELECTROMAGNETIC FIELD THEORY

potential and the conjugate momentum is then given by minus the electric
field. We define
Qi (~x) = Ai (t = 0, ~x), P i (~x) = F i0 (t = 0, ~x). (5.120)
Because of the definition of the field strength tensor in terms of the vector
potential, we have
Pi = ∂t Ai − ∂i At , (5.121)
so we can write
Q̇i = Pi + ∂i At . (5.122)
Next, consider the equations of motion with time and space (as given in an
inertial reference frame) explicitly separated:
∂t F αt + ∂i F αi = 0, (5.123)
which yields four equations:
∂t F jt + ∂i F ji = 0 (5.124)
∂i F ti = 0, (5.125)
These equations allow us to view the Maxwell equations as determining a
curve (Qi = Qi (t, ~x), P i = P i (t, ~x)) in the space of initial data along with
a constraint on the canonical variables. Equations (5.122) and (5.124) are
equivalent to the evolution equations:
Q̇i = δij P j + ∂i At , (5.126)
Ṗ j = ∂j F ij , (5.127)

which correspond to Ampere’s law. Equation (5.125) corresponds to the

constraint
∂j P j = 0. (5.128)
The constraint equation does not specify time evolution but restricts the
canonical variables at any given instant of time. You can easily check that this
is the differential version of Gauss’ law, in the case of vanishing charge density.
One can also interpret the constraint as a restriction on admissible initial
conditions for the evolution equations (5.127), since we have the following
result.

Problem:
5.8. THE HAMILTONIAN FORMULATION OF ELECTROMAGNETISM.125

13. Show that if the constraint (5.128) holds at one time and the canonical
variables evolve in time according to (5.126), (5.127) then (5.128) will
hold at any other time. (Hint: Consider the time derivative of (5.128).)

Notice that the equations (5.126), (5.127) are evolution equations for
(Qi , P i ) only. The time component At is not determined by these equations,
it simply defines a gauge transformation of the Qi as time evolves.
Equations (5.126), (5.127) and (5.128) determine solutions to the Maxwell
equations as follows. Specify 6 functions on R3 , namely (qi (~x), pi (~x)), where
pi is divergence-free, ∂i pi = 0. Pick a function At (t, ~x) any way you like.
Solve the evolution equations (5.126), (5.127) subject to the initial conditions
(Qi (0) = q i , Pi (0) = pi ) to get (Qi (t), P i (t)). In the given inertial reference
frame define
Fti = P i , Fij = ∂i Qj − ∂j Qi . (5.129)
The resulting field strength tensor Fαβ satisfies the (source-free) Maxwell
equations, as you can easily verify.

5.8.3 The electromagnetic Hamiltonian

and gauge transformations
Following the same strategy I mentioned when studying the KG field, we
can compute the electromagnetic Hamiltonian from the Lagrangian using
Legendre transformation. An examination of this Hamiltonian will give us
useful new perspectives on the electromagnetic phase space, on the Hamilton
equations, and on the role of gauge transformations.
To this end we use an inertial reference frame
xα = (x0 , xi ) = (t, ~x) (5.130)
and decompose the electromagnetic Lagrangian density with respect to it.

1 i 1 ij 1 i i 1 ij
L = − −F0i F0 + Fij F = (∂t Ai − ∂i At )(∂t A − ∂ At ) − Fij F
2 2 2 2
(5.131)
I will choose φ ≡ −At (~x) and Qi ≡ Ai (~x)) as the canonical coordinates, the
momentum conjugate to Qi is
∂L
Pi = = (∂t Ai − ∂i At ) = (∂t Qi + ∂i φ), (5.132)
∂(∂t Ai )
126 CHAPTER 5. ELECTROMAGNETIC FIELD THEORY

as we found earlier. Notice that there are no time derivatives of At in the

Lagrangian – its conjugate momentum vanishes:
∂L
= 0. (5.133)
∂(∂t At )

Viewing the spacetime fields as Qi (t, ~x), P i (t, ~x), φ(t, ~x), the Lagrangian can
be viewed as a functional of (Qi , P j , φ) and can be written in the Hamiltonian
“pq̇ − H” form (exercise):
Z
3 i 1 i 1 ij i
L[Q, P, φ] = d x P Q̇i − (Pi P + Fij F ) + P ∂i φ
R3 2 2
Z
3 i 1 i 1 ij i
= d x P Q̇i − (Pi P + Fij F ) + φ∂i P (5.134)
R3 2 2

To get the second equality I integrated by parts and used the divergence
theorem on the last term. At each time t, we will assume that (Qi , P i )
vanish sufficiently rapidly at infinity so that boundary term vanishes. As
usual, the EL equations (using functional derivatives) for P i reproduce the
definition (5.132) so that the EL equations for (At , Qi ) then yield the Maxwell
equations. Notice that φ enters as a Lagrange multiplier enforcing the (Gauss
law) constraint on the canonical variables,

∂i P i = 0, (5.135)

which is how we shall treat it in all that follows.

Problem:

14. Show that the EL equations defined by L = L[Q, P, φ] in (5.134),

δL d δL δL d δL δL
i
− = 0, − = 0, = 0, (5.136)
δP dt δ Ṗ i δQi dt δ Q̇i δφ

are equivalent to the Maxwell equations.

From (5.134) the Hamiltonian is given by

Z
3 1 i 1 ij i
H[Q, P, φ] = dx (Pi P + Fij F ) + φ ∂i P . (5.137)
R3 2 2
5.8. THE HAMILTONIAN FORMULATION OF ELECTROMAGNETISM.127

The first two terms are what you might expect: they define the energy of the
electromagnetic field (once you recognize that 14 Fij F ij = 12 B 2 is the magnetic
energy density). You can see that the first term (the electric energy) is akin to
the kinetic energy of a particle, while the second term (the magnetic energy)
is akin to the potential energy of a particle. The term we want to focus on
is the third term – what’s that doing there? Well, first of all note that this
term does not affect the value of the Hamiltonian provided the canonical
variables satisfy the constraint (5.135). Secondly, let us consider Hamilton’s
equations:
δH
Q̇i = = Pi − ∂i φ, (5.138)
δP i
δH
Ṗ i = − = ∂i F ij . (5.139)
δQi

From (5.138), which is secretly the relation between the electric field and
the potentials, you can see that the last term in the Hamiltonian is precisely
what is needed to generate the gauge transformation term we already found
in (5.126).
Problem:
15. Using the Poisson brackets
Z
3 δM δN δM δN
[M, N ] = dx − , (5.140)
R3 δQi (~x) δP i (~x) δP i (~x) δQi (~x)
show that Z
G=− d3 x Λ(~x) ∂i P i (5.141)
R3
is the generating function for gauge transformations
δQi = [Qi , G] = ∂i Λ, δP i = [P i , G] = 0. (5.142)

Thus, the Maxwell equations can be viewed as a constrained Hamiltonian

system with canonical coordinates and momentum given by the vector po-
tential and electric field, with the Hamiltonian given by (5.137), and with the
constraint (5.135). The Hamiltonian generates the time evolution of (Qi , P i )
with the constraint term contributing a gauge transformation with gauge
function φ. The Hamiltonian structures of non-Abelian gauge theory and of
generally covariant gravitation theories follows a similar pattern.
128 CHAPTER 5. ELECTROMAGNETIC FIELD THEORY

5.9 PROBLEMS
1. Maxwell’s equations for the electric and magnetic field (E, ~ B)
~ associ-
ated to charge density and current density (ρ, ~j) are given by

~ = 4πρ,
∇·E

~ = 0,
∇·B
~
~ − 1 ∂ E = 4πσ,
∇×B
c ∂t
~
∇×E~ + ∂ B = 0.
∂t
~ x, t) the electric
Show that for any function φ(~x, t) and vector field A(~
and magnetic fields defined by

~
~ = −∇φ − 1 ∂ A ,
E ~ =∇×A
B ~
c ∂t
satisfy (5.2) and (5.4).

2. Define the anti-symmetric array Fµν in inertial Cartesian coordinates

xα = (t, x, y, z), α = 0, 1, 2, 3 via

Fti = −Ei , Fij = ijk B k , i, j = 1, 2, 3.

Under a change of inertial reference frame corresponding to a boost

along the x axis with speed v the electric and magnetic fields change
~ B)
(E, ~ → (E ~ 0, B
~ 0 ), where

E x0 = E x , E y0 = γ(E y − vB z ), E z0 = γ(E z + vB y )

B x0 = B x , B y0 = γ(B y + vE z ), B z0 = γ(B z − vE y ).
√
(Here γ = 1/ 1 − v 2 .) Show that this is equivalent to saying that Fµν
are the components of a spacetime tensor of type 02 . Show that the
two quantities E~ ·B~ and E 2 − B 2 do not change under the boost.
5.9. PROBLEMS 129

3. Define the 4-current

j α = (ρ, j i ), i = 1, 2, 3.

Show that the Maxwell equations take the form

F αβ ,β = 4πj α , Fαβ,γ + Fβγ,α + Fγα,β = 0,

where indices are raised and lowered with the usual Minkowski metric.

4. Show that the scalar and vector potentials, when assembled into the
4-potential
Aµ = (−φ, Ai ), i = 1, 2, 3,
are related to the electromagnetic tensor Fµν by

Fµν = ∂µ Aν − ∂ν Aµ .

Show that this formula for Fµν solves the homogeneous Maxwell equa-
tions Fαβ,γ + Fβγ,α + Fγα,β = 0.

5. Show that the EL derivative of the Maxwell Lagrangian satisfies the

differential identity
Dβ E β (L) = 0.

6. Restrict attention to flat spacetime in Cartesian coordinates, as usual.

Fix a vector field on spacetime, j α = j α (x). Show that the Lagrangian
1
Lj = − F αβ Fαβ + 4πj α Aα
4
gives the field equations

F αβ ,α = −4πj β .

These are the Maxwell equations with prescribed electric sources having
a charge density ρ and current density ~j, where

j α = (ρ, ~j).
130 CHAPTER 5. ELECTROMAGNETIC FIELD THEORY

Use the results from the preceding problem to show that the Maxwell
equations with sources have no solution unless the vector field repre-
senting the sources is divergence-free:

∂α j α = 0.

Show that this condition is in fact the usual continuity equations rep-
resenting conservation of electric charge.

7. Show that the Lagrangian density for source-free electromagnetism can

be written in terms of the electric and magnetic fields (in any given
inertial frame) by L = 12 (E 2 − B 2 ). This is one of the 2 relativistic
invariants that can be made algebraically from E ~ and B.
~

9. For any function Λ = Λ(x), define

A0α = Aα + ∂α Λ.

Show that
A0α,β − A0β,α = Aα,β − Aβ,α .
Show that, in terms of the scalar and vector potentials, this gauge
transformation is equivalent to

φ → φ0 = φ − ∂t Λ, ~→A
A ~0 = A
~ + ∇Λ.

10. Consider the electromagnetic field coupled to sources with the La-
grangian density
1
Lj = − F αβ Fαβ + 4πj α Aα
4
Show that this Lagrangian is gauge invariant (up to a divergence) if
and only if the spacetime vector field j α is chosen to be divergence-free.
What is the Noether identity in this case?
5.9. PROBLEMS 131

11. Verify equations (5.87)–(5.92).

12. Show that the 2-form (5.115) does not depend upon the time at which
it is evaluated.

14. Show that the EL equations defined by L = L[Q, P, φ] in (5.134),

δL d δL δL d δL δL
i
− = 0, − = 0, = 0,
δP dt δ Ṗ i δQi dt δ Q̇i δφ

are equivalent to the Maxwell equations.

15. Using the Poisson brackets

Z
3 δM δN δM δN
[M, N ] = dx − ,
R3 δQi (~x) δP i (~x) δP i (~x) δQi (~x)

show that Z
G=− d3 x Λ(~x) ∂i P i
R3
is the generating function for gauge transformations

δQi = [Qi , G] = ∂i Λ, δP i = [P i , G] = 0.
132 CHAPTER 5. ELECTROMAGNETIC FIELD THEORY
Chapter 6

Scalar Electrodynamics

Let us have an introductory look at the field theory called scalar electrody-
namics, in which one considers a coupled system of Maxwell and charged KG
fields. There are an infinite number of ways one could try to couple these
fields. There is one particularly interesting way, physically speaking, and
this is the one we shall be exploring. Mathematically, too, this particular
coupling has many interesting features which we shall explore.
To understand the motivation for the postulated form of scalar electro-
dynamics, it is easiest to proceed via Lagrangians. For simplicity we will
restrict attention to flat spacetime in inertial Cartesian coordinates, but our
treatment is easily generalized to an arbitrary spacetime.

6.1 Electromagnetic field with scalar sources

Let us return to the electromagnetic theory, but now with electrically charged
sources. Recall that if j α (x) is some given divergence-free vector field on
spacetime, representing some externally specified charge-current distribution,
then the behavior of the electromagnetic field interacting with the given
source is dictated by the Lagrangian

1
Lj = − F αβ Fαβ + 4πj α Aα . (6.1)
4
Incidentally, given the explicit appearance of Aα , one might worry about
the gauge symmetry of this Lagrangian. But it is easily seen that the gauge
transformation is a divergence symmetry of this Lagrangian. Indeed, under

133
134 CHAPTER 6. SCALAR ELECTRODYNAMICS

a gauge transformation of the offending term we have

j α Aα −→j α (Aα + ∂α Λ)
= j α Aα + Dα (Λj α ), (6.2)
where we had to use the fact
∂α j α (x) = 0. (6.3)
The idea now is that we don’t want to specify the sources in advance,
we want the theory to tell us how they behave. In other words, we want to
include the sources as part of the dynamical variables of our theory. In all
known instances the correct way to do this always follows the same pattern:
the gauge fields affect the “motion” of the sources, and the sources affect
the form of the gauge field. Here we will use the electromagnetic field as the
gauge field and the charged (U (1) symmetric) KG field as the source. The
reasoning for this latter choice goes as follows.
We are going to use fields to model things like electrically charged mat-
ter, so we insist upon a model for the charged sources built from a classical
field.1 So, we need a classical field theory that admits a conserved current
that we can interpret as an electric 4-current. The KG field admitted 10 con-
served currents corresponding to conserved energy, momentum and angular
momentum. But we know that the electromagnetic field is not driven by
such quantities, so we need another kind of current. To find such a current
we turn to the charged KG field. In the absence of any other interactions,
this field admits the conserved current
j α = −ig αβ ϕ∗ ϕ,β − ϕϕ∗,β .

(6.4)
The simplest thing to try is to build a theory in which this is the current
that drives the electromagnetic field. This is the correct idea, but the most
naive attempt to implement this strategy falls short of perfection. To see
this, imagine a Lagrangian of the form
1
Lwrong = − F αβ Fαβ − (∂ α ϕ∗ ∂α ϕ + m2 |ϕ|2 ) − iAα g αβ ϕ∗ ϕ,β − ϕϕ∗,β . (6.5)

4
This Lagrangian was obtained by simply taking (6.1), substituting (6.4) for
the current, and then adding the KG Lagrangian for the scalar field.2 The
1
After all, this is a course in classical field theory.
2
Recall from Lagrangian mechanics that one can find a Lagrangian for the combination
of mechanical systems by adding their respective Lagrangians.
6.2. MINIMAL COUPLING: THE GAUGE COVARIANT DERIVATIVE135

idea is that the EL equations for A will give the Maxwell equations with the
KG current as the source. The EL equations for the scalar field will now
involve A, but that is reasonable since we expect the presence of the electro-
magnetic field to affect the sources. But there is one big problem with this
Lagrangian: it is no longer gauge invariant! Recall that the gauge invariance
of the Maxwell Lagrangian with prescribed sources made use of the fact that
the current was divergence-free. But now the current is only divergence-free
when the field equations hold. The key to escaping this difficulty is to let
the KG field participate in the gauge symmetry. This forces us to modify
the Lagrangian as we shall now discuss.

6.2 Minimal coupling: the gauge covariant

derivative
The physically correct way to get a gauge invariant Lagrangian for the cou-
pled Maxwell-KG theory, which still gives the j α Aα kind of coupling is rather
subtle and clever. Let me begin by just stating the answer. Then I will try
to show how it works and how one might even be able to derive it from some
new, profound ideas.
The answer is to modify the KG Lagrangian via “minimal coupling”, in
which one replaces
∂α ϕ → Dα ϕ := (∂α + iqAα )ϕ, (6.6)
and
∂α ϕ∗ → Dα ϕ∗ := (∂α − iqAα )ϕ∗ . (6.7)
Here q is a parameter reflecting the coupling strength between the charged
field ϕ and the gauge field. It is a coupling constant. In the quantum field
theory description, q is the bare electric charge of a particle excitation of the
quantum field ϕ. The effect of the electromagnetic field, described by Aα ,
upon the KG field is then described by the Lagrangian
LKG = −Dα ϕ∗ Dα ϕ − m2 |ϕ|2 . (6.8)
This Lagrangian yields field equations which involve the wave operator mod-
ified by terms built from the electromagnetic potential. These additional
terms represent the effect of the electromagnetic field on the charged scalar
field.
136 CHAPTER 6. SCALAR ELECTRODYNAMICS

Problem:

1. Compute the EL equations of LKG in (6.8).

The Lagrangian (6.8) still admits the U (1) phase symmetry of the charged
KG theory, but because this Lagrangian depends explicitly upon A it will
not be gauge invariant unless we include a corresponding transformation of
ϕ. We therefore extend the gauge transformation to be:

Aα −→ Aα + ∂α Λ, (6.9)

ϕ −→ e−iqΛ ϕ, ϕ∗ −→ eiqΛ ϕ∗ . (6.10)

You should now verify that under a gauge transformation we have the fun-
damental relation (which justifies the minimal coupling prescription)

Dα ϕ −→ e−iqΛ Dα ϕ, (6.11)

Dα ϕ∗ −→ eiqΛ Dα ϕ∗ . (6.12)
For this reason Dα is sometimes called the gauge covariant derivative. There
is a nice geometric interpretation of this covariant derivative, which we shall
discuss later. For now, because of this “covariance” property of D, it follows
that the Lagrangian (6.8) is gauge invariant with respect to the transforma-
tions (6.9), (6.10).
The Lagrangian for scalar electrodynamics is taken to be
1
LSED = − F αβ Fαβ − Dα ϕ∗ Dα ϕ − m2 |ϕ|2 . (6.13)
4
We now discuss some important structural features of this Lagrangian.
If we expand the gauge covariant derivatives we see that
1
LSED = − F αβ Fαβ −∂ α ϕ∗ ∂α ϕ−m2 |ϕ|2 +iqAα ϕ∗ ∂ α ϕ − ϕ∂ α ϕ∗ + iqAα |ϕ|2 .

4
(6.14)
This Lagrangian is the sum of the electromagnetic Lagrangian, the free
charged KG Lagrangian, and a j · A “interaction term”. The vector field
that is contracted with A is almost the conserved current (6.4), but is modi-
fied by the last term involving the square of the gauge field, which is needed
6.2. MINIMAL COUPLING: THE GAUGE COVARIANT DERIVATIVE137

for invariance under the gauge transformation (6.9), (6.10) and for the cur-
rent to be conserved when the new form of the field equations are satisfied.
The EL equations for the Maxwell field are of the desired form:
∂β F αβ = −4πJ α , (6.15)
where the current is defined using the covariant derivative instead of the
ordinary derivative:
iq
J α = − (ϕ∗ Dα ϕ − ϕDα ϕ∗ ) . (6.16)
4π
As you will verify in the problem below, this current can be derived from
Noether’s first theorem applied to the U (1) phase symmetry of the La-
grangian (6.13). Thus we have solved the gauge invariance problem and ob-
tained a consistent version of the Maxwell equations with conserved sources
using the minimal coupling prescription.
Problems:
2. Derive (6.15), (6.16) from (6.13).
3. Verify that (6.16) is the Noether current coming from the U (1) sym-
metry of the Lagrangian and that it is indeed conserved when the field
equations for ϕ hold.

One more interesting feature to ponder: the charged current (6.16) serving
as the source for the Maxwell equations is built from the KG field and the
Maxwell field. Physically this means that one cannot say the charge “exists”
only in the KG field. In an interacting system the division between source
fields and fields mediating interactions is somewhat artificial and arbitrary.
This is physically reasonable, if perhaps a little unsettling. Mathematically,
this feature stems from the demand of gauge invariance. Just like the vector
potential, the KG field is no longer uniquely defined - it is subject to a gauge
transformation as well! In the presence of interaction, the computation of
the electric charge involves a gauge invariant combination of the KG and
electromagnetic field. To compute, say, the electric charge contained in a
volume V one should take a solution (A, ϕ) of the coupled Maxwell-KG
equations and substitute it into
Z
1
QV = d3 x iq (ϕ∗ D0 ϕ − ϕD0 ϕ∗ ) . (6.17)
4π V
This charge is conserved and gauge invariant.
138 CHAPTER 6. SCALAR ELECTRODYNAMICS

6.3 Global and Local Symmetries

The key step in constructing the Lagrangian for scalar electrodynamics was to
introduce the coupling between the Maxwell field A and the charged KG field
by replacing in the KG Lagrangian the ordinary derivative with the gauge
covariant derivative. With this replacement, the coupled KG-Maxwell theory
is defined by adding the modified KG Lagrangian to the electromagnetic
Lagrangian. There is a rather deep way of viewing this construction which
we shall now explore.
Let us return to the free, charged KG theory, described by the Lagrangian

L0KG = −(ϕ∗ ,α ϕ,α +m2 |ϕ|2 ). (6.18)

This field theory admits a conserved current

j α = −iq(ϕ∗ ϕ,α −ϕϕ∗ ,α ), (6.19)

which we want to interpret as corresponding to a conserved electric charge

“stored” in the field. Of course, the presence of electric charge in the universe
only manifests itself by virtue of its electromagnetic interactions. How should
the conserved charge in the KG field be interacting? Well, we followed one
rather ad hoc path to introducing this interaction in the last lecture. Let us
revisit the construction with a focus upon symmetry considerations, which
will lead to a very profound way of interpreting and systematizing the con-
struction.
The current j α is conserved because of the global U (1) phase symmetry.
For any α ∈ R this symmetry transformation is

ϕ → e−iqα ϕ, ϕ∗ → eiqα ϕ∗ , (6.20)

where q is a parameter which would ultimately be fixed by experimental

considerations. This transformation shifts the phase of the scalar field by
the same amount α everywhere in space and for all time. This is why the
transformation is called “global”. The phase of the scalar field is defined
by an angle – a point on a circle – and the global U (1) symmetry can be
interpreted as saying that the Lagrangian does not depend on the choice of
origin for that phase. This is analogous to the translational symmetry of
special relativistic theories; the Lagrangian does not depend upon the choice
of origin in a given inertial reference frame.
6.3. GLOBAL AND LOCAL SYMMETRIES 139

The presence of the electromagnetic interaction can be seen as a “localiz-

ing” or “gauging” of this global symmetry so that one is free to redefine the
phase of the field independently at each spacetime event (albeit smoothly).
This “general relativity” of phase is accomplished by demanding that the
theory be modified so that one has the symmetry

ϕ → e−iqα(x) ϕ, ϕ∗ → eiqα(x) ϕ∗ , (6.21)

where α(x) is any function on the spacetime manifold M . Of course, the

original Lagrangian L0KG fails to have this local U (1) transformation as a
symmetry since

∂µ (e−iqα(x) ϕ) = e−iqα(x) ϕ,µ −iqe−iqα(x) ϕα,µ . (6.22)

However, we can introduce a gauge field Aµ , which is affected by the local

U (1) transformation via the gauge transformation:

Aµ −→ Aµ + ∂µ α, (6.23)

and then introduce the covariant derivative

Dα ϕ := (∂α + iqAα )ϕ, (6.24)

and
Dα ϕ∗ := (∂α − iqAα )ϕ∗ , (6.25)
which satisfies
Dµ (e−iqα(x) ϕ) = e−iqα(x) Dµ ϕ. (6.26)
Then with the Lagrangian modified via

∂µ ϕ → Dµ ϕ, (6.27)

so that
LKG = −Dα ϕ∗ Dα ϕ − m2 |ϕ|2 , (6.28)
we get the local U (1) symmetry, as shown previously. Thus the minimal
coupling rule that we invented earlier can be seen as a way of turning the
global U (1) symmetry into a local U (1) gauge symmetry. One also obtains
the satisfying mental picture that the electromagnetic interaction of charges
is the principal manifestation of this local phase symmetry in nature. Thus
140 CHAPTER 6. SCALAR ELECTRODYNAMICS

the electromagnetic interaction is introduced via the principle of local gauge

invariance.
The interaction of the KG field with the electromagnetic field is described
mathematically by the ∂α → Dα prescription described above. But the story
is not complete since we have not given a complete description of the electro-
magnetic field itself. Indeed, even in the absence of the KG field the EM field
has a description in terms of the Lagrangian (5.22). We need to incorporate
the electromagnetic Lagrangian into the total Lagrangian for the system.
But first, how should we think about the electromagnetic Lagrangian from
the point of view the principle of local gauge invariance? The electromag-
netic Lagrangian is the simplest non-trivial relativistic invariant that can be
made from the field strength tensor. The field strength tensor itself can be
viewed as the “curvature” of the gauge covariant derivative, computed via
the commutator:
(Dµ Dν − Dν Dµ ) ϕ = iqFµν ϕ. (6.29)
From this relation it follows immediately that F is gauge invariant; of course
we already knew that F was gauge invariant.

Problem:

4. Verify the result (6.29).

Thus the electromagnetic Lagrangian

1
LEM = − F µν Fµν , (6.30)
4
admits the local U (1) symmetry and can be added to the locally invariant
KG Lagrangian to get the final Lagrangian for the theory

LSED = LKG + LEM . (6.31)

In this way we have an interacting theory designed by local U (1) gauge sym-
metry. The parameter q, which appears via the gauge covariant derivative,
is a “coupling constant” and characterizes the strength with which the elec-
tromagnetic field couples to the charged aspect of the KG field. In the limit
in which q → 0 the theory becomes a decoupled juxtaposition of the non-
interacting (or “free”) charged KG field theory and the non-interacting (free)
6.3. GLOBAL AND LOCAL SYMMETRIES 141

Maxwell field theory. In principle, the parameter q is determined by suitable

experiments.
Scalar electrodynamics still admits the global U (1) symmetry, with α =
const.
ϕ → e−iqα ϕ, ϕ∗ → eiqα ϕ∗ , (6.32)
Aµ → Aµ , (6.33)
as a variational symmetry. Infinitesimally, we can write this transformation
as
δϕ = −iqαϕ, δϕ∗ = iqαϕ∗ , (6.34)
δAµ = 0. (6.35)
From Noether’s first theorem this leads to the identity

Dµ J µ − iq(ϕEϕ − ϕ∗ Eϕ∗ ) = 0, (6.36)

where
iq ∗ α
Jα = − (ϕ D ϕ − ϕDα ϕ∗ ) . (6.37)
4π
Evidently, J is divergence-free when the scalar field equations of motion hold.
J is the conserved Noether current J corresponding to the electric charge
carried by the scalar field. This is the current that serves as source for the
Maxwell field. The presence of the gauge field renders the Noether current
suitably “gauge invariant”, that is, insensitive to the local U (1) transforma-
tion. It also reflects the fact that the equations of motion for ϕ, which must
be satisfied in order for the current to be conserved, depend upon the Maxwell
field as is appropriate since the electromagnetic field affects the motion of its
charged sources.
By construction, the theory of scalar electrodynamics admits the local
U (1) gauge symmetry. With α(x) being any function, the symmetry is

ϕ → e−iqα(x) ϕ, ϕ∗ → eiqα(x) ϕ∗ , (6.38)

Aµ → Aµ + ∂µ α(x). (6.39)
There is a corresponding Noether identity (Noether’s second theorem, re-
member?). To compute it we consider an infinitesimal gauge transformation:

δϕ = −iqα(x)ϕ, δϕ∗ = iqα(x)ϕ∗ , (6.40)

142 CHAPTER 6. SCALAR ELECTRODYNAMICS

δAµ = ∂µ α(x). (6.41)

Since the Lagrangian is gauge invariant we have the identity

0 = δL = Eϕ (−iqα(x)ϕ) + Eϕ∗ (iqα(x)ϕ∗ ) + E µ (∂µ α(x)) + divergence, (6.42)

where Eϕ and Eϕ∗ are the scalar field EL expressions and E µ is the gauge
field EL expression. If we integrate this relation over a compact region and
choose α(x) to vanish at the boundary of this region, then the divergence
term vanishes. We can integrate by parts in the third term to get the Noether
identity
Dµ E µ + iq(ϕEϕ − ϕ∗ Eϕ∗ ) = 0. (6.43)
(Notice that this identity does not follow if the gauge transformation includes
a global part. ) The terms involving the EL expressions for the KG field are
the same as arise in the identity (6.36). Thus the Noether identity (6.43) can
also be written as
Dα Dβ F αβ = 0. (6.44)
Thanks to this Noether identity we can obtain – again! – the conservation
law of electric charge. We have the electromagnetic field equation

E β ≡ Dα F αβ − 4πJ β = 0. (6.45)

Taking the divergence of this equation and using (6.44) we get Dβ J β = 0

when E β = 0.

6.4 A lower-degree conservation law

There is an interesting alternative point of view on the conservation of electric
charge which I would like to mention. Let me set the stage. The conserved
current for scalar electrodynamics,

iq ∗ α
Jα = − (ϕ D ϕ − ϕDα ϕ∗ ) , (6.46)
4π
features in the Maxwell equations via the EL equation E α = 0 for the gauge
field Aα , where
E β = F,α
αβ
− 4πJ β . (6.47)
6.4. A LOWER-DEGREE CONSERVATION LAW 143

Rearranging this formula,

1 αβ 1 β
Jβ = F,α − E , (6.48)
4π 4π
and recalling the discussion of §3.20, you will see that J β is a “trivial” con-
servation law! Mathematically, this result is intimately related to the fact
that the conservation law arises from a symmetry, which contains both a
gauge part and a global part. Physically, it reflects the fact that the electric
charge contained in a given spatial region can be computed using just elec-
tromagnetic data on the surface bounding that region. Indeed, the conserved
electric charge in a 3-dimensional spacelike region V at some time x0 = const.
is given by
Z Z Z
0 1 i0 1 ~
QV = dV J = dV F,i = dS n̂ · E, (6.49)
V 4π V 4π S

where S = ∂V and E i = F 0i is the electric field in the inertial frame with

time t = x0 . This is of course the integral form of the Gauss law.
A more elegant and geometric way to characterize this relationship is via
differential forms and Stokes’ theorem. Recall that if ω is a differential p-form
and V is a (p + 1)-dimensional region with p-dimensional boundary S (e.g.,
V is the interior of a 2-sphere and S is the 2-sphere), then Stokes’ theorem
says Z Z
dω = ω. (6.50)
V S
This generalizes the purely vectorial version of the Stokes’ and divergence the-
orems you learned in multi-variable calculus in Euclidean space to manifolds
of any dimension. As I have mentioned before, we can view the electromag-
netic tensor as a 2-form F via
1
F = Fαβ dxα ⊗ dxβ = Fαβ dxα ∧ dxβ . (6.51)
2
Using the Levi-Civita tensor αβγδ , we can construct the Hodge dual ?F ,
defined by
1
?F = (?F )αβ dxα ∧ dxβ , (6.52)
2
where
1
(?F )αβ = αβ γδ Fγδ . (6.53)
2
144 CHAPTER 6. SCALAR ELECTRODYNAMICS

In terms of F and ?F the Maxwell equations read

dF = 0, d?F = 4πJ , (6.54)

where J is the Hodge dual of the electric current:

1
J = αβγδ J δ dxα ∧ dxβ ∧ dxγ . (6.55)
3!
The conservation of electric current is normally expressed in terms of a
divergence-free vector field J, but this is equivalent to having a closed 3-
form:
dJ = 0, modulo the field equations. (6.56)
Indeed, applying the exterior derivative to both sides of the field equations
(6.54) and using the identity d2 = 0 you see that (6.56) must hold. (This is
just an elegant repackaging of the result at the end of the last section.) In
terms of differential forms, the electric charge in a three-dimensional region
V at a fixed time t and with boundary S is the integral of J over that region:
Z
QV = J. (6.57)
V

When the field equations hold we have

Z Z Z
1 1
QV = J = d?F = ?F, (6.58)
V 4π V 4π S

recovering Gauss’ law in differential form form.

An important application of Stokes’ theorem we will need goes as follows.
Let χ be the integral of the p-form ω over a closed3 p-dimensional space S:
Z
χ= ω. (6.59)
S

Consider any continuous deformation of S into a new (closed) surface S 0 and

let χ0 be the integral of ω over that space:
Z
0
χ = ω. (6.60)
S0
3
To say that S is “closed” means that it has no boundary, ∂S = ∅, e.g., a 2-sphere.
6.4. A LOWER-DEGREE CONSERVATION LAW 145

The relation between these 2 quantities can be obtained using Stokes theo-
rem: Z Z Z Z
0
χ −χ= ω− ω= ω= dω, (6.61)
S0 S S 0 −S V
0
where ∂V = S − S. In particular, if ω is a closed p-form, that is, dω = 0,
then χ = χ0 and the integral χ is independent of the choice of the space S in
the sense that χ is unchanged by any continuous deformation of S. A simple
illustration of this result is the following.

Problem:

5. Consider two concentric circles of radii a and b in the “x-y plane”. Let
V be the area between the circles so that its boundary S consists of the
concentric circles. In Cartesian coordinates with origin at the center of
~ be a vector field defined in V by
the circles, let A

~ = xŷ − yx̂ .
A (6.62)
x2 + y 2

~ vanishes in V . Show that the line integral of

Show that the curl of A
~ around either of the circles gives the same result, independent of a
A
and b.

Consider a charge distribution which is localized in some compact region.

Because the 3-form J is closed when the field equations hold, we can ap-
ply the preceding result to conclude that the integral of J over a spacelike
hypersurface (say, at a fixed time) will be independent of the choice of the
hypersurface. This is just a fancy way of saying the total charge is conserved.
But from (6.58) we have an alternative way to compute the conserved charge
in a compact volume: integrate ?F over a surface enclosing that volume. As
long as the surface is computed outside of the charge distribution the form
?F is closed there and the resulting integral is independent of a continuous
deformation of that surface. This deformation could just be redrawing the
surface at a fixed time, or it could involve evaluating the surface at a different
time. Thus the conservation of electric charge can be expressed in term of a
closed 2-form.
The existence of conservation laws of the traditional sort – divergence-
free currents or closed 3-forms – is tied to the existence of symmetries of an
146 CHAPTER 6. SCALAR ELECTRODYNAMICS

underlying Lagrangian via Noether’s theorem. It is natural to ask if there is

any symmetry-based origin to conserved 2-forms such as we have in source-
free electrodynamics with ?F . The answer is yes. Details would take us
too far afield, but let me just mention that closed 2-forms (in 4 dimensions)
arise in a field theory when (1) the theory admits a gauge symmetry, and (2)
every solution of the field equations admits a gauge transformation which
does not change that solution. Now consider pure electromagnetism in a
region of spacetime with no sources. Of course, criterion (1) is satisfied.
To see that criterion (2) is satisfied consider the gauge transformation by a
constant function.4 It is this symmetry structure which corresponds to the
conservation law associated to ?F in source-free regions. 5

6.5 Scalar electrodynamics and fiber bundles

There is a beautiful geometric interpretation of SED in terms of a famous
mathematical structure called a fiber bundle. I debated with myself for a
long time whether or not to try and describe this to you. I decided that I
could not resist mentioning it, so that those of you who are so-inclined can
get exposed to it. On the other hand, a complete presentation would take
us too far afield and not everybody who is studying this material is going to
be properly prepared (or interested enough!) for a full-blown treatment. So,
if you don’t mind, I will just give a quick and dirty summary of the salient
points. Those who are not ready for this material can just skip it. A more
complete – indeed, a more correct – treatment can be found in many advanced
texts. Of course, the problem is that to use these advanced texts takes
a considerable investment in acquiring prerequisites. The idea of our brief
discussion is to provide a first introductory step in that direction. A technical
point for those who have some background in this stuff: for simplicity in what
follows we shall not emphasize the role of the gauge field as a connection on
a principal bundle, but rather its role as defining a connection on associated
vector bundles.
Recall that our charged KG field can be viewed as a mapping

ϕ : M → C. (6.63)
4
Notice that this is the “global” version of the U (1) symmetry transformation.
5
To read more about such things, have a look at arxiv.org/abs/hep-th/9706092 .
6.5. SCALAR ELECTRODYNAMICS AND FIBER BUNDLES 147

We can view ϕ as a cross section of a fiber bundle

π: E → M (6.64)

where
π −1 (x) = C, x ∈ M. (6.65)
M is called the base space. The space π −1 (x) ≈ C is the fiber over x. Since
C is a vector space, this type of fiber bundle is called a vector bundle. For
us, M = R4 and it can be shown that for a contractible base space such as
Rn there is always a (non-unique) diffeomorphism that makes possible the
global identification:
E ≈ M × C. (6.66)
For a general base space M , such an identification will only be valid locally.
Next, recall that a cross section of E (often just called a “section”) is a
map (or graph)
σ: M → E (6.67)
satisfying
π ◦ σ = idM . (6.68)
Using coordinates (x, z) adapted to (6.66), we can identify a KG field ϕ(x)
with the cross section
σ(x) = (x, ϕ(x)). (6.69)
Thus, given the identification E ≈ M × C, we see that the bundle point
of view just describes the geometric setting of our theory: complex valued
functions on R4 . For the purposes of this discussion the most interesting issue
is that this identification is far from unique. Let us use coordinates (xα , z)
for E, where xα ∈ R4 and z ∈ C. Each set of such coordinates provides an
identification of E with M × C. Since we use a fixed (flat) metric on M ,
one can restrict attention to inertial Cartesian coordinates on M , in which
case one can only redefine xα by a Poincaré transformation. What is more
interesting for us in this discussion is the freedom to redefine the way that
the complex numbers are “glued” to each spacetime event.
Recall that to build the Lagrangian for the charged KG field we also had
to pick an inner product on the vector space C; of course we just used the
standard one
hz, wi = z ∗ w. (6.70)
148 CHAPTER 6. SCALAR ELECTRODYNAMICS

We can therefore restrict attention to linear changes of our coordinates on

C which preserve this inner product. Thus the allowed changes of fiber
coordinates are just the phase transformations

z → e−iα z, α ∈ R. (6.71)

We can make this change of coordinates on C for each fiber so that on π −1 (x)
we make the transformation

z → e−iα(x) z. (6.72)

There is no intrinsic way to compare points on different fibers, and this fact
reflects itself in the freedom to redefine our labeling of those points in a way
that can vary from fiber to fiber. We have seen this already; the change of
fiber coordinates z → e−iα(x) z corresponds to the gauge transformation of
the charged KG field:
ϕ(x) → e−iα(x) ϕ(x). (6.73)
When building a field theory of the charged KG field we need to take
derivatives. Now, to take a derivative means to compare the value of ϕ at
two neighboring points on M . From our fiber bundle point of view, this
means comparing points on two different fibers. Because this comparison is
not defined a priori, there is no natural way to take derivatives of a section of
a fiber bundle. This is closely related to the fact that the ordinary derivative
of the KG field does not transform homogeneously under a gauge transfor-
mation. Thus, for example, to say that a KG field is a constant, ∂α ϕ = 0,
is not an intrinsic statement since a change in the bundle coordinates will
negate it.
A definition of derivatives of sections of the fiber bundle requires the in-
troduction of additional structure beyond the bundle and the metric. (One
often introduces this structure implicitly!) This additional structure is called
a connection and the resulting notion of derivative is called the covariant
derivative defined by the connection. A connection can be viewed as a def-
inition of how to compare points on neighboring fibers. If you are differ-
entiating in a given direction, the derivative will need to associate to that
direction a linear transformation (actually, a phase transformation) which
“aligns” the vector spaces/fibers and allows us to compare them. Since the
derivative involves an infinitesimal motion in M , it turns out that this fiber
transformation is an infinitesimal phase transformation, which involves mul-
tiplication by a pure imaginary number (think: eiα = 1 + iα + . . . ). So,
6.5. SCALAR ELECTRODYNAMICS AND FIBER BUNDLES 149

at each point x ∈ M , a connection assigns an imaginary number to every

direction in M . It can be specified by an imaginary-valued 1-form on M ,
which we write as A = iqAα (x)dxα . The covariant derivative is then

Dα ϕ = (∂α + iqAα )ϕ. (6.74)

The role of the connection A is to define the rate of change of a section by

adjusting the correspondence between fibers relative to that provided by the
given choice of coordinates.
As we have seen, if we make a redefinition of the coordinates on each fiber
by a local gauge transformation, then we must correspondingly redefine the
1-form via
z → e−iα(x) z, Aα → Aµ + ∂µ α. (6.75)
This guarantees that the covariant derivative transforms homogeneously un-
der a gauge transformation in the same way that the KG field itself does,
which means it represents an invariant tensorial quantity on E. In partic-
ular, if a KG field is constant with the given choice of connection then this
remains true in any (fiber) coordinates.
The connection A must be specified to define the charged KG Lagrangian
(6.8). A different choice of connection, even just in a different gauge, changes
the Lagrangian (6.8) and is not a symmetry of the KG field in a given electro-
magnetic field. By contrast, in scalar electrodynamics we view the connec-
tion as one of the dependent variables of the theory and we therefore have,
as we saw, the full gauge invariance since now we can let the connection be
transformed along with the KG field.
As you may know from differential geometry, when using a covariant
derivative one will, in general, lose the commutativity of the derivative op-
eration. The commutator of two covariant derivatives defines the curvature
of the connection. We have already seen that this curvature is precisely the
electromagnetic field strength:

[Dµ Dν − Dν Dµ ] ϕ = iqFµν ϕ. (6.76)

To continue the analogy with differential geometry a bit further, you see that
the field ϕ is playing the role of a vector, with its vector aspect being the
fact that it takes values in the vector space C and transforms homogeneously
under the change of fiber coordinates, that is, the gauge transformation. The
complex conjugate can be viewed as living in the dual space to C, so that it is
150 CHAPTER 6. SCALAR ELECTRODYNAMICS

a “covector”. Quantities like the Lagrangian density, or the conserved electric

current are “scalars” from this point of view – they are gauge invariant. In
particular, the current

iq ∗ µ
Jµ = − (ϕ D ϕ − ϕDµ ϕ∗ ) (6.77)
4π
is divergence free with respect to the ordinary derivative, which is the correct
covariant derivative on “scalars”.
Do we really need all this fancy mathematics? Perhaps not. But, since all
the apparatus of gauge symmetry, covariant derivatives, etc., which show up
repeatedly in field theory, arises so naturally from this geometric structure,
it is clear that this is the right way to be thinking about gauge theories.
Moreover, there are certain results that would, I think, be very hard to come
by without using the fiber bundle point of view. I have in mind certain
important topological structures that can arise via global effects in classical
and quantum field theory. These topological structures are, via the physics
literature, appearing in the guise of “monopoles” and “instantons”. Such
structures would play a very nice role in a second semester for this course, if
there were one.

6.6 PROBLEMS

1. Compute the EL equations of LKG in (6.8).

2. Derive (6.15), (6.16) from (6.13).

3. Verify that (6.16) is the Noether current coming from the U (1) sym-
metry of the Lagrangian and that it is indeed conserved when the field
equations for ϕ hold.

4. Verify the result (6.29).

6.6. PROBLEMS 151

~ = xŷ − yx̂ .
A (6.78)
x2 + y 2
~ vanishes in V . Show that the line integral of
Show that the curl of A
~ around either of the circles gives the same result, independent of a
A
and b.
152 CHAPTER 6. SCALAR ELECTRODYNAMICS
Chapter 7

Spontaneous symmetry
breaking

We now will take a quick look at some of the classical field theoretic underpin-
nings of “spontaneous symmetry breaking” (SSB) in quantum field theory.
Quite generally, SSB can be a very useful way of thinking about phase tran-
sitions in physics. In particle physics, SSB is used, in collaboration with the
“Higgs mechanism”, to give masses to gauge bosons (and other elementary
particles) without destroying gauge invariance.

7.1 Symmetry of laws versus symmetry of

states
To begin to understand spontaneous symmetry breaking in field theory we
need to refine our understanding of “symmetry”, which is the goal of this
section. The idea will be that there are two related kinds of symmetry one
can consider: symmetry of the “laws” governing the field (i.e., the field
equations), and symmetries of the “states” of the field (i.e., the solutions to
the field equations).
So far we have been studying “symmetry” in terms of transformations
of a field which preserve the Lagrangian, possibly up to a divergence. For
our present aims, it is good to think of this as a “symmetry of the laws
of physics” in the following sense. The Lagrangian determines the “laws of
motion” of the field via the Euler-Lagrange equations. As was pointed out in
Chapter 3, symmetries of a Lagrangian are also symmetries of the equations

153
154 CHAPTER 7. SPONTANEOUS SYMMETRY BREAKING

of motion. This means that if ϕ is a solution to the equations of motion and

if ϕ̃ is obtained from ϕ via a symmetry transformation, then ϕ̃ also satisfies
the same equations of motion. Just to make sure this is clear, let me exhibit
a very elementary example.
Consider the massless KG field described by the Lagrangian density:
1
L = − ∂α ϕ∂ α ϕ. (7.1)
2
It is easy to see that this Lagrangian admits the symmetry

ϕ̃ = ϕ + const. (7.2)

You can also easily see that the field equations

∂ α ∂α ϕ = 0 (7.3)

admit this symmetry in the sense that if ϕ is solution then so is ϕ + const.

Thus a symmetry of a Lagrangian is also a symmetry of the field equations
and we will sometimes call it a symmetry of the law governing the field.

Problem:
1. While every symmetry of a Lagrangian is a symmetry of its EL equa-
tions, it is not true that every symmetry of the field equations is a
symmetry of the Lagrangian. Consider the massless KG field. Show
that the scaling transformation ϕ̃ = (const.)ϕ is a symmetry of the
field equations but is not a symmetry of the Lagrangian.

If the Lagrangian and its field equations represent the “laws”, then the
solutions of the field equations are the “states” of the field that are allowed by
the laws. The function ϕ(x) is an allowed state of the field when it solves the
field equations. A symmetry of a given “state”, ϕ0 (x) say, is then defined
to be a transformation of the fields, ϕ → ϕ̃[ϕ], which preserves the given
solution
ϕ̃[ϕ0 (x)] = ϕ0 (x). (7.4)
Since symmetry transformations form a group, such solutions to the field
equations are sometimes called “group-invariant solutions”.
Let us consider an elementary example of group-invariant solutions. Con-
sider the KG field with mass m. Use inertial Cartesian coordinates. We have
7.1. SYMMETRY OF LAWS VERSUS SYMMETRY OF STATES 155

seen that the spatial translations, xi → xi + const., i = 1, 2, 3, form a group

of symmetries of the theory. Functions which are invariant under the group
of spatial translations will depend upon t only: ϕ = ϕ(t). It is easy to see
that the corresponding group invariant solutions to the field equations are of
the form:
ϕ = A cos(mt) + B sin(mt), (7.5)
where A and B are constants. Another very familiar type of example of
group-invariant solutions you will have seen by now occurs whenever you are
finding rotationally invariant solutions of PDEs.

Problem:
2. Derive the result (7.5).

An important result from the geometric theory of differential equations

which relates symmetries of laws to symmetries of states goes as follows.
Suppose G is a group of symmetries of a system of differential equations
∆ = 0 for fields ϕ on a manifold M , (e.g., G is the Poincaré group). Let
K ⊂ G be a subgroup (e.g., spatial rotations). Suppose we are looking for
solutions to ∆ = 0 which are invariant under K. Then the field equations
∆ = 0 reduce to a system of differential equations ∆ ˆ = 0 for K-invariant
1
fields ϕ̂ on the reduced space M/K.
As a simple and familiar example, consider the Laplace equation for func-
tions on R3 ,
∂x2 ϕ + ∂y2 ϕ + ∂z2 ϕ = 0. (7.6)
The Laplace equation is invariant under the whole Euclidean group G con-
sisting of translations and rotations. Consider the subgroup K = SO(3)
consisting of rotations. The rotationally invariant functions are of the form2
p
ϕ(x, y, z) = f (r), r = x2 + y 2 + z 2 . (7.7)

Rotationally invariant solutions to the Laplace equation are characterized by

a reduced field f satisfying a reduced differential equation on the reduced
1
The quotient space M/K is the set of orbits of K in M . Equivalently it is the set of
equivalence classes of points in M where two points are equivalent if they can be related
by an element of the transformation group K.
2
Some boundary conditions have to be imposed at r = 0, but we will not worry about
that here.
156 CHAPTER 7. SPONTANEOUS SYMMETRY BREAKING

space R+ = R3 /SO(3) given by

1 d 2 df
(r ) = 0. (7.8)
r2 dr dr
This is the principal reason one usually makes a “symmetry ansatz” for
solutions to field equations which involves fields invariant under a subgroup
K of the symmetry group G of the equations. It is not illegal to make other
kinds of ansatzes, of course, but most will lead to inconsistent equations or
equations with trivial solutions.
Having said all this, I should point out that just because you ask for
group invariant solutions according to the above scheme it doesn’t mean you
will find any! There are two reasons for this. First of all, it may be that
there are no (non-trivial) fields invariant with respect to the symmetry group
you are trying to impose on the state. For example, consider the symmetry
group ϕ → ϕ + const. we mentioned earlier for the massless KG equation.
You can easily see that there are no functions which are invariant under
that transformation group. Secondly, the reduced differential equation may
have no (or only trivial) solutions, indicating that no (interesting) solutions
exist with that symmetry. Finally, I should mention that not all states have
symmetry - indeed the generic states are completely asymmetric. States
with symmetry are special, physically simpler states than what you expect
generically.
To summarize, field theories may have two types of symmetry. There
may be a group G of symmetries of its laws – the symmetry group of the
Lagrangian (and field equations). There can be symmetries of states, that
is, there may be a transformation group (usually a subgroup of G) which
preserves certain solutions to the field equations.

7.2 The “Mexican hat” potential

Let us now turn to a class of examples which serve to illustrate the preced-
ing remarks and which we shall use to understand spontaneous symmetry
breaking. We have actually seen these examples before.
We start by considering the real KG field with the double-well potential:

1 1 1
L0 = − ∂α ϕ∂ α ϕ − (− a2 ϕ2 + b2 ϕ4 ). (7.9)
2 2 4
7.2. THE “MEXICAN HAT” POTENTIAL 157

As usual, we are working in Minkowski space with inertial Cartesian coor-

dinates. This Lagrangian admits the Poincaré group as a symmetry group.
It also admits the symmetry ϕ → −ϕ, which forms a 2 element discrete
subgroup Z2 of the symmetry group of the Lagrangian. In the Problems
in Chapter 3 we identified 3 simple solutions to the field equations for this
Lagrangian:
a
ϕ = 0, ± , (7.10)
b
where a and b are constants. These solutions are highly symmetric: they
admit the whole Poincaré group of symmetries, as you can easily verify.
Because Z2 is a symmetry of the Lagrangian it must be a symmetry of the
field equations, mapping solutions to solutions. You can verify that this is
the case for the solutions (7.10). Thus the group consisting of the Poincaré
group and Z2 form a symmetry of the law governing the field. The 3 solutions
in (7.10) represent 3 (of the infinite number of) possible solutions to the field
equations – they are possible states of the field. The states represented by
ϕ = ± ab have Poincaré symmetry, but not Z2 symmetry. In fact the Z2
transformation maps between the solutions ϕ = ± ab . The state represented
by ϕ = 0 has both the Poincaré and the Z2 symmetry. The solution ϕ = 0
thus has more symmetry than the states ϕ = ± ab .
Let us consider the energetics of these highly symmetric solutions ϕ =
const. In an inertial reference frame with coordinates xα = (t, xi ), the con-
served energy in a spatial volume V for this non-linear field is easily seen
(from Noether’s theorem) to be
Z
1 2 1 i 1 2 2 1 2 4
E= dV ϕ + ϕ,i ϕ, − a ϕ + b ϕ . (7.11)
V 2 ,t 2 2 4
You can easily check that the solutions given by ϕ = 0, ± ab are critical points
of this energy functional.

Problem:
3. Compute the first variation δE of the functional (7.11). Show that it
vanishes when evaluated on fields ϕ = 0, ± ab .

Given the double-well shape of the potential, it might be intuitively clear

that these solutions ought to represent global minima at ϕ = ± ab and a
local maximum at ϕ = 0. But let us see how one might try to prove it.
158 CHAPTER 7. SPONTANEOUS SYMMETRY BREAKING

Consider the change in the energy to quadratic-order in a displacement u =

u(t, x, y, z) from equilibrium in each case. The idea is that a displacement
from a local minimum can only increase the energy and a displacement from
a local maximum can only decrease the energy. To look into this, we assume
that u has compact support for simplicity. We write ϕ = ϕ0 + u where ϕ0 is
a constant and expand E to quadratic order in u. We get
V a4
Z
1 2 a
E=− 2
+ dV (u,t + u,i u, ) + a u + O(u3 ), when ϕ0 = ±
i 2 2
4b V 2 b
(7.12)
and
Z
1 2 1 2 2
E= dV i
(u,t + u,i u, ) − a u + O(u3 ), when ϕ0 = 0. (7.13)
V 2 2
Evidently, as we move away from ϕ = ± ab the energy increases so that the
critical points ϕ = ± ab represent local minima. The situation near ϕ = 0
is less obvious. One thing is for sure: by choosing functions u which are
suitably “slowly varying”, one can ensure that the energy becomes negative
in the vicinity of the solution ϕ = 0 so that ϕ = 0 is a saddle point if not
a local maximum. We conclude that the state ϕ = 0 – the state of highest
symmetry – is unstable and will not be seen “in the real world”. On the
other hand we expect the critical points ϕ = ± ab to be stable. They are in
fact the states of lowest energy and represent the possible ground states of
the classical field. Evidently, the lowest energy is doubly degenerate.
Because the stable ground states have less symmetry than possible, one
says that the ground state has “spontaneously broken” the (maximal) sym-
metry group Z2 × Poincaré to just the Poincaré group. This terminology is
useful, but can be misleading. The theory retains the full symmetry group
as a symmetry of its laws, of course. Compare this with, say, the ordinary
Klein-Gordon theory with the Lagrangian
1
L = − ∂α ϕ∂ α ϕ − m2 ϕ2 . (7.14)
2
You can easily check that the solution ϕ = 0 is the global minimum energy
state of the theory and that it admits the full symmetry group Z2 × Poincaré.
There is evidently no spontaneous symmetry breaking here.
Let us now generalize this example by allowing the scalar field to become
complex, ϕ : M → C, with Lagrangian
1 1 1
L = − ∂α ϕ ∂ α ϕ∗ − (− a2 |ϕ|2 + b2 |ϕ|4 ). (7.15)
2 2 4
7.2. THE “MEXICAN HAT” POTENTIAL 159

We assume a ≥ 0, b ≥ 0. This Lagrangian still admits the Poincaré symme-

try, but the discrete Z2 symmetry has been enhanced into a continuous U (1)
symmetry. Indeed, it is pretty obvious that the transformation

ϕ → eiα ϕ, α∈R (7.16)

is a symmetry of L. If you graph this potential in Cartesian coordinates

(x, y, z) with x = <(ϕ), y = =(ϕ) and z = V , then you will see that the
graph of the double well potential has been extended into a surface of revo-
lution about z with the resulting shape being of the famous “Mexican hat”
form. From this graphical point of view, the U (1) phase symmetry of the La-
grangian corresponds to symmetry of the graph of the potential with respect
to rotations in the x-y plane.
Let us again consider the simplest possible states of the field, namely, the
ones which admit the whole Poincaré group as a symmetry group. These
field configurations are necessarily constants, and you can easily check that
in order for a constant field to solve the EL equations the constant must be
a critical point of the potential viewed as a function in the complex plane.
So, ϕ = const. is a solution to the field equations if and only if
b2 1
ϕ|ϕ|2 − a2 ϕ = 0. (7.17)
4 2
There is an isolated solution ϕ = 0, and (now assuming b > 0) a continuous
family of solutions characterized by
a
|ϕ| = . (7.18)
b
The solution ϕ = 0 “sits” at the local maximum of the potential at the top
of the “hat”. The solutions (7.18) sit at the circular set of global minima
of the potential. As you might expect, the transformation (7.16) maps the
solutions (7.18) among themselves. To see this explicitly, write the general
form of ϕ satisfying (7.18) as
a
ϕ = eiθ , θ ∈ R. (7.19)
b
The U (1) symmetry transformation (7.16) then corresponds to θ → θ + α.
The U (1) transformation is a symmetry of the state ϕ = 0. Thus the solution
ϕ = 0 has more symmetry than the family of solutions characterized by
(7.18).
160 CHAPTER 7. SPONTANEOUS SYMMETRY BREAKING

The stability analysis of these highly symmetric states of the complex

scalar field generalizes from the double well example as follows. (I will
spare you most of the details of the computations, but you might try to
fill them in as a nice exercise.) In an inertial reference frame with coordi-
nates xα = (t, xi ), the conserved energy for this non-linear field is easily seen
(from Noether’s theorem) to be
Z
3 1 2 1 ∗i 1 2 2 1 2 4
E= dx |ϕ,t | + ϕ,i ϕ, − a |ϕ| + b |ϕ| . (7.20)
2 2 2 4

You can easily check that the solutions given by |ϕ| = 0, a/b are critical
points of this energy functional. As before, the maximally symmetric state
ϕ = 0 is unstable. The circle’s worth of states (7.18) are quasi-stable in the
following sense. Any displacement in field space yields a non-negative change
in energy. To see this, write
ϕ = ρeiΘ , (7.21)

where ρ and Θ are spacetime functions. The energy takes the form
Z
3 1 2 1 i 1 2 2 i 1 2 2 1 2 4
E= dx ρ + ρ,i ρ, + ρ (Θ,t + Θ,i Θ, ) − a ρ + b ρ . (7.22)
2 ,t 2 2 2 4

The critical points of interest lie at ρ = ab , Θ = const. Expanding the energy

in displacements (δρ, δΘ) from equilibrium yields

1 a4
Z
3 1 2 1 i 1 a 2 2 i 2 2
E=− 2+ dx δρ + δρ,i δρ, + (δΘ,t + δΘ,i δΘ, ) + a δρ
4b 2 ,t 2 2 b
(7.23)
Evidently, all displacements except δρ = 0, δΘ = const. increase the energy.
The displacements δρ = 0, δΘ = const. do not change the energy, as you
might have guessed, since they correspond to displacements along the circular
locus of minima of the potential energy function. The states (7.18) are the
lowest energy states – the ground states. Thus the lowest energy is infinitely
degenerate – the set of ground states (7.18) is topologically a circle. That
these stable states form a continuous family and have less symmetry than the
unstable state will have some physical ramifications which we will unravel
after we take a little detour.
7.3. DYNAMICS NEAR EQUILIBRIUM AND GOLDSTONE’S THEOREM161

7.3 Dynamics near equilibrium and Goldstone’s

theorem
A significant victory for classical mechanics is the complete characterization
of motion near stable equilibrium in terms of normal modes and character-
istic frequencies of vibration. It is possible to establish analogous results in
classical field theory via the process of linearization. This is even useful when
one considers the associated quantum field theory: one can interpret the lin-
earization of the classical field equations as characterizing particle states in
the Fock space built on the vacuum state whose classical limit corresponds to
the ground state about which one linearizes. If this seems a little too much
to digest, that’s ok – the point of this section is to at least make it easier to
swallow.
Let us begin again with our simplest example: the real KG field with
the double-well potential. Suppose that ϕ0 is a given solution to the field
equations. Any “nearby” solution we will denote by ϕ and we define the
difference to be δϕ.
δϕ = ϕ − ϕ0 . (7.24)
The field equation is the non-linear PDE:

ϕ + a2 ϕ − b2 ϕ3 = 0. (7.25)

Using (7.24) we substitute ϕ = ϕ0 + δϕ. We then do 2 things: (1) we use the

fact that ϕ0 is a solution to the field equations; (2) we assume that δϕ is in
some sense “small” so we can approximate the field equations in the vicinity
of the given solution ϕ0 by ignoring quadratic and cubic terms in δϕ. We
thus get the field equation linearized about the solution ϕ0 :

δϕ + (a2 − 3b2 ϕ20 )δϕ = 0. (7.26)

This result can be obtained directly from the variational principle.

Problem:

4. Using (7.24) expand the action functional for (7.25) (see (7.9)) to
quadratic order in δϕ. Show that this approximate action, viewed
as an action functional for the displacement field δϕ, has (7.26) as its
Euler-Lagrange field equation.
162 CHAPTER 7. SPONTANEOUS SYMMETRY BREAKING

Evidently, the linearized equation (7.26) is a linear PDE for the displace-
ment field δϕ. In general, this linear PDE has variable coefficients due to the
presence of ϕ0 . But if the given solution ϕ0 is a constant in spacetime, the
linearized PDE is mathematically identical to a Klein-Gordon equation for
δϕ with mass given by (−a2 +3b2 ϕ20 ). The mass at the minima, ϕ0 = ±a/b, is
2a2 . The mass at the maximum, ϕ0 = 0, is −a2 . The negative mass-squared
is a symptom of the instability of this state of the field.
From the way the linearized equation is derived, you can easily see that
any displacement field δϕ constructed as an infinitesimal symmetry of the
field equations will automatically satisfy the linearized equations when the
fields being used to build δϕ satisfy the field equations. Indeed, this fact
is the defining property of an infinitesimal symmetry of the field equations.
Here is a simple example.

Problem:
5. Consider time translations ϕ(t, x, y, z) → ϕ̃ = ϕ(t+λ, x, y, z). Compute
the infinitesimal form δϕ of this transformation as a function of ϕ and
its derivatives. Show that if ϕ solves the field equation coming from
(7.9) then δϕ solves the linearized equation (7.26).

All the preceding results easily generalize to the U (1)-invariant complex

scalar field case, but a new and important feature emerges which leads to
an instance of (the classical limit of) a famous result known as “Goldstone’s
theorem”. Let us go through it carefully.
Things will be most transparent in the polar coordinates (7.21). The
Lagrangian density takes the form
1 1 1 1
L = − ∂α ρ ∂ α ρ − ρ2 ∂α Θ ∂ α Θ + a2 ρ2 − b2 ρ4 . (7.27)
2 2 2 4
The EL equations take the form

ρ − ρ∂α Θ ∂ α Θ + a2 ρ − b2 ρ3 = 0, (7.28)

∂α (ρ2 ∂ α Θ) = 0. (7.29)
There are two things to notice here. First, the symmetry under Θ → Θ +
const. is manifest – only derivatives of Θ appear. Second, the associated
conservation law is the content of (7.29).
7.3. DYNAMICS NEAR EQUILIBRIUM AND GOLDSTONE’S THEOREM163

Let us consider the linearization of these field equations about the circle’s
worth of equilibria ρ = a/b, Θ = const. We could proceed precisely as
before, of course. But it will be instructive to perform the linearization in
the Lagrangian. To this end we write
a
ρ= + δρ, Θ = θ + δΘ, (7.30)
b
where θ = const., and we expand the Lagrangian to quadratic order in δρ,
δΘ:
1 1 a 2 1 a4
L = − ∂α δρ ∂ α δρ − ∂α δΘ ∂ α δΘ − 2 − a2 δρ2 + O(δϕ3 ). (7.31)
2 2 b 4b
Evidently, in a suitably small neighborhood of equilibrium the complex scalar
field can be viewed as 2 real scalar fields (δρ, δΘ); one of the fields (δρ) has
a mass m = a and the other field (δΘ) is massless.
To get a feel for what just happened, let us consider a very similar U (1)
symmetric theory, just differing in the sign of the quadratic potential term
in the Lagrangian. The Lagrangian density is
1 1
L0 = − ∂α ϕ ∂ α ϕ∗ − µ2 |ϕ|2 − b2 |ϕ|4 . (7.32)
2 4
There is only a single Poincaré invariant critical point, ϕ = 0, which is a
global minimum of the energy and which is also U (1) invariant, so the U (1)
symmetry is not spontaneously broken in the ground state. In the vicinity
of the ground state the linearized Lagrangian takes the simple form3
1
L0 = − ∂α δϕ ∂ α δϕ∗ − µ2 |δϕ|2 + O(δϕ3 ). (7.33)
2
Here of course we have the Lagrangian of a complex-valued KG field δϕ with
mass µ; equivalently, we have two real scalar fields with mass µ.
To summarize thus far: With a complex KG field described by a potential
such that the ground state shares all the symmetries of the Lagrangian, the
physics of the theory near the ground state is that of a pair of real, massive
KG fields. Using instead the Mexican hat potential, the ground state of the
complex scalar field does not share all the symmetries of the Lagrangian –
there is spontaneous symmetry breaking – and the physics of the field theory
3
We do not use polar coordinates which are ill-defined at the origin.
164 CHAPTER 7. SPONTANEOUS SYMMETRY BREAKING

near equilibrium is that of a pair of scalar fields, one with mass and one
which is massless.
To some extent, it is not too hard to understand a priori how these re-
sults occur. In particular, we can see why a massless field emerged from the
spontaneous symmetry breaking. For Poincaré invariant solutions – which
are constant functions in spacetime – the linearization of the field equations
about a Poincaré invariant solution involves: (1) the derivative terms in the
Lagrangian, which are quadratic in the fields and, since the Poincaré in-
variant state is constant, are the same in the linearization as for the full
Lagrangian; (2) the Taylor expansion of the potential V (ϕ) to second order
about the constant equilibrium solution ϕ0 . Because of (1), the mass terms
comes solely from the expansion of the potential in (2). Because of the U (1)
symmetry of the potential, through each point in the set of field values there
will be a curve (with tangent vector given by the infinitesimal symmetry)
along which the potential will not change. Because the symmetry is broken,
this curve connects all the ground states of the theory. Taylor expansion
about the ground state in this symmetry direction can yield only vanishing
contributions because the potential has vanishing derivatives in that direc-
tion. Thus the broken symmetry direction(s) defines direction(s) in field
space which correspond to massless fields in an expansion about equilibrium.
This is the essence of the (classical limit of the) Goldstone theorem: to each
broken continuous symmetry generator there is a massless field.

7.4 The Abelian Higgs model

The Goldstone result in conjunction with minimal coupling to an electro-
magnetic field yields a very important new behavior known as the “Higgs
phenomenon”. This results from the interplay of the spontaneously U (1)
symmetry and the local gauge symmetry. We start with a charged self-
interacting scalar field coupled to the electromagnetic field; the Lagrangian
density is
1
L = − F αβ Fαβ − Dα ϕ∗ Dα ϕ − V (ϕ). (7.34)
4
We will again choose the potential so that spontaneous symmetry breaking
occurs:
1 1
V (ϕ) = − a2 |ϕ|2 + b2 |ϕ|4 . (7.35)
2 4
7.4. THE ABELIAN HIGGS MODEL 165

To see what happens in detail, we return to the polar coordinates (7.21).

The Lagrangian takes the form:

1 1 1 1 1
L = − F αβ Fαβ − ∂α ρ∂ α ρ − ρ2 (∂α Θ + qAα )(∂ α Θ + qAα ) + a2 ρ2 − b2 ρ4 .
4 2 2 2 4
(7.36)
The Poincaré invariant ground state(s) can be determined as follows. As
we have observed, a Poincaré invariant function ϕ is necessarily a constant.
Likewise, it is too not hard to see that the only Poincaré invariant (co)vector
is the the zero (co)vector Aα = 0. Consequently, the Poincaré invariant
ground state, as before, is specified by
a
ρ= , Θ = const. Aα = 0. (7.37)
b

As before, the U (1) symmetry of the theory is not a symmetry of this state,
but instead maps these states among themselves. As before, we want to
expanding to quadratic order about the ground state. To this end we write
a
Aα = 0 + δAα , ρ= + δρ, Θ = Θ0 + δΘ, (7.38)
b
where Θ0 is a constant. We also define

1
Bα = δAα + ∂α δΘ. (7.39)
q

Ignoring terms of cubic and higher order in the displacements (δA, δρ, δΘ)
we then get

1 1 aq 2
L ≈ − (∂α Bβ − ∂β Bα )(∂ α B β − ∂ β B α ) − Bα B α
4 2 b
1 1
− ∂α δρ ∂ α δρ − a2 δρ2
2 2
(7.40)

Problem:

6. Starting from (7.34) derive the results (7.36) and (7.40).

166 CHAPTER 7. SPONTANEOUS SYMMETRY BREAKING

As you can see, excitations of ρ around the ground state are those of a
scalar field with mass a, as before. To understand the rest of the Lagrangian
(7.40) we need to understand the Proca Lagrangian:
1 1
LP roca = − (∂α Bβ − ∂β Bα )(∂ α B β − ∂ β B α ) − κ2 Bα B α . (7.41)
4 2
For κ = 0 this is just the usual electromagnetic Lagrangian. Otherwise...

Problem:

7. Assuming κ 6= 0, show that the Euler-Lagrange equations for Bα de-

fined by (7.41) are equivalent to

( − κ2 )Bα = 0, ∂ α Bα = 0. (7.42)

Each component of Bα behaves as a Klein-Gordon field with mass κ. The

divergence condition means that there are only 3 independent, positive energy
scalar fields in play, and it can be shown that these field equations define an
irreducible representation of the Poincaré group corresponding to a massive
spin-1 field. Inasmuch as the Lagrangian reduces to the electromagnetic
Lagrangian in the limit κ → 0, one can interpret the Proca field theory as
the classical limit of a theory of massive photons.4 Notice that the Proca
theory does not admit the gauge symmetry of electromagnetism because of
the “mass term” in its Lagrangian. Gauge symmetry is a special feature of
field theories describing massless particles.
The punchline here is that spontaneous symmetry breaking coupled with
gauge symmetry leads to a field theory whose dynamics near the ground state
is that of a massive scalar and a massive vector field. This is the simplest
instance of the “Higgs phenomenon”.
It is interesting to ask what became of the gauge symmetry and what
became of the massless “Goldstone boson” which appeared when we studied
spontaneous symmetry breaking of the scalar field without the gauge field.
First of all, the U (1) gauge symmetry of the theory – the symmetry of the
law – is intact, but hidden in the linearization about a non-gauge invariant
ground state. The direction in field space (the Θ direction) which would
have yielded the massless Goldstone field is now a direction corresponding to
4
The experimental upper limit on κ is very, very small – but it isn’t zero!
7.5. PROBLEMS 167

the U (1) gauge symmetry. The change of variables (7.39) which we used to
put the (linearized) Lagrangian into the Proca form amounts to modifying
A by a gauge transformation. The effect of this is that the field Θ provides
a “longitudinal” mode to the field B, corresponding to its acquisition of a
mass.
Finally, we point out that the Higgs phenomenon can be generalized con-
siderably. We do not have time to go into it here, but the idea is as follows.
Consider a system of matter fields with symmetry group of its Lagrangian
being a continuous group G and with a ground state which breaks that sym-
metry to some subgroup of G. Couple these matter fields to gauge fields, the
latter with gauge group which includes G. Excitations of the theory near the
ground state will have the gauge fields corresponding to G acquiring a mass.
This is precisely how the W and Z bosons of the weak interaction acquire
their effective masses at (relatively) low energies.

7.5 PROBLEMS

1. While every symmetry of a Lagrangian is a symmetry of its EL equa-

tions, it is not true that every symmetry of the field equations is a
symmetry of the Lagrangian. Consider the massless KG field. Show
that the scaling transformation ϕ̃ = (const.)ϕ is a symmetry of the
field equations but is not a symmetry of the Lagrangian.

2. Derive the result (7.5).

3. Compute the first variation δE of the functional (7.11). Show that it

vanishes when evaluated on fields ϕ = 0, ± ab .

4. Using (7.24) expand the action functional to quadratic order in δϕ.

Show that this approximate action, viewed as an action functional for
the displacement field δϕ, has (7.26) as its Euler-Lagrange field equa-
tion.
168 CHAPTER 7. SPONTANEOUS SYMMETRY BREAKING

5. Consider time translations ϕ(t, x, y, z) → ϕ̃ = ϕ(t+λ, x, y, z). Compute

the infinitesimal form δϕ of this transformation as a function of ϕ and
its derivatives. Show that if ϕ solves the field equation coming from
(7.9) then δϕ solves the linearized equation (7.26).

6. Starting from (7.34) derive the results (7.36) and (7.40).

7. Assuming κ 6= 0, show that the Euler-Lagrange equations for Bα de-

fined by (7.41) are equivalent to

( − κ2 )Bα = 0, ∂ α Bα = 0. (7.43)
Chapter 8

The Dirac field

Recall that spin is an “internal degree of freedom”, that is, an internal bit
of structure for the particle excitations of a quantum field whose classical
approximation we have been studying. The spin gets its name since it con-
tributes to the angular momentum conservation law. In quantum field the-
ory, the KG equation is an equation useful for describing relativistic particles
with spin 0; the Maxwell and Proca equations are equations used to describe
particles with spin 1. One of the many great achievements of Dirac was to
devise a system of differential equations that is perfectly suited for studying
the quantum theory of relativistic particles with “spin 12 ”. This is the Dirac
equation and it used in quantum field theory to describe electrons, neutri-
nos, quarks, etc. Originally, this equation was obtained from trying to find
a relativistic analog of the Schrödinger equation. The idea is that one wants
a PDE that is first order in time, unlike the KG or the 4-potential form of
Maxwell equations. Here we shall simply define the Dirac equation without
trying to give details regarding its historical derivation. Then we will explore
some of the simple field theoretic properties associated with the equation.

8.1 The Dirac equation

Given our flat Minkowski spacetime,
(M, g) = (R4 , η), (8.1)
the classical Dirac field can be viewed as a mapping
ψ : M → C4 . (8.2)

169
170 CHAPTER 8. THE DIRAC FIELD

To define the field equations, we introduce four linear transformations on C4 ,

γµ : C4 → C4 , µ = 0, 1, 2, 3 (8.3)

satisfying the anti-commutation relations

γµ γν + γν γµ = 2ηµν I. (8.4)

Such linear transformations define a representation of what is known as a

Clifford algebra. We shall see why such transformations are relevant when we
discuss how the Dirac fields provide a representation of the Poincaré group.
The matrix forms of the γµ are intimately related to the Pauli matrices that
we considered earlier. Recall that these matrices are given by

0 1 0 −i 1 0
σ1 = σ2 = , σ3 = . (8.5)
1 0 i 0 0 −1

We can define
iI 0
γ0 = , (8.6)
0 −iI
and
0 −iσj
γj = , j = 1, 2, 3. (8.7)
iσj 0
Here we are using a 2 × 2 block matrix notation.
The γ-matrices are often called, well, just the “gamma matrices”. They
are also called the “Dirac matrices”. The Dirac matrices satisfy

γ0† = −γ0 , γi† = γi , (8.8)

and
γ0† = −γ0−1 , γi† = γi−1 (8.9)
The Dirac matrices are not uniquely determined by their anti-commutation
relations. It is possible to find other, equivalent representations. How-
ever, Pauli showed that if two sets of matrices γµ and γµ0 satisfy the anti-
commutation relations and the Hermiticity relations of the gamma matrices
as shown above, then there is a unitary transformation U on C4 such that

γµ0 = U −1 γµ U. (8.10)
8.1. THE DIRAC EQUATION 171

So, given the freedom to perform unitary transformations, i.e., changes of

basis preserving the inner product on C4 , we have all the Dirac matrices.
With all this structure in place, we can now define the Dirac equation.
Using matrix multiplication, it is given by

γ µ ∂µ ψ + mψ = 0. (8.11)

Here we raise the “spacetime index” on γµ with the flat spacetime metric:

γ µ = η µν γν . (8.12)

Also, the derivative operation is understood to be applied to each component

of the Dirac field. Thus, if  
α
β 
ψ= γ  ,
 (8.13)
δ
then  
∂µ α
∂µ β 
∂µ ψ = 
 ∂µ γ  ,
 (8.14)
∂µ δ
The Dirac equation is a coupled system of 4 complex (or 8 real) first-order,
linear PDEs with constant coefficients (in Cartesian coordinates).
In the quantum description of the Dirac field the parameter m plays the
role of the mass of the particle excitations of the field. Indeed, it is easy to
see that if ψ is a solution of the Dirac equation, then each component of the
Dirac field satisfies a KG-type of equation with mass m. To see this simply
apply the Dirac operator
γ ν ∂ν + mI (8.15)
to the left hand side of the Dirac equation (here I is the identity transfor-
mation on C4 .). You get

(γ ν ∂ν + mI)(γ µ ∂µ ψ + mψ) = ψ + 2mγ µ ∂µ ψ + m2 ψ. (8.16)

Now suppose that ψ is a solution of the Dirac equation so that

γ µ ∂µ ψ = −mψ, (8.17)
172 CHAPTER 8. THE DIRAC FIELD

then we have that

0 = (γ ν ∂ν + mI)(γ µ ∂µ ψ + mψ)
= ( − m2 )ψ, (8.18)

where and m2 are operating on ψ component-wise, that is

≡ I, m2 ≡ m2 I. (8.19)

Note: That each component of the Dirac field satisfies the KG equation is
necessary for ψ to satisfy the Dirac equation, but it is not sufficient.
The preceding computation allows us to view the Dirac operator as a
sort of “square root” of the KG operator. This is one way to understand the
utility of the Clifford algebra.
I have no choice but to assign you the following.

Problem:

1. Verify that the Dirac γ matrices shown in (8.6), (8.7) satisfy

{γµ , γν } := γµ γν + γν γµ = 2ηµν I. (8.20)

(Although it is easy to check this by hand, I would recommend using a

computer for this since it is good to learn how to make the computer
do these boring tasks.)

8.2 Representations of the Poincaré group

The Dirac equation was simply postulated in the previous section. It is worth
spending a little time trying to understand why it is the way it is and why
one might find it useful. These questions can be answered on a number of
levels, but probably the most outstanding aspect of the Dirac theory is that
it provides a new projective, unitary representation of the Poincaré group
called a spinor representation. To understand this statement, first recall the
discussion in §2.3 where I sketched how solutions of a field equation can be
used to build a Hilbert space of one particle states in quantum theory. Sym-
metry transformations of the theory, such as the Poincaré group, will map
solutions of the field equation to other solutions. Physically, this represents
the change in state which occurs when one views a physical system from a
8.2. REPRESENTATIONS OF THE POINCARÉ GROUP 173

different inertial reference frame. The Poincaré group is thus represented by

unitary transformations in the quantum theory. The reason why this is im-
portant is that irreducible projective unitary representations of the Poincaré
group are identified with elementary particles in quantum field theory. The
Dirac equation, in particular, is used to describe elementary spin 1/2 par-
ticles like the electron and quarks. These fermionic representations in the
quantum theory can be seen already at the level of the classical field theory,
and this is what we want to describe here. But first it will be instructive to
spend a little time systematically thinking about the Poincaré group, its Lie
algebra, and its representations.
To begin, we’ll review the definition of the Poincaré group. Recall that
the Poincaré group can be viewed as the transformation group x → x0 which
leaves the flat spacetime metric η invariant:

ηαβ dx0α dx0β = ηαβ dxα dxβ . (8.21)

This group of transformations takes the form

x0α = Lαβ xβ + aα , (8.22)

where aα is any constant vector field (defining a spacetime translation) and

L is an element of the Lorentz group (also called O(3,1)).

Lαγ Lβδ ηαβ = ηγδ . (8.23)

Elements of the Poincaré group are thus labeled by pairs (L, a).

Problem:

2. In terms of the characterization of group elements (L, a), where L sat-

isfies (8.23), what are the identity and inverse transformations?

The group composition law for two elements g1 = (L1 , a1 ), g2 = (L2 , a2 )

is defined by

(g2 g1 · x)α = Lα2β (Lβ1γ xγ + aβ1 ) + aβ2 = (Lα2β Lβ1γ )xγ + Lα2β aβ1 + aα2 , (8.24)

so that
g2 g1 = (L2 L1 , L2 · a1 + a2 ). (8.25)
174 CHAPTER 8. THE DIRAC FIELD

As mentioned earlier, in quantum theory it is the “projective unitary

representations” of symmetry groups which are relevant. Recall from §3.17
that a representation of a group G is an assignment of a linear transformation
on a vector space V (a matrix, if you like) to each element of G such that the
linear transformations “behave” like the group. More precisely, we have a
mapping r : G → GL(V ) which satisfies the group homomorphism property:

r(g1 · g2 ) = r(g1 )r(g2 ). (8.26)

A projective representation relaxes this requirement to be

r(g1 · g2 ) = cg1 ,g2 r(g1 )r(g2 ), (8.27)

where cg,h are constants which depend upon a pair of group elements g and
h. A representation is now the special case cg,h = 1, ∀ g, h. The projective
representations are of interest because of quantum theory. Recall that the set
of states of a physical system are identified with elements of a vector space
(in fact a Hilbert space) with the proviso that any two linearly dependent
vectors – two vectors which differ by a scalar multiple – define the same
state. When considering the transformation of states by symmetries, one
thus considers linear transformations, but the group homomorphism property
is only required up to a constant rescaling, as in (8.27).
It can be shown that all irreducible unitary, projective representations of
the Poincaré group are characterized by two parameters, physically identified
with the spin and the mass. Nature only seems to take advantage of the
representations in which the mass is real and positive and the spin is a non-
negative integer or half integer.1 In quantum field theory, the spin of the
representation is principally controlled by the geometrical type of the field.
The principal role of the field equations is then to pick out the mass of the
representation and to ensure its irreducibility. So far we have considered
a scalar field and a 1-form (or 4-vector field), corresponding to spin-0 and
spin-1. The Dirac field is a new type of geometric object: a spinor field,
corresponding to spin- 21 .
To better understand this last statement, we now want to see how the
Poincaré group is represented as a transformation group of the classical fields.
A field representation of the Poincaré group G of transformations assigns to
1
The other allowed representations have imaginary mass, or vanishing mass and con-
tinuous spin. Both mathematical possibilities appear to have unphysical properties.
8.2. REPRESENTATIONS OF THE POINCARÉ GROUP 175

each group element g ∈ G a linear transformation Ψg of the fields,

ϕ → ϕ0 = Ψg (ϕ), (8.28)

satisfying the representation property:

Ψg2 (Ψg1 (ϕ)) = Ψg2 g1 (ϕ), g1 , g2 ∈ G. (8.29)

To understand this point of view, consider our previously studied fields. The
Klein-Gordon field was defined as a function or scalar field, ϕ : R4 → R, so
that we have the representation of the Poincaré group on the space of KG
fields given by:
ϕ(x) → ϕ0 (x) = ϕ(L−1 · (x − a)). (8.30)
It is easy to check that the space of KG fields, equipped with this trans-
formation rule provides a field representation of the Poincaré group. This
representation is called the spin-0 representation. The electromagnetic field
was defined in terms of a 1-form A,

A : R4 → T ∗ M. (8.31)

The Poincaré group acts upon

A = Aα dxα (8.32)

via
Aα (x) → A0α (x) = Lβα Aβ (L−1 · (x − a)). (8.33)
This transformation law leads to the spin-1 representation of the Poincaré
group.

Problem:

3. Verify that the transformation laws (8.30) and (8.33) define represen-
tations of the Poincaré group.

We can build other integer spin representations of G via tensor products.

Our goal now is to introduce the spin 1/2 (spinor) representation.
176 CHAPTER 8. THE DIRAC FIELD

8.3 The spinor representation

The Dirac field takes advantage of a “projective representation” of the Poincaré
group. To build this representation and to understand this “projective” busi-
ness, it is easiest to work at the infinitesimal, Lie algebraic level.
Consider infinitesimal Poincaré transformations. (See §3.13.) Denote
a Poincaré transformation by the pair (L, a) and let (L(s), a(s)) be a 1-
parameter subgroup such that s = 0 is the identity transformation:
Lαβ (0) = δβα , aα (0) = 0. (8.34)
For any fixed vector bα we can write
aα (s) = s bα . (8.35)
Infinitesimally, L(s) is characterized by a skew symmetric tensor,
ωαβ ≡ ηαγ ω γ β = −ωβα , (8.36)
such that
Lαβ (s) = exp(sω α β ) ≈ δβα + s ω α β + · · · . (8.37)
Thus ω and b label the infinitesimal generators of the 1-parameter family of
Poincaré transformations.
The formal definition of the Lie algebra goes as follows.2 The underlying
vector space is the set of pairs h = (ω, b) with addition given by
h1 + h2 = (ω1 + ω2 , b1 + b2 ). (8.38)
Scalar multiplication of (ω, b) a real number λ is given by
λ · (ω, b) = (λω, λb). (8.39)
The Lie bracket is the commutator of infinitesimal transformations:
[h1 , h2 ] = h3 , (8.40)
where
γ β γ β
(ω3 )βα = [ω1 , ω2 ]βα = ω1α ω2γ − ω2α ω1γ , (8.41)
α β α β
bα3 = ω1β b2 − ω2β b1 + bα1 − bα2 . (8.42)

2
Recall that a Lie algebra is a vector space V equipped with a skew-symmetric bilinear
mapping [·, ·] : V × V → V – the Lie bracket – which satisfies the Jacobi identity.
8.3. THE SPINOR REPRESENTATION 177

Recall that a Lie group is (among other things) a differentiable mani-

fold. Its component containing the identity is completely characterized by
its infinitesimal generators via the commutation relations of the associated
Lie algebra. What this means in the present example is that, up to discrete
transformations like time reversal or spatial inversion, the group elements
can be defined by pairs (bα , ωαβ ) and the group multiplication law is char-
acterized by the commutators among the infinitesimal generators. We shall
therefore focus on the infinitesimal generators for most of our discussion.
We have seen that the action of the Poincaré group on a scalar field or
a 1-form involves a transformation of the argument of the field along with
a transformation of the scalar or vector value of the field by the Lorentz
transformation.3 Spinor fields also follow this pattern. So we begin just by
seeing how the value of a spinor field transforms under a Lorentz transfor-
mation. We are interested in representations of the group of transformations
on the spinor space C4 associated to the matrices Lαβ . Infinitesimally, we are
interested in representations on C4 of the Lie algebra of the anti-symmetric
matrices ω α β , that is, we need to assign a linear transformation on C4 to
each antisymmetric ω. We will insist that these linear transformations on C4
should obey the bracket relations (8.41) and (8.42) of the Lie algebra,4 and
they should also define symmetries of the Dirac equation. We will check the
latter requirement a little later. As for the former, we will do this using the
Dirac gamma matrices in what follows.
Define 6 independent linear transformations on C4 via
1
Sµν = [γµ , γν ]. (8.43)
4
Given an infinitesimal Lorentz transformation specified by ω α β , we define its
representative on C4 by
1
S(ω) = ω µν Sµν . (8.44)
2
Let us compute the commutator of these infinitesimal generators on C4 . We
have
1
[S(ω), S(χ)] = ω µν χαβ [Sµν , Sαβ ]. (8.45)
4

Problem:
3
The transformation of the value of a scalar field is just the identity transformation.
4
We shall see that this representation of the Lie algebra will lead to a projective rep-
resentation of the corresponding group.
178 CHAPTER 8. THE DIRAC FIELD

4. Using
γµ γν + γν γµ = 2ηµν I, (8.46)
show that
1 1
[S(ω), S(χ)] = ω µν χαβ [Sµν , Sαβ ] = [ω, χ]αβ Sαβ = S([ω, χ]). (8.47)
4 2

This shows that the linear transformations S(ω) on the space C4 represent
the Lie algebra of the Lorentz group. This representation is called the (in-
finitesimal) spinor representation of the (infinitesimal) Lorentz group. The
Lorentz transformation L(ω) defined by the matrix ω α β is represented on
vectors in C4 – “spinors” – by the matrix exponential R(ω):

R(ω) = exp {S(ω)} . (8.48)

The fact that S(ω) represents the Lie algebra ensures that the exponential
will (projectively) represent the Lorentz group. As with scalar and vector
fields, the translational part of the Poincaré group is represented trivially on
the values of the spinor.
We have considered how C4 – the values of taken by a Dirac field – can
provide a representation of the Lie algebra of the Poincaré group. We now
extend this representation to a field representation (of the connected com-
ponent of the identity of the Poincaré group) by adding in the Lorentz and
translational transformations to the argument of the field and exponentiat-
ing. We specify a Poincaré transformation5 by using a skew tensor ω, and a
constant vector b. We define the transformation ψ → ψ 0 by

ψ 0 (x) = eS(ω) ψ(e−ω · (x − b)). (8.49)

Under an infinitesimal Poincaré transformation the Dirac field transforms

according to
δψ(x) = S(ω)ψ(x) − ω α β xβ ∂α ψ − bα ∂α ψ. (8.50)
To see that this provides a representation of the Lie algebra we consider the
commutator of two successive infinitesimal transformations to find:

δ1 δ2 ψ − δ2 δ1 ψ = δ3 ψ, (8.51)
5
We are restricting to the component connected to the identity transformation this way.
For simplicity I will suppress the interesting discussion of how to represent the remaining
transformations (time reversal and spatial reflection).
8.3. THE SPINOR REPRESENTATION 179

where
ω3β α = [ω1 , ω2 ]β α , α β
bα3 = ω1β α β
b2 − ω2β b1 + bα1 − bα2 . (8.52)
Comparing with (8.41) and (8.42) we see that the infinitesimal transforma-
tions (8.50) do indeed represent the Lie algebra of the Poincaré group.
As I have mentioned in the preceding discussion, the “spinor represen-
tation” of the Poincaré group we have been describing is not quite a true
representation of the group, but rather a projective representation. (As we
just saw, we do have a true representation of the Lie algebra.) To see why
I say this, consider a Poincaré transformation consisting of a rotation by 2π
about, say, the z-axis. You can easily check that this is generated by an
infinitesimal transformation with bα = 0 and

ω01 = ω02 = ω03 = ω13 = ω23 = 0, ω12 = 2π, (8.53)

so that the matrix ω α β is given by

 
0 0 0 0
0 0 2π 0
ω= 0 −2π 0 0
 (8.54)
0 0 0 0

The corresponding Lorentz transformation matrix is given by

Lαβ = [eω ]αβ = δβα , (8.55)

which is just showing that a rotation by 2π is the same as the identity trans-
formation. (Check this calculation as a nice exercise.) The corresponding
transformation ψ → ψ 0 of the spinor field is determined by
1
S(ω) = ω αβ [γα , γβ ] = π[γ1 , γ2 ] = iπ diag(1, 1, −1, −1). (8.56)
4
So that
eS(ω) = −I, (8.57)
and
ψ 0 (x) = −ψ(x). (8.58)

Problem:
5. Verify (8.56) through (8.58).
180 CHAPTER 8. THE DIRAC FIELD

The minus sign is not a mistake. The representative of a rotation by 2π

on spinors is minus the identity. This means that we cannot have a true rep-
resentation of the Poincaré group. The reason I say this is that the principal
requirement for a representation, given in equation (8.29), is not satisfied.
For example, consider two rotations about the z-axis by π. The composition
of these two rotations is, for a true representation, the representative of the
2π rotation which is the identity. For our spinor “representation” we get
minus the identity; the identity only appears after a rotation by 4π. It can
be shown that the spinor “representation” we have constructed only differs
by a sign from a true representation in the manner just illustrated. One thus
speaks of a “representation up to a sign” or a “projective representation”.
If the spinor field were intended as a directly measurable quantity, this pro-
jective representation would be a disaster since physical observables must be
unchanged after a spatial rotation by 2π. However, the way in which spinors
are used in field theory is such that the observable quantities of the theory
turn out to transform properly under 2π rotations. For example, the energy
density of the Dirac field (which will be exhibited below) is quadratic in the
Dirac field, so it behaves properly under a 2π rotation.
One might wonder if the spinor “representation” of the Poincaré group
constructed here is not the right thing to use. Maybe some other represen-
tation is needed. As we shall see a little later, the spinor representation is
precisely what is needed for Poincaré symmetry of the Dirac Lagrangian and
Dirac equation. Moreover, while the spinor field itself cannot be physically
observable directly, the fact that it changes sign under a 2π rotation does
lead to physical effects! For example, neutron interferometry experiments
have been performed in which a beam of neutrons is split with one half of
the beam passing through a magnetic field that rotates the spin state while
the other half propagates with no rotation. The neutron interference pattern
depends upon the magnetic field intensity in a fashion that only agrees with
the theory if the Dirac field changes sign under a 2π rotation.6

8.4 Dirac Lagrangian

Let’s consider a Lagrangian for the Dirac equation. To begin, it is convenient
to define
ψ̄ = iψ † γ 0 . (8.59)
6
See, for example, arxiv.org/pdf/1601.07053.pdf and references therein.
8.5. POINCARÉ SYMMETRY 181

The Lagrangian density takes the form

1 µ
ψ̄γ ∂µ ψ − (∂µ ψ̄)γ µ ψ − mψ̄ψ.

L=− (8.60)
2
We view the Lagrangian as a function of ψ, ψ̄ and their derivatives.

Problem:
6. Verify that L in (8.60) is real. (Note that we have the identity (γ 0 γ µ )† =
γ 0 γ µ .)

To compute the field equations we vary the fields and put the result in
the standard Euler-Lagrange form:
1
δL = −δ ψ̄(γ µ ∂µ ψ+mψ)−(−∂µ ψ̄γ µ +mψ̄)δψ− ∂µ (ψ̄γ µ δψ−δ ψ̄γ µ ψ). (8.61)
2
Evidently, the EL expression for ψ̄ is

Eψ̄ = −(γ µ ∂µ ψ + mψ), (8.62)

so the EL equations coming from varying ψ̄, namely Eψ̄ = 0, yields the Dirac
equation for ψ The EL equations coming from varying ψ are determined by

Eψ = ∂µ ψ̄γ µ − mψ̄. (8.63)

Problem:
7. Show that the equations Eψ = 0 are equivalent to the Dirac equation
for ψ.

8.5 Poincaré symmetry

The Dirac Lagrangian is Poincaré invariant. Let us check this infinitesimally.
This will largely explain the specific form of the Dirac equation and it will
justify the use of the spinor representation. Since the Lagrangian has no
explicit dependence upon the coordinates xα it is easy to see that under an
infinitesimal spacetime translation,

x α → x α + aα , δψ = −aα ψ,α , (8.64)

182 CHAPTER 8. THE DIRAC FIELD

the Lagrangian changes by a divergence (as usual)

δL = Dµ (−aµ L). (8.65)

More interesting perhaps is the way in which the spinor representation makes
the Lagrangian Lorentz-invariant. We need to consider how the Lagrangian
changes when we make a transformation

δψ = S(ω)ψ − ω α β xβ ψ,α . (8.66)

We have

1
δL = − [δ ψ̄γ µ ψ,µ + ψ̄γ µ (Dµ δψ) − (Dµ δ ψ̄)γ µ ψ − ψ̄,µ γ µ δψ] − m(δ ψ̄ψ + ψ̄δψ).
2
(8.67)
To see how to proceed, we need a few small results. First, we have that

γ µ S(ω) = S(ω)γ µ + ω µ α γ α . (8.68)

Next, if δψ is as given in (8.66), then using (8.4) and (8.8) it follows that

δ ψ̄ = −ψ̄S(ω) − ω α β xβ ψ̄,α (8.69)

Using these facts we can compute the change in the Lagrangian under an
infinitesimal Lorentz transformation to be

δL = Dµ (−ω µ ν xν L). (8.70)

Thus the Lorentz transformations – at least those in the component con-

nected to the identity in the Lorentz group – are a divergence symmetry of
the Dirac Lagrangian.

Problem:

8. Prove (8.70).
8.6. ENERGY 183

8.6 Energy
It is instructive to have a look at the conserved energy for the Dirac field,
both to illustrate previous technology and to motivate the use of Grassmann-
valued fields.
From (8.65) the time translation symmetry is a divergence symmetry with

δL = Dµ (−δ0µ L). (8.71)

According to Noether’s theorem we have the conserved current

1
j µ = [ψ̄γ µ ψ,0 − ψ̄,0 γ µ ψ] − δ0µ L. (8.72)
2
Next we take note of the following.

Problems:
9. Derive (8.72). Show that L = 0 when the field equations hold. There-
fore, modulo a trivial conservation law and a scaling by (−1), we can
define the conservation of energy via
1
j µ = − [ψ̄γ µ ψ,0 − ψ̄,0 γ µ ψ]. (8.73)
2

10. Check that the current (8.73) is conserved when the Dirac equation
holds.

The energy density is given by

1 i
ρ ≡ j 0 = − (ψ̄γ 0 ∂0 ψ − ∂0 ψ̄γ 0 ψ) = (∂0 ψ † ψ − ψ † ∂0 ψ). (8.74)
2 2
Let us now expose a basic difficulty with the classical Dirac field theory: the
energy density is not bounded from below (or above). Contrast this with the
energy density of the KG field, (3.12), or with the energy density of the EM
field. To reveal this problem for the classical Dirac field, let us write down
an elementary solution to the Dirac equation and evaluate its energy density.
Consider a Dirac field that only depends upon the time x0 = t, that is,
we consider a Dirac field which has spatial translation symmetry:

∂i ψ = 0. (8.75)
184 CHAPTER 8. THE DIRAC FIELD

The Dirac equation for ψ = ψ(t) is now

γ 0 ∂t ψ + mψ = 0. (8.76)

Let us write  
a(t)
 b(t) 
ψ=
 c(t)  .
 (8.77)
d(t)
Since γ0 = diag(i, i, −i, −i) we have that γ 0 = diag(−i, −i, i, i) so that the
Dirac equation reduces to the decoupled system

−iȧ + ma = 0
−iḃ + mb = 0
iċ + mc = 0
id˙ + md = 0. (8.78)

The solution is then

a0 e−imt
 
 b0 e−imt 
 c0 eimt  ,
ψ=  (8.79)
d0 eimt
where (a0 , b0 , c0 , d0 ) are constants representing the value of ψ at t = 0.
Let us compute the energy density of this solution. We have
i
ρ = (ψ † ∂0 ψ − ∂0 ψ † ψ)
2
= m(|a0 |2 + |b0 |2 − |c0 |2 − |d0 |2 ). (8.80)

This energy is clearly not bounded from below. By choosing initial conditions
with c0 and d0 sufficiently large in magnitude we can make the energy as
negative as we wish. Physically, this is a disaster since it means that one can
extract an infinite amount of energy from the Dirac field by coupling it to
other dynamical systems. Note that, while we are free to consider redefining
the energy density by a change of sign, this won’t help because ρ is not
bounded from above either. One could try to avoid this problem by simply
decreeing that one must only consider solutions in which c = d = 0. This
would work if the only thing in the universe were the Dirac field. But when
8.7. ANTI-COMMUTING FIELDS 185

the field interacts with its environment – which it must do to be observable!

– it can be shown that as time evolves one eventually gets solutions with
c 6= 0, d 6= 0, so this escape route fails. This difficulty reflects the limited
physical domain of the purely classical field theory for fermions. (Contrast
this with electrodynamics, where there exist a multitude of quantum coherent
states which exhibit classical behavior and provide a wide physical domain
of validity of the classical field theory).

8.7 Anti-commuting fields

There is a novel way to redefine the mathematical structure of the classical
Dirac field theory so that difficulties such as were highlighted above do not
arise. This is done by appealing to the underlying quantum field theory.
In quantum field theory, roughly speaking, one replaces the set-up where
fields take values in a commutative algebra (e.g., real numbers, complex
numbers, etc. ) with a set-up where fields take values in an operator algebra
on a Hilbert space.7 Because operator algebras need not be commutative,
this means that, in particular, one must keep track of the ordering of the
operators within products. For example, for a KG quantum field,

ϕ(x)ϕ(x0 ) 6= ϕ(x0 )ϕ(x). (8.81)

Up until now, we have always assumed that fields could be built out of
functions which take values in a commutative algebra (e.g., real numbers).
Insofar as the classical field theory is the“classical limit” of a quantum field
theory, it turns out that this commutative algebra assumption is reasonable
for bosonic fields (integer spin). But for fermionic fields, e.g., spin 1/2 fields
like the Dirac field, one can show that the “classical limit” of the operator
algebra leads to fields which anti-commute. This leads one to formulate a
classical Dirac field theory as a theory of Grassmann-valued spinors. In the
following I will briefly outline how this goes.
A finite-dimensional Grassmann algebra A is a (real or complex) vector
space V with a basis χα , α = 1, 2, . . . , n, and equipped with a product such
that
χα χβ = −χβ χα . (8.82)
7
More precisely: fields are viewed as “unbounded, self-adjoint operator-valued distri-
butions on spacetime”.
186 CHAPTER 8. THE DIRAC FIELD

Notice that this implies

(χα )2 = 0. (8.83)
The algebra is, as usual, built up by sums and products of the χα . Every
element Ω ∈ A is of the form
Ω = ω0 + ω α χα + ω αβ χα χβ + · · · + ω α1 ···αn χα1 · · · χαn , (8.84)
where ω0 can be identified with the field (real or complex) of the vector
space, the coefficients ω α1 ··· are totally antisymmetric. The terms of odd
degree in the χα are “anti-commuting” while the terms of even degree are
“commuting”. If the vector space is complex, then there is a notion of com-
plex conjugation in which
(αβ)∗ = β ∗ α∗ , α, β ∈ A. (8.85)
Note that if α and β are real and anti-commuting,
α∗ = α, β ∗ = β, (8.86)
then their product is pure imaginary in the sense that
(αβ)∗ = β ∗ α∗ = βα = −αβ. (8.87)
The usual exterior algebra of forms over a vector space is an example of a
Grassmann algebra.
In the Dirac theory with anti-commuting fields, one builds A from V =
4
C . The Dirac fields are then considered as maps from spacetime into V .
Of course, one cannot directly measure a “Grassmann number”, but the
“commuting” part of the algebra can be interpreted in terms of real numbers,
which is how one interprets observable aspects of the classical field theory.
Let us return to our simple example involving the energy density asso-
ciated to spatially homogeneous solutions to the Dirac equation. Given the
anti-commuting nature of the values of the Dirac field, it follows that the
solution to the field equations we wrote down above has a, b, c, d now being
interpreted as anti-commuting Grassmann numbers. It then becomes an is-
sue as to what order to put the factors in the definition of various quantities,
say, the energy density. To answer this question it is best to appeal to the
underlying quantum field theory, from which it turns out that the correct
version of (8.80) is
ρ = (a∗ a + b∗ b − cc∗ − dd∗ ) = (a∗ a + b∗ b + c∗ c + d∗ d). (8.88)
8.8. COUPLING TO THE ELECTROMAGNETIC FIELD 187

In the classical limit of the quantum theory each of the quantities a∗ a, b∗ b,

c∗ c, d∗ d correspond to positive real numbers whence the energy density of
our example is non-negative, i.e. bounded from below by zero.

8.8 Coupling to the electromagnetic field

One of the most successful theories in all of physics is the quantum field
theory describing the interactions of electrons, positrons, and photons. This
is quantum electrodynamics (QED). While QED is fundamentally a quantum
theory, and really should be studied as such, we have enough field-theoretic
tools to write down the classical Lagrangian8 using the principal of local
gauge invariance. Let us have a quick look. You may want to review Chapter
6 to remind yourself of the strategy.
The Dirac field admits the transformation group U (1) in much the same
way as the complex KG field. We have for any ψ ∈ C4

ψ → eiα ψ, α ∈ R, (8.89)

or, infinitesimally,
δψ = iψ. (8.90)
You can easily check that this transformation is a symmetry of the Dirac
Lagrangian density (8.60). Noether’s first theorem then provides a conserved
current.

Problem:
11. Show that the conserved current associated to the variational symmetry
(8.89) by Noether’s theorem is given by

j α = iψγ α ψ. (8.91)

Verify that this vector field is divergence-free when the Dirac equation
is satisfied.

This current can be interpreted as the electric 4-current of the field in

the absence of interaction with an electromagnetic field. To introduce that
8
The “classical Lagrangian” can be viewed as arising via the quantum effective action
in the classical limit.
188 CHAPTER 8. THE DIRAC FIELD

interaction we “gauge” the global U (1) symmetry to make it “local”. We do

this by introducing the electromagnetic potential A = Aα dxα and defining a
“covariant derivative”,
Dµ ψ = ∂µ ψ − iqAµ ψ. (8.92)
Here q represents the electromagnetic “coupling constant”. The local U (1)
gauge transformation is defined as

ψ → ψ̃ = eiqα(x) ψ, Aµ → Ãµ = Aµ + ∂µ α(x), (8.93)

where α is any function on spacetime. Under a gauge transformation the

covariant derivative transforms homogeneously:

D̃µ ψ̃ = eiqα(x) Dµ ψ. (8.94)

Consequently, if we replace ordinary derivatives with covariant derivatives

in the Dirac Lagrangian density the resulting Lagrangian density will admit
the local U (1) gauge symmetry and we thereby introduce the coupling of the
Dirac field to the electromagnetic field. Adding this Lagrangian density to
the usual electromagnetic Lagrangian we get the Lagrangian density for the
classical limit of QED:

1 µ 1
L=− ψ̄γ Dµ ψ − (Dµ ψ̄)γ µ ψ − mψ̄ψ − Fαβ F αβ . (8.95)
2 4
Here we have defined
Dµ ψ̄ = ∂µ ψ̄ + iqAµ ψ̄. (8.96)

Problem:

12. Calculate the field equations for ψ and A from this Lagrangian density.
Notice that, unlike the case with scalar electrodynamics, the electric
4-current that acts as a source for the electromagnetic field does not
depend upon the electromagnetic field.

8.9 PROBLEMS
8.9. PROBLEMS 189

1. Verify that the Dirac γ matrices shown in (8.6), (8.7) satisfy

{γµ , γν } := γµ γν + γν γµ = 2ηµν I.

(Although it is easy to check this by hand, I would recommend using a

computer for this since it is good to learn how to make the computer
do these boring tasks.)

2. In terms of the characterization of group elements (L, a), where L sat-

isfies (8.23), what are the identity and inverse transformations?

3. Verify that the transformation laws (8.30) and (8.33) define represen-
tations of the Poincaré group.

4. Using
γµ γν + γν γµ = 2ηµν I,
show that
1 µν αβ 1
ω χ [Sµν , Sαβ ] = [ω, χ]αβ Sαβ = S([ω, χ]).
4 2

5. Verify (8.56) through (8.58).

6. Verify that L in (8.60) is real. (Note that we have the identity (γ 0 γ µ )† =

γ 0 γ µ .)

7. Show that the equations Eψ = 0 (see (8.63)) are equivalent to the Dirac
equation for ψ.

8. Prove (8.70).

9. Derive (8.72). Show that L = 0 when the field equations hold. There-
fore, modulo a trivial conservation law, we can define the conservation
of energy via
1
j µ = − [ψ̄γ µ ψ,0 − ψ̄,0 γ µ ψ].
2

10. Check that the current (8.73) is conserved when the Dirac equation
holds.
190 CHAPTER 8. THE DIRAC FIELD

11. Show that the conserved current associated to the variational symmetry
(8.89) by Noether’s theorem is given by

j α = iψγ α ψ.

Verify that this vector field is divergence-free when the Dirac equation
is satisfied.

12. Calculate the field equations for ψ and A from the Lagrangian density
(8.95). Notice that, unlike the case with scalar electrodynamics, the
electric 4-current that acts as a source for the electromagnetic field does
not depend upon the electromagnetic field.
Chapter 9

Non-Abelian gauge theory

We were introduced to some of the underpinnings of what is now called

“gauge theory” when we studied the electromagnetic field. Let us now con-
sider another gauge theory, often called “Yang-Mills theory” after its inven-
tors. It is also sometimes called “non-Abelian gauge theory” since the gauge
transformations are coming as a non-Abelian generalization of the U (1) type
of gauge transformations of electromagnetic theory. The Yang-Mills (YM)
theory was originally conceived (in 1954) as way of formulating interactions
among protons and neutrons. This approach turned out not to bear much
fruit (as far as I know), but the structure of the theory was studied for its
intrinsic field theoretic interest by a relatively small number of physicists
through the 1960’s. During this time, theories of the Yang-Mills type were
used to slowly devise a scheme for describing a theory of electromagnetic
and weak interactions. This work was performed by Glashow, then Wein-
berg and also Salam. All three eventually won the Nobel prize for this work.
These theories became truly viable when it was shown by ’t Hooft that the
non-Abelian gauge theories were well-behaved from the point of view of per-
turbative quantum field theory. It wasn’t long after that that people found
how to describe the strong interactions using a non-Abelian gauge theory.
It is even possible to think of the gravitational interactions as described by
Einstein’s general relativity as a sort of non-Abelian gauge theory – although
this requires some generalization of the term “gauge theory”. Certainly one
can think of Maxwell theory as a very special case of a non-Abelian gauge
theory. So, one can take the point of view that all the interactions that are
observed in nature can be viewed as an instance of a gauge theory. This is
certainly ample motivation for spending some additional time studying them.

191
192 CHAPTER 9. NON-ABELIAN GAUGE THEORY

The non-Abelian gauge theory substitutes a non-Abelian Lie group1 for

the group U (1) that featured in electrodynamics. Although, when among
friends, one often uses the terms “non-Abelian gauge theory” or “Yang-Mills
theory” or just “gauge theory” interchangeably, one properly distinguishes
the Yang-Mills theory as the specialization of non-Abelian gauge theory to
the gauge group built from SU(2), as Yang and Mills originally did. We shall
try to make this distinction as well.
Non-Abelian gauge theory is a theory of interactions among matter just
as electrodynamics is. Following Weyl, we were able to view electromagnetic
interactions between electrically charged matter as a manifestation of a “lo-
cal U (1) symmetry” of matter. Recall that the charged KG field acquired
its conserved charge by virtue of the U (1) phase symmetry. This symme-
try acted “globally” in the sense that the symmetry transformation shifted
the phase of the complex KG field by the same amount throughout space-
time. The coupling of the charged KG field to the EM field can be viewed
as corresponding to the requirement that this phase change could be made
independently (albeit smoothly) at each spacetime event. Making the sim-
plest “minimal” generalization of the charged KG Lagrangian to incorporate
this “local” gauge symmetry involved introducing the Maxwell field to de-
fine a connection, i.e., a gauge covariant derivative. This gives the correct
Lagrangian for the charged KG field in the presence of an electromagnetic
field. The dynamics of the EM field itself could then be incorporated by
adjoining to this Lagrangian the simplest gauge invariant Lagrangian for the
Maxwell field. This leads to the theory of scalar electrodynamics. Similar
constructions in §8 led to a classical version of QED.
We have already discussed another kind of “charged KG” theory in which
one has not a single conserved current but three of them, corresponding to
a global SU(2) symmetry group. What happens if we try to make that
symmetry local? This is one way to “invent” the YM theory.

9.1 SU(2) doublet of KG fields, revisited

I will use the idea of “localizing” a symmetry to generate an interaction. So,
we begin by reviewing the field theory of two complex KG fields admitting
an SU(2) symmetry. See §3.18.
1
For physical reasons pertaining to the quantum field theory, the gauge group is usually
built on a compact, semi-simple Lie group.
9.1. SU(2) DOUBLET OF KG FIELDS, REVISITED 193

The fields are

ϕ : M → C2 . (9.1)
You can think of ϕ as a column vector with two complex-valued functions for
its components. SU(2) is the group of linear transformations on C2 preserving
the inner product:
hα, βi = α† β, α, β ∈ C2 , (9.2)
hU α, U βi = hα, βi ⇐⇒ U † = U −1 , (9.3)
and with unit determinant
det U = 1. (9.4)
We can express these linear transformations as matrices:

U = U (θ, n) = cos θ I + i sin θ ni σi , (9.5)

where
n = (n1 , n2 , n3 ), (n1 )2 + (n2 )2 + (n3 )2 = 1, (9.6)
and
0 1 0 −i 1 0
σ1 = σ2 = , σ3 = . (9.7)
1 0 i 0 0 −1
are the Pauli matrices. Note that there are three free parameters in this
group, corresponding to θ and the two free parameters defining ni .
The group SU(2), as represented on C2 , acts on the fields in the obvious
way:
ϕ(x) → U ϕ(x). (9.8)
One parameter subgroups U (λ) are defined by any curve in the 3-d parameter
space associated with θ and n. As an example, set n = (1, 0, 0) and θ = λ
to get
cos λ i sin λ
U (λ) = . (9.9)
i sin λ cos λ
The infinitesimal form of the SU(2) transformation is

δϕ = τ ϕ, (9.10)

where τ is an anti-Hermitian, traceless matrix obtained from a 1-parameter

group U (λ) by
dU
τ= . (9.11)
dλ λ=0
194 CHAPTER 9. NON-ABELIAN GAUGE THEORY

Note that
δϕ† = −ϕ† τ. (9.12)
We can write
τ = −iaj σj , (9.13)
where ai ∈ R3 . Thus infinitesimal transformations can be identified with a
three-dimensional vector space with a basis

δj ϕ = iσj ϕ. (9.14)

The infinitesimal generators of the transformation group SU(2) are given

by the anti-Hermitian matrices τ = −iai σi . The commutator algebra of these
matrices defines a representation of the Lie algebra of SU(2), which we shall
denote by su(2). A basis ek for this matrix representation of su(2) is provided
by 2i1 times the Pauli matrices:
i
ek = − σk , τ = ak ek , ak ∈ R3 . (9.15)
2
It is easy to check that in this basis the structure constants of su(2) are given
by
[ei , ej ] = ij k ek , (9.16)
where ijk is the three-dimensional Levi-Civita symbol and indices are raised
and lowered using the Kronecker delta.

Problem:
1. Verify that the ek in (9.15) satisfy the su(2) Lie algebra (9.16) as ad-
vertised.

An SU(2)-invariant Lagrangian which generalizes that of the KG and

charged KG theories is given by (in flat spacetime, with metric η)

L = −η αβ hϕ,α , ϕ,β i − m2 hϕ, ϕi, (9.17)

You can see quite easily that the transformation ϕ → U ϕ yields L → L;

this is the global SU(2) symmetry. This symmetry is responsible for three
independent conservation laws, which arise via Noether’s theorem applied to
the variational symmetries
δj ϕ = iσj ϕ. (9.18)
9.2. LOCAL SU(2) SYMMETRY 195

The 3 conserved currents are

jkβ = i(ϕ† ,β σk ϕ − ϕ† σk ϕ,β ), k = 1, 2, 3. (9.19)

These currents “carry” the SU(2) charge possessed by the scalar fields.

Problems:

2. (a) show that the Lagrangian (9.17) is invariant with respect to the
transformation (9.8); (b) compute its Euler-Lagrange equations; (c)
derive the conserved currents (9.19) from Noether’s theorem; (d) verify
that the currents (9.19) are divergence-free when the Euler-Lagrange
equations are satisfied.

3. Show that if any one of the currents (9.19) is conserved, then the other
2 are automatically conserved. (Hint: Consider the behavior of the
currents under SU(2) transformations of the fields.)

9.2 Local SU(2) symmetry

The SU(2) symmetry group studied in the previous section tacitly uses a
prescription for comparing the values of the fields at each point of spacetime.
To be sure, at each spacetime event the field takes its value in a copy of
C2 , but there are infinitely many ways to put all these different C2 spaces
in correspondence – as many ways as there are linear isomorphisms of this
vector space. You can think of a choice of correspondence between all these
complex vector spaces as a sort of preferred choice of internal reference frame
for the fields ϕ.
In analogy with the case of scalar electrodynamics, one can introduce an
interaction between conserved charges by insisting that there is no fixed, a
priori rule for comparing SU(2) phases from event to event in spacetime. To
do this we add the “rule” as a new variable in the theory, which gets inter-
preted as the gauge field mediating the interaction between charges. Thus we
are, in effect, following the path of Einstein in his general theory of relativity.
There, by insisting that there is no privileged spacetime reference frame at
each point he was able to correctly describe the gravitational interaction. In
YM theory we use a similar relativity principle, but now it is regarding the
196 CHAPTER 9. NON-ABELIAN GAUGE THEORY

“internal frame” on C2 . We do this in two steps: (1) introduce a fixed but

arbitrary connection A or covariant derivative D, which makes explicit how
we are comparing values of ϕ at infinitesimally separated points, (2) treat
the connection as a new field – a gauge field – and give it its own Lagrangian.
The resulting field equations determine the “matter fields” ϕ and the “gauge
fields” A. The result is an interacting theory of charges and gauge fields
dictated by a sort of “general relativity of SU(2) phases”.
So, the upshot of the preceding discussion is that we seek to build a theory
in which the “global” symmetry transformation

ϕ(x) → U ϕ(x), (9.20)

U : C2 → C2 , U † = U −1 , det U = 1, (9.21)
becomes a “local” symmetry:

ϕ(x) → U (x)ϕ(x), (9.22)

U (x) : M → SU (2), U † (x) = U −1 (x), det U (x) = 1. (9.23)

It is easy to see that the mass term in our Lagrangian (9.17) for ϕ allows
this transformation as a symmetry. As in the case of SED, the derivative
terms do not allow this transformation to be a symmetry. This reflects the
fact that the derivatives are defined using a fixed notion of how to compare
the values of ϕ at two neighboring points. The local SU(2) transformation
can be viewed as redefining the method of comparison by redefining the basis
of C2 differently at each point of M ; obviously the partial derivatives will
respond to this redefinition.
To generalize the derivative to allow for an arbitrary method of compar-
ison of values of ϕ from point to point of M we introduce the “gauge field”
or “connection”, or “Yang-Mills (YM) field” A, which is a Lie algebra-valued
1-form. What this means is that, at each point of the spacetime M , A is a
linear mapping from the tangent space at that point to the representation
of the Lie algebra su(2) discussed above. This way, one can use A to define
an infinitesimal SU(2) transformation at any point which tells how to com-
pare the values of the SU(2) phase of ϕ as one moves in any given direction.
Since A is defined as a linear mapping on vectors, we can write it in terms
of 1-forms,
A = Aµ dxµ , (9.24)
9.2. LOCAL SU(2) SYMMETRY 197

where, for each value of µ, Aµ is a map from M to su(2). We write:

Aµ = Aiµ ei , (9.25)

where ei are the basis for su(2) defined in (9.15). Thus, if you like, you can
think of the gauge field as 4 × 3 = 12 real fields Akµ (x) labeled according to
their spacetime (index µ = 0, 1, 2, 3) and “internal” su(2) structure (index
k = 1, 2, 3).
The gauge covariant derivative of ϕ is now defined as2

Dµ ϕ = ∂µ ϕ + Aµ ϕ. (9.26)

A fixed, given gauge field Aµ just defines another way to compare phases of
fields at different points. Nothing is gained symmetry-wise by its introduc-
tion. Indeed, under a gauge transformation

ϕ(x) → U (x)ϕ(x) (9.27)

we have to redefine Aµ via

Aµ → U (x)Aµ U −1 (x) − (∂µ U (x))U −1 (x). (9.28)

This is designed to give

Dµ ϕ → ∂µ + U (x)Aµ U −1 (x) − (∂µ U (x))U −1 (x) (U (x)ϕ)

= U (x) (∂µ + Aµ ) ϕ
= U (x)Dµ ϕ. (9.29)

Granted this set-up, we can then define the Lagrangian for ϕ using the “min-
imal coupling prescription
∂µ → Dµ , (9.30)
so that we have
L = −η αβ hDα ϕ, Dβ ϕi − m2 hϕ, ϕi. (9.31)
Since, at the moment, we view the gauge field as fixed/prescribed, strictly
speaking, this Lagrangian is no more gauge invariant than our original La-
grangian (9.17),3 which it includes as the special case A = 0, but it has
2
For simplicity we absorb a coupling constant into the definition of A.
3
This is completely analogous to our discussion of the behavior of the KG Lagrangian
under a coordinate transformation, back in §2.9.3.
198 CHAPTER 9. NON-ABELIAN GAUGE THEORY

the virtue of allowing for any choice of connection between C2 at neighbor-

ing points. Physically, this Lagrangian defines the dynamics of the fields ϕ
under the influence of the “force” due to any given gauge field A.
I think you can imagine that we could play the preceding game whenever
one is given (1) some fields ϕ : M → V , where V is some vector space, and
(2) given a representation of SU(2) on this vector space, (3) given a globally
SU(2)-invariant Lagrangian. This is all we really used here. We will see an
example of some of this in the following section. One can in fact generalize
this construction further to the case where we are given the representation
of any Lie group G on a vector space V and given a G-invariant Lagrangian
for mappings ϕ : M → V .

9.3 Infinitesimal gauge transformations

Let us consider the infinitesimal form of the gauge transformation (9.28) on
the YM field. We consider a 1-parameter family of gauge transformations
Uλ (x) with
U0 (x) = I, (9.32)
and
dUλ (x)
δU (x) := ≡ ξ(x). (9.33)
dλ λ=0
The infinitesimal generator, ξ(x), is a Lie algebra-valued function on M , that
is, we have
ξ = ξ i ei . (9.34)
Note that
δ(U −1 (x)) = −ξ(x). (9.35)
If we then consider
Aµ (λ) = Uλ (x)Aµ Uλ−1 (x) − (∂µ Uλ (x))Uλ−1 (x) (9.36)
we can compute

dAµ (x)
δAµ = = − (∂µ ξ + [Aµ , ξ]) . (9.37)
dλ λ=0

We can interpret this last result as follows. The Lie algebra su(2), like
any Lie algebra, is a vector space. You can easily check that the set of 2 × 2
9.3. INFINITESIMAL GAUGE TRANSFORMATIONS 199

anti-Hermitian trace-free matrices is a real three-dimensional vector space

using the usual notion of addition of matrices and multiplication of matrices
by scalars. This three dimensional vector space provides a representation
of the group SU(2) called the adjoint representation. This representation is
defined by
τ → U τ U −1 . (9.38)
You can check that this transformation provides an isomorphism of the al-
gebra onto itself:
[U τ U −1 , U τ 0 U −1 ] = U [τ, τ 0 ]U −1 . (9.39)
Anyway, at any given point, ξ ≡ δU is a field taking values in this vector
space and we have a representation of SU(2) acting on this vector space, so
we can let SU(2) act on ξ via

ξ → U ξU −1 . (9.40)

This linear representation can be made into an explicit matrix representation

on the matrix elements of ξ (viewed as entries in a column vector) but we
will not need to do this explicitly. The infinitesimal adjoint action of SU(2)
on the vector space su(2) is via the commutator. To see this, simply consider
a 1-parameter family of SU(2) transformations Uλ with

dUλ
Uλ=0 = I, = τ, (9.41)
dλ λ=0

where τ is trace-free and anti-Hermitian. We have the infinitesimal adjoint

action of SU(2) on su(2) given by

δξ = [τ, ξ] (9.42)

Problem:

4. Show that the adjoint representation (9.40) has the infinitesimal form
(9.42).

The point of these observations is to show that we can now extend our
definition of the covariant derivative to ξ via the adjoint representation. We
200 CHAPTER 9. NON-ABELIAN GAUGE THEORY

can define a covariant derivative of ξ as we did for ϕ by replacing the repre-

sentation of infinitesimal SU(2) on ϕ,

δϕ = τ ϕ, (9.43)

with its action on ξ:

δξ = [τ, ξ]. (9.44)
We define
Dµ ξ = ∂µ ξ + [Aµ , ξ]. (9.45)
It then follows that we can interpret the infinitesimal gauge transformation
of the gauge field in terms of the gauge covariant derivative of ξ:

δAµ = −Dµ ξ. (9.46)

It is instructive to note that the preceding set of results can be viewed as a

generalization of the construction of the U (1) gauge theory in §6. To see how
this works, take account of the fact that U (1) is Abelian, so commutators of
Lie-algebra valued elements vanish and the adjoint representation is trivial.
Thus, in particular, the infinitesimal transformation (9.46) becomes (6.41)
with ξ → −α.

9.4 Geometrical interpretation: parallel prop-

agation
I have made various vague references to the fact that the gauge field provides
a rule for “comparing the values of ϕ at different spacetime points”. Let us
try, very briefly, to be a little more precise about this.
Consider an infinitesimal displacement from a point x along a vector v.
To each such displacement we can associate an element τ ∈ su(2) (in our
matrix representation) via
τ = Aµ v µ . (9.47)
The idea is that this infinitesimal transformation is used to define the linear
transformation that, by definition, “lines up” the bases for C2 at infinitesi-
mally neighboring points x and x + v. Let us explore this a little. Suppose
that we have a curve x = γ(s) : [0, 1] → M starting at x1 , and ending at x2

γ(0) = x1 , γ(1) = x2 . (9.48)

9.4. GEOMETRICAL INTERPRETATION: PARALLEL PROPAGATION201

We can associate a group element g[γ] ∈ SU (2) to each point on this curve
by, in effect, exponentiating this infinitesimal transformation. More precisely,
we define a group transformation at each point along the curve by solving
the differential equation
d
g(s) = −γ̇ µ (s)Aµ (γ(s))g(s), (9.49)
ds
subject to the initial condition

g(0) = I. (9.50)

We say that ϕ has been parallelly propagated from x1 to x2 along the curve
γ if4
ϕ(x2 ) = g(1)ϕ(x1 ). (9.51)
The idea is that, given the curve, the group transformation g(1) is used to
define the relationship between the spaces C2 at x1 and x2 . Equivalently, we
can define the parallel propagation of ϕ along the curve γ to be defined by
solving the equation
d
ϕ(γ(s)) = −γ̇ µ (s)Aµ (γ(s))ϕ(γ(s)). (9.52)
ds
Thus the gauge field A defines what it means to have elements of C2 to be
“parallel”, that is, to stay “unchanged” as we move from point to point in
M . In particular, we say that the field ϕ is not changing (relative to A) or
is parallelly propagated along the curve if
d
0= ϕ(γ(s)) + γ̇ µ (s)Aµ (γ(s))ϕ(γ(s))
ds
= γ̇ µ (ϕ,µ (γ) + Aµ (γ)ϕ(γ))
= γ̇ µ Dµ ϕ(γ). (9.53)

You can see how the covariant derivative determines the rate of change of ϕ
along the curve. So how a field is changing from point to point is determined
by the choice of gauge field or “connection”.
Let me emphasize that the parallel propagation of a field (relative to a
given connection) from one point to another is, in general, path-dependent,
4
“Parallel propagation” is also called ”parallel transport”.
202 CHAPTER 9. NON-ABELIAN GAUGE THEORY

i.e., depends upon the choice of curve connecting the points. This is because
the form of the ODE shown above will depend upon γ. We will say a little
more about this below.
It is possible to give a formal series solution to the differential equation
of parallel propagation. Indeed, if you have studied quantum mechanics, you
will perhaps note the similarity of the equation
d
g(s) = −γ̇ µ (s)Aµ (γ(s))g(s). (9.54)
ds
to the Schrödinger equation for the time evolution operator U (t) of a time-
dependent Hamiltonian H(t):

∂U (t)
i~ = H(t)U (t). (9.55)
∂t
In quantum mechanics one solves the Schrödinger equation using the “time-
ordered exponential”; here we can use the analogous quantity, the “path-
ordered exponential”:
∞
X Z 1 Z sn Z s2
n
g(s) = (−1) dsn dsn−1 · · · ds1 γ̇ µ (sn )Aµ (γ(sn )) · · · γ̇ µ (s1 )Aµ (γ(s1 ))
n=0 0 0 0
Z 1
≡ P exp − ds γ̇ µ Aµ .
0
(9.56)

The ordering of the factors of A is crucial since they do not commute.

It may happen that for a particular choice of A there is a field satisfying

Dµ ϕ = 0, (9.57)

such fields are called covariantly constant, or just “constant”, or just “par-
allel”. Covariantly constant fields have the property that their value at any
point is parallel with its value at any other point – relative to the definition of
“parallel” provided by the connection A and independently of any choice of
curve connecting the points. As we shall see below, the existence of parallel
fields requires a special choice of connection.
It is instructive to point out that everything we have done for SU(2) gauge
theory could be also done with the group U (1) in SED. Now the connections
9.5. GEOMETRICAL INTERPRETATION: CURVATURE 203

can be viewed as i× ordinary, real-valued 1-forms ωµ , which commute. In

this case the path-ordered exponential becomes the ordinary exponential:
Z 1 Z 1
µ µ
P exp − ds γ̇ Aµ = exp −i ds γ̇ ωµ . (9.58)
0 0

9.5 Geometrical interpretation: curvature

Given a covariant derivative, that is, a notion of parallel transport, one ob-
tains also a notion of curvature. The simplest way to define curvature is as we
did for SED, namely, via the commutation relations for covariant derivatives.
A straightforward computation reveals

[Dµ , Dν ]ϕ = Fµν ϕ, (9.59)

where Fµν defines a Lie algebra-valued 2-form, the curvature of the gauge
field,
i
Fµν = Fµν ei , (9.60)
given by
Fµν = ∂µ Aν − ∂ν Aµ + [Aµ , Aν ], (9.61)
and
i
Fµν = ∂µ Aiν − ∂ν Aiµ + i jk Ajµ Akν . (9.62)

Problem:
5. Verify the results (9.59) – (9.62) on the YM curvature.

We noted earlier that the parallel propagation of ϕ from one point to

another depends upon what curve is used to connect the points. Infinitesi-
mally, the parallel transport in the direction xµ is defined by the covariant
derivative Dµ . Since F arises as the commutator of covariant derivatives,
you can interpret F as an infinitesimal measure of the path dependence of
parallel transport. Indeed, the parallel transport is path independent (in a
region) if and only if the connection is flat (in that region), F = 0.
The curvature can also be viewed as the obstruction to the existence of
parallel fields ϕ. Since these satisfy

Dµ ϕ = 0, (9.63)
204 CHAPTER 9. NON-ABELIAN GAUGE THEORY

the integrability condition

[Dµ , Dν ]ϕ = 0 (9.64)
arises, yielding
i
0 = Fµν ϕ = Fµν ei ϕ. (9.65)
It is not hard to check that, for SU(2),

det(v i ei ) = δij v i v j I. (9.66)

Therefore, assuming ϕ 6= 0,
i
Fµν ei ϕ = 0 =⇒ Fµν = 0. (9.67)

So, if there exists non-vanishing parallel fields ϕ then

∂µ Aν − ∂ν Aµ + [Aµ , Aν ] = 0, (9.68)

This condition, which says that the connection is “flat”, is also the integra-
bility condition for the existence of SU(2)-valued functions U satisfying

∂µ U + Aµ U = 0. (9.69)

Thus, if the connection is flat then there exists U (x) : M → SU (2) such that

Aµ = −(∂µ U (x))U −1 (x). (9.70)

This means that the gauge field is just a gauge transformation of the trivial
connection A = 0. Thus we see that, at least locally,5 all flat connections are
gauge-equivalent to the zero connection.
Finally, let us consider the behavior of the curvature under a gauge trans-
formation.

Problem:
6. Let U (x) : M → SU (2) define a gauge transformation, for which

Aµ → U (x)Aµ U −1 (x) − (∂µ U (x))U −1 (x). (9.71)

Show that
Fµν → U (x)Fµν U −1 (x). (9.72)
5
I say “locally” because arguments based upon integrability conditions only guarantee
local solutions to the differential equations.
9.6. LAGRANGIAN FOR THE YM FIELD 205

Thus while the gauge field transforms inhomogeneously under a gauge trans-
formation, the curvature transforms homogeneously. The curvature is trans-
forming according to the adjoint representation of SU(2) on su(2) in which,
with
τ ∈ su(2), U ∈ SU (2), (9.73)
we have
τ → U τ U −1 . (9.74)
Infinitesimally, if we have a gauge transformation defined by

δU (x) = ξ(x), (9.75)

then it is easy to see that

δFµν = [ξ(x), Fµν ]. (9.76)

Problem:
7. Verify (9.76).

The curvature of the YM field differs from the curvature of the Maxwell
field in a few key ways. First, the YM curvature is really a trio of 2-forms,
while the Maxwell curvature is a single 2-form. Second, the YM curvature is
a non-linear function of the the gauge field in contrast to the linear relation
F = dA arising in Maxwell theory. Finally, the YM curvature transforms
homogeneously under a gauge transformation, while the Maxwell curvature
is gauge invariant. You can now interpret this last result as coming from the
fact that the adjoint representation of an Abelian group like U (1) is trivial
(exercise).

9.6 Lagrangian for the YM field

Let us recall the Lagrangian that represents the coupling of ϕ to A:

Lϕ = −η µν hDµ ϕ, Dν ϕi − m2 hϕ, ϕi. (9.77)

If we view A as fixed, i.e., given, then it is not a field variable but instead
provides explicit functions of xα which appear in Lϕ = Lϕ (x, ϕ, ∂ϕ). Gauge-
related connections A will give the same Lagrangian if we transform ϕ as
206 CHAPTER 9. NON-ABELIAN GAUGE THEORY

well, but this has no immediate field theoretic consequence, e.g., it doesn’t
provide a symmetry for Noether’s theorem. The Lagrangian in this case is
viewed as describing the dynamics of KG fields interacting with a prescribed
YM field. The full, gauge invariant theory demands that A also be one of
the dynamical fields. Following the examples of the electrically charged KG
field and Dirac field, we need to adjoin to this Lagrangian a term providing
dynamics for the YM field.6 This is our next consideration.
We can build a Lagrangian for the YM field A by a simple generaliza-
tion of the Maxwell Lagrangian. Our guiding principle is gauge invariance.
Now, unlike the Maxwell curvature, the YM curvature is not gauge invariant,
rather, it is gauge “covariant”, transforming homogeneously under a gauge
transformation via the adjoint representation of SU(2) on su(2). However, it
is easy to see that the trace of a product of su(2) elements is invariant under
this action of SU(2) on su(2):
tr (U τ1 U −1 )(U τ2 U −1 ) = tr {τ1 τ2 } .

(9.78)
This implies that the following Lagrangian density is gauge invariant:
1
LY M = tr (Fµν F µν ) . (9.79)
2
Writing
i
Fµν = Fµν ei , (9.80)
and using
1
tr(ei ej ) = − δij , (9.81)
2
we get
1
LY M = − δij F µν i Fµνj
. (9.82)
4
So, the YM Lagrangian is arising really as a sum of 3 Maxwell-type La-
grangians. However there is a very significant difference between the Maxwell
and YM Lagrangians: the Maxwell Lagrangian is quadratic in the gauge field
while the YM Lagrangian includes terms cubic and quartic in the gauge field.
This means that, unlike the source-free Maxwell equations, the source-free
YM equations will be non-linear. Physically this means that the YM fields
are “self-interacting”.
6
Simply using (9.77) to describe the dynamics of the scalars and the YM field is unsat-
isfactory since the Euler-Lagrange equations for the gauge field imply that ϕ is covariantly
constant and hence that A is flat.
9.7. THE SOURCE-FREE YANG-MILLS EQUATIONS 207

9.7 The source-free Yang-Mills equations

We have built the Yang-Mills Lagrangian as a generalization of the Maxwell
Lagrangian. Let us for a moment forget about the scalar fields ϕ and derive
the source-free YM equations as the Euler-Lagrange equations of
1
LY M = tr (Fµν F µν ) . (9.83)
2
We proceed as usual: we vary the Lagrangian and assemble the derivatives
of the variations into a divergence term. This calculation will not involve
any explicit xα dependence; to avoid confusion with the gauge covariant
derivative, I will use the symbol ∂µ to denote the total derivative with respect
to xµ . This total derivative is used when building the covariant derivative.
The variation of the Lagrangian density is

δLY M = tr (F µν δFµν ) , (9.84)

where
δFµν = Dµ δAν − Dν δAµ , (9.85)
and
Dµ δAν = ∂µ δAν + [Aµ , δAν ]. (9.86)

Problem:

8. Verify equations (9.84) and (9.85).

Note that (9.86) is appropriate since δAµ can be viewed as a Lie-algebra

valued field transforming according to the adjoint representation of SU(2).
To see this, start from the gauge transformation rule

Aµ → U Aµ U −1 − (∂µ U )U −1 , (9.87)

and apply it to a 1-parameter family of gauge fields. From there it is easy

to compute the transformation of the variation of the gauge field to get
(exercise):
δAµ → U δAµ U −1 , (9.88)
giving us that adjoint transformation rule.
208 CHAPTER 9. NON-ABELIAN GAUGE THEORY

Returning now to our computation of the EL equations, we have

δLY M = 2tr (F µν Dµ δAν )
= 2tr (F µν {∂µ δAν + [Aµ , δAν ]})
= −2tr {(∂µ F µν )δAν − F µν [Aµ , δAν ]} + ∂µ (2trF µν δAν )
= −2tr {(∂µ F µν + [Aµ , F µν ])δAν } + ∂µ (2trF µν δAν )
= −2tr {Dµ F µν δAν } + ∂µ (2trF µν δAν )
= Dµ F µν i δAjν δij + ∂µ (−F µνi δAjν δij ), (9.89)
where we have defined the covariant derivative of the curvature via
Dµ F αβ = ∂µ F αβ + [Aµ , F αβ ], (9.90)
which is appropriate given the fact that (like δA) F transforms according to
the adjoint representation of SU(2).
From this computation we see that the EL derivative of LY M is
Eiν (LY M ) = Dµ Fiµν , (9.91)
where the Latin (su(2) component) indices are raised and lowered with δij .
The source-free YM equations are then
Dµ F µν = 0. (9.92)
Of course, this compact geometric notation hides a lot of stuff. The PDEs
for the gauge field A are, more explicitly,
∂µ (∂ µ Aν − ∂ ν Aµ + [Aµ , Aν ]) + [Aµ , ∂ µ Aν − ∂ ν Aµ ] + [Aµ , [Aµ , Aν ]] = 0. (9.93)
Thus we get 3 × 4 = 12 equations providing a non-linear generalization of
the Maxwell equations.
Despite their complexity, a good number of solutions of the Yang-Mills
equations are known. It would take us too far afield to discuss them, but I
will show you a very famous solution, the “Wu-Yang monopole.”
Problem:
9. Consider the Yang-Mills field whose non-zero components, Aiα , α =
(t, x, y, z), i = (1, 2, 3), relative to an inertial Cartesian coordinate sys-
tem and the basis ei are given by
z y z x y x
A2x = − 2 , A3x = 2 , A1y = 2 , A3y = − 2 , A1z = − 2 , A2z = 2 .
r r r r r r
(9.94)
Show that this YM field solves the YM equations.
9.8. YANG-MILLS WITH SOURCES 209

9.8 Yang-Mills with sources

Let us now consider the YM field coupled to its “source” ϕ.7 We will compute
the EL equations associated to

L = LY M + Lϕ , (9.95)

where
Lϕ = − hDµ ϕ, Dµ ϕi + m2 hϕ, ϕi .

(9.96)
It is not hard to check that the EL equations for ϕ are given by

Dµ Dµ ϕ − m2 ϕ = 0, (9.97)

while the EL equations for Aiν are given by

Dµ Fiµν + ϕ† ei Dν ϕ − Dν ϕ† ei ϕ = 0.

(9.98)

We can write the latter as a Lie algebra-valued expression:

Dµ F µν − j ν = 0. (9.99)

You can see that the fields ϕ are coupled to the gauge field through a sort
of covariant version of the KG equation. The scalar field acts as a source of
the YM field via the current

jiµ = Dµ ϕ† ei ϕ − ϕ† ei Dµ ϕ .

(9.100)

Note that, just as in SED, the definition of the current involves the gauge
field itself.

Problem:

10. Derive equations (9.97), (9.98), (9.99).

7
While calling ϕ the “source” is quite all right, it should be kept in mind that, owing
to the non-linearity of the source-free YM field equations, one can consider the YM field
as its own source!
210 CHAPTER 9. NON-ABELIAN GAUGE THEORY

9.9 Noether theorems

Let us briefly mention some considerations arising from applying Noether’s
theorems to the SU(2) transformations arising in YM theory. There are some
significant differences compared to Maxwell theory. First of all, we can try
to consider the global SU(2) symmetry for which
ϕ → U ϕ, A → U AU −1 . (9.101)
But this kind of symmetry is not that meaningful (in general) since there is no
gauge invariant way of demanding that U be constant except via the covariant
derivative, and this would force the curvature to vanish. The availability
of “global” or “rigid” gauge symmetries is a special feature permitted by
Abelian gauge groups. It does not immediately generalize to YM theory.
However, we can still consider Noether’s second theorem as applied to
the gauge symmetry
ϕ → U (x)ϕ, Aµ → U (x)Aµ U −1 (x) − (∂µ U (x))U −1 (x). (9.102)
To begin with, the source-free theory described by LY M also has the gauge
symmetry, and this implies the identity (which you can easily check directly)
Dν Dµ F µν = 0. (9.103)

Problem:
11. Verify by direct computation
Dν Dµ F µν = 0. (9.104)
(Note that this is not quite as trivial as in the U (1) case since covariant
derivatives appear.)

For the YM field coupled to its source described by Lϕ , the differential iden-
tity is
{Dν [Dµ Fiµν − jiν ]} + 2Re [Eϕ ei ϕ] = 0. (9.105)
We conclude then that the scalar current (9.100) satisfies a covariant conser-
vation equation when the field equations hold:
Dµ j µ ≡ ∂µ j µ + [Aµ , j µ ] = 0, modulo field equations. (9.106)

Problem:
9.9. NOETHER THEOREMS 211

12. Derive (9.105) and (9.106).

This last result is significant: a covariant divergence appears in (9.106)

rather than an ordinary divergence. This means that one does not have a
continuity equation for j µ ! Physically, one interprets this by saying that the
“charge”described by j µ is not all the charge in the system. Indeed, one
says that the YM field itself carries some charge! As we have mentioned,
this is why the YM equations are non-linear: the YM field can serve as its
own source. As it turns out, without making some kind of special restrictions
(see below), there is no gauge-invariant way to localize the charge and current
densities, so there is no meaningful conserved current in general! With ap-
propriate asymptotic conditions it can be shown that there is a well-defined
notion of total charge in the system, but that is another story. All these
features have analogs in the gravitational field within the general theory of
relativity.

Problems:

13. Suppose the gauge field A takes the form

A = A1 e1 , A1 = αµ dxµ . (9.107)

Show that there exist fields φ = φi ei transforming according to the

adjoint representation of SU(2), which satisfy

Dµ φ = 0. (9.108)

14. Suppose that the gauge field A is such that there exists a covariantly
constant field φ = φi ei (e.g., as in the previous problem),

Dµ φ = 0. (9.109)

Show that J µ := φi jiµ (see (9.100)) defines a bona fide conserved cur-
rent.
212 CHAPTER 9. NON-ABELIAN GAUGE THEORY

9.10 Non-Abelian gauge theory in general

There are a number of generalizations of our results for YM theory. For
example, here I show how to generalize YM theory to any gauge group asso-
ciated to a matrix Lie group G. Let G be a Lie group represented as a set
of matrices acting on a real (for simplicity) vector space V . Let g be its Lie
algebra, also represented as a set of matrices acting on V . Let

ϕ : M → V, (9.110)

and let A be a g-valued 1-form on M . Define a covariant derivative

Dµ ϕ = ∂µ ϕ + Aµ ϕ. (9.111)

Define the curvature of A via

[Dµ , Dν ]ϕ = Fµν ϕ, (9.112)

so that
Fµν = ∂µ Aν − ∂ν Aµ + [Aµ , Aν ]. (9.113)
The curvature is a g-valued 2-form. The behavior of the gauge field and
curvature under a gauge transformations is the same as when the gauge
group was SU(2):

Aµ → U (x)Aµ U −1 (x) − (∂µ U (x))U −1 (x), (9.114)

Fµν → U (x)Fµν U −1 (x). (9.115)

To build a Lagrangian we need two bi-linear forms. Let

α: V × V → R (9.116)

be a G-invariant, non-degenerate bilinear form on V . (Recall that when we

chose V = C2 and G = SU (2) we used the standard U (2) invariant Hermitian
inner product for α.) “G-invariance” means that for any U ∈ G

α(U v, U w) = α(v, w), v, w ∈ V. (9.117)

“Non-degenerate” means that

α(v, w) = 0 ∀ w =⇒ v = 0. (9.118)
9.11. CHERN-SIMONS THEORY 213

Next, let
β: g × g → R (9.119)
be a non-degenerate bi-linear form on the Lie algebra, invariant with respect
to the adjoint representation of G on g. If U ∈ G the adjoint representation,
AdU : g → g, is given by

AdU · τ = U τ U −1 , τ ∈ g. (9.120)

The invariance condition on β is

β(AdU τ1 , AdU τ2 ) ≡ β(U τ1 U −1 , U τ2 U −1 ) = β(τ1 , τ2 ). (9.121)

(When we used the 2 × 2 matrix representation of G = SU (2) we used

(minus) the trace of a product of matrices for β. This is equivalent to using
the Killing form for β.)
The Lagrangian for the theory can now be constructed via

L = LA + Lϕ , (9.122)

where
LA = β(Fµν , F µν ), (9.123)
and
Lϕ = −α(Dµ ϕ, Dµ ϕ) − m2 α(ϕ, ϕ). (9.124)
I think you can see that, given the the transformation properties of the
covariant derivative and the curvature, and given the invariance properties
of the bi-linear forms α and β, the Lagrangian L is gauge invariant.

9.11 Chern-Simons Theory

There is a significant new option for gauge theory when the underlying man-
ifold of independent variables has three dimensions (as opposed to the four
we have been using for the most part so far). This gauge theory is known
as Chern-Simons theory. Here I will just give a brief introduction within
the general setting of the previous section. The goals will be to define the
Lagrangian, compute the field equations, and examine some symmetries. I
will begin by summarizing the technology we will need.
214 CHAPTER 9. NON-ABELIAN GAUGE THEORY

Fix a basis ei , i = 1, 2, . . . , m, for a chosen m-dimensional Lie algebra, so

that
A = Aiµ dxµ ei , [ei , ej ] = Cijk ek .

(9.125)
(In this section Greek letters denote coordinates xµ , µ = 1, 2, 3.) The invari-
ant bilinear form on the Lie algebra is then

β = βij ω i ⊗ ω j , (9.126)

where ω i is the basis of 1-forms dual to the basis ei . This means that, with

v = v i ei , w = w i ei , (9.127)

we have
β(v, w) = βij v i wj . (9.128)
The adjoint-invariance of β implies that
c c
Cka βcb + Ckb βac = 0. (9.129)
c c
Along with the usual anti-symmetry of the structure constants, Cab = −Cba ,
this implies
c c
βca Cbd = βc[a Cbd] . (9.130)
The next ingredient we will need to build the Lagrangian is the 3-dimensional
permutation symbol.8 This object is denoted η αβγ , α, β, γ = 1, 2, 3, and is de-
fined to have the following properties: (i) it is totally anti-symmetric (changes
sign under any permutation of its components), and (ii) it has components
consisting of 0, ±1 where, in any coordinate system (with the same orienta-
tion),
η 123 = 1. (9.131)
The permutation symbol plays a fundamental role in defining the deter-
minant of a matrix. From the Leibniz formula for the determinant of a 3 × 3
matrix Bβα we have

(det B) η µνσ = η αβγ Bαµ Bβν Bγσ . (9.132)

Using this identity, the permutation symbol can be interpreted as a rank-3,

contravariant tensor density of weight one. By definition, under a coordinate
8
Also known as the Levi-Civita tensor density.
9.11. CHERN-SIMONS THEORY 215

transformation xα → x α , the components W αβγ of a rank-3, contravariant

tensor density of weight one transform as
−1
∂x α ∂x β ∂x γ µνσ

αβγ ∂x
W = det W (9.133)
∂x ∂xµ ∂xν ∂xσ

Applying this rule to η αβγ we get

−1
∂x α ∂x β ∂x γ µνσ

αβγ ∂x
η = det η = η αβγ . (9.134)
∂x ∂xµ ∂xν ∂xσ

Thus if (9.131) holds in any one coordinate system it will hold in all coordi-
nate systems, consistent with the definition of the permutation symbol. An
important point to make in this context is that nowhere did we need to use
a metric tensor to define the permutation symbol; such an object is always
available as a rank-n tensor density on any n-dimensional manifold.

Problem:

15. Show that one can likewise define a permutation symbol ηαβγ , which
is a covariant tensor density of weight minus one. (Do not introduce a
metric to lower the indices!)

We now have enough machinery to build the Lagrangian density for

Chern-Simons theory. I will simply postulate it; you can consult, e.g.,
the nLab page ncatlab.org/nlab/show/Chern-Simons+theory for back-
ground, motivations, additional results, details, and references. The Chern-
Simons Lagrangian density is

αβγ i j 2 j i k l
L = η βij 2Aα ∂β Aγ + Ckl Aα Aβ Aγ . (9.135)
3

Let’s calculate the Chern-Simons field equations using the Euler-Lagrange

formula. We have
∂L µβγ

j j k l
∂L
= η 2β jr ∂ β A γ + 2βj[r C kl] Aβ A γ , = −2η µνγ βjr Ajγ ,
∂Arµ ∂Arµ,ν
(9.136)
216 CHAPTER 9. NON-ABELIAN GAUGE THEORY

Consequently, the Euler-Lagrange expression Erµ (L) for the Chern-Simons

Lagrangian is

j
Erµ (L) = η µβγ 2βjr ∂β Ajγ + 2βj[r Ckl] Akβ Alγ + 2η µνγ βjr ∂ν Ajγ

j
= 2η µβγ 2βjr ∂β Ajγ + βj[r Ckl] Akβ Alγ
j
= 2η µβγ 2βjr ∂β Ajγ + βjr Ckl Akβ Alγ

µβγ j j k l
= 2η βjr 2∂[β Aγ] + Ckl Aβ Aγ (9.137)

To get the last equality I used (9.130). The curvature formula (9.113) ex-
pressed in the basis (9.125) is
i
Fµν = ∂µ Aiν − ∂ν Aiµ + Cjk
i
Ajµ Akν . (9.138)

We thus get
j
Erµ (L) = 2η µβγ βjr Fβγ . (9.139)
Because of the identities
β β
η µβγ ηµρσ = 2δ[ρ δσ] , βjr β rs = δjs , (9.140)

we can conclude
Erµ (L) = 0 ⇐⇒ i
Fµν = 0. (9.141)
The EL equations for the Chern-Simons Lagrangian say that the gauge field
(or connection) is flat!
I will conclude this brief introduction to Chern-Simons theory by exposing
the somewhat some remarkable symmetries of the Lagrangian. To begin, it
is reasonable to inquire whether there is a gauge symmetry. At a first glance,
existence of such a symmetry would appear doubtful since the Lagrangian is
not built in an invariant way from the curvature. Still, it is instructive to
look more closely. Use (9.138) to write the Lagrangian as

αβγ i j 1 j i k l
L = η βij Aα Fβγ − Ckl Aα Aβ Aγ
3

αβγ 1
=η β(Aα , Fβγ ) − β(Aα , [Aβ , Aγ ])
3

αβγ 2
=η β(Aα , Fβγ ) − β(Aα , Aβ Aγ ) . (9.142)
3
9.11. CHERN-SIMONS THEORY 217

Define
Ãα = U Aα U −1 − ∂α U U −1 , F̃βγ = U Fβγ U −1 . (9.143)
Use the gauge transformation formulas (9.114) and (9.115) and the invariance
conditions (9.121) and (9.130) to obtain

η αβγ β(Ãα , F̃βγ ) = η αβγ β(Aα , Fβγ ) − η αβγ β(U −1 ∂α U, Fβγ )

= η αβγ β(Aα , Fβγ ) − 2β(U −1 ∂α U, ∂β Aγ ) − 2β(U −1 ∂α U, Aβ Aγ )

(9.144)

and

β(Ãα , Ãβ Ãγ ) = η αβγ β(Aα , Aβ Aγ ) − β(∂α U U −1 , ∂β U U −1 ∂γ U U −1 )

+ 3β(U −1 ∂α U U −1 ∂β U, Aγ ) − 3β(U −1 ∂α U, Aβ Aγ )
(9.145)

We now have
2
L̃ ≡ η αβγ β(Ãα , F̃βγ ) − β(Ãα , Ãβ Ãγ )
3
2 αβγ
= L + η β(∂α U U −1 , ∂β U U −1 ∂γ U U −1 )
3
− 2η αβγ β(U −1 ∂α U, ∂β Aγ ) + β(U −1 ∂α U U −1 ∂β U, Aγ ) . (9.146)

From U −1 U = 1 we can deduce

U −1 ∂α U U −1 = −∂α U −1 , (9.147)

so that
2
L̃ = L + η αβγ β(∂α U U −1 , ∂β U U −1 ∂γ U U −1 ) + ∂α η αβγ β(2U −1 ∂βU, Aγ ) .

3
(9.148)
The last term is a divergence; what about the second term? This term is
interesting. If the gauge transformation can be obtained by iterating in-
finitesimal transformations, that is, if it can be written as

U = exp(λi (x)ei ), (9.149)

for some functions λ(x) then

∂α U U −1 = ∂α λi ei , =⇒ ∂[α (∂β] U U −1 ) = 0. (9.150)

218 CHAPTER 9. NON-ABELIAN GAUGE THEORY

You can check that in this case the term of interest in (9.148) can also be
written as a divergence. Thus infinitesimal gauge transformations will be
divergence symmetries of the Lagrangian. Noether’s second theorem then
applies implying the existence of differential identities satisfied by the Euler-
Lagrange equations. In this case these are equivalent to the “Bianchi identi-
ties”:
D[α Fβγ] = 0. (9.151)

Problem:
16. Show by direct computation just using the definition (9.113) and the
definition of covariant derivative that

D[α Fβγ] = 0. (9.152)

It can happen that there are gauge transformations not obtained by it-
erating infinitesimal transformations as in (9.149). To explain this in detail
will take us too far afield, so let me just provide you with an example.

Problem:
17. Consider a Chern-Simons theory where M = S 3 and G = SU (2),
and βij = −2δij . Using the Euler angle coordinates xµ = (γ, β, α),
(0 < α < 2π, 0 < β < 2π, 0 < γ < π) for S 3 , define U : M → G by

cos(γ/2) e−i(α−β) sin(γ/2)

i(α+β)
e
U= (9.153)
−ei(α−β) sin(γ/2) e−i(α+β) cos(γ/2)
Show that
Z
η αβγ β(∂α U U −1 , ∂β U U −1 ∂γ U U −1 ) = 48π 2 . (9.154)
M

This means that the integrand can’t be written as a divergence since

in such a case the integral would have to vanish by the divergence
theorem. Thus for these “large” gauge transformations the Lagrangian
is not gauge invariant.9
9
Interestingly and importantly, with a proper normalization the action integral changes
by an additive integer under such gauge transformations. This allows gauge invariance in
the associated quantum field theory.
9.11. CHERN-SIMONS THEORY 219

Finally, let me point out a remarkable symmetry enjoyed by the Chern-

Simons Lagrangian. Consider a diffeomorphism of M . This is a smooth 1-1
mapping of M onto itself whose inverse is also smooth. Diffeomorphisms
are the “symmetries” of manifolds in the sense that one considers two man-
ifolds related by a diffeomorphism to be the same abstract manifold. The
diffeomorphisms form a group, which is the symmetry group of “generally
covariant” theories like Einstein’s general theory of relativity, as we shall
discuss later. I want to show you that, unlike the usual Yang-Mills type of
Lagrangian, the Chern-Simons Lagrangian is generally covariant in the sense
that it admits the diffeomorphism symmetry. For brevity, I will just sketch
the proof of invariance with respect to infinitesimal transformations.
Given a coordinate system xµ on M , a diffeomorphism φ : M → M looks
like a change of coordinates
xµ → x̃µ = φµ (x), (9.155)
where φµ will be smooth with a smooth inverse in the domain of the coordi-
nates. An infinitesimal diffeomorphism comes from a 1-parameter family of
diffeomorphisms φs in the usual way:
φµs (x) = xµ + sV µ (x) + O(s2 ). (9.156)
Here the V µ define – and are defined by – the 1-parameter family of transfor-
mations. Geometrically, V α is a vector field on M . So, for any infinitesimal
diffeomorphism defined by a vector field V α on M , define the infinitesimal
transformation of the gauge field as10
δAα = V β Fβα . (9.157)

Problem:
18. A “covariant” tensor like Aα transforms under (9.155) by
∂φν
Aµ (x) → Aν (φ(x)). (9.158)
∂xµ
Show that (9.157) corresponds to the infinitesimal form of this transfor-
mation law (using (9.156)) up to an infinitesimal gauge transformation
generated by V α Aα (see (9.46)).
10
For those who know about these things, this is the “gauge covariant Lie derivative”
of A.
220 CHAPTER 9. NON-ABELIAN GAUGE THEORY

The transformation (9.157) induces the following change in the Lagrangian

density:
j
δL = 2η αβγ βij V σ Fσα
i
+ ∂β 2η αβγ βij Aiα V σ Fσγ
j

Fβγ . (9.159)

As a nice exercise you can check the identity

j
η αβγ βij Fσα
i
Fβγ = 0. (9.160)

Consequently, the infinitesimal transformation (9.157) defines a divergence

symmetry of the Lagrangian.

Problem:

19. Apply Noether’s second theorem to the infinitesimal gauge and dif-
feomorphism symmetries of the Chern-Simons Lagrangian. What are
the corresponding differential identities satisfied by the Euler-Lagrange
expression (9.139)?

9.12 PROBLEMS
1. Verify that the ek in (9.15) satisfy the su(2) Lie algebra (9.16) as ad-
vertised.

2. (a) Show that the Lagrangian (9.17) is invariant with respect to the
transformation (9.8); (b) compute its Euler-Lagrange equations; (c)
derive the conserved currents (9.19) from Noether’s theorem; (d) verify
that the currents (9.19) are divergence-free when the Euler-Lagrange
equations are satisfied.

3. Show that if any one of the currents (9.19) are conserved, then the
other 2 are automatically conserved. (Hint: Consider the behavior of
the currents under SU(2) transformations of the fields.)

4. Show that the adjoint representation (9.40) has the infinitesimal form
(9.42).

5. Verify the results (9.59) – (9.62) on the YM curvature.

9.12. PROBLEMS 221

6. Let U (x) : M → SU (2) define a gauge transformation, for which

Aµ → U (x)Aµ U −1 (x) − (∂µ U (x))U −1 (x). (9.161)
Show that
Fµν → U (x)Fµν U −1 (x). (9.162)
7. Verify equation (9.76).
8. Verify equations (9.84) and (9.85).
9. Consider the Yang-Mills field whose non-zero components, Aiα , α =
(t, x, y, z), i = (1, 2, 3), relative to an inertial Cartesian coordinate sys-
tem and the basis ei are given by
z y z x y x
A2x = − 2 , A3x = 2 , A1y = 2 , A3y = − 2 , A1z = − 2 , A2z = 2 .
r r r r r r
(9.163)
Show that this YM field solves the YM equations.
10. Derive equations (9.97), (9.98), (9.99).
11. Verify by direct computation
Dν Dµ F µν = 0. (9.164)
(Note that this is not quite as trivial as in the U (1) case since covariant
derivatives appear.)
12. Derive (9.105) and (9.106).
13. Suppose the gauge field takes the form
A = A1 e1 , A1 = αµ dxµ . (9.165)
Show that there exist fields φ = φi ei transforming according to the
adjoint representation of SU(2), which satisfy
Dµ φ = 0. (9.166)

14. Suppose that the gauge field A is such that there exists a covariantly
constant field φ = φi ei ,
Dµ φ = 0. (9.167)
Show that J µ := φi jiµ (see (9.100)) defines a bona fide conserved cur-
rent.
222 CHAPTER 9. NON-ABELIAN GAUGE THEORY

15. Show that one can define a permutation symbol ηαβγ , which is a co-
variant tensor density of weight minus one. (Do not introduce a metric
to lower the indices!)
16. Show by direct computation just using the definition (9.113) and the
definition of covariant derivative that
D[α Fβγ] = 0. (9.168)

17. Consider a Chern-Simons theory where M = S 3 and G = SU (2),

and βij = −2δij . Using the Euler angle coordinates xµ = (γ, β, α),
(0 < α < 2π, 0 < β < 2π, 0 < γ < π) for S 3 , define U : M → G by
cos(γ/2) e−i(α−β) sin(γ/2)
i(α+β)
e
U= (9.169)
−ei(α−β) sin(γ/2) e−i(α+β) cos(γ/2)
Show that
Z
η αβγ β(∂α U U −1 , ∂β U U −1 ∂γ U U −1 ) = 48π 2 . (9.170)
M

This means that the integrand can’t be written as a divergence since

in such a case the integral would have to vanish by the divergence
theorem. Thus for these “large” gauge transformations the Lagrangian
is not gauge invariant.11
18. A “covariant” tensor like Aα transforms under (9.155) by
∂φν
Aµ (x) → Aν (φ(x)). (9.171)
∂xµ
Show that (9.157) corresponds to the infinitesimal form of this transfor-
mation law (using (9.156)) up to an infinitesimal gauge transformation
generated by V α Aα (see (9.46)).
19. Apply Noether’s second theorem to the infinitesimal gauge and dif-
feomorphism symmetries of the Chern-Simons Lagrangian. What are
the corresponding differential identities satisfied by the Euler-Lagrange
expression (9.139)?

11
Interestingly and importantly, with a proper normalization the action integral changes
by an additive integer under such gauge transformations. This allows gauge invariance in
the associated quantum field theory.
Chapter 10

Gravitational field theory

Currently, our best theory of gravitation is the field-theoretic description

occurring within Einstein’s geometrical model of spacetime as a Lorentzian
manifold, which is the essence of his 1915 General Theory of Relativity.
There are 3 principal ingredients to this theory. (1) The geodesic hypothesis:
freely falling test particles move on geodesics of the spacetime geometry. (2)
The Einstein field equations: the curvature of the geometry is specified by
energy density, pressure, momentum and stress of matter. (3) The principle
of general covariance: the only fixed structure needed to implement (1) and
(2) is the manifold structure of spacetime. Everything else is determined
dynamically.
The following sections are not intended to be an introduction to relativ-
ity, but rather an exploration of some of its field theoretical aspects. I am
assuming that you have seen some relativistic physics in a previous class.

10.1 Spacetime geometry

We begin with a very brief review of spacetime geometry. The goal in this
section is simply to refresh your memory regarding some of the things we will
need and to establish notation. The set of all spacetime events is mathemat-
ically represented as a differentiable manifold M equipped with a spacetime
metric g. The metric determines the spacetime interval between events via
its line element. In coordinates xµ on M , the infinitesimal interval ds is
defined by the “line element”

ds2 = gµν (x)dxµ dxν . (10.1)

223
224 CHAPTER 10. GRAVITATIONAL FIELD THEORY

The corresponding metric tensor field takes the form

g := gµν (x)dxµ ⊗ dxν , gµν = gνµ , (10.2)

where dxµ are the coordinate basis dual vectors. We denote by g αβ = g βα

the symmetric array which is inverse to the array of components gαβ :

g αβ gβγ = δγα . (10.3)

It is a basic result from linear algebra that any quadratic form Q can be
put into canonical form by a change of basis (try googling “Sylvester’s law of
inertia”). What this means is that there always exists a basis for the vector
space upon which Q is defined such that the matrix of components of the
quadratic form takes the form

Q = diag(1 , 2 , . . . , n ), (10.4)

where each = ±1, 0 and n = dim(M ). The number of times +1 occurs,

−1 occurs, and 0 occurs cannot be modified by a change of basis and is an
intrinsic feature – the only intrinsic feature – of Q. These numbers determine
the signature of Q.
At each point of M the metric determines a quadratic form on the tangent
space to M at that point. So we can apply the above math facts to the metric
at any given point. First of all, a metric cannot have any = 0 (since the
metric should be non-degenerate). Secondly, if each = 1, then the metric is
called Riemannian. Otherwise, the metric is called pseudo-Riemannian. If a
single = −1 and the rest have = 1 the pseudo-Riemannian metric is called
Lorentzian.1 All this analysis took place at one point, but since we require
6= 0, the signature cannot change from point to point (assuming the metric
components and hence the signature vary continuously). Thus the signature
is actually a fixed feature of the whole metric tensor field.
Spacetime in general relativity is a Lorentzian manifold – a manifold
equipped with a metric of Lorentz signature. Most of what we do here will
not depend on the signature of the metric; only in a few places will we
assume that the metric is Lorentzian. It is perhaps worth pointing out that
not all manifolds admit a globally defined Lorentzian metric. For example,
1
There is a convention (which we shall not use) where one also calls “Lorentzian” the
case where a single = 1 and the rest are minus one.
10.1. SPACETIME GEOMETRY 225

the manifold R4 admits a Lorentzian metric, but the four-sphere (S 4 ) does

not.
On a Riemannian manifold (a manifold equipped with a Riemannian met-
ric) the line element of the metric determines the infinitesimal distance ds
between neighboring points xµ and xµ + dxµ . On a Lorentzian manifold the
line element of the metric represents the invariant spacetime interval and
determines the infinitesimal proper time elapsed along nearby timelike sepa-
rated events and the infinitesimal proper distance between nearby spacelike
separated events.
The metric defines a notion of squared-“length” of vectors and a notion of
“angles” between vectors. Consequently, one can use the metric to define the
difference between two vectors (and other kinds of tensors) at neighboring
spacetime points. Using the metric to define parallelism on the spacetime,
one obtains a derivative operator (or “covariant derivative”) ∇ which is de-
fined as follows. The derivative maps functions to 1-forms using the exterior
derivative:
∇µ φ = ∂µ φ. (10.5)
The derivative maps vector fields to tensor fields of type 11 :

∇α v β = ∂α v β + Γβγα v γ , (10.6)

where the Christoffel symbols are given by

1
Γβγα = g βσ (∂γ gασ + ∂α gγσ − ∂σ gαγ ). (10.7)
2
The derivative maps 1-forms to tensor fields of type 02 :

∇α wβ = ∂α wβ − Γγβα wγ . (10.8)

The derivative is extended to all other tensor fields as a derivation (lin-

ear, Leibniz product rule) on the algebra of tensor fields. In particular, the
Christoffel symbols are defined so that the metric determines parallelism; we
have:
∇α gβγ ≡ ∂α gβγ − Γσβα gσγ − Γσγα gβσ = 0. (10.9)

Problem:

1. Show that (10.7) enforces ∇α gβγ = 0.

226 CHAPTER 10. GRAVITATIONAL FIELD THEORY

Geodesics play an important role in Einstein’s theory of gravity. Mathe-

matically, geodesics are curves whose tangent vector T is parallel transported
along the curve, T α ∇α T β = 0. If the curve is given parametrically by func-
tions q µ , i.e., xµ = q µ (s), where s is proportional to the arc length along the
curve2 , then the geodesics are solutions to

d2 q µ α
µ dq dq
β
+ Γ αβ = 0, (10.10)
ds2 ds ds
dq µ dq ν
gµν
= κ, (10.11)
ds ds
where κ is a given constant; κ > 0 for spacelike curves, κ < 0 for timelike
curves and κ = 0 for lightlike curves.

Problem:
µ dq ν
2. Show that gµν dqds ds
is a constant of motion for (10.10).

For essentially the same reasons as in Yang-Mills theory, the commuta-

tor of two covariant derivatives defines a tensor field called the Riemann
curvature tensor:
2∇[α ∇β] wγ = Rαβγ δ wδ , (10.12)
where
Rαβγ δ = −2∂[α Γδβ]γ + 2Γγ[α Γδβ] . (10.13)
The curvature tensor satisfies the following identities:

Rαβγδ = −Rβαγδ = −Rαβδγ = Rγδαβ , (10.14)

∇[σ Rαβ]γδ = 0. (10.15)

The latter identity is known as the Bianchi identity.
The Ricci tensor is defined to be

Rαβ = Rγ αγβ . (10.16)

The scalar curvature, or Ricci scalar, is defined to be

R = Rαα = Rγα γα (10.17)

2
Or is an “affine parameter” if the curve is lightlike.
10.2. THE GEODESIC HYPOTHESIS 227

The Weyl tensor is defined (in n dimensions) by

2 2
Cαβγδ = Rαβγδ − (gα[γ Rδ]β − gβ[γ Rδ]α ) + Rgα[γ gδ]β .
n−2 (n − 1)(n − 2)
(10.18)
The curvature tensor is completely determined by the Weyl tensor, Ricci
tensor, and Ricci scalar through the following formula:
2 2
Rαβγδ = Cαβγδ + gα[γ Rδ]β − gβ[γ Rδ]α − Rgα[γ gδ]β .
n−2 (n − 1)(n − 2)
(10.19)

10.2 The Geodesic Hypothesis

Einstein’s Big Idea is that gravitation is a manifestation of the curvature of
spacetime. In particular, the motion of test particles in a gravitational field
is defined to be geodesic motion in the curved spacetime. From this point of
view, the principal manifestation of gravitation is a focusing/defocusing of
families of geodesics. Let us briefly explore this.
For the geodesic hypothesis to work, it must be that any possible initial
conditions for a particle can be evolved uniquely into one of the geodesic
curves. This works because geodesics are solutions xµ = q µ (s) to the geodesic
equation (10.10). Like any consistent system of ODEs, the geodesic equations
have a well-posed initial value problem. What this means is that at any given
point, say xµ0 , and for any any tangent vector v µ at this point satisfying (for
a given constant κ),
gµν (x0 )v µ v ν = κ, (10.20)
there exists a unique solution to the geodesic equations (10.10) such that

dq µ (0)
q µ (0) = xµ0 , = vµ. (10.21)
ds
Here is a useful mathematical result with an important physical interpre-
tation. Pick a spacetime event, i.e., a point p ∈ M . Events q sufficiently close
to p can be labeled using geodesics as follows. Find a geodesic starting at p
which passes through q. This geodesic will be unique if q is in a sufficiently
small neighborhood of p. Let uµ be the components of the tangent vector at
p which, along with p, provides the initial data for the geodesic which passes
228 CHAPTER 10. GRAVITATIONAL FIELD THEORY

through q. If the geodesic in question is timelike, then normalize uµ to have

length gαβ (p)uα uβ = −1. If the geodesic is spacelike, normalize uµ to have
length +1. If the geodesic is null, normalization is not an issue. The geodesic
in question will pass through q when the affine parameter s takes some value,
say, s0 . Assign to q the coordinates xα = s0 uα . It can be shown that this
construction defines a coordinate chart called geodesic normal coordinates.
This chart has the property that
gαβ (p) = ηαβ , Γγαβ (p) = 0. (10.22)
These relations will not in general hold away from the origin p of the nor-
mal coordinates. Physically, the geodesics correspond to the local reference
frame of a freely falling observer at p. For a sufficiently small spacetime
region around p the observer sees spacetime geometry in accord with special
relativity insofar as (10.22) is a good approximation for any measurements
made. This is the mathematical manifestation of the equivalence principal
of Einstein. Sometimes (10.22) is characterized by saying that spacetime
is “locally flat”, but you should be warned that this slogan is misleading –
spacetime curvature can never be made to disappear in any reference frame.3
Let us see what in fact is the physical meaning of spacetime curvature.
Nearby geodesics will converge/diverge from each other according to the
curvature there. This “tidal acceleration” is the principal effect of gravita-
tion and is controlled by the geodesic deviation equation, which is defined as
follows. Consider a 1-parameter (λ) family of geodesics, xµ = q µ (s, λ). This
means that, for each λ there is a geodesic obeying
d2 q µ (s, λ) α β
µ dq (s, λ) dq (s, λ)
+ Γ αβ = 0, (10.23)
ds2 ds ds
dq µ (s, λ) dq ν (s, λ)
gµν = κ, (10.24)
ds ds
The deviation vector field
∂q µ (s, λ)
rµ (s, λ) = (10.25)
∂λ
defines the displacement of the geodesic labeled by λ + dλ relative to the
geodesic labeled by λ. The relative velocity v µ (s, λ) of the geodesic labeled
3
Roughly speaking, at a given point the zeroth and first derivative content of the metric
can be adjusted more or less arbitrarily by a choice of coordinates, but (some of the) second
derivative information is immutable.
10.3. THE PRINCIPLE OF GENERAL COVARIANCE 229

by λ + dλ relative to the geodesic labeled by λ is the directional derivative

of the deviation vector in the direction of the geodesic labeled by λ:

∂rµ ∂q σ
v µ (s, λ) = + Γµνσ rν . (10.26)
∂s ∂s
The relative acceleration aµ (s, λ) of the neighboring geodesic labeled by λ+dλ
relative to the geodesic labeled by λ is the directional derivative of the relative
velocity in the direction of the geodesic labeled by λ:

∂v µ ∂q σ
aµ (s, λ) = + Γµνσ v ν . (10.27)
∂s ∂s
A nice exercise is to show that:

aµ (s, λ) = Rµ αβγ v α v β rγ . (10.28)

This is the precise sense in which spacetime curvature controls the relative
acceleration of geodesics. Note that while one can erect a freely falling refer-
ence frame such that (10.22) holds at some event, it is not possible to make
the curvature vanish at a point by a choice of reference frame. Thus the
geodesic deviation (10.28) is an immutable feature of the gravitational field.
One can say that gravity is geodesic deviation.

10.3 The Principle of General Covariance

The principle of general covariance is usually stated in physical terms as “the
laws of physics take the same form in all reference frames”. Since physical
laws are typically defined using differential equations, and since reference
frames in physics can be mathematically represented in terms of coordinates
(possibly along with other structures) a pragmatic implementation of the
principle is to require that the differential equations characterizing physi-
cal laws take the same form in all coordinate systems. This means that,
while one must introduce coordinates to define the differential equations, the
equations which are obtained are in fact independent of the choice of the
coordinates. A more elegant global form of this statement is that the field
equations only depend upon the manifold structure of spacetime for their
construction and hence must be unchanged by any transformations which
fix that structure. The Einstein field equations (to be described soon) are,
230 CHAPTER 10. GRAVITATIONAL FIELD THEORY

of course, the paradigm for such “generally covariant field equations”. To

better understand the principle of general covariance it is instructive to look
at a simple and familiar example of a field equation which is not generally
covariant.
Consider the wave equation (massless KG equation) for a scalar field ϕ
on a given spacetime (M, g). The wave equation takes the form
ϕ ≡ g αβ ∇α ∇β ϕ = 0. (10.29)
This definition can be used to compute the wave equation in any coordi-
nate system, but the form of the equation will, in general, be different in
different coordinate systems because the metric and Christoffel symbols –
which must be specified in order to define the equation for ϕ – are different
in different coordinate systems. For example, suppose that the spacetime is
Minkowski spacetime (so the curvature tensor of g vanishes). Then one can
introduce (inertial) coordinates xα = (t, x, y, z) such that the metric is the
usual Minkowski metric and the wave equation is
ϕ = (−∂t2 + ∂x2 + ∂y2 + ∂z2 )ϕ = 0. (10.30)
Now suppose that we use spherical polar coordinates to label events at the
same t. Labeling these coordinates as xα = (t, r, θ, φ) we have
1 2 1
ϕ = (−∂t2 + ∂r2 + 2
∂θ + 2 2 ∂φ2 )ϕ = 0. (10.31)
r r sin θ
Clearly these differential equations are not the same, e.g., one equation has
constant coefficients and the other doesn’t. The KG equation on a given
background spacetime is not “generally covariant”.

Problem:
3. Perhaps the simplest example of a “generally covariant” differential
equation is the general form of the geodesic equation in 2-d Euclidean
space. It arises as the Euler-Lagrange equations for a curve (x(λ), y(λ))
of the Lagrangian: p
L = ẋ2 + ẏ 2 . (10.32)
Show that the form of the EL equations do not depend upon the choice
of parameter λ. (Hint: consider a new parameter τ = f (λ); show that
the equations in terms of τ are of the same form as those in terms of
λ, irrespective of the choice of f .)
10.3. THE PRINCIPLE OF GENERAL COVARIANCE 231

A mathematically precise way to implement the principle of general co-

variance in a theory is to demand that all equations defining the theory are
constructed using only the underlying manifold structure. The KG equation
as described above fails to obey this principle since one needs to specify ad-
ditional structure – the metric – in order to define the equation for ϕ. Recall
that two manifolds are equivalent if and only if they are related by a dif-
feomorphism, which is a smooth mapping φ : M → M which has a smooth
inverse φ−1 . The principle of general covariance specifies that the differential
equations of interest will be insensitive to diffeomorphisms.
In coordinates, a diffeomorphism is a transformation xα → y α = φα (x)
with inverse xα = φ−1α (y). Let us consider how the metric behaves under
such a transformation. The metric components g̃αβ , expressed in terms of y α ,
are related to the components in terms of xα by the “pullback” operation:4

∗ ∂φµ ∂φν
(φ g)αβ (x) = g̃µν (φ(x)). (10.33)
∂xα ∂xβ
More generally, given any tensor field we can relate its components φ∗ Tβ...
α...
(x)
µ α... µ
in terms of x and its components T̃β... (y) in terms of y :

∂φν ∂φ−1α

∗
(φ T )α...
β... (x) = ... µ...
. . . T̃ν... (φ(x)). (10.34)
∂xβ ∂y µ y=φ(x)

Now we exhibit (without proof) a fundamental identity involving the cur-

vature. The curvature formula (10.13) takes a metric5 g in some coordinate
system and produces the curvature tensor R(g) in that same coordinate sys-
tem. Given a diffeomorphism φ – an equivalent presentation of the manifold
– the metric changes to φ∗ g. Applying (10.13) we get the important result:

R(φ∗ g) = φ∗ R(g). (10.35)

What this formula says in English is: If you apply the curvature formula to
the transformed metric, the result is the same as applying the formula to
the untransformed metric and then transforming the result. In this sense the
curvature formula is “the same” in all coordinate systems or, more elegantly,
is defined from the metric only using the underlying manifold structure. We
4
One way to see this is to take the line element defined with y α and express it in terms
α
of x using the mapping φ.
5
We could also phrase this in terms of a connection.
232 CHAPTER 10. GRAVITATIONAL FIELD THEORY

say that the curvature tensor is “naturally” defined in terms of the met-
ric. More generally, any tensor field obtained from the curvature using the
metric and covariant derivatives with the metric-compatible connection will
be “naturally” constructed from the metric. We will call such tensor fields
natural.6
The principle of general covariance is the requirement that the field equa-
tions for spacetime are naturally constructed from the metric (and any other
matter fields which may be present). Just considering spacetime (with no
matter), let the field equations be of the form G = 0, where G = G(g) is
a tensor constructed naturally from the metric g. Supposed g0 is solution
to the field equations, G(g0 ) = 0. Then, thanks to the property (10.35), we
have
G(φ∗ g0 ) = φ∗ G(g0 ) = 0. (10.36)

This shows that generally covariant field theories on a manifold M have a

very large group of symmetries – the diffeomorphism group of M .

10.4 The Einstein-Hilbert Action

In this section we shall build the Einstein field equations for the gravitational
field “in vacuum”, that is, in regions of spacetime where there is no matter.
We will obtain the field equations as the Euler-Lagrange equations associ-
ated to a famous variational principle obtained by Hilbert and Einstein. To
implement the principle of general covariance, we need to use a Lagrangian
which is constructed naturally from the metric. Since we aim to vary the
metric to get the field equations, we must take care to identify all places
where the metric is used.
A Lagrangian density is meant to be the integrand in an action integral.
For an integral over a manifold of dimension n, the integrand is most properly
viewed as a differential n-form — a completely antisymmetric tensor of type
0

n
. Given a coordinate chart, xα , α = 0, 1, 2, 3, the integral over some

6
Some old-fashioned expositions of this subject simply call such an object a “tensor”.
This term is clearly ambiguous since any field of multi-linear maps on the tangent spaces,
cotangent spaces, and products thereof is, by definition, a tensor field, irrespective of its
naturality. We will use the modern terminology: “natural tensor” to denote things like
the curvature which obey (10.35).
10.4. THE EINSTEIN-HILBERT ACTION 233

region B in that chart of a function f (x) is

Z Z
0 1 2 3
f (x) dx ∧ dx ∧ dx ∧ dx ≡ f (x) dx0 dx1 dx2 dx3 (10.37)
B B

This, for example, explains the use of Jacobian determinants to transform

coordinates in an integral. Of course, the result of such an integral is as
arbitrary as are the coordinates used to make the 4-form dx0 ∧dx1 ∧dx2 ∧dx3 .
The metric provides an invariant notion of integration by defining a natural
4-form:
1p
= αβγδ dxα ∧ dxβ ∧ dxγ ∧ dxδ , αβγδ = |g|ηαβγδ . (10.38)
24
p
Here |g| is the square root of the absolute value of the determinant of the
metric components gαβ (in the current basis of dual vectors), and ηαβγδ is the
alternating symbol defined by the properties: (1) ηαβγδ is totally antisymmet-
ric, (2) η1234 = 1. We call the volume form defined by the metric. Notice
that all 4-forms in four dimensions are proportional to one another; here we
have p
= |g|dx0 ∧ dx1 ∧ dx2 ∧ dx3 . (10.39)
According to the rules for integrating forms, the integral of the volume form
over a region B is the volume V (B) of B defined by the metric. In coordinates
xa Z Z p
= |g| dx0 dx1 dx2 dx3 = V (B). (10.40)
B B

The principal properties of the volume form (10.38) are contained in the
following problems:

Problems:

4. Show that the volume form (10.38) is a natural tensor.

5. Show that the integral in (10.40) does not depend upon the choice of
coordinates.

While the volume form by itself is not suitable as a gravitational La-

grangian, it will be instructive and useful to see how to it varies with respect
234 CHAPTER 10. GRAVITATIONAL FIELD THEORY

to variations of the metric. To this end, consider a 1-parameter family of

metrics g(λ) passing through a given metric g ≡ g(0). As usual we define

∂gαβ (λ)
δgαβ = . (10.41)
∂λ λ=0

For each λ there is an inverse metric, which satisfies

g αβ (λ)gβγ (λ) = δγα , (10.42)

Differentiating both sides with respect to λ gives

δg αβ gβγ + g αβ δgβγ = 0, (10.43)

so that
δg αβ = −g αγ g βδ δgγδ . (10.44)
Henceforth we must remember to make this exception in the usual tensor
notation for raising and lowering indices with the metric. Next, we recall a
couple of results from linear algebra. Let A be a non-singular square matrix.
We have the identity
ln(det(A)) = tr(ln(A)). (10.45)
Now let A(λ) be a non-singular square matrix depending upon a parameter
λ. It follows from (10.45) that

d −1 d
det(A(λ)) = det(A(λ))tr A (λ) A(λ) . (10.46)
dλ dλ

Problem:
6. Prove (10.45) and (10.46).

From these results it follows that the variation of g ≡ det(gab ) is given by

δg = gg αβ δgαβ . (10.47)

Therefore, the variation of the volume form is given by

1 µν
δ = g δgµν . (10.48)
2
10.4. THE EINSTEIN-HILBERT ACTION 235

After the volume form, the next simplest natural Lagrangian density is
the Einstein-Hilbert Lagrangian:

LEH = (const.)R, (10.49)

where R is the scalar curvature. Einstein’s theory – including the cosmolog-

ical constant Λ – arises from a Lagrangian density which is a combination of
the Einstein-Hilbert Lagrangian density and the volume form:

1
L= (R − 2Λ), (10.50)
2κ
where κ is given in terms of Newton’s constant G and the speed of light c as

8πG
κ=
c4

To compute the Euler-Lagrange expression for L we will need to know

how to compute the variation of the scalar curvature. Here are the results
we will need.7
1
δΓγαβ = g γδ (∇α δgβδ + ∇β δgαδ − ∇δ δgαβ ) , (10.51)
2

δRαβγ δ = ∇β δΓδαγ − ∇α δΓδβγ , (10.52)

δRαγ = δRαβγ β = ∇β δΓβαγ − ∇α δΓββγ . (10.53)

Finally,

δR = δg αγ Rαγ + g αγ δRαγ = −Rαγ δgαγ + ∇σ g αγ δΓσαγ − g σγ δΓββγ (10.54)

Problem:

7. Derive equations (10.51)–(10.54).

7
All these formulas can be viewed as functions on the jet space of metrics. From this
point of view, the covariant derivative should be viewed as a total derivative plus the
Christoffel terms, e.g., ∇α ωβ = Dα ωβ − Γγαβ ωγ
236 CHAPTER 10. GRAVITATIONAL FIELD THEORY

The variation of the Lagrangian for Einstein’s theory is thus given by

1
δL = {(δR) + (R − 2Λ)δ}
2κ
1 1 αγ
αγ αγ αγ σ σγ β
= −(R − g R + Λg )δgαγ + ∇σ g δΓαγ − g δΓβγ
2κ 2
1
= {−(Gαγ + Λg αγ )δgαγ + ∇α Θα } , (10.55)
2κ
where
1
Gαβ = Rαβ − Rg αβ (10.56)
2
is the celebrated Einstein tensor, and the vector field featuring in the diver-
gence term is
Θσ = g αγ δΓσαγ − g σγ δΓββγ . (10.57)
The Einstein action for the gravitational field is given by
Z
1
Sgrav [g] = (R − 2Λ). (10.58)
M 2κ

If we consider metric variations with compact support on a manifold M ,

the functional derivative of the action (also known as the Euler-Lagrange
expression E for the Einstein Lagrangian) is given by

δSgrav 1
E αβ ≡ = − (Gαγ + Λg αγ ). (10.59)
δgαβ 2κ

10.5 Vacuum spacetimes

In the absence of matter, and for a given value of the cosmological constant,
the Einstein equations or the metric are

Gαβ + Λg αβ = 0. (10.60)

These are usually called the “vacuum Einstein equations” (with a cosmolog-
ical constant). Using the definition of the Einstein tensor, and contracting
this equation with gαβ (“taking the trace”), we get (in 4-dimensions)

R = 4Λ. (10.61)
10.6. DIFFEOMORPHISM SYMMETRY AND THE CONTRACTED BIANCHI IDENTITY237

Consequently, the vacuum Einstein equations with a cosmological constant

Λ are equivalent to (exercise)

Rαβ = Λgαβ . (10.62)

The vacuum equations are 10 coupled non-linear PDEs for the 10 compo-
nents of the metric, in any given coordinate system. Despite their complexity,
many solutions are known. The majority of the known solutions have Λ = 0,
but here is a pretty famous solution which includes Λ. It is called the “Kottler
metric”. It is also called the “Schwarzschild-de Sitter metric”. In coordinates
(t, r, θ, φ) it is corresponds to the line element given by

2m Λ 2 2 1
ds2 = −(1 − − r )dt + 2m Λ 2
dr2 + r2 (dθ2 + sin2 θdφ2 ). (10.63)
r 3 1− r
− 3
r

When m = 0 = Λ this is the metric (in spherical coordinates) of the flat

spacetime used in special relativity (and throughout this text!). When Λ = 0
this is the famous Schwarzschild solution. When m = 0 and Λ > 0 this is
the de Sitter metric. When m = 0 and Λ < 0 this is the anti-de Sitter
metric. The metric (10.63) can be physically interpreted as defining the
exterior gravitational field of a spherical body embedded in a space which is
(anti) de Sitter at large distances from the body.

Problem:

8. Use your favorite tensor analysis software (e.g., DifferentialGeometry

in Maple) to verify that (10.63) is, for any m, a solution to the vacuum
Einstein equations (10.60) with cosmological constant Λ.

10.6 Diffeomorphism symmetry and the con-

tracted Bianchi identity
The Euler-Lagrange expression computed from the gravitational Lagrangian
satisfies an important identity by virtue of its “naturality” (also known as
“general covariance” or “diffeomorphism invariance”). One way to see this
is as follows.
To begin with, the Einstein-Hilbert action is invariant under diffeomor-
phisms of M in the following sense. Let φ : M → M be a diffeomorphism.
238 CHAPTER 10. GRAVITATIONAL FIELD THEORY

Compute the action Z

S[g] = L(g) (10.64)
M
for two different metrics, g and φ∗ g. Because of naturality of the Lagrangian
and the fact that manifolds are invariant under diffeomorphisms8 , we have
Z Z Z Z
∗ ∗ ∗
S[φ g] = L(φ g) = φ (L(g)) = L(g) = L(g) = S[g].
M M φ(M ) M
(10.65)
Thus the Einstein action has a diffeomorphism symmetry.
Next we need an important fact about the transformation of a metric
by infinitesimal diffeomorphisms.9 Consider a 1-parameter family of diffeo-
morphisms, φλ , with φ0 being the identity. We specialize to infinitesimal
diffeomorphisms, characterized by λ << 1. In coordinates xα we have
φαλ (x) = xα + λv α (x) + O(λ2 ). (10.66)
It can be shown that φλ is completely determined by the vector field v α ,
which is called the infinitesimal generator of the 1-parameter family of dif-
feomorphisms. A fundamental result of tensor analysis is that
φ∗λ gαβ = gαβ + λ(∇α vβ + ∇β vα ) + O(λ2 ), (10.67)
where the infinitesimal variation in the metric induced by the family of dif-
feomorphisms is given by the Lie derivative Lv :
δgαβ = Lv gαβ = ∇α vβ + ∇β vα . (10.68)
With that little mathematical result in hand, consider the variation of
the Einstein action, Z
1
Sgrav = (R − 2Λ) (10.69)
M 2κ
induced by an infinitesimal diffeomorphism. We already know that, for any
variation δgαβ (with compact support away from the boundary of M ),
Z
1
Gαβ + Λg αβ δgαβ

δSgrav = − (10.70)
2κ M
8
Our discussion will also apply to the case where we have a manifold with boundary
provided we restrict attention to diffeomorphisms
which preserve the boundary
9
This result holds for any tensor of type 02 , and it generalizes to tensor fields of any
type via the Lie derivative.
10.6. DIFFEOMORPHISM SYMMETRY AND THE CONTRACTED BIANCHI IDENTITY239

For a variation (10.68) due to an infinitesimal diffeomorphism we have

Z
1
Gαβ + Λg αβ (∇α vβ + ∇β vα )

δSgrav = −
2κ M
Z
1
Gαβ + Λg αβ ∇α vβ

=−
κ M
Z
1
∇α Gαβ + Λg αβ vβ − ∇α k α

=
κ M
Z Z
1 αβ αβ
1
= ∇α G + Λg vβ − k · , (10.71)
κ M κ ∂M
where
(k · )βγδ = k α αβγδ , (10.72)
and
k α = Gαβ + Λg αβ vβ

(10.73)
The last equality in (10.71) follows from the divergence theorem, which can
be expressed as: Z Z
α
(∇α k ) = k · . (10.74)
M ∂M
If the manifold M has no boundary, then the boundary term is absent. The
boundary could be “at infinity”, which can be the limit as r → ∞ of a sphere
of radius r. We assume the diffeomorphism is the identity transformation on
the boundary, which means the vector field v α vanishes there (at a fast enough
rate, in the sphere at infinity case). Either way, the boundary term is absent
and we have the important identity:
Z Z
1 αβ αβ 1
∇α Gαβ vβ ,

δSgrav = ∇α G + Λg vβ = (10.75)
κ M κ M
were we used ∇α gβγ = 0.
Now we finish the argument. We know a priori the action is invariant
under diffeomorphisms. This means that for infinitesimal diffeomorphisms
(which are trivial on the boundary of M )
δSgrav = 0. (10.76)
On the other hand, we have the identity (10.75), which must vanish for any
choice of the vector field v α thanks to (10.76). Evidently, using the usual
calculus of variations reasoning we must have the identity
∇α Gαβ = 0. (10.77)
240 CHAPTER 10. GRAVITATIONAL FIELD THEORY

This is the contracted Bianchi identity; it can be derived from the Bianchi
identity (10.15).

Problem:

9. Derive (10.77) from (10.15).

While (10.77) can be understood as a consequence of (10.15), we now

know its origins can also be traced to the diffeomorphism invariance of the
Einstein action. Indeed, from our preceding discussion it is easy to see that
any natural Lagrangian will give rise to an Euler-Lagrange expression E αβ =
E βα which satisfies the identity

∇α E αβ = 0. (10.78)

Finally, you might want to re-read §5.5. Then you will see that the discussion
above is just an instance of Noether’s second theorem.

10.7 Coupling to matter - scalar fields

In Einstein’s theory of gravity “matter” is any material phenomena which
admits an energy-momentum tensor. In the limit where matter’s influence
on gravitation can be ignored, and when modeling matter as made of “test
particles”, the geodesic hypothesis describes the motion of matter in a given
gravitational field. But to understand where the gravitational field comes
from in the first place one needs to go beyond the test particle approxima-
tion and consider the interacting system of gravity coupled to matter. In
this setting, matter curves spacetime and spacetime affects the motion of the
matter.10 Here we will briefly outline how to use classical field theory tech-
niques to obtain equations which model the curving of spacetime by matter
and the propagation of matter in the curved spacetime.
The two principal phenomenological models of classical (non-quantum)
matter which are used in gravitational physics to model macroscopic phe-
nomena in the “real world” are fluids and electromagnetic fields. A scalar
field also provides a sort of fluid source for gravitation – one with a “stiff
10
It is worth mentioning that a parallel set of comments can be made about electric
charges and currents in electromagnetic theory. Think about it!
10.7. COUPLING TO MATTER - SCALAR FIELDS 241

equation of state” – and also provides a nice simple warm-up for incorpora-
tion of electromagnetic sources. Of course, we are very familiar with scalar
fields by now and, since this is a course in field theory, let us begin with this
matter field.11
We will use a Klein-Gordon field ϕ. We begin by recalling the definition
of its Lagrangian as a top-degree differential form on any curved spacetime
(M, g):
1
LKG = − g αβ ∇α ϕ∇β ϕ + m2 ϕ2 .

(10.79)
2
The Euler-Lagrange equation for ϕ is a curved spacetime generalization of
the original Klein-Gordon equation:

∇α ∇α ϕ − m2 ϕ = 0 (10.80)

You can think of the appearance of the metric in this equation (in the co-
variant derivative term) as bringing into play the effect of the gravitational
field on the “motion” of the scalar field.

Problem:
10. The following line element represents a class of “big bang” cosmological
models, characterized by a scale factor a(t).

ds2 = −dt2 + a(t)2 (dx2 + dy 2 + dz 2 ). (10.81)

Compute the Klein-Gordon equation (10.80). Can you find a solution?

The Lagrangian (10.79) describes the effect of gravity on the field ϕ.

To allow the field to serve as a source of gravity we add the gravitational
Lagrangian (10.50) to get

1 1 αβ 2 2

L= (R − 2Λ) − g ∇α ϕ∇β ϕ + m ϕ . (10.82)
2κ 2
11
The only fundamental scalar field known to actually exist is the Higgs field, and it
should be treated quantum mechanically. Bound states of quarks called “mesons” exist
which have spin zero and are represented via scalar fields, but this is principally in the
quantum domain. So you should think of the following discussion in the context of a
classical limit of a quantum theory or as a phenomenological study of matter with a
simple relativistic fluid model built from a scalar field, or as a humble version of the
electromagnetic field.
242 CHAPTER 10. GRAVITATIONAL FIELD THEORY

There are now EL equations for the metric and for the scalar field. The EL
equation for the scalar field is (10.80), given above. The EL equations for
the metric take the form
1 1
(−Gαγ + Λg αγ ) + T αγ = 0, (10.83)
2κ 2
or
Gαβ + Λg αβ = κT αβ , (10.84)
where
1 1
T αγ = ∇α ϕ∇γ ϕ − g αγ ∇β ϕ∇β ϕ − g αγ m2 ϕ2 (10.85)
2 2
is the energy-momentum tensor of the scalar field. The equations (10.84) are
the celebrated Einstein field equations.

Problem:

11. With
Z
1 αβ
g ∇α ϕ∇β ϕ + m2 ϕ2

SKG [g, ϕ] = − (10.86)
M 2

compute the energy-momentum tensor of the scalar field via

2 δSKG
T αβ = p (10.87)
|g| δgαβ

and verify (10.85).

Note that here the energy-momentum tensor is not defined via Noether’s
theorem but instead via the coupling of the scalar field to gravity. States of
the combined system of matter (modeled as a scalar field) interacting with
gravity satisfy the coupled system of 11 non-linear PDEs given by (10.80)
and (10.84), known as the Einstein-Klein-Gordon (or Einstein-scalar) system
of equations.

Problem:
10.8. THE CONTRACTED BIANCHI IDENTITY REVISITED 243

12. Show that the following metric and scalar field, given in the coordinate
chart (t, r, θ, φ) define a solution to the Einstein-scalar field equations
for m = 0.
2
g = −r2 dt⊗dt+ dr ⊗dr +r2 (dθ ⊗dθ +sin2 θ dφ⊗dφ), (10.88)
1 − 32 Λr2
1
ϕ = ± √ t + const. (10.89)
κ

10.8 The contracted Bianchi identity revis-

ited
We saw earlier that the Einstein tensor satisfies the contracted Bianchi iden-
tity, ∇α Gαβ = 0. Evidently, if one manages to find a metric and scalar field
satisfying the Einstein field equations (10.84) then the energy-momentum
tensor of the scalar field must be divergence free,
∇α T αβ = 0, for solutions to the field equations. (10.90)
You can think of this as a necessary condition for a metric and scalar field to
satisfy the Einstein-scalar field system. Let me show you how this condition
on the energy-momentum tensor is guaranteed by the KG equation via the
diffeomorphism invariance of the action for the metric and scalar field.12
The analysis is very much like the one we did for the vacuum theory.
Return to the Lagrangian (10.82) and form the action integral over some
region M :
Z
1 1 αβ 2 2

S[g, ϕ] = (R − 2Λ) − g ∇α ϕ∇β ϕ + m ϕ . (10.91)
M 2κ 2
As before, consider a diffeomorphism Ψ : M → M supported away from the
boundary of M . The change in the metric and scalar field induced by a
diffeomorphism is via the pullback operation, discussed earlier. Because the
Lagrangian is natural, we have that diffeomorphisms are a symmetry group
of the action:
S[Ψ∗ g, Ψ∗ ϕ] = S[g, ϕ]. (10.92)
12
You should compare this discussion with its analog in scalar electrodynamics in §6.3.
244 CHAPTER 10. GRAVITATIONAL FIELD THEORY

Given a 1-parameter family of diffeomorphisms, we can define an infinitesimal

diffeomorphism via a vector field v α on M . The infinitesimal changes in the
metric and scalar field are given by the Lie derivatives:

δgαβ = Lv gαβ = ∇α vβ + ∇β vα , δϕ = Lv ϕ = v α ∇α ϕ. (10.93)

For any variations of compact support the change in the action is

Z
1 αγ αγ 1 αγ α 2

δS = {−G + Λg } + T δgαγ + ∇ ∇α ϕ − m ϕ δϕ .
M 2κ 2
(10.94)
For variations corresponding to infinitesimal diffeomorphisms the action is
unchanged by the argument we used earlier, so we have the identity
Z n
1 αγ αγ 1 αγ
0= {−G + Λg } + T (∇α vγ + ∇γ vα ) (10.95)
M 2κ 2
α o
2
α
+ ∇ ∇α ϕ − m ϕ v ∇α ϕ (10.96)

This identity must hold for all v α of compact support. Integrating by parts
via the divergence theorem, with the boundary terms vanishing because v α
is of compact support, we can remove the derivatives of the vector field v α
to get
Z
1 αγ αγ αγ
α 2
α
−∇α {−G + Λg } + T vγ + ∇ ∇α ϕ − m ϕ v ∇α ϕ = 0,
M κ
(10.97)
which can be simplified (using the contracted Bianchi identity and covariant
constancy of the metric) to
Z
∇α Tβα − ∇α ∇α ϕ − m2 ϕ ∇β ϕ v β = 0.

(10.98)
M

Since v α is arbitrary on the interior of M , it follows that

∇α Tβα − ∇α ∇α ϕ − m2 ϕ ∇β ϕ = 0, .

(10.99)

Problem:
13. Show by direct calculation of the divergence of (10.85) that it satisfies
the identity (10.99).
10.8. THE CONTRACTED BIANCHI IDENTITY REVISITED 245

From this identity it is easy to see that any solution (g, ϕ) to the KG
equation (10.80) will obey the compatibility condition (10.90). Recall that
(10.90) was a necessary condition implied by the Einstein field equations
(10.84) and the contracted Bianchi identity. In this way the coupled Einstein-
scalar field equations are compatible. But it is even more interesting to note
that any solution (g, ϕ) to the Einstein field equations (10.84) alone, provided
∇ϕ 6= 0, will automatically satisfy the KG equation (10.80)! The logic goes
as follows. If (g, ϕ) define a solution to (10.84) then by the Bianchi identity
the energy-momentum tensor built from the solution (g, ϕ) has vanishing
covariant divergence. But then the identity (10.99) and the assumption that
ϕ is not constant means that (g, ϕ) must satisfy (10.80). So, if we opt to
work on the space of non-constant functions ϕ only, which is reasonable
since constant functions cannot satisfy the KG equation with m 6= 0, then
the equations of motion for matter are already contained in the Einstein field
equations!
The divergence condition ∇a T ab = 0 is closely related to – but not quite
the same as – conservation of energy and momentum. You will recall that
in flat spacetime conservation of energy-momentum corresponds to the fact
that the energy-momentum tensor is divergence free, which is interpreted as
providing a divergence-free collection of currents. Here we do not have an
ordinary divergence of a vector field, but rather a covariant divergence of a
tensor field:
∇α T αβ = ∂α T αβ + Γαασ T σβ + Γβασ T ασ . (10.100)
For this reason one cannot use the divergence theorem to convert (10.90) into
a true conservation law.13 Physically, this state of affairs reflects the fact that
the matter field ϕ can exchange energy and momentum with the gravitational
field; since the energy momentum tensor pertains only to the matter field
it need not be conserved by itself. In a freely falling reference frame the
effects of gravity disappear to some extent and one might expect at least
an approximate conservation of energy-momentum of matter. This idea can
be made mathematically precise by working in geodesic normal coordinates
(defined earlier). At the origin of such coordinates the Christoffel symbols
vanish and the coordinate basis vectors can be used to define divergence
13
There are actually two reasons for this. First of all, as just mentioned, the ordinary
divergence is not zero! And, even if it were, there is no useful definition of the integral of a
vector (namely, the divergence of T ) over a volume in the absence of additional structures,
such as preferred vector fields with which to take components.
246 CHAPTER 10. GRAVITATIONAL FIELD THEORY

free currents at the origin. So, in an infinitesimally small region around the
origin of such coordinate system – physically, in a suitably small freely falling
reference frame – one can define an approximate set of conservation laws for
the energy-momentum (and angular momentum) of matter. But because
gravity can carry energy and momentum this conservation law cannot be
extended to any finite region. The crux of the matter is that there is no
useful way to define gravitational energy-momentum densities – there is no
suitable energy-momentum current for gravity. Indeed, if you had such a
current, you ought to be able to make it vanish at the origin of normal
coordinates, whence it can’t be a purely geometrical quantity or it should
vanish in any coordinates. In fact, it is possible to prove that any vector field
locally constructed from the metric and its derivatives to any order which
is divergence-free when the Einstein equations hold is a trivial conservation
law in the sense of §3.20.14
The upshot of these considerations is that gravitational energy, momen-
tum, and angular momentum will not occur via the usual mechanism of
conserved currents. Thanks to the equivalence principle, gravitational en-
ergy, momentum, and angular momentum must be non-local in character.
You may recall that an analogous situation arose for the sources of the Yang-
Mills field. Despite our inability to localize gravitational energy and momen-
tum, it is possible to define a notion of conserved total energy, momentum,
and angular momentum for gravitational systems which are suitably isolated
from their surroundings. So, for example, it is possible to compute these
quantities for a star or galaxy if we ignore the rest of the universe. Based on
our preceding discussion you will not be surprised to hear that such quanti-
ties are not constructed by volume integrals of densities. Perhaps a future
version of this text will explain how all that works.

10.9 PROBLEMS
1. Show that (10.7) enforces ∇α gβγ = 0.
µ dq ν
2. Show that gµν dqds ds
is a constant of motion for (10.10).

3. Perhaps the simplest example of a “generally covariant” differential

equation is the general form of the geodesic equation in 2-d Euclidean
14
I. Anderson and C. Torre, Phys. Rev. Lett. 70, 3525 (1993).
10.9. PROBLEMS 247

space. It arises as the Euler-Lagrange equations for a curve (x(λ), y(λ))

of the Lagrangian: p
L = ẋ2 + ẏ 2 . (10.101)

Show that the form of the EL equations do not depend upon the choice
of parameter λ. (Hint: consider a new parameter τ = f (λ); show that
the equations in terms of τ are of the same form as those in terms of
λ, irrespective of the choice of f .)

4. Show that the volume form (10.38) is a natural tensor.

5. Prove (10.45) and (10.46).

6. Derive equations (10.51)–(10.54).

7. Use your favorite tensor analysis software (e.g., DifferentialGeometry

in Maple) to verify that (10.63) is, for any m, a solution to the vacuum
Einstein equations (10.60) with cosmological constant Λ.

8. Derive (10.77) from (10.15).

9. The following line element represents a class of “big bang” cosmological

models, characterized by a scale factor a(t).

ds2 = −dt2 + a(t)2 (dx2 + dy 2 + dz 2 ). (10.102)

Compute the Klein-Gordon equation (10.80). Can you find a solution?

10. With
Z
1 αβ
g ∇α ϕ∇β ϕ + m2 ϕ2

SKG [g, ϕ] = − (10.103)
M 2

compute the energy-momentum tensor of the scalar field via

2 δSKG
T αβ = p (10.104)
|g| δgαβ

and verify (10.85).

248 CHAPTER 10. GRAVITATIONAL FIELD THEORY

11. Show that the following metric and scalar field, given in the coordinate
chart (t, r, θ, φ) define a solution to the Einstein-scalar field equations
for m = 0.
2
g = −r2 dt⊗dt+ dr⊗dr+r2 (dθ⊗dθ+sin2 θdφ⊗dφ), (10.105)
1 − 23 Λr2

1
ϕ = ± √ t + const. (10.106)
κ
12. Repeat the analysis of §10.7 and §10.8 for the case where the matter
field is the electromagnetic field defined by
1
LEM = − g αγ g βδ Fαβ Fγδ , (10.107)
4
where
Fαβ = ∇α Aβ − ∇β Aα = ∂α Aβ − ∂β Aα . (10.108)

13. Show by direct calculation of the divergence of (10.85) that it satisfies

the identity (10.99).

∇α T αβ = ∇β ϕ ∇α ∇α ϕ − m2 ϕ .

(10.109)
Chapter 11

Goodbye

If you have stayed with me to the end of this exhilarating mess, I congratulate
you. If you feel like there is a lot more you would like to learn about classical
field theory, then I agree with you. I have only given a simple introduction
to some of the possible topics. Slowly but surely I hope to add more to this
text, correct errors, add problems, etc. so check back every so often to see
if the version has been updated. Meanwhile, the next section has a list of
resources which you might want to look at, depending upon your interests.

11.1 Suggestions for Further Reading

General Relativity, R. M. Wald, University of Chicago Press

Spacetime and Geometry, S. Carroll, Addison-Wesley

Tensors, Differential Forms, and Variational Principles, D. Lovelock and H.

Rund, Dover

Lecture notes on classical fields, J. Binney,

http://www-thphys.physics.ox.ac.uk/user/JamesBinney/classf.pdf

Classical Field Theory, D. Soper, Dover

Applications of Lie Groups to Differential Equations, D. Olver, Springer

Classical Theory of Fields, L. Landau and E. Lifshitz, Pergamon Press

249
250 CHAPTER 11. GOODBYE

Dynamical Theory of Groups and Fields, B. DeWitt, Routledge

Classical Field Theory and the Stress-Energy Tensor, S. Swanson, Morgan &
Claypool Publishers

Quantum Field Theory, M. Srednicki,

web.physics.ucsb.edu/∼mark/ms-qft-DRAFT.pdf

QRH-C-212-300 Revision 4 Jul 2019 PDF
100% (3)
QRH-C-212-300 Revision 4 Jul 2019 PDF
91 pages
Quantum Physics for Beginners
From Everand
Quantum Physics for Beginners
Max Thomson
4.5/5 (3)
Introduction To Classical Field Theory
100% (1)
Introduction To Classical Field Theory
235 pages
Norbury. Quantum Field Theory (Wisconsin Lecture Notes, 2000) (107s)
No ratings yet
Norbury. Quantum Field Theory (Wisconsin Lecture Notes, 2000) (107s)
107 pages
Quantum Field Theory
No ratings yet
Quantum Field Theory
104 pages
Notes
No ratings yet
Notes
171 pages
LecturesSM_2020v4
No ratings yet
LecturesSM_2020v4
124 pages
Theoretical Physics 1: Brwebberandchwbarnes Michaelmas Term 2008
No ratings yet
Theoretical Physics 1: Brwebberandchwbarnes Michaelmas Term 2008
73 pages
Classical Fields: Notes For A Course On
No ratings yet
Classical Fields: Notes For A Course On
201 pages
PII - Theoretical Physics 1 - Analytical Mechanics and Field Theory - Webber, Stirling (2009) 73pg
No ratings yet
PII - Theoretical Physics 1 - Analytical Mechanics and Field Theory - Webber, Stirling (2009) 73pg
73 pages
Tca Lisboa
No ratings yet
Tca Lisboa
396 pages
QFT 1
No ratings yet
QFT 1
201 pages
Weinberg Notes
No ratings yet
Weinberg Notes
116 pages
Classi Fields
No ratings yet
Classi Fields
222 pages
Classical Fields: Notes For A Course On
No ratings yet
Classical Fields: Notes For A Course On
203 pages
Classical Fields: Notes For A Course On
No ratings yet
Classical Fields: Notes For A Course On
203 pages
Field Theory by Siegel
No ratings yet
Field Theory by Siegel
885 pages
Advanced
No ratings yet
Advanced
401 pages
principles-of-qft-MKv0
No ratings yet
principles-of-qft-MKv0
228 pages
Advanced
No ratings yet
Advanced
381 pages
Advanced Quantum Mechanics
No ratings yet
Advanced Quantum Mechanics
402 pages
AQT Lectures
No ratings yet
AQT Lectures
62 pages
Sample QFT4
No ratings yet
Sample QFT4
27 pages
QFT 2024 Lectures 1-30 - v2
No ratings yet
QFT 2024 Lectures 1-30 - v2
186 pages
QFT 07
No ratings yet
QFT 07
260 pages
Introduction To Relativistic Quantum Fields
No ratings yet
Introduction To Relativistic Quantum Fields
260 pages
IFT LectureNotes
No ratings yet
IFT LectureNotes
125 pages
Classical Theoretical Physics - Alexander Altland
100% (1)
Classical Theoretical Physics - Alexander Altland
121 pages
Classical Theoretical PhysicsII-Goooooooooood
No ratings yet
Classical Theoretical PhysicsII-Goooooooooood
121 pages
Advanced Quantum Mechanics PDF
No ratings yet
Advanced Quantum Mechanics PDF
379 pages
Advanced Quantum Mehanics-Peter S. Riseborough
No ratings yet
Advanced Quantum Mehanics-Peter S. Riseborough
379 pages
Advanced Quantum Mechanics
No ratings yet
Advanced Quantum Mechanics
379 pages
External-Revised 65575757
No ratings yet
External-Revised 65575757
218 pages
QFT Scattering Notes
No ratings yet
QFT Scattering Notes
125 pages
Standard Model
No ratings yet
Standard Model
111 pages
Fields W Siegel 9912205
No ratings yet
Fields W Siegel 9912205
796 pages
Advanced Quantum Mechanics
100% (1)
Advanced Quantum Mechanics
371 pages
Quantum Theory of Many Particle Systems
No ratings yet
Quantum Theory of Many Particle Systems
125 pages
Fields PDF
No ratings yet
Fields PDF
885 pages
Advanced Quantum Field Theory PDF
No ratings yet
Advanced Quantum Field Theory PDF
393 pages
Peskin Problems
No ratings yet
Peskin Problems
209 pages
Electrodynamics
100% (2)
Electrodynamics
347 pages
Notes of College Physics
No ratings yet
Notes of College Physics
453 pages
QFT
No ratings yet
QFT
72 pages
A Modern Course in Quantum Field Theory: December 2018
No ratings yet
A Modern Course in Quantum Field Theory: December 2018
27 pages
Mortals or Immortals
From Everand
Mortals or Immortals
Konstantinos p Anastasiadis
No ratings yet
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
From Everand
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
Massimiliano Bocciarelli
No ratings yet
Mathematics N4: FET College Nated, #6
From Everand
Mathematics N4: FET College Nated, #6
Efetobo Emede
No ratings yet
Hamlet Had an Uncle: A Comedy of Honor
From Everand
Hamlet Had an Uncle: A Comedy of Honor
James Branch Cabell
4.5/5 (7)
Electricity, Magnetism, Gravity & The Big Bang
From Everand
Electricity, Magnetism, Gravity & The Big Bang
Charles R. Storey
No ratings yet
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet
Deadline Yemen (The Elizabeth Darcy Series)
From Everand
Deadline Yemen (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)
Primer of Quantum Mechanics
From Everand
Primer of Quantum Mechanics
Marvin Chester
4.5/5 (5)
GRAND UNIFIED THEORY MADE EASY
From Everand
GRAND UNIFIED THEORY MADE EASY
Charles R. Storey
No ratings yet
Old Breed General: How Marine Corps General William H. Rupertus Broke the Back of the Japanese in World War II from Guadalcanal to Peleliu
From Everand
Old Breed General: How Marine Corps General William H. Rupertus Broke the Back of the Japanese in World War II from Guadalcanal to Peleliu
Amy Rupertus Peacock
No ratings yet
Deadline Istanbul (The Elizabeth Darcy Series)
From Everand
Deadline Istanbul (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)
Keys to Better Reading
From Everand
Keys to Better Reading
Judy McFall
No ratings yet
A Discourse Analysis of 1 Peter
From Everand
A Discourse Analysis of 1 Peter
Ervin Ray Starwalt
No ratings yet
Ground State Structural Searches for Boron Atomic Clusters Using Density Functional Theory
From Everand
Ground State Structural Searches for Boron Atomic Clusters Using Density Functional Theory
John Kabaa
No ratings yet
Selecting and Using RS-232, RS-422, and RS-485 Serial Data Standards
100% (1)
Selecting and Using RS-232, RS-422, and RS-485 Serial Data Standards
9 pages
CH 1 Numerical
No ratings yet
CH 1 Numerical
10 pages
Lecture 2 - Clinical and Lab Steps - Preliminary Impression-Preliminary Cast (3.03)
No ratings yet
Lecture 2 - Clinical and Lab Steps - Preliminary Impression-Preliminary Cast (3.03)
42 pages
Readings in The Philippine History v2019
No ratings yet
Readings in The Philippine History v2019
5 pages
Angkor As Clue To Khmer Empire
No ratings yet
Angkor As Clue To Khmer Empire
14 pages
Writings: Ashokh (Bard, Storyteller), and Clearly Assimilated Various Oral
No ratings yet
Writings: Ashokh (Bard, Storyteller), and Clearly Assimilated Various Oral
31 pages
ДЗИ Английски 2022 г.
No ratings yet
ДЗИ Английски 2022 г.
19 pages
DR - Vandana. Dept of Microbiology KMC, Manipal.: Asst Professor
No ratings yet
DR - Vandana. Dept of Microbiology KMC, Manipal.: Asst Professor
72 pages
Work Immersion Action Plan
67% (3)
Work Immersion Action Plan
4 pages
MECH1230 Dynamics Unit 2 - Rigid Body Kinematics
No ratings yet
MECH1230 Dynamics Unit 2 - Rigid Body Kinematics
48 pages
Lesson Plan Copy
No ratings yet
Lesson Plan Copy
11 pages
Fast23 Liu
No ratings yet
Fast23 Liu
15 pages
The Staging of Memory Ars Memorativa and
No ratings yet
The Staging of Memory Ars Memorativa and
15 pages
MSDS Biomate MBC2881
No ratings yet
MSDS Biomate MBC2881
9 pages
Naskah Publikasi
No ratings yet
Naskah Publikasi
19 pages
Tarifa SD Mach
No ratings yet
Tarifa SD Mach
28 pages
Diversity of Spiders in Agricultural Fields From Partwada Tahsil, District Amaravati
No ratings yet
Diversity of Spiders in Agricultural Fields From Partwada Tahsil, District Amaravati
3 pages
As 400 User Guide
100% (1)
As 400 User Guide
96 pages
Sukuk Market Liquidity Determinants: A Case Study On Sovereign Sukuk in Indonesia
No ratings yet
Sukuk Market Liquidity Determinants: A Case Study On Sovereign Sukuk in Indonesia
25 pages
CompReg 8NOVEMBER2019
No ratings yet
CompReg 8NOVEMBER2019
1,671 pages
Chapter01 ThebasicsofMicrobiology
No ratings yet
Chapter01 ThebasicsofMicrobiology
14 pages
The Power of Communication
No ratings yet
The Power of Communication
6 pages
Mushoku Tensei 24 - Conclusion Chapter PDF
100% (1)
Mushoku Tensei 24 - Conclusion Chapter PDF
76 pages
Quick Installation Guide: Wired/Wireless IP Camera
No ratings yet
Quick Installation Guide: Wired/Wireless IP Camera
14 pages
Hospital Floor Plan Examples
No ratings yet
Hospital Floor Plan Examples
24 pages
Become A Rockstar
No ratings yet
Become A Rockstar
44 pages
DoD-Directive-5000.01-The-Defense-Acquisition-System-9-Sept-2020
No ratings yet
DoD-Directive-5000.01-The-Defense-Acquisition-System-9-Sept-2020
16 pages
Tarkhiguzdao 00 Hamduoft
No ratings yet
Tarkhiguzdao 00 Hamduoft
558 pages
SC 5
No ratings yet
SC 5
33 pages