Notes2 PDF
Notes2 PDF
TERENCE TAO
1. Complex interpolation
Proof We may assume that (p0 , q0 ) 6= (p1 , q1 ) otherwise the claim is trivial. We
may then normalise A0 = A1 = 1 as in the previous weeks notes. By duality and
homogeneity it suffices to show that
Z
| T f g d| 1 whenever kf kLp (X) = kg kLq (Y ) = 1
X
for all simple functions f , g of finite measure support and all 0 1. Note
that the claim is already true by hypothesis when = 0, 1.
The idea is to use the three lines lemma from last weeks notes. The problem is
that the inequality is not complex analytic in as stated. However we can fix this
as follows. Observe that if f is a simple function with Lp (X) norm 1, then we
can factorise
f = F01 F1 a
where F0 , F1 are non-negative simple functions with Lp0 (X) and Lp1 (X) norms
respectively equal to 1, and a is a simple function of magnitude at most 1. Indeed
we can set a = sgn(f ) and Fi = |f |p /pi . (Some minor changes need to be made
1
2 TERENCE TAO
for the limiting case when one or both of the pi are equal to infinity; we leave this
to the reader.) Similarly we can factorise
g = G1
0 G1 b
where G0 , G1 are non-negative simple functions with Lq0 (Y ) and Lq1 (Y ) norms
respectively equal to 1, and b is a simple function of magnitude at most 1.
Because T is linear and all functions are simple, it is not difficult to see that H is
an entire function of z of at most exponential growth. It is bounded by 1 on both
sides of the strip 0 Re(z) 1, hence bounded also within the strip. Setting z =
we obtain the result.
An important observation of Elias Stein is that the above proof can be generalised
to the case where the operator T itself varies analytically:
Proof We repeat the previous argument; the only observation to make is that the
function Z
H(z) := Tz (F01z F1z a)G1z
0 Gz1 b d
X
continues to be complex-analytic; this is easiest seen by decomposing all the simple
functions into indicator functions.
The real and complex interpolation methods provide a surprisingly powerful way
to prove estimates involving Lp norms: in order to prove a range of such estimates,
it suffices to do so for the extreme cases (possibly weakening strong type to weak
type or restricted weak type in the process). We shall shortly give several disparate
examples of this interpolation approach.
2. Duality
Both real and complex interpolation give a means to deduce new bounds on an
operator T from old ones. Another important technique is that of duality, which
basically replaces the input function f with an output function T g, and the output
function T f with an input function g. Generally speaking, the situation is as
follows. One has a domain X, and some class of test functions DX on X (such
as simple functions, or perhaps the Schwartz class), and one also has a domain Y
with a class DY . We shall require DX and DY to be vector spaces, though we
will not require any topological structure on these spaces (in particular, they do
not need to be complete with respect to any particular topology). We can then
define the dual space DY of linear functionals g 7 hg, hiY on DY ; we do not
require these functionals to be continuous in any topology, and so DY is in fact a
very large space. In particular, it typically contains very general objects such as
locally integrable functions, measures, or distributions, where we (formally at least)
identify a function (or distribution, etc.) h(y) with the linear functional
Z
g 7 hg, hiY := g(y)h(y) dY (y).
Y
We also adopt the convention hh, giY := hg, hiY . One can of course have similar
conventions for DX and DX .
The point of introducing the adjoint T is that many bounds on the original oper-
ator T are equivalent to those for T . We illustrate this point with the Lp spaces,
although from the proof one sees that the same would hold true for any spaces
whose norm can be characterised by a duality relationship.
Theorem 2.1 (Duality). Let 1 p, q and A > 0. Let DX and DY be linear
spaces of functions on X and Y which are dense in Lp (X) and Lq (Y ) respectively.
Let T : DX DY be a linear transformation, and let T : DY DX
be its adjoint.
Suppose we make the (qualitative) assumptions that T maps DX to Lq (Y ) and T
maps DY to Lp (X). Then the following (quantitative) assertions are equivalent.
Proof The equivalence of (i) with (iii) follows from the dual characterisation of the
Lq (Y ) norm from last weeks notes (and the density of DX in Lq (Y )). Similarly
for the equivalence of (ii) with (iv). The equivalence of (iii) with (iv) follows from
the definition of T .
Remark 2.2. This theorem differs very slightly from the usual duality theorem,
which asserts that if T is continuous from a normed vector space V to another W ,
then its adjoint T is well defined from W to V and has the same operator norm.
That duality theorem is fine for most applications, but runs into a slight difficulty
when V is an L type space, as the dual can then get excessively large. (Having
the domain space W too large is of course not a problem.) In particular when
dealing with non-reflexive spaces such as L1 or L , one loses the equivalence of the
two statements. By passing to dense classes and making the a priori qualitative
hypotheses that T maps DX to Lq (Y ) and T maps DY to Lp (X) (which are often
easily verified in practice, since we can usually1 take the dense classes DX and DY
to be very nice functions such as the Schwartz class).
3. Conditional expectation
Let us now start using our interpolation theorems. The first application is to the
useful operation of conditional expectation. This operation was first developed in
probability theory but has since proven to be useful in ergodic theory and har-
monic analysis; furthermore, several further operations in harmonic analysis (such
1One caveat comes when L spaces are involved, because classes which decay at infinity, such
as the Schwartz class, are usually not dense in L .
6 TERENCE TAO
Let (X, Bmax , ) be a measure space (which need not necessarily be a probability
space, though this is certainly an important special case), and let B be2 a -finite
sub--algebra of Bmax . Then L2 (B) = L2 (X, B, ) is a closed subspace of the
Hilbert space L2 (Bmax ) = L2 (X, Bmax , ), and thus has an orthogonal projection,
which we shall denote by the map f 7 E(f |B). Thus E(f |B) is B-measurable, and
is equal to f if and only if f is also B-measurable.
Problem 3.1. Let X = R with Lebesgue measure d = dx, and let Bmax be the
Borel or Lebesgue -algebra. Let B be the sub--algebra generated by the intervals
[n, n + 1) for n Z. Show that for f L2 (Bmax ), E(f |B) is given by the formula
Z x+1
E(f |B)(x) := f (y) dy
x
for all x R, where x is the greatest integer less than x. This model example is
worth keeping in mind for the rest of this discussion.
By linearity, we thus see that E(f |B) = E(f |B) and E(Ref |B) = ReE(f |B). Also,
if f , g are real-valued and f g pointwise, then E(f |B) E(g|B) pointwise. For
real-valued functions this already gives the triangle inequality |E(f |B)| E(|f ||B).
For complex-valued functions, a naive splitting into real and imaginary parts will
lose a factor of two: |E(f |B)| 2E(|f ||B). This factor can eventually be recovered
by a tensor power trick (try it!) but this is overkill. Instead, we exploit phase
2A small subtlety here: the hypothesis that B is -finite does not automatically follow from
the hypothesis that B is a sub--algebra of Bmax . Of course this is not an issue when X has finite
measure, which is for instance the case in probability theory.
LECTURE NOTES 2 7
invariance to deduce the complex triangle inequality from the real one. Observe
that the complex triangle inequality |E(f |B)| E(|f ||B) is invariant if we multiply
f by a B-measurable phase h (thus |h| = 1). Using this (and the B-measurability
of E(f |B), and the module property) we can reduce to the case where E(f |B) is
real and non-negative. But in that case
|E(f |B)| = ReE(f |B) E(Ref |B) E(|f ||B)
as desired.
For 1 p < we know that L2 (Bmax ) Lp (Bmax ) (which contains all simple func-
tions of finite measure support) is dense in Lp , and thus there is a unique continuous
extension of conditional expectation to Lp (Bmax ), which is also a contraction. The
same argument works when p = and X has finite measure. When X has infi-
nite measure, the Hahn-Banach theorem gives an infinity of possible extensions to
3Real interpolation would also work, but then one needs the tensor product trick to eliminate
the constant. This in turn requires some knowledge of product measures, and how conditional
expectation reacts to tensor products.
8 TERENCE TAO
L (Bmax ) which are contractions, however there is a unique such extension which
still retains the module property (3); we leave the verification of this to the reader.
Remark 3.3. When p = 2, E(f |B) is the closest B-measurable function to f in Lp
norm; however we caution that this claim is not true for other values of p.
Observe that the space of sets in Bmax which can be approximated inWmeasure to
S
arbitrary accuracy by a Wset in n=1 Bn is a -algebra and thus contains n=1 Bn . In
particular, everyWset in n=1 Bn can be so approximated. This implies that simple
functions in Lp ( n=1 Bn ) can be approximated in Lp norm by simple functions in
p
L (BSn ) for some n; since simple functions
W are themselves dense in Lp , this implies
p p
that n=1 L (Bn ) is dense in L ( n=1 Bn ).
S
Now for f n=1 Lp (Bn ) it is clear that E(f |Bn ) converges to f in Lp norm as
n . We also know that the conditional expectation operators f 7 E(f |Bn ) are
contractions on Lp , and in particular are uniformly bounded on Lp . The claim now
follows from the previously mentioned density and a standard limiting argument.
Remarks 3.6. The claim is false at p = ; consider for instance the indicator
function of [0, 1/3) in Example 3.4. We shall also address the issue of pointwise
almost everywhere convergence in next weeks notes. The trick of using uniform
bounds on operators, together with norm convergence on a dense class of functions,
to deduce norm convergence on all functions in the space, is a common4 one and
will be seen several times in these notes.
4Indeed, the uniform boundedness principle implies, in some sense, that this is the only way
to establish a norm convergence result.
LECTURE NOTES 2 9
4. Norm interchange
Consider a function f (x, y) of two variables. More precisely, consider two measure
spaces X = (X, BX , X ) and Y = (Y, BY , Y ), and consider a function f on the
product space X Y = (X Y, BX BY , X Y ). We can consider the mixed
norms Lpx Lqy (X Y ) of f for 1 p, q by
kf kLpxLqy (XY ) := kkf (x, )kLqy (Y ) kLpx (X)
thus for instance when p, q are finite we have
Z Z
kf kLxLy (XY ) = ( ( |f (x, y)|q dY (y))p/q dX (x))1/p .
p q
X Y
We can similarly define the interchanged norm Lqy Lpx (X Y ). By iterating the
triangle inequality we can see that these are indeed norms. The relationship between
the norms is as follows.
Proposition 4.1 (Interchange of norms). Let the notation and assumptions be as
above.
Proof Claim (i) follows from the Fubini-Tonelli theorem (with the case p = q =
treated separately). For claim (ii), observe that the claim is already true for q = p,
and is also true for q = 1 by Minkowski. For the intermediate values of p, we can
modify the proof of the Riesz-Thorin interpolation theorem (adapted now to mixed
norms) to conclude the argument; we leave the details to the reader as an exercise.
Claim (iii) of course follows from claim (ii) by reversing p and q.
A more direct proof of (ii) is as follows. We can take p, q finite since the cases
p = and q = are easy. Raising things to the q th power, it suffices to prove
k|f |q kLp/q L1 k|f |q kL1 Lp/q ,
x y y x
p/q
but this follows from Minkowskis inequality (since Lx is a Banach space). This
proof is shorter, but perhaps a little harder to remember as it relies on an ad hoc
trick; in contrast, the interpolation proof, while more complicated, is at least relies
on a standard method and is thus easier to remember.
Problem 4.2. Let 1 q < p < , and suppose that kf kLpxLqy = kf kLqy Lpx < .
Show that |f | is a tensor product, i.e. there exist functions fx Lpx and fy Lqy
such that |f (x, y)| = fx (x)fy (y). Thus we only expect interchanging norms to be
a good idea when we believe the worst case example of f in our problem to be
roughly tensor product-like in nature.
Let us note two special (and easy) cases of norm interchange. Firstly, that maximal
functions control individual functions:
sup kfn kp k sup |fn |kp .
n n
10 TERENCE TAO
The moral here is that maximal functions are harder to upper-bound, but are
conversely more powerful for dominating other expressions; conversely, it is easier to
bound norms of sums than sums of norms, though the latter are good for dominating
other things.
Problem 4.3 (Duality for mixed norm spaces). Show that if 1 p, q and
f Lpx Lqy , then
Z Z
kf kLpxLqy = sup{| f (x, y)g(x, y) dX (x)dY (y)| : kgkLp Lq 1}.
x y
X Y
Note that Proposition 4.1 is in some sense self-dual with respect to with this
duality relationship.
5. Schurs test
5Unfortunately, the word kernel is also used for the null space of T ; in cases where this could
lead to confusion the full terminology integral kernel is of course recommended instead.
6Later on we shall study singular integrals, in which K is a distribution which typically does
not obey any absolute integrability condition, but nevertheless for which the integral operator can
make sense due to cancellation inherent in the kernel.
LECTURE NOTES 2 11
Remark 5.3. If kK(x, )kLq (Y ) is unbounded, then (5) strongly suggests that T is
not of strong type (1, q). However proving this rigorously is remarkably difficult,
because the lack of even qualitative control on K (other than measurability) defeats
the use of most limiting arguments. Nevertheless, for heuristic purposes at least, the
above proposition gives a necessary and sufficient condition to map from L1 to Lq
for any q 1. A similar argument works with Lq replaced by other Banach spaces,
but it unfortunately does not work for quasi-normed spaces such as L1, ; indeed,
weak-type (1, 1) estimates can be remarkably delicate to establish (or disprove).
7This can be deduced from the triangle inequality by a monotone convergence argument,
approximating |K| and |f | from below by simple functions. The corresponding claim for K and f
then follows from a similar dominated convergence argument.
12 TERENCE TAO
We leave the proof as an exercise to the reader; it is similar to the previous propo-
sition but relies on H
olders inequality (and its converse) rather than Minkowskis
inequality.
The above propositions give useful necessary and sufficient criteria for strong-type
(p, q) bounds when p = 1 or q = . For other cases, it does not seem possible to
find such a simple criterion for strong-type (p, q), although when the kernel K is
non-negative one can at least handle the p = or q = 1 cases easily:
Problem 5.5. Let K : X Y R+R be non-negative and 1 q . Then TK is
of strong type (, q) if and only if X K(x, y) dX (x) lies in Lq (Y ), and in fact in
this case we have
Z
kTK kL (X)Lq (Y ) = k K(x, y) dX (x)kLqy (Y ) .
X
R
Dually, if 1 p , then TK is of strong type (p, 1) if and only if Y K(x, y) dX (y)
lies in Lp (X), and in fact in this case we have
Z
kTK kLp (X)L1 (Y ) = k K(x, y) dY (y)kLp (Y ) .
y
Y
By combining Propositions 5.2, 5.4 with the Riesz-Thorin theorem we can obtain
a useful test for Lp boundedness:
Theorem 5.6 (Schurs test). Suppose that K : X Y C obeys the bounds
Z
|K(x, y)| dX (x) A for almost every y Y
X
and Z
|K(x, y)| dY (y) B for almost every x X
Y
for some 0 < A, B < . Then for every 1 p , the integral operator TK in
(4) is well-defined (in the sense that the integral is absolutely integrable for almost
every y) for all f Lp (X), with
kTK f kLp (Y ) A1/p B 1/p kf kLp(X) .
8Note how we are exploiting the non-assumption that the domain (X, ) and range (Y, )
X Y
are not required to be equal. Thus we see that generalising a proposition may in fact make it
easier to prove, as it can introduce more symmetries or other structural features.
LECTURE NOTES 2 13
R
so the form Y
TK f g dY makes sense and is absolutely integrable for all simple f ,
g.
From the preceding two propositions we know that the claim is true for p = 1 and
p = , and the general case follows from the Riesz-Thorin theorem. (One can also
use real interpolation coupled with the tensor power trick.)
We can give an alternate proof of Schurs test, which is more elementary (using
real convexity rather than complex analyticity) but relies on a non-obvious trick,
as follows. Again we normalise A = B = 1. As the cases p = 1, are trivial (or
can be obtained by limiting arguments) we assume 1 < p < . By duality and
monotone convergence it suffices to show that
Z Z
|K(x, y)||f (x)||g(y)| dX (x)dY (y) kf kLp(X) kgkLp (Y )
X Y
for all simple functions f, g. We can normalise kf kLp(X) = kgkLp (Y ) = 1. We now
use the weighted arithmetic mean-geometric mean inequality
1 1
|f (x)||g(y)| |f (x)|p + |g(y)|p
p p
which if we rewrite it as
1
log |f (x)|p + p1 log |g(y)|p
1 log |f (x)|p 1 p
ep e + elog |g(y)|
p p
is seen to just be the convexity of the exponential function in disguise. We conclude
that
Z Z Z Z Z Z
1 1
|K(x, y)||f (x)||g(y)| dxdy |K(x, y)||f (x)|p dX (x)dY (y)+ |K(x, y)||g(y)|p dX (x)dY
X Y p X Y p X Y
Computing the first term by integrating in y first, and the second term by integrat-
ing in x first, we obtain the claim.
Remark 5.7. A variant on this elementary approach is to factor |K(x, y)||f (x)||g(y)|
as |K(x, y)|1/p |f (x)| and |K(x, y)|1/p |g(y)| and use H
olders inequality, estimating
these two factors in Lp (X Y ) and Lp (X Y ) respectively. (We thank Kenley
Jung for pointing out this simple argument.)
Problem 5.8. By observing what happens when one multiplies K, X , or Y by
positive constants, show that the hypotheses of Schurs test cannot be used to
deduce a strong type (p, q) (or even a restricted weak-type (p, q)) bound on T when
p 6= q.
Remark 5.9. An illustrative case of Schurs test is as follows: let K be an n n
matrix whose entries are non-negative, and whose row and column sums are all
bounded by A. Then the operator norm of K is also bounded by A.
Schurs test can be sharp. Suppose that X, Y have finite measure, that K is non-
negative, and that the hypotheses of Schurs test are satisfied exactly, in the sense
that Z
|K(x, y)| dX (x) = A
X
14 TERENCE TAO
When K oscillates, one usually does not expect Schurs test to be efficient. For
instance, consider the Fourier transform operator f 7 f on Rd , which is an integral
operator with kernel K(x, y) = e2ixy . The values of A and B for this kernel are
infinite, yet Plancherels theorem (which we shall review below) shows that this
operator is bounded in L2 (Rd ). Handling oscillatory integrals is in fact a major
challenge in harmonic analysis, requiring tools such as almost orthogonality and
various wave packet decompositions; we shall return to these issues in later notes.
Remark 5.10. There is in fact a sense in which the weighted version of Schurs test
is always sharp for non-negative kernels K 0, provided one chooses the weights
optimally. Indeed, suppose that kTK kLp (X)Lp (Y ) = A, and furthermore that this
bound is attained by some non-zero f , thus kTK f kLp(Y ) = Akf kLp(X) . Since K is
non-negative, we can
R assume without loss of generality that f is also non-negative;
we can normalise X f p dX = 1, thus
Z
(TK f )p dY = Ap .
Y
Elementary calculus of variations using Lagrange multipliers then reveals that
TK ((TK f )p1 ) = f p1
for some 0; integrating this against f reveals that = Ap . If we write w = f
and v = (TK f )p1 we thus have
Z
K(x, y)w(x) dX (x) = v(y)1/(p1)
X
and Z
K(x, y)v(y) dY (y) = Ap w(x)p1 .
Y
LECTURE NOTES 2 15
It is then not difficult to formulate a weighted version of Schurs test which gives
kTK kLp (X)Lp (Y ) A. Thus we see that by choosing the weights to be associated
to the extremal functions for T , Schurs test does not lose any constants whatsoever.
6. Youngs inequality
Proof By splitting K into real and imaginary parts, and then into positive and
negative parts, we may assume that K is real and non-negative. We may also
restrict the functions f we are testing to also be real and non-negative, for similar
reasons. By multiplying K by a constant we may normalise A = 1.
From the dual characterisation of Lr, (X) (Problem 6.9 of last weeks notes) we
know that the Lr, (X) quasi-norm is equivalent to a norm. One can then mod-
ify the proof of Proposition 5.2 to obtain the weak type (1, r). A similar argument
(modifying Proposition 5.4) gives us the restricted type (r , ). The Lorentz claims
then follow from Marcinkiewicz interpolation, and the final strong-type claim fol-
lows by setting s = p and recalling that Lq,p (Y ) embeds into Lq .
1 d
Indeed, one simply sets K(x, y) := |xy| ds and r := ds and verifies the hypotheses
of Proposition 6.1. The reason for the terminology fractional integration will be
explained later in this course.
R f (x)
The quantity Rd |xy| ds dx is of course a convolution of f with the convlution
kernel |x|1ds . Convolutions can in fact be studied on more general groups than
Rd . Suppose we have a domain X which is both a measure space, X = (X, B, ),
and a multiplicative9 group. We assume that the measure structure and group
structure are compatible in the following senses. First, we assume that the group
operations (x, y) 7 xy and x 7 x1 are measurable, and that the translations
x 7 xy, x 7 yx and reflection x 7 x1 are all measure preserving10. We can then
define (formally, at least), the convolution f g of two functions f, g : X C by
Z Z
f g(x) := f (y)g(y 1 x) d(y) = f (xy 1 )g(y) d(y).
X X
This operation is bilinear and (formally) associative (the association being justified
from the Fubini-Tonelli theorem when f, g are either both non-negative or both
absolutely integrable), but is only commutative when the underlying group X is
also. The convolution algebra is a continuous version of the group algebra CX of
X, which corresponds to the case when is counting measure and all functions are
restricted to have finite support.
9We use multiplication whenever we do not wish to assume that the group is commutative. Of
course, many important examples, such as Euclidean space Rd , are additive (and thus commuta-
tive) groups.
10This is for instance the case if X is a Lie group, with a bi-invariant Haar measure. Such
measures exist whenever X is unimodular, which in particular occurs when X is compact or
abelian. Another example is if is counting measure and X is at most countable (for -finiteness).
LECTURE NOTES 2 17
for all f Lpx (Rd ). Here hxi := (1 + |x|2 )1/2 is the Japanese bracket of x.
Problem 6.9. Show that the theorem fails for p = , but is true when L (Rd ) is
replaced by the closed subspace C 0 (Rd ) of bounded continuous functions.
Problem 6.10. If 1 p , f Lp (Rd ), and g Lp (Rd ), show that f g is
continuous and decays to zero at infinity.
7. Hausdorff-Young
We will now very quickly introduce the Fourier transform on Rd in order to demon-
strate a classical application of interpolation theory, namely the Hausdorff-Young
inequality. In this section we shall present this Fourier transform in a rather un-
motivated way; much later in this course we shall systematically study the Fourier
transform on various groups in a more unified context.
F Transx0 = Modx0 F
F Mod0 = Trans0 F
F Dilp = Dilp1 F
where Transx0 for x0 Rd is the spatial translation operator
Transx0 f (x) := f (x x0 ),
d
Mod0 for 0 R is the frequency modulation operator
Modx0 f (x) := e2ix0 f (x),
and Dilp for > 0 and 1 p is the Lp -normalised dilation operator
1 x
Dilp f (x) := f ( ).
d/p
11To be truly finicky, the frequency variable should live in the dual space (Rd ) of Rd ; this
is of course canonically identifiable with Rd once one imposes the Euclidean inner product x
on Rd , but one could also work on a more abstract finite-dimensional vector space without a
preferred inner product, in which case no canonical identification is available. It is occasionally
useful to work in the latter abstract setting, in order to more easily exploit the GL(Rd ) symmetry,
but in many applications the Euclidean structure is already being exploited (e.g. through the use
of balls, or of the magnitude function x 7 |x|), and so there is little point in trying to remove
that structure from the Fourier transform.
LECTURE NOTES 2 19
Thus the Fourier transform interchanges translation with modulation, and reverses
dilation. There is in fact a more general symmetry
F DilpU = Dilp(U )1 F
for any invertible linear transformation U : Rd Rd (i.e. U GL(Rd )), where
1
DilpU f (x) := f (U 1 x).
| det U |1/p
Thus, for instance, the Fourier transform commutes with orthogonal transforma-
tions such as rotations and reflections. Finally, we observe the commutation rela-
tions
F = 2ij F
xj
F 2ixj = F
j
which can be viewed as infinitesimal versions of the translation and modulation
symmetries. One consequence of these symmetries is the observation that F maps
Schwartz functions to Schwartz functions, continuously in the Schwartz topology.
for all 1 < p < 2 and simple functions f with compact support, and (thus by
density) we may also extend F uniquely and continuously to Lp (Rd ). (For p > 2,
the Fourier transform of an Lp (Rd ) function can still be defined as a distribution,
but it need not correspond to a locally integrable function.)
Problem 7.1. Show that up to constant multiplication, the Fourier transform F is
the only continuous map from Schwartz space to itself which obeys the translation
and modulation symmetries.
Problem 7.2. Obtain the Hausdorff-Young inequality using real interpolation and
the tensor power trick.
The Hausdorff-Young inequality (7) inequality gives an upper bound of 1 for the
operator norm kF kLp (Rd )Lp (Rd ) . In the converse direction, we have:
x
1/2p
This gives a lower bound of (pp )1/2p for the operator norm. This bound turns out
to be sharp (a famous result of Beckner) but we will not demonstrate it here.
One can ask whether the Fourier transform has any other Lp mapping properties,
i.e. to determine all the (p, q) for which F is strong type (or restricted weak
type, etc.). The scale invariance of the Fourier transform shows that the duality
condition q = p is necessary for any of these type properties to hold. The Hausdorff-
Young inequality also shows that this necessary condition is sufficient (for any of
the types) for p 2. Unfortunately the inequality fails for p > 2. In fact the
failure is quite severe and can be demonstrated in a number of ways. One is by
Littlewoods principle, which asserts that on non-compact domains (such as Rd ),
it is not possible for an operator with any sort of translation symmetry to map
high-exponent Lebesgue spaces to low-exponent spaces. (The higher exponents
are always on the left.) We give one rigorous formulation of this principle in Q12.
13This is one reason why we normalise the Fourier transform by placing the 2 in the exponent.
LECTURE NOTES 2 21
That principle does not directly apply here, but a variant of it will suffice here. Let
N be a large integer, let v Rd have large magnitude, and consider the function
N
X
f (x) := e2ixnv g(x nv)
n=1
2
where g is the standard Gaussian g(x) := e|x| . Then the symmetries of the
Fourier transform show that
N
X
f() := e2inv g( nv) = f ().
n=1
8. Kernel truncation
We continue our study of integral operators (4). Intuitively, the smaller one makes
the kernel K, the easier it should be to bound it. One basic result in this direction
is:
Theorem 8.1 (Positive domination). Let K : X Y C, K : X Y R+ be
kernels, and let TK , TK be the corresponding integral operators. Suppose that we
have the pointwise bound |K| K . Then if TK is of strong type (p, q), then TK
is also (and the integral is a.e. absolutely convergent); in fact we have the bound
kTK kLp (X)Lq (Y ) kTK kLp (X)Lq (Y ) .
Proof From the triangle inequality we have the pointwise bound |TK (f )|
TK (|f |). The claim follows.
22 TERENCE TAO
Note that similar results also hold for weak type, restricted type, etc. because of
the monotonicity property of the underlying function spaces.
On the other hand, certain truncations are acceptable for signed kernels. For in-
stance, if p = 1 or q = then all truncations contract the operator norm, thanks
to Propositions 5.2, 5.4. Secondly, it is clear that if is a product set = A B,
then from the14 identity
TK1 (f ) = 1B TK (1A f ) (8)
we see that if TK is of strong-type (p, q) then TK1 is also, with the expected
comparison principle kTK1 kLp(X)Lq (Y ) kTK kLp (X)Lq (Y ) . Also note in this
case that TK1 can also be viewed as an operator from Lp (A) to Lp (B), and
kTK1 kLp (X)Lq (Y ) = kTK1 kLp (A)Lq (Y ) = kTK1 kLp (A)Lq (B) .
14Here we are implicitly assuming that there is enough regularity in K and f that the expres-
sions are well-defined; we gloss over this subtlety here.
15Strictly speaking, block diagonal refers to the case when X = Y and X = Y . A more
n n
appropriate term here would be block permutation, but we prefer block diagonal as it is more
familiar (and covers the most important cases).
LECTURE NOTES 2 23
If q p, show that
kT kLp(X)Lq (Y ) = sup kTn kLp (Xn )Lq (Yn ) .
n
(Note that the bounded degree hypothesis ensures that this sum is well-defined.)
Then for any A > 0 the following are equivalent up to changes in the implied
constants.
kT kLp(X)Lq (Y ) .p,q A.
kTn,m kLp (Xn )Lq (Ym ) .p,q A for all n, m with n m.
Proof We will prove this by induction on N ; however, one must be careful because
of the use of the Vinogradov notation .p,q in the conclusion. To do things rigor-
ously, let Ap,q be a large constant depending only on p and q to be chosen later.
We claim that for every N , we have the bound
kT kLp (X)Lq (Y ) Ap,q kT kLp(X)Lq (Y ) .
The claim is trivial for N = 1 (if Ap,q is large enough) so suppose inductively that
N > 1 and the claim has already been proven for smaller values of N . We may
normalise kT kLp(X)Lq (Y ) = 1. Let us also choose f Lp (X), normalised so that
kf kLp(X) = 1. Our task is to show that
kT f kLq (Y ) Ap,q .
The idea is to divide-and-conquer f in an intelligent fashion, adapted to both f and
the blocks X1 , . . . , Xn . Consider the sequence of numbers kf 1X1 ...Xn kpLp (X) for
n = 0, 1, . . . , N . This sequence increases from 0 to 1, so we can find 0 n0 < N
such that
1
kf 1X1 ...Xn0 kpLp (X) < kf 1X1 ...Xn0 +1 kpLp (X)
2
and hence
1
kf 1X1 ...Xn0 kLp (X) , kf 1Xn0 +2 ...XN kLp (X) 1/p .
2
Applying the induction hypothesis, we conclude that
1
kT (f 1X1 ...Xn0 )kLq (Y1 ...Yn0 ) , kT (f 1Xn0 +2 ...XN )kLq (Yn0 +2 ...YN ) 1/p Ap,q .
2
On the other hand, we have
kT (f 1X1 ...Xn0 )kLq (Yn0 +1 ...YN ) = kT (f 1X1 ...Xn0 )kLq (Yn0 +1 ...YN ) = O(1)
LECTURE NOTES 2 25
and
kT (f 1Xn0 +1 )kLq (Y ) kT (f 1Xn0+1 )kLq (Y ) = O(1)
by the normalisations on T and f and the definition of T . Also, T (f 1Xn0 +2 ...XN )
vanishes outside of Yn0 +2 . . . YN . From the triangle inequality we conclude that
1
kT f kLq (Y1 ...Yn0 ) , kT f kLq (Yn0 +1 ...YN ) 1/p Ap,q + O(1)
2
and thus
21/q
kT f kLq (Y ) 1/p Ap,q + O(1).
2
If we choose Ap,q large enough depending on p, q, then we can make the right-hand
side less than Ap,q (here we use the hypothesis q > p). The claim follows.
In terms of kernels,
S the above lemma allows one to truncate the kernel to upper-
diagonal regions nm Xn Ym without losing too much in the operator norm, so
long as q > p. The claim is unfortunately false when p = q (except when p = q = 1
or p = q = , where the claim is easy to verify), although producing an example
will have to wait for our analysis of the Hilbert transform in next weeks notes.
S S
The fact that the partitions X = n Xn , Y = n Yn are arbitrary makes the
Christ-Kiselev lemma quite powerful. Indeed, we can conclude
Corollary 8.8 (Christ-Kiselev lemma, maximal version). Let Q be a countable
ordered set, and for each Q let E be a set in X with the nesting property
E E whenever < . Let 1 p < q , and let T : Lp (X) Lq (Y ) be a
bounded linear operator. Define the maximal operator
T f (y) := sup |T (f 1E )(y)|. (9)
Q
Then the sublinear operator T is also bounded from Lp (X) to Lq (Y ), and in fact
kT kLp (X)Lq (Y ) .p,q kT kLp(X)Lq (Y ) .
Let us give a sample application of this lemma. Given any locally integrable function
f : R C, define the maximal Fourier transform F f by
Z
F f () := sup | f (x)e2ix dx| = sup fd1I ()
I I I
where I ranges over compact intervals in R.
Corollary 8.9 (Menshov-Paley-Zygmund theorem, quantitative version). For any
1 p < 2 we have
kF kLp (R)Lp (R) .p 1.
The claim is trivial if f lies in L1 (R)Lp (R), which is a dense subset of Lp (R). The
claim then follows from a limiting argument and Corollary 8.9 (and the Hausdorff-
Young inequality).
Remark 8.11. The above results are also true for p = 2, but is much more difficult
to prove, and is known as Carlesons theorem.
One can pass from the finite version of the Christ-Kiselev lemma to an infinite
version if one has enough regularity on the kernel. For instance, we have
Problem 8.12 (Christ-Kiselev lemma, upper diagonal version). Let K : R R C
be a locally integrable kernel, and suppose that the operator TK (which is defined
and locally integrable for compactly supported continuous functions f Cc0 (R) at
least) is bounded from Lp (R) to Lq (R) for some 1 p < q (i.e. it is bounded
on the dense subclass Cc0 (R) and thus has a unique continuous extension). Let K>
be the restriction of K to the upper diagonal {(s, t) R R : t > s}. Then the
operator TK> (which is also defined on Cc0 (R)) is also of strong type (p, q), with
kTK> kLp (R)Lq (R) .p,q kTK kLp (R)Lq (R) .
LECTURE NOTES 2 27
Remark 8.13. A vector-valued version of this lemma, in which the input and
output functions are not complex values, but instead take values in some infinite-
dimensional vector space, is very useful in the study of nonlinear dispersive PDE,
and in particular in the theory of Strichartz estimates, but we will not discuss that
topic further here.
In the above results we have only discussed rough truncations, in which a ker-
nel K was multiplied by an indicator function 1 . One can do somewhat better
with smooth truncations, but this requires significantly more structure on the
underlying domain.
Definition 8.14 (Adapted bump function). Let Rd be a bounded open set,
and let L : Rd Rd be an invertible affine transformation (the composition of
a general linear transformation and a translation), thus L() is another bounded
open set in Rd . We say that a function : Rd C is a bump function adapted to
L() if is smooth, supported in L(), and we have the bounds
k ( L)(x) = Ok, (1)
for all k 0 and x Rd (actually one just needs x ).
Remark 8.15. This definition only becomes meaningful if we hold fixed and let L
vary. Typically is a standard set, such as the unit ball, unit cube, unit cylinder,
or unit annulus, so that L() describes balls, cubes, boxes, tubes, or annuli of
various radius, eccentricity, and location; the point is then that the bump functions
adapted to these sets are adapted uniformly in these additional parameters.
Proposition 8.16 (Smooth cutoffs). Let K : Rd Rd C be a locally integrable
function, and suppose that TK is bounded from Lp (Rd ) to Lq (Rd ) for some 1
p, q . Let L() Rd and L ( ) Rd be affine images of bounded open
sets Rd , Rd , and let : Rd Rd C be a bump function adapted to
L() L ( ). Then TK is bounded from Lp (Rd ) to Lq (Rd ), and
kTK kLp (Rd )Lq (Rd ) ., ,d,d kTK kLp (Rd )Lq (Rd )
where we define TK first for test functions Cc0 (Rd ) and then extend by density.
Proof The idea is to exploit the fact that while bump functions adapted to product
sets such as L() L ( ) are not necessarily tensor products of bump functions
adapted to L() and L ( ) separately, they can be efficiently decomposed as a
convergent sum of such.
By rescaling by L and L we may assume that these transformations are the iden-
tity. By rescaling a little more to shrink and we may assume that these sets
are supported in (say) the cubes [1/4, 1/4]d and [1/4, 1/4]d respectively. In par-
ticular, is now a smooth bump function supported on [1/4, 1/4]d [1/4, 1/4]d .
We can identify this set with a subset of the torus (R/Z)d (R/Z)d in the usual
manner, thus creating a function : (R/Z)d (R/Z)d C whose k th derivatives
are Ok,, (1). We now appeal to the theory of Fourier series to write
X X b
y) =
(x, m)e2inx e2imy
(n, (10)
nZd mZd
28 TERENCE TAO
b
where (n, m) is the Fourier coefficient
Z
b m) =
(n, y)e2inx e2imy dxdy.
(x,
(R/Z)d (R/Z)d
(at least when applied to test functions), where em is the multiplier operator
f (y) 7 f (y)e2imy , and similarly for en . Since the operators 1[1/4,1/4]d , en are
contractions on Lp (Rd ), and 1[1/4,1/4]d , em are contractions on Lq (Rd ), the claim
now follows from the triangle inequality
One of the morals of the above proposition is that smooth bump functions, when
applied to an integral kernel, have essentially a negligible impact on the behaviour
of the integral operator, other than to localise that operator to the support of that
bump function. In particular the precise choice of bump function used is almost
always irrelevant. The boundedness of an operator is instead controlled by those
features unaffected by multiplying by smooth bump functions, such as singularity
or rapid oscillation.
9. Exercises
for all 1 < p 2 and all simple functions f . Show that we also have
kT f kL1(X) . kf kL log L(X)
for all simple functions f . (Hint: use Q10.)
Q12 (Littlewoods principle). Let 1 q < p , and suppose that
T : Lp (Rd ) Lq (Rd ) is a bounded linear operator which commutes with
translations, thus T Transx0 = Transx0 T for all x0 R. Show that T is
identically zero. (Hint: if T is not identically zero, then there is a non-
zero f Lp (Rd ) such that T f is also not identically zero. Use monotone
convergence to make f and T f small in a suitable sense outside of a large
PN
ball. Now investigate the Lp (Rd ) norm of n=1 Transxn f , where the xn
are widely separated points in space, as well as the Lq norm of the image of
that function under T .) Give a counterexample to show that Littlewoods
principle fails if Rd is replaced with a compact domain such as the torus
Rd /Zd .
Q13. Let BX be a -finite sub--algebra of BX , and similarly let BY be
a -finite sub--algebra of BY . Show that if K : X Y C is bounded
and measurable with respect to BX BY , and TK extends to a continuous
p q
linear operator from L (BX ) to L (BY ), then we have the
kTK kLp (BX )Lq (BY ) = kTK kLp (BX
)Lq (B ) = kTK kLp (B )Lq (B )
Y X Y
Show that
kT kLp (X)Lp (Y ) . log N kTK kLp (X)Lp (Y ) .
Q15 (Radamacher-Menshov inequality). Let f1 , . . . , fN be an orthonormal
set of functions in L2 (X) for some N > 1. Show that
Xn
k sup | fm |kL2 (X) . N 1/2 log N.
1nN m=1