Linear Algebra, Signal Processing, and Wavelets. A Unified Approach. Python Version
Linear Algebra, Signal Processing, and Wavelets. A Unified Approach. Python Version
Øyvind Ryan
ii
CONTENTS iii
Nomenclature 471
Bibliography 473
CONTENTS vi
Index 477
List of Examples and
Exercises
vii
List of Examples and Exercises viii
Exercise 8.6: Alternative QMF filter banks with additional sign . . . . . . 294
Example 8.7: Lifting factorization of the alternative piecewise linear wavelet295
Exercise 8.8: Polyphase components for symmetric filters . . . . . . . . . 300
Exercise 8.9: Implementing kernels transformations using lifting . . . . . 300
Exercise 8.10: Lifting orthonormal wavelets . . . . . . . . . . . . . . . . . 300
Exercise 8.11: 4 vanishing moments . . . . . . . . . . . . . . . . . . . . . 301
Exercise 8.12: Wavelet based on piecewise quadratic scaling function . . . 302
Exercise 8.13: Run forward and reverse transform . . . . . . . . . . . . . 313
Exercise 8.14: Verify statement of filters . . . . . . . . . . . . . . . . . . . 313
Exercise 8.15: Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
Example 9.1: Normalising the intensities . . . . . . . . . . . . . . . . . . . 322
Example 9.2: Extracting the different colors . . . . . . . . . . . . . . . . . 323
Example 9.3: Converting from color to grey-level . . . . . . . . . . . . . . 324
Example 9.4: Computing the negative image . . . . . . . . . . . . . . . . 326
Example 9.5: Increasing the contrast . . . . . . . . . . . . . . . . . . . . . 326
Exercise 9.6: Generate black and white images . . . . . . . . . . . . . . . 328
Exercise 9.7: Adjust contrast in images . . . . . . . . . . . . . . . . . . . 328
Exercise 9.8: Adjust contrast with another function . . . . . . . . . . . . 328
Example 9.9: Smoothing an image . . . . . . . . . . . . . . . . . . . . . . 333
Example 9.10: Edge detection . . . . . . . . . . . . . . . . . . . . . . . . . 335
Example 9.11: Second-order derivatives . . . . . . . . . . . . . . . . . . . 338
Example 9.12: Chess pattern image . . . . . . . . . . . . . . . . . . . . . 340
Exercise 9.13: Implement a tensor product . . . . . . . . . . . . . . . . . 341
Exercise 9.14: Generate images . . . . . . . . . . . . . . . . . . . . . . . . 341
Exercise 9.15: Interpret tensor products . . . . . . . . . . . . . . . . . . . 341
Exercise 9.16: Computational molecule of moving average filter . . . . . . 342
Exercise 9.17: Bilinearity of the tensor product . . . . . . . . . . . . . . . 342
Exercise 9.18: Attempt to write as tensor product . . . . . . . . . . . . . 342
Exercise 9.19: Computational molecules . . . . . . . . . . . . . . . . . . . 342
Exercise 9.20: Computational molecules 2 . . . . . . . . . . . . . . . . . . 342
Exercise 9.21: Comment on code . . . . . . . . . . . . . . . . . . . . . . . 343
Exercise 9.22: Eigenvectors of tensor products . . . . . . . . . . . . . . . 343
Exercise 9.23: The Kronecker product . . . . . . . . . . . . . . . . . . . . 343
Example 9.24: Change of coordinates with the DFT . . . . . . . . . . . . 347
Example 9.25: Change of coordinates with the DCT . . . . . . . . . . . . 348
Exercise 9.26: Implement DFT and DCT on blocks . . . . . . . . . . . . . 348
Exercise 9.27: Implement two-dimensional FFT and DCT . . . . . . . . . 349
Exercise 9.28: Zeroing out DCT coefficients . . . . . . . . . . . . . . . . . 350
Exercise 9.29: Comment code . . . . . . . . . . . . . . . . . . . . . . . . . 350
Example 10.1: Piecewise constant functions . . . . . . . . . . . . . . . . . 359
Example 10.2: Piecewise linear functions . . . . . . . . . . . . . . . . . . . 361
Example 10.3: Applying the Haar wavelet to a very simple example image 365
Example 10.4: Creating thumbnail images . . . . . . . . . . . . . . . . . . 365
Example 10.5: Detail and low-resolution approximations for different wavelets366
Example 10.6: The Spline 5/3 wavelet and removing bands in the detail
spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
List of Examples and Exercises xiii
A major part of the information we receive and perceive every day is in the
form of audio. Most sounds are transferred directly from the source to our ears,
like when we have a face to face conversation with someone or listen to the
sounds in a forest or a street. However, a considerable part of the sounds are
generated by loudspeakers in various kinds of audio machines like cell phones,
digital audio players, home cinemas, radios, television sets and so on. The sounds
produced by these machines are either generated from information stored inside,
or electromagnetic waves are picked up by an antenna, processed, and then
converted to sound.
What we perceive as sound corresponds to the physical phenomenon of slight
variations in air pressure near our ears. Air pressure is measured by the SI-unit
Pa (Pascal) which is equivalent to N/m2 (force / area). In other words, 1 Pa
corresponds to the force exerted on an area of 1 m2 by the air column above this
area. Larger variations mean louder sounds, while faster variations correspond
to sounds with a higher pitch.
Observation 1.1. Continuous Sound.
A sound can be represented as a function, corresponding to air pressure
measured over time. When a function represents a sound, it is often referred to
as a continuous sound.
Continuous sounds are defined for all time instances. On computers and
various kinds of media players, however, the sound is digital, i.e. it is represented
by a large number of function values, stored in a suitable number format. Such
digital sound is easier to manipulate and process on a computer.
xk = f (k/fs ), for k = 0, 1, . . . , N − 1.
1
CHAPTER 1. SOUND AND FOURIER SERIES 2
Note that the indexing convention for digital sound is not standard in
mathematics, where vector indices start at 1. The components in digital sound
are often referred to as samples, the time between successive samples is called
the sampling period, denoted Ts . and measuring the sound is also referred to as
sampling the sound.
The quality of digital sound is often measured by the bit rate (number of
bits per second), i.e. the product of the sampling rate and the number of bits
(binary digits) used to store each sample. Both the sample rate and the number
format influence the quality of the resulting sound. These are encapsulated in
digital sound formats. A couple of them are described below.
Telephony. For telephony it is common to sample the sound 8000 times per
second and represent each sample value as a 13-bit integer. These integers are
then converted to a kind of 8-bit floating-point format with a 4-bit significand.
Telephony therefore generates a bit rate of 64 000 bits per second, i.e. 64 kb/s.
Newer formats. Newer formats with higher quality are available. Music is
distributed in various formats on DVDs (DVD-video, DVD-audio, Super Audio
CD) with sampling rates up to 192 000 and up to 24 bits per sample. These
formats also support surround sound (up to seven channels in contrast to the
two stereo channels on a CD). In the following we will assume all sound to be
CHAPTER 1. SOUND AND FOURIER SERIES 3
digital. Later we will return to how we reconstruct audible sound from digital
sound.
In the following we will briefly discuss the basic properties of sound: loudness
(the size of the variations), and frequency (the number of variations per second).
We will then address to what extent sounds can be decomposed as a sum of
different frequencies, and look at important operations on sound, called filters,
which preserve frequencies. We will also see how we can experiment with digital
sound.
The functionality for accessing sound in this chapter is collected in a module
called sound.
6 +1.01322e5
5 1.0
4 0.5
3 0.0
2 0.5
1 1.0
0
0.00 0.01 0.02 0.03 0.04 0.000 0.002 0.004 0.006 0.008 0.010
Figure 1.1: An audio signal shown in terms of air pressure (left), and in terms
of the difference from the ambient air pressure (right).
The right plot in Figure 1.1 shows another sound which displays a slow, cos-
like, variation in air pressure, with some smaller and faster variations imposed on
this. This combination of several kinds of systematic oscillations in air pressure
is typical for general sounds. The size of the oscillations is directly related to the
loudness of the sound. The range of the oscillations is so big that it is common
to measure the loudness of a sound on a logarithmic scale:
Fact 1.3. Sound pressure and decibels.
It is common to relate a given sound pressure to the smallest sound pressure
that can be perceived, as a level on a decibel scale,
2
p p
Lp = 10 log10 = 20 log 10 .
p2ref pref
Here p is the measured sound pressure while pref is the sound pressure of a just
perceivable sound, usually considered to be 0.00002 Pa.
The square of the sound pressure appears in the definition of Lp since this
represents the power of the sound which is relevant for what we perceive as
loudness.
0.15 0.10
0.10
0.05
0.05
0.00 0.00
0.05
0.05
0.10
0.150.0 0.1 0.2 0.3 0.4 0.5 0.10
0.000 0.005 0.010 0.015 0.020
0.04
0.02
0.00
0.02
0.04
0.0000 0.0005 0.0010 0.0015 0.0020
Figure 1.2: Variations in air pressure during parts of a song. The first 0.5
seconds, the first 0.02 seconds, and the first 0.002 seconds.
The sounds in Figure 1.1 are synthetic in that they were constructed from
mathematical formulas. The sounds in Figure 1.2 on the other hand show the
variation in air pressure for a song, where there is no mathematical formula
involved. In the first half second there are so many oscillations that it is
CHAPTER 1. SOUND AND FOURIER SERIES 5
impossible to see the details, but if we zoom in on the first 0.002 seconds we
can see that there is a continuous function behind all the ink. In reality the
air pressure varies more than this, even over this short time period, but the
measuring equipment may not be able to pick up those variations, and it is also
doubtful whether we would be able to perceive such rapid variations.
f (t + T ) = f (t)
Note that all the values of a periodic function f with period T are known if
f (t) is known for all t in the interval [0, T ). The following will be our prototype
for periodic functions:
Observation 1.5. Frequency.
If ν is a real number, the function f (t) = sin(2πνt) is periodic with period
T = 1/ν. When t varies in the interval [0, 1], this function covers a total of
ν periods. This is expressed by saying that f has frequency ν. Frequency is
measured in Hz (Hertz) which is the same as s−1 (the time t is measured in
seconds). The function sin(2πνt) is also called a pure tone.
Clearly sin(2πνt) and cos(2πνt) have the same frequency, and they are simply
shifted versions of oneanother (since cos(2πνt) = sin(2πνt + π/2)) Both, as well
as linear combinations of them, are called pure tones with frequency ν. Due to
this, the complex functions e±2πiνt = cos(2πνt) ± i cos(2πνt) will also be called
pure tones. They will also turn out to be useful in the following.
If we are to perceive variations in air pressure as sound, they must fall within
a certain range. It turns out that, for a human with good hearing to perceive a
sound, the number of variations per second must be in the range 20–20 000.
There is a simple way to change the period of a periodic function, namely by
multiplying the argument by a constant. Figure 1.3 illustrates this. The function
in the upper left is the plain sin t which covers one period when t varies in the
interval [0, 2π]. By multiplying the argument by 2π, the period is squeezed into
the interval [0, 1] so the function sin(2πt) has frequency ν = 1. Then, by also
multiplying the argument by 2, we push two whole periods into the interval [0, 1],
so the function sin(2π2t) has frequency ν = 2. In the lower right the argument
has been multiplied by 5 — hence the frequency is 5 and there are five whole
periods in the interval [0, 1].
CHAPTER 1. SOUND AND FOURIER SERIES 6
1.0 1.0
sin(t) sin(2πt)
0.5 0.5
0.0 0.0
0.5 0.5
1.00 1 2 3 4 5 6 1.00.0 0.2 0.4 0.6 0.8 1.0
1.0 1.0
sin(2π2t) sin(2π5t)
0.5 0.5
0.0 0.0
0.5 0.5
1.00.0 0.2 0.4 0.6 0.8 1.0 1.00.0 0.2 0.4 0.6 0.8 1.0
Figure 1.3: Versions of sin with different frequencies.
These functions can be found in the module sound. Note that the method play
does not block - if we play several sounds in succession they will be played
simultaneously. To avoid this, we can block the program ourselves using the
raw_input function, in order to wait for input from the terminal.
play basically sends the array of sound samples x and sample rate fs to the
sound card, which uses some method for reconstructing the sound to an analog
sound signal. This analog signal is then sent to the loudspeakers and we hear
the sound.
The sound samples can have different data types. We will always assume that
they are of type double. The computer requires that they have values between
−1 and 1 (0 corresponding to no variation in air pressure from ambience, and
−1 and 1 the largest variations in air pressure). If they are not the behaviour
when the sound is played is undefined.
You can also create the vector x you play on your own, without reading it
from file. Below we do this for pure tones.
CHAPTER 1. SOUND AND FOURIER SERIES 7
You may not hear a difference between the two channels. There may still be
differences, however, they may only be notable when the channels are sent to
different loudspeakers.
We will later apply different operations to sound. It is possible to apply these
operations to the sound channels simultaneously, and we will mostly do this.
Sounds we generate on our own, such as pure tones, will mostly be generated in
one channel.
yi = xN −i−1 , for i = 0, 1, . . . N − 1.
When we reverse the sound samples, we have to reverse the elements in both
sound channels. For our audio sample file this can be performed as follows.
z = x[::(-1), :]
play(z, fs)
Performing this on our sample file you generate the sound you can find in the
file castanetsreverse.wav.
Here f is the frequency, antsec the length in seconds, and fs the sampling rate.
A pure tone with frequency 440 Hz can be found in the file puretone440.wav,
and a pure tone with frequency 1500 Hz can be found in the file puretone1500.wav.
CHAPTER 1. SOUND AND FOURIER SERIES 8
1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.0 1.0
0.000 0.002 0.004 0.006 0.008 0.010 0.000 0.002 0.004 0.006 0.008 0.010
Figure 1.4: The first five periods of the square wave and the triangle wave.
Let us listen also listen to the square wave. We can first create the samples
for one period.
antsec = 3
samplesperperiod = fs/f # The number of samples for one period
oneperiod = hstack([ones((samplesperperiod/2),dtype=float), \
-ones((samplesperperiod/2),dtype=float)])
Then we repeat one period to obtain a sound with the desired length, and play
it as follows.
You can listen to this square wave in the file square440.wav. We hear a sound
which seems to have the same "base frequency" as sin(2π440t), but the square
wave is less pleasant to listen to: There seems to be some "sharp corners" in the
sound, translating into a rather shrieking, piercing sound. We will later explain
this by the fact that the square wave can be viewed as a sum of many frequencies,
and that many frequencies pollute the sound so that it is not pleasant to listen
to.
CHAPTER 1. SOUND AND FOURIER SERIES 9
z = x + c*(2*random.random(shape(x))-1)
This adds noise to all channels. The function for returning random numbers
returns numbers between 0 and 1, and above we have adjusted these so that
they are between −1 and 1 instead, as for other sound which can be played by
the computer. c is a constant (usually smaller than 1) that dampens the noise.
Write code which adds noise to the audio sample file, and listen to the result
for damping constants c=0.4 and c=0.1. Remember to scale the sound values
after you have added noise, since they may be outside [−1, 1] then.
Proof. Since
hf − g, hi = 0, for all h ∈ W .
If we have an orthogonal basis φ = {φi }m
for W , the orthogonal decomposition
i=1
theorem states that the best approximation from W is
m
X hf, φi i
g= φi . (1.5)
i=1
hφ i , φi i
1 See Section 6.1 in [25] for a review of inner products and orthogonality.
2 See Section 6.7 in [25] for a review of function spaces as inner product spaces.
3 See Section 6.3 in [25] for a review of projections and least squares approximations.
CHAPTER 1. SOUND AND FOURIER SERIES 12
V1 ⊂ V2 ⊂ · · · ⊂ Vn ⊂ · · ·
of increasing dimensions so that most sounds can be approximated arbitrarily
well by choosing n large enough, and use the orthogonal decomposition theorem
to compute the approximations. It turns out that pure tones can be used for
this purpose:
Definition 1.8. Fourier series.
Let VN,T be the subspace of C[0, T ] spanned by the set of functions given by
The space VN,T is called the N ’th order Fourier space. The N th-order Fourier
series approximation of f , denoted fN , is defined as the best approximation of f
from VN,T with respect to the inner product defined by (1.3).
We see that pure tones at frequencies 1/T , 2/T ,..., N/T are a basis for VN,T .
A best approximation at these frequencies, as described above will be called a
Fourier series. They are similar to Taylor series, where instead polynomials are
used in the approximation, but we will see that there is a major difference in
how the two approximations are computed. The theory of approximation of
functions with Fourier series is referred to as Fourier analysis, and is a central
tool in practical fields like image- and signal processing, but is also an important
field of research within pure mathematics. The approximation fN ∈ VN,T can
serve as a compressed version of f if many of the coefficients can be set to 0
without the error becoming too big.
Note that all the functions in the set DN,T are periodic with period T , but
most have an even shorter period (cos(2πnt/T ) also has period T /n). In general,
the term fundamental frequency is used to denote the lowest frequency of a given
periodic function.
The next theorem explains that the DN,T actually forms a basis for the
Fourier spaces, and also how to obtain the coefficients in this basis.
Theorem 1.9. Fourier coefficients.
The set DN,T is an orthogonal basis for VN,T . In particular, the dimension
of VN,T is 2N + 1, and if f is a function in L2 [0, T ], we denote by a0 , . . . , aN
and b1 , . . . , bN the coordinates of fN in the basis DN,T , i.e.
N
X
fN (t) = a0 + (an cos(2πnt/T ) + bn sin(2πnt/T )) . (1.7)
n=1
Z T
1
a0 = hf, 1i = f (t) dt, (1.8)
T 0
Z T
2
an = 2 f, cos(2πnt/T ) = f (t) cos(2πnt/T ) dt for n ≥ 1, (1.9)
T 0
Z T
2
bn = 2hf, sin(2πnt/T )i = f (t) sin(2πnt/T ) dt for n ≥ 1. (1.10)
T 0
hcos(2πmt/T ), cos(2πnt/T )i
1 T
Z
= cos(2πmt/T ) cos(2πnt/T )dt
T 0
Z T
1
= (cos(2πmt/T + 2πnt/T ) + cos(2πmt/T − 2πnt/T ))
2T 0
T
1 T T
= sin(2π(m + n)t/T ) + sin(2π(m − n)t/T )
2T 2π(m + n) 2π(m − n) 0
= 0.
Here we have added the two identities cos(x±y) = cos x cos y∓sin x sin y together
to obtain an expression for cos(2πmt/T ) cos(2πnt/T )dt in terms of cos(2πmt/T +
2πnt/T ) and cos(2πmt/T − 2πnt/T ). By testing all other combinations of sin
and cos also, we obtain the orthogonality of all functions in DN,T . We also
obtain that
1
hcos(2πmt/T ), cos(2πmt/T )i =
2
1
hsin(2πmt/T ), sin(2πmt/T )i =
2
h1, 1i = 1,
From the orthogonal decomposition theorem (1.5) it follows from this that
CHAPTER 1. SOUND AND FOURIER SERIES 14
N
hf, 1i X hf, cos(2πnt/T )i
fN (t) = 1+ cos(2πnt/T )
h1, 1i n=1
hcos(2πnt/T ), cos(2πnt/T )i
N
X hf, sin(2πnt/T )i
+ sin(2πnt/T )
n=1
hsin(2πnt/T ), sin(2πnt/T )i
1 T N 1 RT
R
T 0 f (t)dt T 0 f (t) cos(2πnt/T )dt
X
= + 1 cos(2πnt/T )
1 n=1 2
N 1 RT
T 0 f (t) sin(2πnt/T )dt
X
+ 1 sin(2πnt/T )
n=1 2
N
!
Z T Z T
1 X 2
= f (t)dt + f (t) cos(2πnt/T )dt cos(2πnt/T )
T 0 n=1
T 0
N
!
Z T
X 2
+ f (t) sin(2πnt/T )dt sin(2πnt/T ).
n=1
T 0
1.005 1.005
1.000 1.000
0.995 0.995
0.990 0.990
0.985 0.985
0.980 0.980
0.9750.0 0.2 0.4 0.6 0.8 1.0 0.9750.0 0.5 1.0 1.5 2.0
Figure 1.5: The cubic polynomial f (x) = − 13 x3 + 12 x2 − 163
x + 1 on the interval
[0, 1], together with its Fourier series approximation from V9,1 . The function and
its Fourier series is shown left. The Fourier series on a larger interval is shown
right.
Z T Z T /2 Z T
1 1 1
a0 = fs (t)dt = dt − dt = 0.
T 0 T 0 T T /2
Z T
2
an = fs (t) cos(2πnt/T )dt
T 0
Z T /2 Z T
2 2
= cos(2πnt/T )dt − cos(2πnt/T )dt
T 0 T T /2
T /2 T
2 T 2 T
= sin(2πnt/T ) − sin(2πnt/T )
T 2πn 0 T 2πn T /2
2 T
= ((sin(nπ) − sin 0) − (sin(2nπ) − sin(nπ)) = 0.
T 2πn
Finally, using Equation (1.10) we obtain
CHAPTER 1. SOUND AND FOURIER SERIES 16
Z T
2
bn = fs (t) sin(2πnt/T )dt
T 0
Z T /2 Z T
2 2
= sin(2πnt/T )dt − sin(2πnt/T )dt
T 0 T T /2
T /2 T
2 T 2 T
= − cos(2πnt/T ) + cos(2πnt/T )
T 2πn 0 T 2πn T /2
2 T
= ((− cos(nπ) + cos 0) + (cos(2nπ) − cos(nπ)))
T 2πn
2(1 − cos(nπ)
=
( nπ
0, if n is even;
=
4/(nπ), if n is odd.
In other words, only the bn -coefficients with n odd in the Fourier series are
nonzero. This means that the Fourier series of the square wave is
4 4 4 4
sin(2πt/T )+ sin(2π3t/T )+ sin(2π5t/T )+ sin(2π7t/T )+· · · . (1.11)
π 3π 5π 7π
With N = 20, there are 10 trigonometric terms in this sum. The corresponding
Fourier series can be plotted over one period with the following code.
N = 20
T = 1/440.
t = linspace(0, T, 100)
x = zeros(len(t))
for k in range(1, N + 1, 2):
x += (4/(k*pi))*sin(2*pi*k*t/T)
plt.figure()
plt.plot(t, x, ’k-’)
The left plot in Figure 1.6 shows the resulting plot. In the right plot the values of
the first 100 Fourier coefficients bn are shown, to see that they actually converge
to zero. This is clearly necessary in order for the Fourier series to converge.
Even though f oscillates regularly between −1 and 1 with period T , the
discontinuities mean that it is far from the simple sin(2πt/T ) which corresponds
to a pure tone of frequency 1/T . Clearly b1 sin(2πt/T ) is the dominant term in
the Fourier series. This is not surprising since the square wave has the same
period as this term, but the additional terms in the Fourier series pollute the
pure sound. As we include more and more of these, we gradually approach the
square wave.
There is a connection between how fast the Fourier coefficients go to zero, and
how we perceive the sound. A pure sine sound has only one nonzero coefficient,
while the square wave Fourier coefficients decrease as 1/n, making the sound
CHAPTER 1. SOUND AND FOURIER SERIES 17
1.0 1.0
0.8
0.5
0.6
0.0
0.4
0.5
0.2
1.0 0.0
0.0000 0.0005 0.0010 0.0015 0.0020 20 40 60 80 100
Figure 1.6: The Fourier series of the square wave with N = 20, and the values
for the first 100 Fourier coefficients bn .
less pleasant. This explains what we heard when we listened to the sound in
Example 1.4. Also, it explains why we heard the same pitch as the pure tone,
since the first frequency in the Fourier series has the same frequency as the pure
tone we listened to, and since this had the highest value.
Let us listen to the Fourier series approximations of the square wave. For
N = 1 the sound can be found in the file square440s1.wav. This sounds exactly
like the pure sound with frequency 440Hz, as noted above. For N = 5 the sound
can be found in the file square440s5.wav, and for N = 9 it can be found in the
file square440s9.wav. The latter sounds are more like the square wave itself. As
we increase N we can hear how the introduction of more frequencies gradually
pollutes the sound more.
Z T /2 Z T
2 2
bn = ft (t) sin(2πnt/T )dt + ft (t) sin(2πnt/T )dt = 0.
T 0 T T /2
CHAPTER 1. SOUND AND FOURIER SERIES 18
For the final coefficients, since both f and cos(2πnt/T ) are symmetric about
T /2, we get for n ≥ 1,
Z T /2 Z T
2 2
an = ft (t) cos(2πnt/T )dt + ft (t) cos(2πnt/T )dt
T 0 T T /2
T /2
4 T /2 4
Z Z
4 T
= ft (t) cos(2πnt/T )dt = t− cos(2πnt/T )dt
T 0 T 0 T 4
16 T /2 4 T /2
Z Z
= 2 t cos(2πnt/T )dt − cos(2πnt/T )dt
T 0 T 0
4
= 2 2 (cos(nπ) − 1)
(π
n
0, if n is even;
=
−8/(n2 π 2 ), if n is odd.
where we have dropped the final tedious calculations (use integration by parts).
From this it is clear that the Fourier series of the triangle wave is
8 8 8 8
− cos(2πt/T )− 2 2 cos(2π3t/T )− 2 2 cos(2π5t/T )− 2 2 cos(2π7t/T )+· · · .
π2 3 π 5 π 7 π
(1.12)
In Figure 1.7 we have repeated the plots used for the square wave, for the triangle
wave. The figure indicates that the Fourier series coefficients decay faster.
1.0 1.0
0.8
0.5
0.6
0.0
0.4
0.5
0.2
1.0 0.0
0.0000 0.0005 0.0010 0.0015 0.0020 20 40 60 80 100
Figure 1.7: The Fourier series of the triangle wave with N = 20, and the values
for the first 100 Fourier coefficients an .
Z T0
2 1 T sin(2πnT0 /T )
an = cos(2πnt/T )dt = [sin(2πnt/T )]0 0 =
T 0 πn πn
for n ≥ 1. Similar computations hold for bn . We see that |an | is of the order
1/(πn), and that infinitely many n contribute, This function may be thought
of as a simple building block, corresponding to a small time segment. However,
we see that it is not a simple building block in terms of trigonometric functions.
This time segment building block may be useful for restricting a function to
smaller time segments, and later on we will see that it still can be useful.
The point is that the square wave is antisymmetric about 0, and the triangle
wave is symmetric about 0.
Proof. Note first that we can write
Z T /2 Z T /2
2 2
an = f (t) cos(2πnt/T )dt bn = f (t) sin(2πnt/T )dt,
T −T /2 T −T /2
i.e. we can change the integration bounds from [0, T ] to [−T /2, T /2]. This
follows from the fact that all f (t), cos(2πnt/T ) and sin(2πnt/T ) are periodic
with period T .
Suppose first that f is symmetric. We obtain
CHAPTER 1. SOUND AND FOURIER SERIES 20
Z T /2
2
bn = f (t) sin(2πnt/T )dt
T −T /2
0
2 T /2
Z Z
2
= f (t) sin(2πnt/T )dt + f (t) sin(2πnt/T )dt
T −T /2 T 0
2 −T /2
Z 0 Z
2
= f (t) sin(2πnt/T )dt − f (−t) sin(−2πnt/T )dt
T −T /2 T 0
Z 0
2 0
Z
2
= f (t) sin(2πnt/T )dt − f (t) sin(2πnt/T )dt = 0.
T −T /2 T −T /2
where we have made the substitution u = −t, and used that sin is antisymmetric.
The case when f is antisymmetric can be proved in the same way, and is left as
an exercise.
N
fN (t) + fN (−t) X
= a0 + an cos(2πnt/T )
2 n=1
N
fN (t) − fN (−t) X
= bn sin(2πnt/T ).
2 n=1
though the given setting is real numbers. This is definitely the case in Fourier
analysis. More precisely, we will make the substitutions
1 2πint/T
cos(2πnt/T ) = e + e−2πint/T (1.13)
2
1 2πint/T
sin(2πnt/T ) = e − e−2πint/T (1.14)
2i
in Definition 1.8. From these identities it is clear that the set of complex
exponential functions e2πint/T also is a basis of periodic functions (with the same
period) for VN,T . We may therefore reformulate Definition 1.8 as follows:
Definition 1.11. Complex Fourier basis.
We define the set of functions
and call this the order N complex Fourier basis for VN,T .
The function e2πint/T is also called a pure tone with frequency n/T , just
as sines and cosines are. We would like to show that these functions also are
orthogonal. To show this, we need to say more on the inner product we have
defined by Equation (1.3). A weakness with this definition is that we have
assumed real functions f and g, so that this can not be used for the complex
exponential functions e2πint/T . For general complex functions we will extend
the definition of the inner product as follows:
Z T
1
hf, gi = f ḡ dt. (1.17)
T 0
The associated norm now becomes
s
Z T
1
kf k = |f (t)|2 dt. (1.18)
T 0
The motivation behind Equation (1.17), where we have conjugated the second
function, lies in the definition of an inner product for vector spaces over complex
numbers. From before we are used to vector spaces over real numbers, but vector
spaces over complex numbers are defined through the same set of axioms as
for real vector spaces, only replacing real numbers with complex numbers. For
complex vector spaces, the axioms defining an inner product are the same as for
real vector spaces, except for that the axiom
N N
X hf, e2πint/T i 2πint/T
X
fN (t) = 2πint/T , e2πint/T i
e = hf, e2πint/T ie2πint/T
n=−N
he n=−N
N Z T !
X 1
= f (t)e−2πint/T dt e2πint/T .
T 0
n=−N
The yn are called the complex Fourier coefficients of f , and they are given by.
Z T
1
yn = hf, e 2πint/T
i= f (t)e−2πint/T dt. (1.22)
T 0
kf k2 = kf − fN k2 + kfN k2 ≥ kfN k2 ,
InPparticular the Fourier coefficients go to zero. The results does not say
N
that n=−N |yn |2 → kf k2 , which would imply that kf − fN k → 0. This is more
difficult to analyze, and we will only prove a particular case of it in Section 1.6.
If we reorder the real and complex Fourier bases so that the two functions
{cos(2πnt/T ), sin(2πnt/T )} and {e2πint/T , e−2πint/T } have the same index in
the bases, equations (1.13)-(1.14) give us that the change of coordinates matrix
4
from DN,T to FN,T , denoted PFN,T ←DN,T , is represented by repeating the
matrix
1 1 1/i
2 1 −1/i
along the diagonal (with an additional 1 for the constant function 1). In other
words, since an , bn are coefficients relative to the real basis and yn , y−n the
corresponding coefficients relative to the complex basis, we have for n > 0,
yn 1 1 1/i an
= .
y−n 2 1 −1/i bn
This can be summarized by the following theorem:
Theorem 1.15. Change of coordinates between real and complex Fourier bases.
The complex Fourier coefficients yn and the real Fourier coefficients an , bn of
a function f are related by
y0 = a0 ,
1
yn = (an − ibn ),
2
1
y−n = (an + ibn ),
2
for n = 1, . . . , N .
Combining with Theorem 1.10, Theorem 1.15 can help us state properties of
complex Fourier coefficients for symmetric- and antisymmetric functions. We
look into this in Exercise 1.34.
Due to the somewhat nicer formulas for the complex Fourier coefficients when
compared to the real Fourier coefficients, we will write most Fourier series in
complex form in the following.
Let us consider some examples where we compute complex Fourier series.
4 See Section 4.7 in [25], to review the mathematics behind change of coordinates.
CHAPTER 1. SOUND AND FOURIER SERIES 25
Z T iT
1 1 h
yn = e2πit/T2 e−2πint/T dt = e2πit(1/T2 −n/T )
T 0 2πiT (1/T2 − n/T ) 0
1
= e2πiT /T2 −1 .
2πi(T /T2 − n)
Here it is only the term 1/(T /T2 − n) which depends on n, so that yn can only be
large when n is close T /T2 . In Figure 1.8 we have plotted |yn | for two different
combinations of T, T2 .
1.0 1.0
0.8 0.8
0.6 0.6
|yn |
|yn |
0.4 0.4
0.2 0.2
0.00 5 10 15 20 0.00 5 10 15 20
n n
Figure 1.8: Plot of |yn | when f (t) = e2πit/T2 , and T2 > T . Left: T /T2 = 0.5.
Right: T /T2 = 0.9.
(
e2πin1 t/T on [0, T2 ]
f (t) = .
e2πin2 t/T on[T2 , T )
When n 6= n1 , n2 we have that
!
Z T /2 Z T
1
yn = e2πin1 t/T e−2πint/T dt + e2πin2 t/T e−2πint/T dt
T 0 T /2
T /2 T !
1 T T
= e2πi(n1 −n)t/T + e2πi(n2 −n)t/T
T 2πi(n1 − n) 0 2πi(n2 − n) T /2
|yn |
0.4 0.4
0.2 0.2
0.00 5 10 15 20 25 30 35 0.00 5 10 15 20 25 30 35
n n
Figure 1.9: Plot of |yn | when we have two different pure tones at the different
parts of a period. Left: n1 = 10, n2 = 12. Right: n1 = 2, n2 = 20.
We see that, when n1 , n2 are close, the Fourier coefficients are close to those
of a pure tone with n ≈ n1 , n2 , but that also other frequencies contribute. When
n1 , n2 are further apart, we see that the Fourier coefficients are like the sum of
the two base frequencies, but that other frequencies contribute also here.
There is an important lesson to be learned from this as well: We should
be aware of changes in a sound over time, and it may not be smart to use
a frequency representation over a large interval when we know that there are
simpler frequency representations on the smaller intervals. The following example
shows that, in some cases it is not necessary to compute the Fourier integrals at
all, in order to compute the Fourier series.
CHAPTER 1. SOUND AND FOURIER SERIES 27
3
1 2πit/T
cos3 (2πt/T ) = (e + e−2πit/T )
2
1 2πi3t/T
= (e + 3e2πit/T + 3e−2πit/T + e−2πi3t/T )
8
1 3 3 1
= e2πi3t/T + e2πit/T + e−2πit/T + e−2πi3t/T .
8 8 8 8
From this we see that the complex Fourier series is given by y1 = y−1 = 38 , and
that y3 = y−3 = 18 . In other words, it was not necessary to compute the Fourier
integrals in this case, and we see that the function lies in V3,T , i.e. there are
finitely many terms in the Fourier series. In general, if the function is some
trigonometric function, we can often use trigonometric identities to find an
expression for the Fourier series.
1 → e0 = (1, 0, 0, 0 . . . , )
2πint/T
e → en = (0, 0, . . . , 1, 0, 0, . . .)
sin(2πna/T )
χ−a,a → .
πn
The 1 in en is at position n and the function χ−a,a is the characteristic function
of the interval [−a, a], defined by
(
1, if t ∈ [−a, a];
χ−a,a (t) =
0, otherwise.
The first two pairs are easily verified, so the proofs are omitted. The case for
χ−a,a is very similar to the square wave, but easier to prove, and therefore also
omitted.
Theorem 1.17. Fourier series properties.
The mapping f → yn is linear: if f → xn , g → yn , then
af + bg → axn + byn
For all n. Moreover, if f is real and periodic with period T , the following
properties hold:
Proof. The proof of linearity is left to the reader. Property 1 follows immediately
by writing
Z T Z T
1 1
yn = f (t)e−2πint/T dt = f (t)e2πint/T dt
T 0 T 0
Z T
1
= f (t)e−2πi(−n)t/T dt = y−n .
T 0
CHAPTER 1. SOUND AND FOURIER SERIES 30
Z T Z T Z −T
1 −2πint/T 1 −2πint/T 1
g(t)e dt = f (−t)e dt = − f (t)e2πint/T dt
T 0 T 0 T 0
Z T
1
= f (t)e2πint/T dt = yn .
T 0
The first part of property 2 follows from this. The second part follows directly
by noting that
Z T Z T
1 1
g(t)e−2πint/T dt = f (t − d)e−2πint/T dt
T 0 T 0
Z T
1
= f (t)e−2πin(t+d)/T dt
T 0
Z T
−2πind/T 1
=e f (t)e−2πint/T dt = e−2πind/T yn .
T 0
For property 5 we observe that the Fourier coefficients of g(t) = e2πidt/T f (t) are
Z T Z T
1 −2πint/T 1
g(t)e dt = e2πidt/T f (t)e−2πint/T dt
T 0 T 0
Z T
1
= f (t)e−2πi(n−d)t/T dt = yn−d .
T 0
The first property says that the positive and negative frequencies in a (real)
sound essentially are the same. The second says that, when we play a sound
backwards, the frequency content is essentially the same. This is certainly the
case for all pure sounds. The third property says that, if we delay a sound, the
frequency content also is essentially the same. This also matches our intuition
on sound, since we think of the frequency representation as something which
is time-independent. The fourth property says that, if we multiply a sound
with a pure tone, the frequency representation is shifted (delayed), according
to the value of the frequency. This is something we see in early models for the
transmission of audio, where an audio signal is transmitted after having been
multiplied with what is called a ‘carrier wave‘. You can think of the carrier signal
as a pure tone. The result is a signal where the frequencies have been shifted
with the frequency of the carrier wave. The point of shifting the frequency of
the transmitted signal is to make it use a frequency range in which one knows
that other signals do not interfere. The last property looks a bit mysterious. We
will not have use for this property before the next chapter.
From Theorem 1.17 we also see that there exist several cases of duality
between a function and its Fourier series:
Actually, one can show that these dualities are even stronger if we had considered
Fourier series of complex functions instead of real functions. We will not go into
this.
Lemma 1.18. The order of computing Fourier series and differentiation does
not matter.
Assume that f is differentiable. Then (fN )0 (t) = (f 0 )N (t). In other words,
the derivative of the Fourier series equals the Fourier series of the derivative.
CHAPTER 1. SOUND AND FOURIER SERIES 32
Z T
1
hf, e 2πint/T
i= f (t)e−2πint/T dt
T 0
T Z T !
1 T −2πint/T T 0 −2πint/T
= − f (t)e + f (t)e dt
T 2πin 0 2πin 0
T 1 T 0
Z
T
= f (t)e−2πint/T dt = hf 0 , e2πint/T i.
2πin T 0 2πin
get that
N
!0 N
0
X
2πint/T 2πint/T 2πin X
(fN ) (t) = hf, e ie = hf, e2πint/T ie2πint/T
T
n=−N n=−N
N
X
= hf 0 , e2πint/T ie2πint/T = (f 0 )N (t).
n=−N
where we substituted the connection between the inner products we just found.
4 4 4 4
((ft )0 )N (t) = sin(2πt/T ) + sin(2π3t/T ) + sin(2π5t/T ) + · · · .
T π 3π 5π
8 1 1
(ft )N (t) = − cos(2πt/T ) + cos(2π3t/T ) + 2 cos(2π5t/T ) + · · · + C.
π2 32 5
arrive at the same expression as in Equation (1.12) for the Fourier series of the
triangle wave. This approach clearly had less computations involved. There
is a minor point here which we have not addressed: the triangle wave is not
differentiable at two points, as required by Lemma 1.18. It is, however, not too
difficult to see that this result still holds in cases where we have a finite number
of nondifferentiable points only.
We get the following corollary to Lemma 1.18:
Corollary 1.19. Connection between the Fourier coefficients of f (t) and f 0 (t).
If the complex Fourier coefficients of f are yn and f is differentiable, then
the Fourier coefficients of f 0 (t) are 2πin
T yn .
If we turn this around, we note that the Fourier coefficients of f (t) are
T /(2πin) times those of f 0 (t). If f is s times differentiable, we can repeat
s this
argument to show that the Fourier coefficients of f (t) are T /(2πin) times
those of f (s) (t). In other words, the Fourier coefficients of a function which is
many times differentiable decay to zero very fast.
Observation 1.20. Convergence speed of differentiable functions.
The Fourier series converges quickly when the function is many times differ-
entiable.
An illustration is found in examples 1.12 and 1.13, where we saw that the
Fourier series coefficients for the triangle wave converged more quickly to zero
than those of the square wave. This is explained by the fact that the square
wave is discontinuous, while the triangle wave is continuous with a discontinuous
first derivative. Also, the functions considered in examples 1.24 and 1.25 are not
continuous, which partially explain why we there saw contributions from many
frequencies.
The requirement of continuity in order to obtain quickly converging Fourier
series may seem like a small problem. However, often the function is not defined
on the whole real line: it is often only defined on the interval [0, T ). If we
extend this to a periodic function on the whole real line, by repeating one
period as shown in the left plot in Figure 1.10, there is no reason why the
new function should be continuous at the boundaries 0, T, 2T etc., even though
the function we started with may be continuous on [0, T ). This would require
that f (0) = limt→T f (t). If this does not hold, the function may not be well
approximated with trigonometric functions, due to a slowly convergence Fourier
series.
We can therefore ask ourselves the following question:
Idea 1.21. Continuous Extension.
Assume that f is continuous on [0, T ). Can we construct another periodic
function which agrees with f on [0, T ], and which is both continuous and periodic
(maybe with period different from T )?
If this is possible the Fourier series of the new function could produce better
approximations for f . It turns out that the following extension strategy does
the job:
CHAPTER 1. SOUND AND FOURIER SERIES 34
2.0 2.0
1.5 1.5
1.0 1.0
0.5 0.5
0.0 1 0 1 2 3 4 5 6 7 0.0 1 0 1 2 3 4 5 6 7
Figure 1.10: Two different extensions of f to a periodic function on the whole
real line. Periodic extension (left) and symmetric extension (right).
Z T Z T
2 2 T 4 T
bn = t− sin(2πnt/T )dt = 2 t− sin(2πnt/T )dt
T 0 T 2 T 0 2
T
2 T
Z Z
4 2
= t sin(2πnt/T )dt − sin(2πnt/T )dt = − ,
T2 0 T 0 πn
so that
N
X 2
fN (t) = − sin(2πnt/T ),
n=1
nπ
which indeed converges slowly to 0. Let us now instead consider the symmetric
extension of f . Clearly this is the triangle wave with period 2T , and the Fourier
series of this was
X 8
(f˘)N (t) = − cos(2πnt/(2T )).
n2 π 2
n≤N , n odd
The second series clearly converges faster than the first, since its Fourier coef-
ficients are an = −8/(n2 π 2 ) (with n odd), while the Fourier coefficients in the
first series are bn = −2/(nπ).
If we use T = 1/440, the symmetric extension has period 1/220, which gives
a triangle wave where the first term in the Fourier series has frequency 220Hz.
Listening to this we should hear something resembling a 220Hz pure tone, since
the first term in the Fourier series is the most dominating in the triangle wave.
Listening to the periodic extension we should hear a different sound. The first
term in the Fourier series has frequency 440Hz, but this drounds a bit in the
contribution of the other terms in the Fourier series, due to the slow convergence
of the Fourier series, just as for the square wave.
Let us plot the Fourier series with N = 7 terms for f . These are shown in
Figure 1.11.
It is clear from the plot that the Fourier series for f itself is not a very good
approximation, while we cannot differentiate between the Fourier series and the
function itself for the symmetric extension.
1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.0 1.0
0.000 0.002 0.004 0.006 0.008 0.000 0.002 0.004 0.006 0.008
Figure 1.11: The Fourier series with N = 7 terms of the periodic (left) and
symmetric (right) extensions of the function in Example 1.35.
(
1, if −T /4 ≤ t < T /4;
f (t) =
−1, if T /4 ≤ |t| < T /2.
f is just the square wave, delayed with d = −T /4. Compute the Fourier
coefficients of f directly, and use Property 4 in Theorem 1.17 to verify your
result.
Hint. Attempt to use one of the properties in Theorem 1.17 on the Fourier
series of the square wave.
we have seen, the unpleasant square wave to the pure tone sin(2π440t). Doing
so is an example of an important operation on sound called a filter:
Definition 1.24. Analog filters.
An operation on sound is called a filter if it preserves the different frequencies
in the sound. In other words, s is a filter if, for any sound on the form f =
2πiνt
P
ν c(ν)e , the output s(f ) is a sound which can be written on the form
!
X X
2πiνt
s(f ) = s c(ν)e = c(ν)λs (ν)e2πiνt .
ν ν
λs (ν) is a function describing how s treats the different frequencies, and is also
called the frequency response of s.
By definition any pure tone is an eigenvector of s, with the frequency response
providing the eigenvalue. The notion of a filter makes sense for both periodic
and non-periodic input functions. The problem is, however, that aP function may
be an infinite sum of frequencies, for which a sum on the form ν c(ν)e2πiνt
may not converge.
This general definition of filters may not be useful in practice, bit if we
restrict to Fourier spaces, however, we restrict ourselves to finite sums. We then
clearly have that s(f ) ∈ VN,T whenever f ∈ VN,T , so that the computation can
be performed in finite dimensions. Let us now see how we can construct useful
such filters.
Theorem 1.25. Convolution kernels.
Assume that g is a bounded Riemann-integrable function with compact
support (i.e. that there exists an interval [a, b] so that g = 0 outside [a, b]). The
operation
Z ∞
f (t) → h(t) = g(u)f (t − u)du. (1.23)
−∞
R∞
is a filter. Also, the frequency response of the filter is λs (ν) = ∞
g(s)e−2πiνs ds.
The function g is also called the kernel of s.
Note that the requirement that g is bounded with compact support is just
made for convenience, to ensure that the integral exists. Many weaker conditions
can be put on g to ensure that the integral exists. In case of compact support
there exist constants a and b so that the filter takes the form f (t) → h(t) =
Rb
a
g(s)f (t − s)ds.
Proof. We compute
Z ∞ Z ∞
s(e 2πiνt
)= g(s)e 2πiν(t−s)
ds = g(s)e−2πiνs dse2πiνt = λs (f )e2πiνt ,
−∞ −∞
Z T
1
SN (t) − f (t) = (f (t + u) + f (t − u) − 2f (t))FN (u)du. (1.24)
2T 0
CHAPTER 1. SOUND AND FOURIER SERIES 40
We have written SN (t) − f (t) on this form since now the integrand is continuous
at u = 0 as a function of u: This is obvious if f is continuous at t. If on the other
hand t is one of the discontinuities, we have that f (t + u) + f (t − u) → 2f (t) as
u → 0. Given > 0, we can therefore find a δ > 0 so that
|f (t + u) + f (t − u) − 2f (t)| <
R T R −δ Rδ R T /2
whenever |u| < δ. Now, split the integral (1.24) in three: 0 = −T /2 + −δ + δ .
For the second of these we have
Z δ
1
|(f (t + u) + f (t − u) − 2f (t))FN (u)|du
2T −δ
Z δ Z T /2
1 1
≤ Fn (u)du ≤ Fn (u)du = .
2T −δ 2T −T /2 2
Z T /2
1
|(f (t + u) + f (t − u) − 2f (t))FN (u)|du
2T δ
T /2
T2 kf k∞ T 2
Z
1
≤ 4kf k∞ du ≤ ,
2T δ 4(N + 1)u2 4(N + 1)δ 2
where kf k∞ = maxx∈[0,T ] |f (x)|. A similar calculation can be done for the first
integral. Clearly then we can choose N so big that the sum of the first and third
integrals are less than /2, and we then get that |SN (t) − f (t)| < . This shows
that SN (t) → f (t) as N → ∞ for any t. For the final statement, if [a, b] is an
interval where f is continuous, choose the δ above so small that [a − δ, b + δ] still
contains no discontinuities. Since continuous functions are uniformly continuous
on compact intervals, it is not too hard to see that the convergence of f to SN
on [a, b] is uniform. This completes the proof.
RT
Since SN (t) = T1 0 f (t−u)FN (u)du ∈ VN,T , and fN is a best approximation
from VN,T , we have that kfN − f k ≤ kSN − f k. If f is continuous, the result says
that kf − SN k∞ → 0, which implies that kf − SN k → 0, so that kf − fN k → 0.
Therefore, for f continuous, both kf − Sn k∞ → 0 and kf − fN k → 0 hold, so
that we have established both modes of convergence. If f has a discontinuity, it is
obvious that kf −SN k∞ → 0 can not hold, since SN is continuous. kf −fN k → 0
holds, however, even with discontinuities. The reason is that any function with
only a finite number of discontinuities can be approximated arbitrarily well with
continuous functions w.r.t. k · k. The proof of this is left as an exercise.
Both the square wave and the triangle are piecewise continuous (at least if
we redefined the value of the square wave at the discontinuity). Therefore both
their Fourier series converge to f in k · k. Since the triangle wave is continuous,
SN also converges uniformly to ft .
CHAPTER 1. SOUND AND FOURIER SERIES 41
The result above states that SN converges pointwise to f - it does not say
that fN converges pointwise to f . This suggests that SN may be better suited
to approximate f . In Figure 1.12 we have plotted SN (t) and fN (t) for the
square wave. Clearly the approximations are very different. The pointwise
convergence of fN (t) is more difficult to analyze, so we will make some additional
assumptions.
1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.0 1.0
0.0000 0.0005 0.0010 0.0015 0.0020 0.0000 0.0005 0.0010 0.0015 0.0020
f (t + h) − f (t+ ) f (t + h) − f (t− )
D+ f (t) = lim+ D− f (t) = lim−
h→0 h h→0 h
exist. Then limN →∞ fN (t) = f (t).
Proof. In Exercise 1.41 we construct another kernel DN (t), called the Dirichlet
kernel. This satisfies only two of the properties of a summability kernel, but this
will turn out to be enough for our purposes due to the additional assumption on
the one-sided limits for the derivative. A formula similar to (1.24) can be easily
proved using the same substitution v = −u:
Z T
1
fN (t) − f (t) = (f (t + u) + f (t − u) − 2f (t))DN (u)du. (1.25)
2T 0
Substituting the expression for the Dirichlet kernel obtained in exercise 1.42, the
integrand can be written as
We have that
−
and similarly for f (t−u)−f (t )
sin(πu/T ) . It follows that the function h defined above is a
piecewise continuous function in u. The proof will be done if we can show that
RT
0
h(u) sin(π(2N + 1)u/T )dt → 0 as N → ∞ for any piecewise continuous h.
Since
and since h(u) cos(πu/TR ) and h(u) sin(πu/T ) also are Rpiecewise continuous, it is
enough to show that h(u) sin(2πN u/T )du → 0 and h(u) cos(2πN u/T )du →
0. These are simply the order N Fourier coefficients of h. Since h is in particular
square integrable, it follows from Bessel’s inequality (Theorem 1.14) that the
Fourier coefficients of h go to zero, and the proof is done.
The requirement on the one-sided limits of the derivative above can be can be
replaced by less strict conditions. This gives rise to what is known as Dini’s test.
One can also replace with the less strict requirement that f has a finite number
of local minima and maxima. This is refered to as Dirichlets theorem, after
Dirichlet who proved it in 1829. There also exist much more general conditions
that secure pointwise convergence of the Fourier series. The most general results
require deep mathematical theory to prove.
Both the square wave and the triangle wave have one-sided limits for the
derivative. Therefore both their Fourier series converge to f pointwise.
25 40
20 30
15 20
10 10
5 0
0
0.0015 0.0010 0.0005 0.0000 0.0005 0.0010 0.0015 10
0.0015 0.0010 0.0005 0.0000 0.0005 0.0010 0.0015
RT
For the Dirichlet kernel we saw that fN (t) = T1 0 DN (u)f (t − u)du. From
this it follows in the same way that the frequency response corresponding to
filtering with the Dirichlet kernel is given by the mapping n/T → 1, i.e. it is one
on [−N/T, N/T ] and 0 elsewhere.
Figure 1.14: The frequency responses for the filters with Fejer and Dirichlet
kernels, N = 20.
The two frequency responses are shown in Figure 1.14. Both filters above are
what is called lowpass filters: They annihilate high frequencies. More precisely,
if ν > |N/T |, then the frequency response of ν is zero. The lowest frequency
ν = 0 is treated in the same way by the two filters, but the higher frequencies are
differed: The Dirichlet kernel keeps them, while the Fejer kernel attenuates them,
i.e. does not include all the frequency content at the higher frequencies. That
filtering with the Fejer kernel gave something (SN (t)) with better convergence
properties can be interpreted as follows: We should be careful when we include
the contribution from the higher frequencies, as this may affect the convergence.
CHAPTER 1. SOUND AND FOURIER SERIES 44
2
Hint. Use that π |u| ≤ | sin u| when u ∈ [−π/2, π/2].
b) Show that FN (t) satisfies the three properties of a summability kernel.
RT PN
c) Show that T1 0 f (t − u)FN (u)du = N1+1 n=0 fn .
1
PN
Hint. Show that FN (t) = N +1 n=0 Dn (t), and use Exercis 1.41 b).
d) Write a function which takes N and T as arguments, and plots FN (t) over
[−T /2, T /2].
CHAPTER 1. SOUND AND FOURIER SERIES 45
can be removed, even when they are quite large. If the sounds are below the
masking threshold, it is simply omitted by the encoder, since the model says
that the sound should be inaudible.
Masking effects are just one example of what is called psycho-acoustic effects,
and all such effects can be taken into account in a psycho-acoustic model. Another
obvious such effect regards computing the scale factors: the human auditory
system can only perceive frequencies in the range 20 Hz - 20 000 Hz. An obvious
way to do compression is therefore to remove frequencies outside this range,
although there are indications that these frequencies may influence the listening
experience inaudibly. The computed scaling factors tell the encoder about the
precision to be used for each frequency band: If the model decides that one band
is very important for our perception of the sound, it assigns a big scale factor to
it, so that more effort is put into encoding it by the encoder (i.e. it uses more
bits to encode this band).
Using appropriate scale factors and masking thresholds provide compression,
since bits used to encode the sound are spent on parts important for our percep-
tion. Developing a useful psycho-acoustic model requires detailed knowledge of
human perception of sound. Different MP3 encoders use different such models,
so they may produce very different results, worse or better.
The information remaining after frequency analysis and using a psycho-
acoustic model is coded efficiently with (a variant of) Huffman coding. MP3
supports bit rates from 32 to 320 kb/s and the sampling rates 32, 44.1, and 48
kHz. The format also supports variable bit rates (the bit rate varies in different
parts of the file). An MP3 encoder also stores metadata about the sound, such
as the title of the audio piece, album and artist name and other relevant data.
MP3 too has evolved in the same way as MPEG, from MP1 to MP2, and to
MP3, each one more sophisticated than the other, providing better compression.
MP3 is not the latest development of audio coding in the MPEG family: AAC
(Advanced Audio Coding) is presented as the successor of MP3 by its principal
developer, Fraunhofer Society, and can achieve better quality than MP3 at the
same bit rate, particularly for bit rates below 192 kb/s. AAC became well
known in April 2003 when Apple introduced this format (at 128 kb/s) as the
standard format for their iTunes Music Store and iPod music players. AAC is
also supported by many other music players, including the most popular mobile
phones.
The technologies behind AAC and MP3 are very similar. AAC supports
more sample rates (from 8 kHz to 96 kHz) and up to 48 channels. AAC uses the
same transformation as MP3, but AAC processes 1 024 samples at a time. AAC
also uses much more sophisticated processing of frequencies above 16 kHz and
has a number of other enhancements over MP3. AAC, as MP3, uses Huffman
coding for efficient coding of the transformed values. Tests seem quite conclusive
that AAC is better than MP3 for low bit rates (typically below 192 kb/s), but
for higher rates it is not so easy to differentiate between the two formats. As
for MP3 (and the other formats mentioned here), the quality of an AAC file
depends crucially on the quality of the encoding program.
CHAPTER 1. SOUND AND FOURIER SERIES 47
1.8 Summary
We defined digital sound, and demonstrated how we could perform simple
operations on digital sound such as adding noise, playing at different rates e.t.c..
Digital sound could be obtained by sampling continuous sounds.
We discussed the basic question of what is sound is, and concluded that
sound could be modeled as a sum of frequency components. If the function
was periodic we could define its Fourier series, which can be thought of as an
approximation scheme for periodic functions using finite-dimensional spaces of
trigonometric functions. We established the basic properties of Fourier series,
and some duality relationships between the function and its Fourier series. We
have also computed the Fourier series of the square wave and the triangle wave,
and we saw that we could speed up the convergence of the Fourier series by
instead considering the symmetric extension of the function.
We also discussed the MP3 standard for compression of sound, and its relation
to a psychoacoustic model which describes how the human auditory system
perceives sound. There exist a wide variety of documents on this standard. In
[33], an overview is given, which, although written in a signal processing friendly
language and representing most relevant theory such as for the psychoacoustic
model, does not dig into all the details.
we also defined analog filters, which were operations which operate on con-
tinuous sound, without any assumption on periodicity. In signal processing
literature one defines the Continuous-time Fourier transform, or CTFT. We will
not use this concept in this book. We have instead disguised this concept as the
frequency response of an analog filter. To be more precise: in the literature, the
CTFT of g is nothing but the frequency response of an analog filter with g as
convolution kernel.
49
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS50
illustrate this, we have in Figure 2.1 shown a vector x and its periodic extension
x.
2.0 2.0
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0 10 20 30 40 0.0 0 10 20 30 40
Figure 2.1: A vector and its periodic extension.
At the outset our vectors will have real components, but since we use complex
exponentials we must be able to work with complex vectors also. We therefore
first need to define the standard inner product and norm for complex vectors.
Definition 2.1. Euclidean inner product.
For complex vectors of length N the Euclidean inner product is given by
N
X −1
hx, yi = xk yk . (2.1)
k=0
In the previous chapter we saw that, using a Fourier series, a function with
period T could be approximated by linear combinations of the functions (the
pure tones) {e2πint/T }N
n=0 . This can be generalized to vectors (digital sounds),
but then the pure tones must of course also be vectors.
Definition 2.2. Discrete Fourier analysis.
In Discrete Fourier analysis, a vector x = (x0 , . . . , xN −1 ) is represented as a
linear combination of the N vectors
1 2πin/N 2πi2n/N
φn = √ 1, e ,e , . . . , e2πikn/N , . . . , e2πin(N −1)/N .
N
These vectors are called the normalised complex exponentials, or the pure
digital tones of order N . n is also called frequency index. The whole collection
N −1
FN = {φn }n=0 is called the N -point Fourier basis.
Note that pure digital tones can be considered as √ samples of a pure tone,
taken uniformly over one period: If f (t) = e2πint/T / N is the pure tone with
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS51
√ √
frequency n/T , then f (kT /N ) = e2πin(kT /N )/T / N = e2πink/N / N = φn .
When mapping a pure tone to a digital pure tone, the index n corresponds to
frequency ν = n/T , and N the number of samples takes over one period. Since
T fs = N , where fs is the sampling frequency, we have the following connection
between frequency and frequency index:
nfs νN
ν= and n = (2.3)
N fs
The following lemma shows that the vectors in the Fourier basis are orthonor-
mal, so they do indeed form a basis.
Lemma 2.3. Complex exponentials are an orthonormal basis.
−1
The normalized complex exponentials {φn }N
n=0 of order N form an orthonor-
N
mal basis in R .
Proof. Let n1 and n2 be two distinct integers in the range [0, N − 1]. The inner
product of φn1 and φn2 is then given by
In particular, this orthogonality means that the the complex exponentials form
a basis. Clearly also hφn , φn i = 1, so that the N -point Fourier basis is in fact
an orthonormal basis.
Note that the normalizing factor √1N was not present for pure tones in the
previous chapter. Also, the normalizing factor T1 from the last chapter is not part
of the definition of the inner product in this chapter. These are small differences
which have to do with slightly different notation for functions and vectors, and
which will not cause confusion in what follows.
The focus in Discrete Fourier analysis is to change coordinates from the
standard basis to the Fourier basis, performing some operations on this “Fourier
representation”, and then change coordinates back to the standard basis. Such
operations are of crucial importance, and in this section we study some of their
basic properties. We start with the following definition.
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS52
φ1 · · · φN −1 y = FN−1 y, (2.4)
x = y0 φ0 + y1 φ1 + · · · + yN −1 φN −1 = φ0
where we have used the inverse of the defining relation y = FN x, and that the
φn are the columns in FN−1 (this follows from the fact that FN−1 is the change of
coordinates matrix from the Fourier basis to the standard basis, and the Fourier
basis vectors are clearly the columns in this matrix). Equation (2.4) is also called
the synthesis equation.
Let us find an expression for the matrix FN . From Lemma 2.3 we know that
the columns of FN−1 are orthonormal. If the matrix was real, it would have been
called orthogonal, and the inverse matrix could have been obtained by transposing.
FN−1 is complex, however, and it is easy to see that the conjugation present in
the definition of the inner product (2.1), implies that the inverse of FN can be
obtained if we also conjugate, in addition to transpose, i.e. (FN )−1 = (FN )T .
We call (A)T the conjugate transpose of A, and denote this by AH . We thus
have that (FN )−1 = (FN )H . Matrices which satisfy A = AH are called unitary.
For complex matrices, this is the parallel to orthogonal matrices.
Theorem 2.5. Fourier matrix is unitary.
The Fourier matrix FN is the unitary N × N -matrix with entries given by
1
(FN )nk = √ e−2πink/N ,
N
for 0 ≤ n, k ≤ N − 1.
Since the Fourier matrix is easily inverted, the DFT is also easily inverted.
Note that, since (FN )T = FN , we have that (FN )−1 = FN . Let us make the
following definition.
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS53
N −1 N −1
X 1 X
yn = xk e−2πink/N xk = yn e2πink/N (2.5)
N n=0
k=0
In applied fields such as signal processing, it is more common to state the DFT
and IDFT in these component forms, rather than in the matrix forms y = DFTy
and x = IDFTy.
Let us now see how these formulas work out in practice by considering some
examples.
x−L = . . . = x−1 = x0 = x1 = . . . = xL = 1,
while all other values are 0. This is similar to a square wave, with some
modifications: First of all we assume symmetry around 0, while the square wave
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS54
1 i −1 −i
We now can compute the DFT of a vector like (1, 2, 3, 4)T simply as
1 1+2+3+4 10
2 1 − 2i − 3 + 4i −2 + 2i
3 = 1 − 2 + 3 − 4 = −2 .
DFT4
4 1 + 2i − 3 − 4i −2 − 2i
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS55
In general, computing the DFT implies using floating point multiplication. For
N = 4, however, we see that there is no need for floating point multiplication at
all, since DFT4 has unit entries which are either real or purely imaginary.
def DFTImpl(x):
y = zeros_like(x).astype(complex)
N = len(x)
for n in xrange(N):
D = exp(-2*pi*n*1j*arange(float(N))/N)
y[n] = dot(D, x)
return y
x)N −n = (b
1. (b x)n for 0 ≤ n ≤ N − 1.
2. If xk = xN −k for all n (so x is symmetric), then x
b is a real vector.
Proof. The methods used in the proof are very similar to those used in the proof
of Theorem 1.17. From the definition of the DFT we have
N
X −1 N
X −1 N
X −1
x)N −n =
(b e−2πik(N −n)/N xk = e2πikn/N xk = e−2πikn/N xk = (b
x)n
k=0 k=0 k=0
N
X −1 N
X −1 N
X
z )n =
(b zk e−2πikn/N = xN −k e−2πikn/N = xu e−2πi(N −u)n/N
k=0 k=0 u=1
N
X −1 N
X −1
= xu e2πiun/N = xu e−2πiun/N = (b
x)n .
u=0 u=0
N
X −1 N
X −1
z )n =
(b xk−d e−2πikn/N = xk e−2πi(k+d)n/N
k=0 k=0
N
X −1
= e−2πidn/N xk e−2πikn/N = e−2πidn/N (b
x)n .
k=0
N
X −1 N
X −1
(FN z)n = zk e−2πikn/N = e2πi2k/N xk e−2πikn/N
k=0 k=0
N
X −1
= xk e−2πik(n−2)/N = (FN (x))n−2 .
k=0
1 − cN
yn =
1 − ce−2πin/N
for n = 0, . . . , N − 1.
Hint. Split into real and imaginary parts, and use linearity of the DFT.
• The function also takes a second parameter called forward. If this is true
the DFT is applied. If it is false, the IDFT is applied. If this parameter is
not present, then the forward transform should be assumed.
• If the input x is two-dimensional (i.e. a matrix), the DFT/IDFT should be
applied to each column of x. This ensures that, in the case of sound, the
FFT is applied to each channel in the sound when the enrire sound is used
as input, as we are used to when applying different operations to sound.
• xn = 0 for all odd n if and only if yk+N/2 = yk for all 0 ≤ k < N/2.
• xn = 0 for all even n if and only if yk+N/2 = −yk for all 0 ≤ k < N/2.
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS59
1
(FN (x1 ))k = (FN (x))k + (FN (x))N −k
2
1
(FN (x2 ))k = (FN (x))k − (FN (x))N −k
2i
This shows that we can compute two DFT’s on real data from one DFT on
complex data, and 2N extra additions.
M Z T
X 1
f (t) = fM (t) = zn e2πint/T , where zn = f (t)e−2πint/T dt. (2.6)
T 0
n=−M
We here have changed our notation for the Fourier coefficients from yn to zn , in
order not to confuse them with the DFT coefficients. We recall that in order to
represent the frequency n/T fully, we need the corresponding exponentials with
both positive and negative arguments, i.e., both e2πint/T and e−2πint/T .
Fact 2.8. frequency vs. Fourier coefficients.
Suppose f is given by its Fourier series (2.6). Then the total frequency
content for the frequency n/T is given by the two coefficients zn and z−n .
1
(z0 , z1 , . . . , zM , 0, . . . , 0 , z−M , z−M +1 , . . . , z−1 ) = DFTN x. (2.7)
| {z } N
N −(2M +1)
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS60
Inserting the sample points t = kT /N into the Fourier series, we must have that
M
X −1
X M
X
xk = f (kT /N ) = zn e2πink/N = zn e2πink/N + zn e2πink/N
n=−M n=−M n=0
N
X −1 M
X
= zn−N e2πi(n−N )k/N + zn e2πink/N
n=N −M n=0
M
X N
X −1
= zn e2πink/N + zn−N e2πink/N .
n=0 n=N −M
The theorem says that any f ∈ VM,T can be reconstructed from its samples
(since we can write down its Fourier series), as long as N > 2M . That f ∈ VM,T
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS61
1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.00.0 0.2 0.4 0.6 0.8 1.0 1.00.0 0.2 0.4 0.6 0.8 1.0
Figure 2.2: An example on how the samples are picked from an underlying
continuous time function (left), and the samples on their own (right).
is important. From Figure 2.2 it is clear that information is lost in the right plot
when we discard everything but the sample values from the left plot.
Here the function is f (t) = sin(2π8t) ∈ V8,1 , so that we need to choose N
so that N > 2M = 16 samples. Here N = 23 samples were taken, so that
reconstruction from the samples is possible. That the condition N < 2M is also
necessary can easily be observed in Figure 2.3.
1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.00.0 0.2 0.4 0.6 0.8 1.0 1.00.0 0.2 0.4 0.6 0.8 1.0
Figure 2.3: Sampling sin(2πt) with two points (left), and sampling sin(2π4t)
with eight points (right).
f / f˜
O
DFTN
/y
x
Figure 2.4: How we can interpolate f from VM,T with help of the DFT. The
left vertical arrow represents sampling. The right vertical arrow represents
interpolation, i.e. computing Equation (2.9).
The new function f˜ has the same values as f in the sample points. This is
usually not the case for fM , so that f˜ and fM are different approximations to f .
Let us summarize as follows.
Idea 2.11. f˜ as approximation to f .
The function f˜ resulting from sampling, taking the DFT, and interpolation, as
shown in Figure 2.4, also gives an approximation to f . f˜ is a worse approximation
in the mean square sense (since fM is the best such), but it is much more useful
since it avoids evaluation of the Fourier integrals, depends only on the samples,
and is easily computed.
The condition N > 2M in Proposition 2.9 can also be written as N/T >
2M/T . The left side is now the sampling rate fs , while the right side is the
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS63
double of the highest frequency in f . The result can therefore also be restated
as follows
Proposition 2.12. Reconstruction from samples.
Any f ∈ VM,T can be reconstructed uniquely from a uniform set of samples
−1
{f (kT /N )}N
k=0 , as long as fs > 2|ν|, where ν denotes the highest frequency in
f.
We also refer to fs = 2|ν| as the critical sampling rate, since it is the
minimum sampling rate we need in order to reconstruct f from its samples. If
fs is substantially larger than 2|ν| we say that f is oversampled, since we have
takes more samples than we really need. Similarly we say that f is undersampled
if fs is smaller than 2|ν|, since we have not taken enough samples in order to
reconstruct f . Clearly Proposition 2.9 gives one formula for the reconstruction.
In the literature another formula can be found, which we now will deduce. This
alternative version of Theorem 2.9 is also called the sampling theorem. We start
by substituting N = T /Ts (i.e. T = N Ts , with Ts being the sampling period) in
the Fourier series for f :
M
X
f (kTs ) = zn e2πink/N −M ≤ k ≤ M.
n=−M
Equation (2.7) said that the Fourier coefficients could be found from the samples
from
1
(z0 , z1 , . . . , zM , 0, . . . , 0 , z−M , z−M +1 , . . . , z−1 ) = DFTN x.
| {z } N
N −(2M +1)
N −1 M
1 X 1 X
zn = f (kTs )e−2πink/N = f (kTs )e−2πink/N , − M ≤ n ≤ M.
N N
k=0 k=−M
M M
1 X X
f (t) = f (kTs )e−2πink/N e2πint/T
N
n=−M k=−M
M M
!
X 1 X
2πin(t/T −k/N )
= f (kTs )e
N
k=−M n=−M
M
X 1 −2πiM (t/T −k/N ) 1 − e2πi(2M +1)(t/T −k/N )
= e f (kTs )
k=−M
N 1 − e2πi(t/T −k/N )
M
X 1 sin(π(t − kTs )/Ts )
= f (kTs )
N sin(π(t − kTs )/T )
k=−M
Theorem 2.13. Sampling theorem and the ideal interpolation formula for peri-
odic functions.
Let f be a periodic function with period T , and assume that f has no
frequencies higher than νHz. Then f can be reconstructed exactly from its
samples f (−M Ts ), . . . , f (M Ts ) (where Ts is the sampling period, N = TTs is the
number of samples per period, and M = 2N + 1) when the sampling rate fs = T1s
is bigger than 2ν. Moreover, the reconstruction can be performed through the
formula
M
X 1 sin(π(t − kTs )/Ts )
f (t) = f (kTs ) . (2.10)
N sin(π(t − kTs )/T )
k=−M
Formula (2.10) is also called the ideal interpolation formula for periodic
functions. Such formulas, where one reconstructs a function based on a weighted
sum of the sample values, are more generally called interpolation formulas. The
function N1 sin(π(t−kT s )/Ts )
sin(π(t−kTs )/T ) is also called an interpolation kernel. Note that f
itself may not be equal to a finite Fourier series, and reconstruction is in general
not possible then. The ideal interpolation formula can in such cases still be used,
but the result we obtain may be different from f (t).
In fact, the following more general result holds, which we will not prove. The
result is also valid for functions which are not periodic, and is frequently stated
in the literature:
Theorem 2.14. Sampling theorem and the ideal interpolation formula, general
version..
Assume that f has no frequencies higher than νHz. Then f can be recon-
structed exactly from its samples . . . , f (−2Ts ), f (−Ts ), f (0), f (Ts ), f (2Ts ), . . .
when the sampling rate is bigger than 2ν. Moreover, the reconstruction can be
performed through the formula
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS65
∞
X sin(π(t − kTs )/Ts )
f (t) = f (kTs ) . (2.11)
π(t − kTs )/Ts
k=−∞
repeats the values in the block in periods, while the old signal consists of one
much bigger block. What are the differences in the frequency representations of
the two signals?
Assume that the entire sound has length M . The frequency representation
of this is computed as an M -point DFT (the signal is actually repeated with
period M ), and we write the sound samples as a sum of frequencies: xk =
1
PM −1 2πikn/M
M n=0 yn e . Let us consider the effect of restricting to a block for each
of the contributing pure tones e2πikn0 /M , 0 ≤ n0 ≤ M − 1. When we restrict
N −1
this to a block of size N , we get the signal e2πikn0 /M k=0 . Depending on n0 ,
this may not be a Fourier basis vector! Its N -point DFT gives us its frequency
representation, and the absolute value of this is
N
X −1 N
X −1
|yn | = e2πikn0 /M e−2πikn/N = e2πik(n0 /M −n/N )
k=0 k=0
2πiN (n0 /M −n/N )
1−e sin(πN (n0 /M − n/N ))
= = . (2.12)
1 − e2πi(n0 /M −n/N ) sin(π(n0 /M − n/N ))
The explanation is that the pure tone is not a pure tone when N = 64 and
N = 256, since at this scale such frequencies are too high to be represented
exactly. The closest pure tone in frequency is n = 0, and we see that this
has the biggest contribution, but other frequencies also contribute. The other
frequencies contribute much more when N = 256, as can be seen from the peak
in the closest frequency n = 0. In conclusion, when we split into blocks, the
frequency representation may change in an undesirable way. This is a common
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS67
0.10 0.4
0.3
0.05 0.2
0.1
0.00 0.0
0.1
0.05 0.2
0.3
0.100 100 200 300 400 500 0.40 100 200 300 400 500
0.4 0.10
0.3
0.2 0.05
0.1
0.0 0.00
0.1
0.2 0.05
0.3
0.40 100 200 300 400 500 0.100 100 200 300 400 500
Figure 2.6: Experimenting with the DFT on a small part of a song.
Note that using a neglection threshold in this way is too simple in practice:
The neglection threshold in general should depend on the frequency, since the
human auditory system is more sensitive to certain frequencies.
In practice this quantization procedure is also too simple, since the human
auditory system is more sensitive to certain frequency information, and should
thus allocate a higher number of bits for such frequencies. Modern audio
standards take this into account, but we will not go into details on this.
x = x[0:2**17]
y = fft.fft(x, axis=0)
y[(2**17/4):(3*2**17/4)] = 0
newx = abs(fft.ifft(y))
newx /= abs(newx).max()
play(newx, fs)
Comment in particular why we adjust the sound samples by dividing with the
maximum value of the sound samples. What changes in the sound do you expect
to hear?
• at the top you define the function f (x) = cos6 (x), and M = 3,
• compute the unique interpolant from VM,T (i.e. by taking N = 2M + 1
samples over one period), as guaranteed by Proposition 2.9,
• plot the interpolant against f over one period.
Finally run the code also for M = 4, M = 5, and M = 6. Explain why the plots
coincide for M = 6, but not for M < 6. Does increasing M above M = 6 have
any effect on the plots?
factored in a way that leads to much more efficient algorithms, and this is the
topic of the present section. We will discuss the most widely used implementation
of the DFT, usually referred to as the Fast Fourier Transform (FFT). The FFT
has been stated as one of the ten most important inventions of the 20’th century,
and its invention made the DFT computationally feasible in many fields. The
FFT is for instance used much in real time processing, such as processing and
compression of sound, images, and video. The MP3 standard uses the FFT
to find frequency components in sound, and matches this information with a
psychoachoustic model, in order to find the best way to compress the data.
FFT-based functionality is collected in a module called fft.
Let us start with the most basic FFT algorithm, which applies for a general
complex input vector x, with length N being an even number.
where x(e) , x(o) ∈ RN/2 consist of the even- and odd-indexed entries of x,
respectively, i.e.
N −1 N/2−1 N/2−1
X X X
−2πink/N −2πin2k/N
yn = xk e = x2k e + x2k+1 e−2πin(2k+1)/N
k=0 k=0 k=0
N/2−1 N/2−1
X X
= x2k e−2πink/(N/2) + e−2πin/N x2k+1 e−2πink/(N/2)
k=0 k=0
(e) −2πin/N
= DFTN/2 x +e DFTN/2 x(o) ,
n n
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS71
where we have substituted x(e) and x(o) as in the text of the theorem, and
recognized the N/2-point DFT in two places. Assembling this for 0 ≤ n <
N/2 we obtain Equation (2.13). For the second half of the DFT coefficients,
i.e. {yN/2+n }0≤n≤N/2−1 , we similarly have
N
X −1 N
X −1
yN/2+n = xk e−2πi(N/2+n)k/N = xk e−πik e−2πink/N
k=0 k=0
N/2−1 N/2−1
X X
= x2k e−2πin2k/N − x2k+1 e−2πin(2k+1)/N
k=0 k=0
N/2−1 N/2−1
X X
= x2k e−2πink/(N/2) − e−2πin/N x2k+1 e−2πink/(N/2)
k=0 k=0
= DFTN/2 x(e) − e−2πin/N DFTN/2 x(o) .
n n
x(e)
(y0 , y1 , . . . , yN/2−1 ) = DFTN/2 DN/2 DFTN/2
x(o)
x(e)
(yN/2 , yN/2+1 , . . . , yN −1 ) = DFTN/2 −DN/2 DFTN/2 .
x(o)
where the vectors x(e) and x(o) have been further split into even- and odd-indexed
entries. Clearly, if this factorization is repeated, we obtain a factorization
I DN/2k 0 0 ··· 0 0
I −DN/2k 0 0 ··· 0 0
log2 N 0
0 I DN/2k ··· 0 0
0 0 I −DN/2k ··· 0 0
Y
DFTN = P. (2.18)
.. .. .. .. ..
k=1 . . . . . 0 0
0 0 0 0 ··· I DN/2k
0 0 0 0 ··· I −DN/2k
The factorization has been repated until we have a final diagonal matrix with
DFT1 on the diagonal, but clearly DFT1 = 1, so we do not need any DFT-
matrices in the final factor. Note that all matrices in this factorization are
sparse. A factorization into a product of sparse matrices is the key to many
efficient algorithms in linear algebra, such as the computation of eigenvalues and
eigenvectors. When we later compute the number of arithmetic operations in
this factorization, we will see that this is the case also here.
In Equation (2.18), P is a permutation matrix which secures that the even-
indexed entries come first. Since the even-indexed entries have 0 as the last
bit, this is the same as letting the last bit become the first bit. Since we here
recursively place even-indexed entries first, it is not too difficult to see that P
permutes the elements of x by performing a bit-reversal of the indices, i.e.
P (ei ) = ej i = d1 d2 . . . dn j = dn dn−1 . . . d1 ,
which is responsible for the bit-reversal of the input vector x. Then the matrices
in the factorization (2.18) is applied in a “kernel FFT function” (and we will
have many such kernels), which assumes that the input has been bit-reversed. A
simple implementation of the general function can be as follows.
def FFTImpl(x, FFTKernel):
bitreverse(x)
FFTKernel(x)
A simple implementation of the kernel FFT function, based on the first FFT
algorithm we stated, can be as follows.
def FFTKernelStandard(x):
N = len(x)
if N > 1:
xe, xo = x[0:(N/2)], x[(N/2):]
FFTKernelStandard(xe)
FFTKernelStandard(xo)
D = exp(-2*pi*1j*arange(float(N/2))/N)
xo *= D
x[:] = concatenate([xe + xo, xe - xo])
FFTImpl(x, FFTKernelStandard)
RN
lim = 1.
N →∞ f (N )
Note that e−2πi/N may be computed once and for all and outside the algorithm,
and this is the reason why we have not counted these operations.
The following example shows how the difference equations (2.19) can be solved.
It is not too difficult to argue that MN = O(2N log2 N ) and AN = O(3N log2 ),
by noting that there are log2 N levels in the FFT, with 2N real multiplications
and real 3N additions at each level. But for N = 2 and N = 4 we may actually
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS76
The homogeneous equation xr+1 − 2xr = 0 has the general solution xhr = C2r .
Since the base in the power on the right hand side equals the root in the
homogeneous equation, we should in each case guess for a particular solution on
the form (xp )r = Ar2r . If we do this we find that the first equation has particular
solution (xp )r = 2r2r , while the second has particular solution (xp )r = 3r2r .
The general solutions are thus on the form xr = 2r2r + C2r , for multiplications,
and xr = 3r2r + C2r for additions.
Now let us state initial conditions for the number of additions and multipli-
cations. Example 2.3 showed that floating point multiplication can be avoided
completely for N = 4. We can therefore use M4 = x2 = 0 as an initial value.
This gives, xr = 2r2r − 4 · 2r , so that MN = 2N log2 N − 4N .
For additions we can use A2 = x1 = 4 as initial value (since DFT2 (x1 , x2 ) =
(x1 + x2 , x1 − x2 )), which gives xr = 3r2r , so that AN = 3N log2 N − N . Our
FFT algorithm thus requires slightly more additions than multiplications. FFT
algorithms are often characterized by their operation count, i.e. the total number
of real additions and real multiplications, i.e. RN = MN + AN . We see that
RN = 5N log2 N − 5N . The order of the operation count of our algorithm can
log2 N −4N
thus be written as O(5N log2 N ), since limN →∞ 5N5N log2 N = 1.
In practice one can reduce the number of multiplications further, since
e−2πin/N take the simple values 1, −1, −i, √ i for √
some n. √ One can also use that
e−2πin/N can take the simple values ±1/ 2 ± 1/ 2i = 1/ 2(±1 ± i), which also √
saves some floating point multiplication, due to that we can factor out 1/ 2.
These observations do not give big reductions in the arithmetic complexity,
however, and one can show that the operation count is still O(5N log2 N ) after
using these observations.
It is straightforward to show that the IFFT implementation requires the
same operation count as the FFT algorithm.
In contrast, the direct implementation of the DFT requires N 2 complex
multiplications and N (N − 1) complex additions. This results in 4N 2 real
multiplications and 2N 2 + 2N (N − 1) = 4N 2 − 2N real additions. The total
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS77
operation count is thus 8N 2 − 2N . In other words, the FFT and IFFT signifi-
cantly reduce the number of arithmetic operations. In Exercise 2.29 we present
another algorithm, called the Split-radix algorithm, which reduces the number of
operations even further. We will see, however, the reduction obtained with the
split-radix algorithm is about 20%. Let us summarize our findings as follows.
Often we apply the DFT for real data, so we would like to have FFT-
algorithms tailored to this, with reduced complexity (since real data has half
the dimension of general complex data). By some it has been argued that one
can find improved FFT algorithms when one assumes that the data is real. In
Exercise 2.27 we address this issue, and conclude that there is little to gain from
assuming real input: The general algorithm for complex input can be tailored
for real input so that it uses half the number of operations, which harmonizes
with the fact that real data has half the dimension of complex data.
Another reason why the FFT is efficient is that, since the FFT splits the
calculation of the DFT into computing two DFT’s of half the size, the FFT
is well suited for parallel computing: the two smaller FFT’s can be performed
independently of one another, for instance in two different computing cores
on the same computer. Besides reducing the number of arithmetic operations,
FFT implementation can also apply several programming tricks to speed up
computation, see for instance http://cnx.org/content/m12021/latest/ for an
overview.
The previous vectors x(e) and x(o) can be seen as special cases of polyphase
components. Polyphase components will also be useful later. Using the polyphase
notation, we can write
N
X −1 1 −1 N
NX X2 −1
1 −1
NX 2 −1
NX
−2πi(N2 q+n)p/N
e (x(p) )k e−2πi(N2 q+n)k/N2
p=0 k=0
1 −1
NX 2 −1
NX
= e−2πiqp/N1 e−2πinp/N (x(p) )k e−2πink/N2 .
p=0 k=0
This means that the sum can be written as component (n, q) in the matrix
Y FN1 . Clearly Y FN1 is the matrix where the DFT is applied to all rows of Y .
We have thus shown that component N2 q + n of FN x equals (Y FN1 )n,q . This
means that FN x can be obtained by stacking the columns of Y FN1 on top of
one-another. We can thus summarize our procedure as follows, which gives a
recipe for splitting an FFT into smaller FFT’s when N is not a prime number.
Theorem 2.22. FFT algorithm when N is composite.
When N = N1 N2 , the FFT of a vector x can be computed as follows
From the algorithm one easily deduces how the IDFT can be computed also:
All steps are invertible, and can be performed by IFFT or multiplication. We
thus only need to perform the inverse steps in reverse order.
We see here that, if we already have computed DFTN/2 x(e) and DFTN/2 x(o) ,
we need one additional complex multiplication for each yn with 0 ≤ n < N/4
(since e−2πin/N and (DFTN/2 x(o) )n are complex). No further multiplications
are needed in order to compute yN/2−n , since we simply conjugate terms before
adding them. Again yN/2 must be handled explicitly with this approach. For
this we can use the formula
I DN/2k
.
I −DN/2k
It may be a good idea to start by implementing multiplication with such a simple
matrix first as these are the building blocks in the algorithm (also attempt to do
this so that everything is computed in-place). Also compare the execution times
with our original FFT algorithm, as we did in Exercise 2.24, and try to explain
what you see in this comparison.
DFTN/4 DN/4 DFTN/4 (e)
DFTN/2 DN/2
DFT N/4 −D N/4 DFT N/4
x
DFTN x = x(oe) .
DFTN/4 DN/4 DFTN/4
DFTN/2 −DN/2 x(oo)
DFTN/4 −DN/4 DFTN/4
(2.20)
The term radix describes how an FFT is split into FFT’s of smaller sizes, i.e. how
the sum in an FFT is split into smaller sums. The FFT algorithm we started
this section with is called a radix 2 algorithm, since it splits an FFT of length
N into FFT’s of length N/2. If an algorithm instead splits into FFT’s of length
N/4, it is called a radix 4 FFT algorithm. The algorithm we go through here is
called the split radix algorithm, since it uses FFT’s of both length N/2 and N/4.
a) Let GN/4 be the
(N/4)×(N/4) diagonal
matrix with e−2πin/N on the diagonal.
GN/4 0
Show that DN/2 = .
0 −iGN/4
b) Let HN/4 be the (N/4) × (N/4) diagonal matrix GD/4 DN/4 . Verify the
following rewriting of Equation (2.20):
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS83
GN/4 DFTN/4 HN/4 DFTN/4 (e)
DFT N/2 x
−iGN/4 DFTN/4 iHN/4 DFTN/4 x(oe)
DFTN x =
−GN/4 DFTN/4 −HN/4 DFTN/4
DFTN/2 x(oo)
iGN/4 DFTN/4 −iHN/4 DFTN/4
I 0 GN/4 HN/4 (e)
0 I −iGN/4 iHN/4 DFTN/2 0 0 x
(oe)
=I 0 −GN/4 −HN/4
0 DFT N/4 0 x
0 0 DFTN/4 (oo)
x
0 I iGN/4 −iHN/4
GN/4 HN/4
DFTN/2 x(e)
I
−iGN/4 iHN/4 DFTN/4 x(oe)
= GN/4 HN/4
I − DFTN/4 x(oo)
−iGN/4 iHN/4
GN/4 DFTN/4 x(oe) + HN/4 DFTN/4 x(oo)
(e)
DFT N/2 x + (oe)
−i GN/4 DFTN/4 x − HN/4 DFTN/4 x(oo)
= (oe)
(e) GN/4 DFTN/4 x + HN/4 DFTN/4 x(oo)
DFTN/2 x −
−i GN/4 DFTN/4 x(oe) − HN/4 DFTN/4 x(oo)
c) Explain from the above expression why, once the three FFT’s above have
been computed, the rest can be computed with N/2 complex multiplications,
and 2 × N/4 + N = 3N/2 complex additions. This is equivalent to 2N real
multiplications and N + 3N = 4N real additions.
Hint. It is important that GN/4 DFTN/4 x(oe) and HN/4 DFTN/4 x(oo) are com-
puted first, and the sum and difference of these two afterwards.
d) Due to what we just showed, our new algorithm leads to real multiplication
and addition counts which satisfy
j = 0
for i in range(N-1):
print j
m = N/2
while (m >= 1 and j >= m):
j -= m
m /= 2
j += m
Explain that the code prints all numbers in [0, N − 1] in bit-reversed order (i.e. j).
Verify this by running the program, and writing down the bits for all numbers
for, say N = 16. In particular explain the decrements and increments made to
the variable j. The code above thus produces pairs of numbers (i, j), where j is
the bit-reverse of i. As can be seen, bitreverse applies similar code, and then
swaps the values xi and xj in x, as it should.
Since bit-reverse is its own inverse (i.e. P 2 = I), it can be performed by
swapping elements i and j. One way to secure that bit-reverse is done only once,
is to perform it only when j > i. You see that bitreverse includes this check.
b) Explain that N − j − 1 is the bit-reverse of N − i − 1. Due to this, when
i, j < N/2, we have that N − i − 1, N − j − l ≥ N/2, and that bitreversal can
swap them. Moreover, all swaps where i, j ≥ N/2 can be performed immediately
when pairs where i, j < N/2 are encountered. Explain also that j < N/2
if and only if i is even. In the code you can see that the swaps (i, j) and
(N − i − 1, N − j − 1) are performed together when i is even, due to this.
c) Assume that i < N/2 is odd. Explain that j ≥ N/2, so that j > i. This says
that when i < N/2 is odd, we can always swap i and j (this is the last swap
performed in the code). All swaps where 0 ≤ j < N/2 and N/2 ≤ j < N can be
performed in this way.
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS85
In bitreversal, you can see that the bit-reversal of 2r and 2r +1 are handled
together (i.e. i is increased with 2 in the for-loop). The effect of this is that the
number of if-tests can be reduced, due to the observations from b) and c).
2.4 Summary
We considered the analog of Fourier series for digital sound, which is called
the Discrete Fourier Transform, and looked at its properties and its relation to
Fourier series. We also saw that the sampling theorem guaranteed that there is
no loss in considering the samples of a function, as long as the sampling rate is
high enough compared to the highest frequency in the sound.
We obtained an implementation of the DFT, called the FFT, which is
more efficient in terms of the number of arithmetic operations than a direct
implementation of the DFT. The FFT has been cited as one of the ten most
important algorithms of the 20’th century [5]. The original paper [8] by Cooley
and Tukey dates back to 1965, and handles the case when N is composite. In the
literature, one has been interested in the FFT algorithms where the number of
(real) additions and multiplications (combined) is as low as possible. This number
is also called the flop count. The presentation in this book thus differs from
the literature in that we mostly count only the number of multiplications. The
split-radix algorithm [51, 14], which we reviewed in Exercise 2.3. 2.29, held the
record for the lowest flop count until quite recently. In [22], Frigo and Johnson
showed that the operation count can be reduced to O(34N log2 (N )/9), which
clearly is less than the O(4N log2 N ) we obatined for the split-radix algorithm.
It may seem strange that the total number of additions and multiplications
are considered: Aren’t multiplications more time-consuming than additions?
When you consider how this is done mechanically, this is certainly the case:
In fact, floating point multiplication can be considered as a combination of
many floating point additions. Due to this, one can find many places in the
literature where expressions are rewritten so that the multiplication count is
reduced, at the cost of a higher addition count. Winograd’s algorithm [50] is
an example of this, where the number of additions is much higher than the
number of multiplications. However, most modern CPU’s have more complex
hardware dedicated to computing multiplications, which can result in that one
floating point multiplication can be performed in one cycle, just as one addition
can. Another thing is that modern CPU’s typically can perform many additions
and multiplications in parallel, and the higher complexity in the multiplication
hardware may result in that the CPU can run less multiplications in parallel,
compared to additions. In other words, if we run test program on a computer, it
may be difficult to detect any differences in performance between addition and
multiplication, even though complex big-scale computing should in theory show
some differences. There are also other important aspects of the FFT, besides
the flop count. Another is memory use. It is possible to implement the FFT so
that the output is computed into the same memory as the input, so that the
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS86
FFT algorithm does not require extra memory besides the input buffer. Clearly,
one should bit-reverse the input buffer in order to achieve this.
We have now defined two types of transforms to the frequency domain: Fourier
series for continuous, periodic functions, and the DFT, for periodic vectors. In
the literature there are two other transforms also: The Continuous time Fourier
transform (CTFT) we have already mentioned at the end of Chapter 1. We also
have the Discrete time Fourier transform (DTFT)) for vectors which are not
periodic [37]. In this book we will deliberately avoid the DTFT as well, since it
assumes that the signal to transform is of infinite duration, while we in practice
analyze signals with a limited time scope.
The sampling theorem is also one of the most important results of the last
century. It was discovered by Harry Nyquist and Claude Shannon [42], but also
by others independently. One can show that the sampling theorem holds also
for functions which are not periodic, as long as we have the same bound on the
highest frequency. This is more common in the literature. In fact, the proof seen
here where we restrict to periodic functions is not common. The advantage of
the proof seen here is that we remain in a finite dimensional setting, and that
we only need the DFT. More generally, proofs of the sampling theorem in the
literature use the DTFT and the CTFT.
1 1
zn+N = (xn+N −1 + 2xn+N + xn+N +1 ) = (xn−1 + 2xn + xn+1 ) = zn .
4 4
87
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS88
The filter is also clearly a linear transformation and may therefore be represented
by an N × N matrix S that maps the vector x = (x0 , x1 , . . . , xN −1 ) to the vector
z = (z0 , z1 , . . . , zN −1 ), i.e., we have z = Sx. To find S, for 1 ≤ n ≤ N − 2 it is
clear from Equation (3.1) that row n has the value 1/4 in column n − 1, the
value 1/2 in column n, and the value 1/4 in column n + 1. For row 0 we must be
a bit more careful, since the index −1 is outside the legal range of the indices.
This is where the periodicity helps us out so that
1 1 1
z0 = (x−1 + 2x0 + x1 ) = (xN −1 + 2x0 + x1 ) = (2x0 + x1 + xN −1 ).
4 4 4
From this we see that row 0 has the value 1/4 in columns 1 and N − 1, and the
value 1/2 in column 0. In exactly the same way we can show that row N − 1
has the entry 1/4 in columns 0 and N − 2, and the entry 1/2 in column N − 1.
In summary, the matrix of the smoothing filter is given by
2 1 0 0 ··· 0 0 0 1
1 2 1 0 · · · 0 0 0 0
10 1 2 1 · · · 0 0 0 0
S = . . . . . . . . . . (3.2)
4 .. .. .. .. .. .. .. .. ..
0 0 0 0 · · · 0 1 2 1
1 0 0 0 ··· 0 0 1 2
A matrix on this form is called a Toeplitz matrix. The general definition is as
follows and may seem complicated, but is in fact quite straightforward:
Definition 3.1. Toeplitz matrices.
An N × N -matrix S is called a Toeplitz matrix if its elements are constant
along each diagonal. More formally, Sk,l = Sk+s,l+s for all nonnegative integers
k, l, and s such that both k + s and l + s lie in the interval [0, N − 1]. A Toeplitz
matrix is said to be circulant if in addition
for all integers k, l in the interval [0, N − 1], and all s (Here mod denotes the
remainder modulo N ).
Toeplitz matrices are very popular in the literature and have many applica-
tions. A Toeplitz matrix is constant along each diagonal, while the additional
property of being circulant means that each row and column of the matrix
’wraps over’ at the edges. Clearly the matrix given by Equation (3.2) satisfies
Definition 3.1 and is a circulant Toeplitz matrix. A Toeplitz matrix is uniquely
identified by the values on its nonzero diagonals, and a circulant Toeplitz matrix
is uniquely identified by the values on the main diagonal, and on the diagonals
above (or under) it. Toeplitz matrices show up here in the context of filters, but
they will also show up later in the context of wavelets.
Equation (3.1) leads us to the more general expression
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS89
X
zn = tk xn−k . (3.3)
k
If t has infinitely many nonzero entries, the sum is an infinite one, and may
diverge. We will, however, mostly assume that t has a finite number of nonzero
entries. This general expression opens up for defining many types of operations.
The values tk will be called filter coefficients. The range of k is not specified,
but is typically an interval around 0, since zn usually is calculated by combining
xk ’s with indices close to n. Both positive and negative indices are allowed.
As an example, for formula (3.1) k ranges over −1, 0, and 1, and we have that
t−1 = t1 = 1/4, and t0 = 1/2. Since Equation (3.3) needs to be computed for
each n, if only t0 , . . . , tkmax are nonzero, we need to go through the following
for-loop to compute zkmax ,. . . ,zN −1 :
z = zeros_like(x)
for n in range(kmax,N):
for k in range(kmax + 1):
z[n] += t[k]*x[n - k]
It is clearly possible to vectorize the inner loop here, since it takes the form of a
dot product. Another possible way to vectorize is to first change the order of
summation, and then vectorize as follows
z = zeros_like(x)
for k in range(kmax + 1):
z[kmax:N] += t[k]*x[(kmax-k):(N-k)]
In the following we will avoid such implementations, since for-loops can be very
slow. We will see that an efficient built-in function exists for computing this,
and use this instead.
By following the same argument as above, the following is clear:
Proposition 3.2. Filters as matrices.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS90
zn = tk xn−k . (3.5)
k=kmin
For all k different from 0, 1, and N − 1, we have that sk = 0. Clearly this gives
the matrix in Equation (3.2).
1 0 3 2
This is a circulant Toeplitz matrix with N = 4, and we see that s0 = 2, s1 = 3,
s2 = 0, and s3 = 1. The first equation in (3.4) gives that t0 = s0 = 2, and
t1 = s1 = 3. The second equation in (3.4) gives that t−2 = s2 = 0, and
t−1 = s3 = 1. By including only the tk which are nonzero, the operation can be
written as
3.1.1 Convolution
Applying a filter to a vector x is also called taking the convolution of the two
vectors t and x. Convolution is usually defined without the assumption that
the input vector is periodic, and without any assumption on the vector lengths
(i.e. they may be sequences of inifinite length). The case where both vectors t
and x have a finite number of nonzero elements dererves extra attention. Assume
that t0 , . . . , tM −1 and x0 , . . . , xN −1 are the only nonzero elements in t and x
M N
(i.e. we can view them Pas vectors in R and R , respectively). It is clear from
the expression zn = tk xn−k that only z0 , . . . , zM +N −2 can be nonzero. This
motivates the following definition.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS92
S = {t−L , . . . , t0 , . . . , tL }.
If x ∈ RN , then Sx can be computed as follows:
@
@
@@
@@
@@
@@
@
@@@
@
@
@@@ @ @@
@@@
@@@@ @@@
@@
@
@@
@
@@
@
@@@@
@@
@@@
@@@
@@
@@@@ @
@@@
@
@@
@@
@
@
Figure 3.1: Matrix for the operation x → t ∗ x (left), as well as this matrix
with the first and last 2L rows dropped (right).
We need to show that Sx = S̃ x̃. We have that S̃ x̃ equals the matrix shown in
the left part of Figure 3.2 multiplied with (xN −L , . . . , xN −1 , x0 , . . . , xN −1 , x0 , . . . , xL−1 )
(we inserted extra vertical lines in the matrix where circulation occurs), which
equals the matrix shown in the right part of Figure 3.2 multiplied with (x0 , . . . , xN −1 ).
We see that this is Sx, and the proof is complete.
@@@@@ @@@ @@
@@@@@ @@@@ @
@@@@@ @@@@@
@@@@@ @@@@@
@@@@@ @@@@@
@@@@@ @@@@@
@@@@@ @@@@@
@@@@@ @@@@@
@@@@@ @@@@@
@@@@@ @@@@@
@@@@@ @ @@@@
@@@@@ @@ @@@
degrees of the individual polynomials. Of course we can make the same addition
of degrees when we multiply polynomials. Clearly the polynomial associated
with t is the frequency response, when we insert x = e−iω . Also, applying two
filters in succession is equivalent to applying the convolution of the filters, so
that two filtering operations can be combined to one.
Since the number of nonzero filter coefficients is typically much less than N
(the period of the input vector), the matrix S have many entries which are zero.
Multiplication with such matrices requires less additions and multiplications
than for other matrices: If S has k nonzero filter coefficients, S has N k nonzero
entries, so that kN multiplications and (k −1)N additions are needed to compute
Sx. This is much less than the N 2 multiplications and (N − 1)N additions
needed in the general case. Perhaps more important is that we need not form
the entire matrix, we can perform the matrix multiplication directly in a loop.
For large N we risk running into out of memory situations if we had to form the
entire matrix.
Since the Fourier basis vectors are orthogonal vectors, S is clearly orthogonally
diagonalizable. Since also the Fourier basis vectors are the columns in (FN )H ,
we have that
contains as columns an orthonormal set of eigenvectors, and D is diagonal with the eigenvectors
listed on the diagonal (see Section 7.1 in [25]).
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS96
Clearly also S1 + S2 is a filter when S1 and S2 are. The set of all filters is thus
a vector space, which also is closed under multiplication. Such a space is called
an algebra. Since all filters commute, this algebra is also called a commutative
algebra.
• S = (FN )H DFN for a diagonal matrix D, i.e. the Fourier basis is a basis
of eigenvectors for S.
• S is a circulant Toeplitz matrix.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS97
Proof. If S is a filter, then SEd = Ed S for all d since all filters commute, so that
S is time-invariant. This proves 1. → 3..
Assume that S is time-invariant. Note that Ed e0 = ed , and since SEd e0 =
Ed Se0 we have that Sed = Ed s, where s is the first column of S. This also says
that column d of S can be obtained by delaying the first column of S with d
elements. But then d is a circulant Toeplitz matrix. This proves 3. P → 2..
N −1
Finally, any circulant Toeplitz matrix can be written on the form d=0 sd Ed
(by splitting the matrix into a sum of its diagonals). Since all Ed are filters, it is
clear that any circulant Toeplitz matrix is a filter. This proves 2. → 1..
Due to this result, filters are also called LTI filters, LTI standing for Linear,
Time-Invariant. Also, operations defined by (3.3) are digital filters, when re-
stricted to vectors with period N . The following results enables us to compute
the eigenvalues/frequency response easily through the DFT, so that we do not
need to form the characteristic polynomial and find its roots:
Theorem 3.11. Connection between frequency response and the matrix.
Any digital filter is uniquely characterized by the values in the first column
of its matrix. Moreover, if s is the first column in S, the frequency response of
S is given by
λS = DFTN s. (3.9)
Conversely, if we know the frequency response λS , the first column s of S is
given by
s = IDFTN λS . (3.10)
Proof. If we replace S by (FN )H DFN we find that
1 1
√ √ 0 √ 0
DFTN s = N FN s = N FN S . = N FN FNH DFN .
.. ..
0 0
1
1
√ 0
..
= N DFN . = D . = λS ,
..
1
0
√
where we have used that the first column in FN has all entries equal to 1/ N ,
and that the diagonal matrix D has all the eigenvalues of S on its diagonal,
so that the last expression is the vector of eigenvalues λS . This proves (3.9).
Equation (3.10) follows directly by applying the inverse DFT to (3.9).
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS98
The first column s, which thus characterizes the filter, is also called the
impulse response. This name stems from the fact that we can write s = Se0 ,
i.e. the vector s is the output (often called response) to the vector e0 (often
called an impulse). Equation (3.9) states that the frequency response can be
written as
N
X −1
λS,n = sk e−2πink/N , for n = 0, 1, . . . , N − 1, (3.11)
k=0
√
N φ5 = e2πi5·0/N , e2πi5·1/N , . . . , e2πi5·(N −1)/N
√
N φN −5 = e−2πi5·0/N , e−2πi5·1/N , . . . , e−2πi5·(N −1)/N ,
−2πi5k/N
Since e2πi5k/N
√ +e = 2 cos(2π5k/N ), we get by adding the two vectors
1
that x = 2 N (φ5 + φN −5 ). Since the φn are eigenvectors, we have expressed x
as a sum of eigenvectors. The corresponding eigenvalues are given by the vector
frequency response, so let us compute this. If N = 8, computing Sx means to
multiply with the 8 × 8 circulant Toeplitz matrix
6 4 1 0 0 0 1 4
4 6 4 1 0 0 0 1
1 4 6 4 1 0 0 0
1 0 1 4 6 4 1 0 0
6 0 0 1 4 6 4 1 0
0 0 0 1 4 6 4 1
1 0 0 0 1 4 6 4
4 1 0 0 0 1 4 6
We now see that
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS100
1
λS,n = (6 + 4e−2πin/N + e−2πi2n/N + e−2πi(N −2)n/N + 4e−2πi(N −1)n/N )
6
1
= (6 + 4e2πin/N + 4e−2πin/N + e2πi2n/N + e−2πi2n/N )
6
4 1
= 1 + cos(2πn/N ) + cos(4πn/N ).
3 3
The two values of this we need are
4 1
λS,5 = 1 + cos(2π5/N ) + cos(4π5/N )
3 3
4 1
λS,N −5 = 1 + cos(2π(N − 5)/N ) + cos(4π(N − 5)/N )
3 3
4 1
= 1 + cos(2π5/N ) + cos(4π5/N ).
3 3
Since these are equal, x is a sum of eigenvectors with equal eigenvalues. This
means that x itself also is an eigenvector, with the same eigenvalue, so that
4 1
Sx = 1 + cos(2π5/N ) + cos(4π5/N ) x.
3 3
x = (f (0 · T /N ), f (1 · T /N ), . . . , f ((N − 1)T /N ))
z = (s(f )(0 · T /N ), s(f )(1 · T /N ), . . . , s(f )((N − 1)T /N ))
f / s(f˜)
O
S /z FN
/y
x
Figure 3.3: The connections between filters and digital filters, sampling and
interpolation, provided by Theorem 3.12. The left vertical arrow represents
sampling, the right vertical arrow represents interpolation.
are determined from their samples then. If there is a bound on the highest
frequency in f , then f lies in VM,T for large enough M , so that we recover s(f )
as our approximation using N = 2M + 1. Finally, what happens when there is
no bound on the highest frequency? We know that s(fN ) = (s(f ))N . Since fN
is a good approximation to f , the samples x of f are close to the samples of fN .
By continuity of the digital filter, z = Sx will also be close to the samples of
(s(f ))N = s(fN ), so that (also by continuity) interpolating with z gives a good
approximation to (s(f ))N , which is again a good approximation to s(f )). From
this it follows that the digital filter is a better approximation when N is high.
N
X −1 X X
λS,n = sk e−2πink/N = sk e−2πink/N + sk e−2πink/N
k=0 0≤k<N/2 N/2≤k≤N −1
X X
= tk e−2πink/N + tk−N e−2πink/N
0≤k<N/2 N/2≤k≤N −1
X X
−2πink/N
= tk e + tk e−2πin(k+N )/N
0≤k<N/2 −N/2≤k≤−1
X X
−2πink/N
= tk e + tk e−2πink/N
0≤k<N/2 −N/2≤k≤−1
X
−2πink/N
= tk e = λS (2πn/N ).
−N/2≤k<N/2
omega = 2*pi*arange(0,N)/float(N)
s = concatenate([t, zeros(N - len(t))])
plot(omega, abs(fft.fft(s)))
When plotting the frequency response on [0, 2π), angular frequencies near 0
and 2π correspond to low frequencies, angular frequencies near π correspond to
high frequencies
λS may also be viewed as a function defined on the interval [−π, π). Plotting
on [−π, π] is often done in practice, since it makes clearer what corresponds to
lower frequencies, and what corresponds to higher frequencies:
Observation 3.16. Higher and lower frequencies.
When plotting the frequency response on [−π, π), angular frequencies near 0
correspond to low frequencies, angular frequencies near ±π correspond to high
frequencies.
To see this, note first that S has frequency response λS,n = λs (n/T ) =
λs (f ), where f = n/T . We then rewrite λS,n = λS (2πn/N ) = λS (2πf T /N ) =
λS (2πf fs ).
Since the frequency response is essentially a DFT, it inherits several properties
from Theorem 2.7. We will mostly use the continuous frequency response to
express these properties.
Theorem 3.18. Properties of the frequency response.
We have that
D−1 is a diagonal matrix with the values 1/λS,n on the diagonal. Clearly then
S −1 is also a digital filter, and its frequency response is λS −1 ,n = 1/λS,n , which
proves 4. The last property follows in the same was as we showed that filters
commute:
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.00 1 2 3 4 5 6 0.0 3 2 1 0 1 2 3
Figure 3.4: |λS (ω)| of the moving average filter of Formula (3.1), plotted over
[0, 2π] and [−π, π].
Both the continuous frequency response and the vector frequency response
for N = 51 are shown. The right part shows clearly how the high frequencies
are softened by the filter.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS107
1 2iω 1 1
λS1 (ω) = (e + e−2iω ) = e2iω + e−2iω
2 2 2
3 iω 3 iω 3
λS2 (ω) = 1 + (e + e ) = e + 1 + e−iω .
−iω
2 2 2
We now get that
1 2iω 1 −2iω 3 iω 3 −iω
λS1 S2 (ω) = λS1 (ω)λS2 (ω) = e + e e +1+ e
2 2 2 2
3 3iω 1 2iω 3 iω 3 −iω 1 −2iω 3 −3iω
= e + e + e + e + e + e
4 2 4 4 2 4
From this expression we see that the filter coefficients of S are t±1 = 3/4,
t±2 = 1/2, t±3 = 3/4. All other filter coefficients are 0. Using Theorem 3.2, we
get that s1 = 3/4, s2 = 1/2, and s3 = 3/4, while sN −1 = 3/4, sN −2 = 1/2, and
sN −3 = 3/4 (all other sk are 0). This gives us the matrix representation of S.
N −1 N −1
1 X ikω0 −ikω 1 X −ik(ω−ω0 )
yn = e e = e
N N
k=0 k=0
(here yn were the DFT components of the sound after we had restricted to a
block). This expression states that, when we restrict to a block of length N in
the signal by discarding the other samples, a pure tone of angular frequency
ω0 suddenly gets a frequency contribution at angular frequency ω also, and the
contribution is given by this formula. The expression is seen to be the same as
the frequency response of the filter N1 {1, 1, . . . , 1} (where 1 is repeated N times),
evaluated at ω − ω0 . This filter is nothing but a (delayed) moving average filter.
The frequency response of a moving average filter thus governs how the different
frequencies pollute when we limit ourselves to a block of the signal. Since this
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS108
N −1 N −1
1 X 1 X
yn = wk eikω0 e−ikω = wk e−ik(ω−ω0 ) .
N N
k=0 k=0
1
This is the frequency response of N w.In order to limit the pollution from
other frequencies, we thus need to construct a window with a frequency response
with smaller values than that of the rectangular window away from 0. Let us
summarize our findings as follows:
Observation 3.20. Constructing a window.
Assume that we would like to construct a window of length N . It is desirable
that the frequency response of the window has small values away from zero.
We will not go into techniques for how such frequency responses can be
constructed, but only consider one example different from the rectangular window.
We define the Hamming window by
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 3 2 1 0 1 2 3 0.0 3 2 1 0 1 2 3
Figure 3.5: The frequency responses of the rectangular and Hamming windows,
which we considered for restricting to a block of the signal.
2.0 2.0
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0 5 10 15 20 25 30 0.0 0 5 10 15 20 25 30
Figure 3.6: The coefficients of the rectangular and Hamming windows, which
we considered for restricting to a block of the signal.
X X
λS2 (ω) = (−1)k tk e−iωk = (e−iπ )k tk e−iωk
k k
X X
−iπk −iωk
= e tk e = tk e−i(ω+π)k = λS1 (ω + π).
k k
where we have set e−iπ = −1 (note that this is nothing but Property 4. in
Theorem 2.7, with d = N/2). Now, for a low-pass filter S1 , λS1 (ω) has large
values when ω is close to 0 (the low frequencies), and values near 0 when ω is
close to π (the high frequencies). For a high-pass filter S2 , λS2 (ω) has values
near 0 when ω is close to 0 (the low frequencies), and large values when ω is
close to π (the high frequencies). Therefore, the relation λS2 (ω) = λS1 (ω + π)
says that S1 is low-pass when S2 is high-pass, and vice versa.
N,nchannels = shape(x)
z = zeros((N, nchannels))
z[0:d] = x[0:d] # No echo at the beginning of the signal
z[d:N] = x[d:N] + c*x[0:(N-d)]
z /= abs(z).max()
S = {1, 0, . . . , 0, c},
where the damping factor c appears after the delay d. The frequency response of
this is λS (ω) = 1 + ce−idω , which is not real, so that the filter is not symmetric.
In Figure 3.7 we have plotted the magnitude of this frequency response with
c = 0.1 and d = 10.
1.2
1.0
0.8
0.6
0.4
0.2
0.0
0 1 2 3 4 5 6
Figure 3.7: The frequency response of a filter which adds an echo with damping
factor c = 0.1 and delay d = 10.
We see that the response varies between 0.9 and 1.1. The deviation from 1 is
controlled by the damping factor c, and the oscillation is controlled by the delay
d.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS114
sin(πn(2L + 1)/N )
yn = .
sin(πn/N )
Since s = x/(2L + 1) and λS = DFTN s, the frequency response of S is
1 sin(πn(2L + 1)/N )
λS,n = ,
2L + 1 sin(πn/N )
so that
1 sin((2L + 1)ω/2)
λS (ω) = .
2L + 1 sin(ω/2)
We clearly have
1 sin((2L + 1)ω/2)
0≤ ≤ 1,
2L + 1 sin(ω/2)
and this frequency response approaches 1 as ω → 0. The frequency response
thus peaks at 0, and this peak gets narrower and narrower as L increases, i.e. as
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS115
we use more and more samples in the averaging process. This filter thus “keeps”
only the lowest frequencies. When it comes to the highest frequencies it is seen
that the frequency response is small for ω ≈ π. In fact it is straightforward
to see that |λS (π)| = 1/(2L + 1). In Figure 3.8 we have plotted the frequency
response for moving average filters with L = 1, L = 5, and L = 20.
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.2 0.2
0.40 1 2 3 4 5 6 0.40 1 2 3 4 5 6
1.0
0.8
0.6
0.4
0.2
0.0
0.2
0.40 1 2 3 4 5 6
Figure 3.8: The frequency response of moving average filters with L = 1, L = 5,
and L = 20.
Unfortunately, the frequency response is far from a filter which keeps some
frequencies unaltered, while annihilating others: Although the filter distinguishes
between high and low frequencies, it slightly changes the small frequencies.
Moreover, the higher frequencies are not annihilated, even when we increase L
to high values.
we computed the DFT of this vector, and it followed from Theorem 2.7 that
the IDFT of this vector equals its DFT. This means that we can find the filter
coefficients by using Equation (3.10), i.e. we take an IDFT. We then get the
filter coefficients
1 sin(πk(2L + 1)/N )
.
N sin(πk/N )
This means that the filter coefficients lie as N points uniformly spaced on the
curve N1 sin(πt(2L+1)/2)
sin(πt/2) between 0 and 1. This curve has been encountered many
other places in these notes. The filter which keeps only the frequency ωc = 0 has
all filter coefficients being N1 (set L = 1), and when we include all frequencies (set
L = N ) we get the filter where x0 = 1 and all other filter coefficients are 0. When
we are between these two cases, we get a filter where s0 is the biggest coefficient,
while the others decrease towards 0 along the curve we have computed. The
bigger L and N are, the quicker they decrease to zero. All filter coefficients are
usually nonzero for this filter, since this curve is zero only at certain points. This
is unfortunate, since it means that the filter is time-consuming to compute.
The two previous examples show an important duality between vectors which
are 1 on some elements and 0 on others (also called window vectors), and the
vector N1 sin(πk(2L+1)/N
sin(πk/N )
)
(also called a sinc): filters of the one type correspond to
frequency responses of the other type, and vice versa. The examples also show
that, in some cases only the filter coefficients are known, while in other cases
only the frequency response is known. In any case we can deduce the one from
the other, and both cases are important.
Filters are much more efficient when there are few nonzero filter coefficients.
In this respect the second example displays a problem: in order to create filters
with particularly nice properties (such as being an ideal low-pass filter), one may
need to sacrifice computational complexity by increasing the number of nonzero
filter coefficients. The trade-off between computational complexity and desirable
filter properties is a very important issue in filter design theory.
Hopefully this gives us a filter where the frequency response is not that different
from the ideal low-pass filter. Let us set N = 128, L = 32, so that the filter
removes all frequencies ω > π/2. In Figure 3.9 we show the corresponding
frequency responses. N0 has been chosen so that the given percentage of all
coefficients are included.
This shows that we should be careful when we omit filter coefficients: if we
drop too many, the frequency response is far away from that of an ideal bandpass
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS117
1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.2 0.2
0 1 2 3 4 5 6 0 1 2 3 4 5 6
1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.2 0.2
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Figure 3.9: The frequency response which results by including the first 1/32,
the first 1/16, the first 1/4, and and all of the filter coefficients for the ideal
low-pass filter.
filter. In particular, we see that the new frequency response oscillates wildly
near the discontinuity of the ideal low-pass filter. Such oscillations are called
Gibbs oscillations.
1.2
2.0 1.0
1.5 0.8
0.6
1.0
0.4
0.5 0.2
0.0 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.0 0.6 0.4 0.2 0.0 0.2 0.4 0.6
Figure 3.10: Frequency responses of some filters used in the MP3 standard.
The prototype filter is shown left. The other frequency responses at right are
simply shifted copies of this.
of the undesirable effect from the previous example have been eliminated: The
oscillations near the discontinuities are much smaller, and the values are lower
away from 0. Using Property 4 in Theorem 2.7, it is straightforward to construct
filters with similar frequency responses, but centered around different frequencies:
We simply need to multiply the filter coefficients with a complex exponential, in
order to obtain a filter where the frequency response has been shifted to the left
or right. In the MP3 standard, this observation is used to construct 32 filters,
each having a frequency response which is a shifted copy of that of the prototype
filter, so that all filters together cover the entire frequency range. 5 of these
frequency responses are shown in the right plot in Figure 3.10. To understand the
effects of the different filters, let us apply them to our sample sound. If you apply
all filters in the MP3 standard in successive order with the most low-pass filters
first, the result can be found in the file mp3bands.wav. You should interpret the
result as low frequencies first, followed by the high frequencies. π corresponds
to the frequency 22.05KHz (i.e. the highest representable frequency equals half
the sampling rate on 44.1KHz. The different filters are concentrated on 1/32 of
these frequencies each, so that the angular frequencies you here are [π/64, 3π/64],
[3π/64, 5π/64], [5π/64, 7π/64], and so on, in that order.
In Section 3.3.1 we mentioned that the psychoacoustic model of the MP3
standard applied a window the the sound data, followed by an FFT to that
data. This is actually performed in parallel on the same sound data. Applying
two different operations in parallel to the sound data may seem strange. In the
MP3 standard [20] (p. 109) this is explained by “the lack of spectral selectivity
obtained at low frequencies“ by the filters above. In other words, the FFT can
give more precise frequency information than the filters can. This more precise
information is then used to compute psychoacoustic information such as masking
thresholds, and this information is applied to the output of the filters.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS119
1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.00 1 2 3 4 5 6 0.00 1 2 3 4 5 6
Figure 3.11: The frequency response of filters corresponding to iterating the
moving average filter {1/2, 1/2} k = 5 and k = 30 times (i.e. using row k in
Pascal’s triangle).
filters than this also, and the frequency responses we plotted for the filters used
in the MP3 standard gives an indication to this.
Let us now see how to implement the filters S k . Since convolution corresponds
to multiplication of polynomials, we can obtain their filter coefficients with the
following code
t = [1.]
for kval in range(k):
t = convolve(t, [1/2., 1/2.])
Note that S k has k + 1 filter coefficients, and that S k corresponds to the filter
coefficients of a symmetric filter when k is even. Having computed t, we can
simply compute the convolution of the input x and t. In using conv we disregard
the circularity of S, and we introduce a time delay. These issues will, however,
not be audible when we listen to the output. An example of the result of
smoothing is shown in Figure 3.12.
1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.0
0.000 0.002 0.004 0.006 0.008 0.010 1.0
0.000 0.002 0.004 0.006 0.008 0.010
Figure 3.12: Reducing the treble. The original sound signal is shown left, the
result after filtering using row 4 in Pascal’s triangle is shown right.
The left plot shows the samples of the pure sound with frequency 440Hz
(with sampling frequency fs = 4400Hz). The right plot shows the result of
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS121
1.0
0.5
0.0
0.5
1.0
0.000 0.002 0.004 0.006 0.008 0.010
Figure 3.13: The result of applying the bass-reducing filter deduced from row 4
in Pascals triangle to the pure sound in the left plot of Figure 3.12.
We observe that the samples oscillate much more than the samples of the
original sound. In Exercise 3.39 you will be asked to implement reducing the
bass in our sample audio file. The new sound will be difficult to hear for large
k, and we will explain why later. For k = 1 the sound can be found in the file
castanetsbass1.wav, for k = 2 it can be found in the file castanetsbass2.wav.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS122
Even if the sound is quite low, you can hear that more of the bass has disappeared
for k = 2.
1.2
1.0
0.8
0.6
0.4
0.2
0.0
0 1 2 3 4 5 6
Figure 3.14: The frequency response of the bass reducing filter, which corre-
sponds to row 5 of Pascal’s triangle.
• First a time delay filter with delay d1 = 2, due to internal transfer of data
in the system,
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS124
Hint. Use the expressions (Ed1 x)n = xn−d1 , (T x)n = 14 xn+1 + 12 xn + 14 xn−1 ,
(Ed2 x)n = xn−d2 , and compute first (Ed1 x)n , then (T Ed1 x)n , and finally
(T2 x)n = (Ed2 T Ed1 x)n . From the last expression you should be able to read
out the filter coefficients.
c) Assume that N = 8. Write down the 8 × 8-circulant Toeplitz matrix for the
filter T2 .
x, fs = audioread(’sounds/castanets.wav’)
N, nchannels = shape(x)
z = zeros((N, nchannels))
for n in range(1,N-1):
z[n] = 2*x[n+1] + 4*x[n] + 2*x[n-1]
z[0] = 2*x[1] + 4*x[0] + 2*x[N-1]
z[N-1] = 2*x[0] + 4*x[N-1] + 2*x[N-2]
z = z/abs(z).max()
play(z, fs)
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS125
Comment in particular on what happens in the three lines directly after the
for-loop, and why we do this. What kind of changes in the sound do you expect
to hear?
b) Write down the compact filter notation for the filter which is used in the
code, and write down a 5 × 5 circulant Toeplitz matrix which corresponds to
this filter. Plot the (continuous) frequency response. Is the filter a low-pass- or
a high-pass filter?
c) Another filter is given by the circulant Toeplitz matrix
4 −2 0 0 −2
−2 4 −2 0 0
0 −2 4 −2 0
.
0 0 −2 4 −2
−2 0 0 −2 4
Express a connection between the frequency responses of this filter and the filter
from b). Is the new filter a low-pass- or a high-pass filter?
For most filters we have looked at, we had a limited number of nonzero tk , and this
enabled us to compute them on a computer using a finite number of additions and
multiplications. Filters which have a finite number of nonzero filter coefficients
are also called FIR-filters (FIR is short for Finite Impulse Response. Recall
that the impulse response of a filter can be found from the filter coefficients).
However, there exist many useful filters which are not FIR filters, i.e. where
the sum above is infinite. The ideal lowpass filter from Example 3.32 was one
example. It turns out that many such cases can be made computable if we
change our procedure slightly. The old procedure for computing a filter is to
compute z = Sx. Consider the following alternative:
It turns out that there also are highly computable filters where neither the
filter nor its inverse have a finite number of filter coefficients. Consider the
following idea:
Idea 3.24. More general filters (2).
Let x be the input to a filter, and let U and V be filters. By solving the
system U z = V x for z we get another filter, which we denote by S. The filter S
can be implemented in two steps: first we compute the right hand side y = V x,
and then we solve the equation U z = y.
If both U and V are invertible we have that the filter is S = U −1 V , and this
is invertible with inverse S −1 = V −1 U . The point is that, when U and V have
a finite number of filter coefficients, both S and its inverse will typically have
an infinite number of filter coefficients. The filters from this idea are thus more
general than the ones from the previous idea, and the new idea makes a wider
class of filters implementable using row reduction of sparse matrices. Computing
a filter by solving U z = V x may also give meaning when the matrices U and
V are singular: The matrix system can have a solution even if U is singular.
Therefore we should be careful in using the form T = U −1 V .
We have the following result concerning the frequency responses:
Theorem 3.25. Frequency response of IIR filters.
Assume that S is the filter defined from the equation U z = V x. Then we
have that λS (ω) = λλVU (ω)
(ω)
whenever λU (ω) 6= 0.
The following example clarifies the points made above, and how one may
construct U and V from S. The example also shows that, in addition to making
some filters with infinitely many filter coefficients computable, the procedure
U z = V x for computing a filter can also reduce the complexity in some filters
where we already have a finite number of filter coefficients.
1
zn+1 = (xn+1+L + · · · + xn+1 + · · · + xn+1−L )
2L + 1
1 1
= (xn+L + · · · + xn + · · · + xn−L ) + (xn+1+L − xn−L )
2L + 1 2L + 1
1
= zn + (xn+1+L − xn−L ).
2L + 1
This means that we can also compute the output from the formula
1
zn+1 − zn = (xn+1+L − xn−L ),
2L + 1
which can be written on the form U z = V x with U = {1, −1} and V =
1
2L+1 {1, 0, . . . , 0, −1} where the 1 is placed at index −L − 1 and the −1 is placed
at index L. We now perform only 2N additions in computing the right hand
side, and solving the equation system requires only 2(N − 1) additions. The
total number of additions is thus 2N + 2(N − 1) = 4N − 2, which is much less
than the previous 2LN when L is large.
A perhaps easier way to find U and V is to consider the frequency response
of the moving average filter, which is
1 1 1 − e(2L+1)iω
(e−Liω + . . . + eLiω ) = e−Liω
2L + 1 2L + 1 1 − eiω
1
+ e−Liω
(L+1)iω
2L+1 −e
= ,
1 − eiω
where we have used the formula for the sum of a geometric series. From here
we easily see the frequency responses of U and V from the numerator and the
denominator.
Filters with an infinite number of filter coefficients are also called IIR filters
(IIR stands for Infinite Impulse Response). Thus, we have seen that some IIR
filters may still have efficient implementations.
From this it is clear that (Sx)t only depends on xt−(k−k0 −1) , . . . , xt+k0 . This
means that, if we restrict the computation of S to xt−(k−k0 −1) , . . . , xt+M −1+k0 ,
the outputs xt , . . . , xt+M −1 will be the same as without this restriction. This
means that we can compute the output M elements at a time, at each step
multiplying with a circulant Toeplitz matrix of size (M + k − 1) × (M + k − 1). If
we choose M so that M + k − 1 = 2r , we can use the FFT and IFFT algorithms
to compute S = FNH DFN , and we require O(r2r ) multiplications for every block
r2r r2r
of length M . The total number of multiplications is NM = 2rN−k+1 . If k = 128,
you can check on your calculator that the smallest value is for r = 10 with
value 11.4158 × N . Since the direct implementation gives kN multiplications,
this clearly gives a benefit for the new approach, it gives a 90% decrease in the
number of multiplications.
filters, and then applying each filter in turn. Since the frequency response of
the product of filters equals the product of the frequency responses, we get the
following idea:
Idea 3.26. Factorizing a filter.
Let S be a filter with real coefficients. Assume that
3.7 Summary
We defined digital filters, which do the same job for digital sound as analog filters
do for (continuous) sound. Digital filters turned out to be linear transformations
diagonalized by the DFT. We proved several other equivalent characterizations
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS130
• Simple examples of filters, such as time delay filters and filters which add
echo.
• Low-pass and high-pass filters and their frequency responses, and their
interpretation as treble- and bass-reducing filters. Moving average filters,
and filters arising from rows in Pascal’s triangle, as examples of such filters.
132
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 133
The result from the first step lies in an N -dimensional subspace of all vectors in
R2N , which we will call the space of symmetric vectors. To account for the fact
that a periodic vector can have a different symmetry point than N − 1/2, let us
make the following general definition:
Definition 4.2. Symmetric vector.
We say that a periodic vector x is symmetric if there exists a number d so
that xd+k = xd−k for all k so that d + k and d − k are integers. d is called the
symmetry point of x
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 134
Proof. Assume first that d = 0. It follows in this case from property 2a) of
Theorem 2.7 that (b x)n is a real vector. Combining this with property 1 of
Theorem 2.7 we see that x b, just as x, also must be a real vector symmetric about
0. Since the DFT is one-to-one, it follows that x is real and symmetric about
0 if and only if x
b is. From property 3 of Theorem 2.7it follows that, when d is
an integer, x is real and symmetric about d if and only if (b x)n = zn e−2πidn/N ,
where zn is real and symmetric about 0. This completes the proof.
Symmetric extensions were here defined by having the non-integer symmetry
point N − 1/2, however. For these we prove the following, which is slightly more
difficult.
Theorem 4.4. Symmetric vectors with non-integer symmetry points.
Let d be an odd multiple of 1/2. The following are equivalent
N −1
1 X
x)n = √
(b xk e−2πikn/N
N k=0
1 X X
=√ xd+s e−2πi(d+s)n/N + xd−s e−2πi(d−s)n/N
N s≥0 s≥0
1 X
=√ xd+s e−2πi(d+s)n/N + e−2πi(d−s)n/N
N s≥0
1 X
= √ e−2πidn/N xd+s e−2πisn/N + e2πisn/N
N s≥0
1 X
= √ e−2πidn/N 2xd+s cos(2πsn/N ).
N s≥0
√1
P
Here s runs through odd multiples of 1/2. Since zn = N s≥0 2xd+s cos(2πsn/N )
−2πidn/N
is a real number, we can write the result as zn e . Substituting N − n
for n, we get
1 X
x)N −n = √ e−2πid(N −n)/N
(b 2xd+s cos(2πs(N − n)/N )
N s≥0
1 X
= √ e−2πid(N −n)/N 2xd+s cos(−2πsn/N + 2πs)
N s≥0
1 X
= − √ e−2πid(N −n)/N 2xd+s cos(2πsn/N ) = −zn e−2πid(N −n)/N .
N s≥0
This shows that zN −n = −zn , and this completes one way of the proof. The
other way, we can write
N −1
1 X
xk = √ x)n e2πikn/N
(b
N n=0
( N −1 )
1
e0 , √ e−2πi(N −1/2)n/(2N ) en − e−2πi(N −1/2)(2N −n)/(2N ) e2N −n
2 n=1
1
√ e−2πi(N −1/2)n/(2N ) en − e−2πi(N −1/2)(2N −n)/(2N ) e2N −n
2
1
= √ e−πin eπin/(2N ) en + eπin e−πin/(2N ) e2N −n
2
1 πin πin/(2N )
=√ e e en + e−πin/(2N ) e2N −n .
2
This also means that
( )
1 πin/(2N ) N −1
−πin/(2N )
e0 , √ e en + e e2N −n
2 n=1
is an orthonormal basis.
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 137
( N −1 )
1 0 1 1 n 1
√ cos 2π k+ , √ cos 2π k+ (4.2)
2N 2N 2 N 2N 2 n=1
is an orthonormal basis for the set of vectors symmetric around N − 1/2 in R2N .
Moreover, the n’th vector in this basis has frequency contribution only from the
indices n and 2N − n.
Proof. Since the IDFT is unitary, the IDFT applied to the vectors above gives
an orthonormal basis for the set of symmetric extensions. We get that
1 1 1 1 0 1
(F2N )H (e0 ) = √ ,√ ,..., √ =√ cos 2π k+ .
2N 2N 2N 2N 2N 2
1
(F2N )H √ eπin/(2N ) en + e−πin/(2N ) e2N −n
2
1 1 2πink/(2N ) 1 2πi(2N −n)k/(2N )
=√ eπin/(2N ) √ e + e−πin/(2N ) √ e
2 2N 2N
1 1 2πink/(2N ) 1 −2πink/(2N )
=√ eπin/(2N ) √ e + e−πin/(2N ) √ e
2 2N 2N
1 1 n 1
= √ e2πi(n/(2N ))(k+1/2) + e−2πi(n/(2N ))(k+1/2) = √ cos 2π k+ .
2 N N 2N 2
Since F2N is unitary, and thus preserves the scalar product, the given vectors
are orthonormal.
We need to address one final thing before we can define the DCT: The vector
x we start with is in RN , but the vectors above are in R2N . We would like
to have orthonormal vectors in RN , so that we can use them to decompose
x. It is possible to show with a direct argument that, when we restrict the
vectors above to the first N elements, they are still orthogonal. We will, however,
apply a more instructive argument to show this, which gives us some intuition
into the connection with symmetric filters. We start with the following result,
which shows that a filter preserves symmetric vectors if and only if the filter is
symmetric.
Theorem 4.7. Criteria for preserving symmetric vectors.
Let S be a filter. The following are equivalent
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 138
• The vector of filter coefficients has an integer symmetry point if and only
if the input and output have the same type (integer or non-integer) of
symmetry point.
• The input and output have the same symmetry point if and only if the
filter is symmetric.
Proof. Assume that the filter S maps a symmetric vector with symmetry at d1
to another symmetric vector. Let x be the symmetric vector so that (b x)n =
e−2πid1 n/N for n < N/2. Since the output is a symmetric vector, we must have
that
x0
..
.
x0 xN −1
xN −1 . .
Sr x = S1 S2 xN −1 = S1 .. + S2 ..
xN −1 x0
.
..
x0
x0 x0
= S1 ... + (S2 )f ... = (S1 + (S2 )f )x,
xN −1 xN −1
n 1
S cos 2π k+
2N 2
1 2πi(n/(2N ))(k+1/2)
=S e + e−2πi(n/(2N ))(k+1/2)
2
1 πin/(2N ) 2πink/(2N )
= e S e + e−πin/(2N ) S e−2πink/(2N )
2
1 πin/(2N )
= e λS,n e2πink/(2N ) + e−πin/(2N ) λS,2N −n e−2πink/(2N )
2
1
= λS,n e2πi(n/(2N ))(k+1/2) + λS,2N −n e−2πi(n/(2N ))(k+1/2)
2
1 2πi(n/(2N ))(k+1/2)
= λS,n e + e−2πi(n/(2N ))(k+1/2)
2
n 1
= λS,n cos 2π k+ ,
2N 2
To see why these vectors are orthogonal, choose at the outset a symmetric filter
−1
where {λS,n }N
n=0 are distinct. Then the cosine-vectors of length N are also
eigenvectors with distinct eigenvalues, and they must be orthogonal since Sr is
symmetric. Moreover, since
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 141
2N −1
X n 1
cos2 2π k+
2N 2
k=0
N −1 2N −1
X n 1 X n 1
= cos2 2π k+ + cos2 2π k+
2N 2 2N 2
k=0 k=N
N −1 N −1
X
2 n 1 X
2 n 1
= cos 2π k+ + cos 2π k+N +
2N 2 2N 2
k=0 k=0
N −1 N −1
X
2 n 1 2n
X
2 n 1
= cos 2π k+ + (−1) cos 2π k+
2N 2 2N 2
k=0 k=0
N −1
X n 1
=2 cos2 2π k+ ,
2N 2
k=0
where we used that cos(x + nπ) = (−1)n cos x. This means that
2N −1 N −1
√
n 1 n 1
cos 2π k+ = 2 cos 2π k+ .
2N 2 k=0 2N 2 k=0
√ the first N
Thus, in order to make the vectors orthonormal when we consider
elements instead of all 2N elements, we need to multiply with 2. This gives
us the vectors dn as defined in the text of the theorem. This completes the
proof.
We now clearly see the analogy between symmetric functions and vectors:
while the first can be written as a sum of cosine-functions, the second can be
written as a sum of cosine-vectors. The orthogonal basis we have found is given
its own name:
√
1/ 2 0 ··· 0
1 ···
r
2 0 0
n
DCTN = cos 2π 2N (k + 1/2) . (4.3)
. .. .. ..
N .. . . .
0 0 ··· 1
Since this matrix is orthogonal, it is immediate that
1/2 0 · · · 0
−1 2 0 1 · · · 0
n
cos 2π 2N (k + 1/2) = cos 2π n+1/2 k .. (4.4)
2N .. .. ..
N . . . .
0 0 ··· 1
1/2 0 · · · 0
−1 2 0 1 · · · 0
cos 2π n+1/2 n
k = .. cos 2π 2N (k + 1/2) .
. .. ..
2N N .. . . .
0 0 ··· 1
(4.5)
In other words, not only can DCTN be directly expressed in terms of a cosine-
matrix, but our developments helped us to express the inverse of a cosine
matrix in terms of other cosine-matrices. In the literature different types of
cosine-matrices have been useful:
I Cosine-matrices with entries cos(2πnk/(2(N − 1))).
II Cosine-matrices with entries cos(2πn(k + 1/2)/(2N )).
III Cosine-matrices with entries cos(2π(n + 1/2)k/(2N )).
IV Cosine-matrices with entries cos(2π(n + 1/2)(k + 1/2)/(2N )).
We will call these type-I, type-II, type-III, and type-IV cosine-matrices, respec-
tively. What we did above handles the case of type-II cosine-matrices. It will
turn out that not all of these cosine-matrices are orthogonal, but that we in all
cases, as we did above for type-II cosine matrices, can express the inverse of a
cosine-matrix of one type in terms of a cosine-matrix of another type, and that
any cosine-matrix is easily expressed in terms of an orthogonal matrix. These
(I) (II) (III) (IV )
orthogonal matrices will be called DCTN , DCTN , DCTN , and DCTN ,
respectively, and they are all called DCT-matrices. The DCTN we constructed
(II)
abobe is thus DCTN . The type-II DCT matrix is the most commonly used,
and the type is therefore often dropped when refering to these. We will consider
the other cases of cosine-matrices at different places in this book: In the next
chapter we will run into type-I cosine matrices, in connection with a different ex-
tension strategy used for wavelets. Type-IV cosine-matrices will be encountered
in exercises 4.5 and 4.6 at the end of this section.
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 143
As with the Fourier basis vectors, the DCT basis vectors are called synthesis
vectors, since we can write
x = y0 d0 + y1 d1 + · · · + yN −1 dN −1 (4.6)
in the same way as for the DFT. Following the same reasoning as for the DFT,
DCT−1N is the matrix where the dn are columns. But since these vectors are real
and orthonormal, DCTN must be the matrix where the dn are rows. Moreover,
since Theorem 4.9 also states that the same vectors are eigenvectors for filters
which preserve symmetric extensions, we can state the following:
Theorem 4.13. The DCT is orthogonal.
DCTN is the orthogonal matrix where the rows are dn . Moreover, for any
digital filter S which preserves symmetric extensions, (DCTN )T diagonalizes Sr ,
i.e. Sr = DCTTN DDCTN where D is a diagonal matrix.
N −1
X n 1
f (kT /N ) = yn dn,N cos 2π k+ 0 ≤ k ≤ N − 1.
n=0
2N 2
This gives us an equation system for finding the yn with the invertible DCT
matrix as coefficient matrix, and the result follows.
Due to this there is a slight difference to how we applied the DFT, due to the
subtle change in the sample points, from kT /N for the DFT, to (2k + 1)T /(2N )
for the DCT. The sample points for the DCT are thus the midpoints on the
intervals in a uniform partition of [0, T ] into N intervals, while they for the DFT
are the start points on the intervals. Also, the frequencies are divided by 2. In
Figure 4.2 we have plotted the sinusoids of Theorem 4.15 for T = 1, as well as
the sample points used in that theorem.
The sample points in the upper left plot correspond to the first column in the
DCT matrix, the sample points in the upper right plot to the second column of
the DCT matrix, and so on (up to normalization with dn,N ). As n increases, the
functions oscillate more and more. As an example, y5 says how much content of
maximum oscillation there is. In other words, the DCT of an audio signal shows
the proportion of the different frequencies in the signal, and the two formulas
y = DCTN x and x = (DCTN )T y allow us to switch back and forth between
the time domain representation and the frequency domain representation of the
sound. In other words, once we have computed y = DCTN x, we can analyse
the frequency content of x. If we want to reduce the bass we can decrease the
y-values with small indices and if we want to increase the treble we can increase
the y-values with large indices.
!
√1 cos(0) √1 cos(0) √1 √1
DCT4 = 2 2 = 2 2
cos π2 0 + 12 cos π2 1 + 12 √1 − √12
2
The DCT of the same vector as in Example 2.3 can now be computed as:
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 145
1.06 1.0
1.04
0.5
1.02
1.00 0.0
0.98
0.5
0.96
0.940.0 0.2 0.4 0.6 0.8 1.0 1.00.0 0.2 0.4 0.6 0.8 1.0
1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.00.0 0.2 0.4 0.6 0.8 1.0 1.00.0 0.2 0.4 0.6 0.8 1.0
1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.00.0 0.2 0.4 0.6 0.8 1.0 1.00.0 0.2 0.4 0.6 0.8 1.0
Figure 4.2: The 6 different sinusoids used in DCT for N = 6, i.e. cos(2π(n/2)t),
0 ≤ n < 6. The plots also show piecewise linear functions (in red) between the
sample points 2k+1
2N 0 ≤ k < 6, since only the values at these points are used in
Theorem 4.15.
!
√3
1 2
DCT2 = .
2 − √12
2 1 0 0 0 0
1 1 1 0 0 0
1 0 1 1 1 0 0
S=
3 0
0 1 1 1 0
0 0 0 1 1 1
0 0 0 0 1 2
a) Compute the eigenvalues and eigenvectors of S using the results of this
section. You should only need to perform one DFT or one DCT in order to
achieve this.
b) Use a computer to compute the eigenvectors and eigenvalues of S also. What
are the differences from what you found in a)?
c) Find a filter T so that S = Tr . What kind of filter is T ?
a) Show that
r
N (IV) 0 A
M= DCTN
2 B 0
··· ··· 0 −1 −1 0 ···
···
.. .. .. .. .. .. .. ..
f
A= . . . . . . . .
= −IN/2 −IN/2
0 −1 ··· ··· · · · · · · −1 0
−1 0 ··· ··· · · · · · · 0 −1
1 0 ··· ··· · · · · · · 0 −1
0 1 ··· ··· · · · · · · −1 0
f
B= . .. = IN/2 −IN/2 .
.. .. .. .. .. ..
.. . . . . . . .
··· ··· 0 1 −1 0 · · · · · ·
Due to this expression, any algorihtm for the DCT-IV can be used to compute
the MDCT.
b) The MDCT is not invertible, since it is not a square matrix. We will show
here that it still can be used in connection with invertible transformations. We
first define the IMDCT as the matrix M T /N . Transposing the matrix expression
we obtained in a) gives
BT
1 0 (IV)
√ DCTN
2N AT 0
for the IMDCT, which thus also has an efficient implementation. Show that if
and
x0 x1
y0,1 = M y1,2 = M
x1 x2
(i.e. we compute two MDCT’s where half of the data overlap), then
−1 N −1
x1 = {IMDCT(y0,1 )}2N
k=N + {IMDCT(y1,2 )}k=0 .
Even though the MDCT itself is not invertible, the input can still be recovered
from overlapping MDCT’s.
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 148
Since s(f˘) agrees with s(f ) except near the boundaries, we can thus conclude
that s((f˘)N ) is a better approximation to s(f ) than what s(fN ) is.
We have seen that the restriction of s to VM,T is equivalent to an N × N
digital filter S, where N = 2M + 1. Let x be the samples of f , x̆ the samples of
f˘. Turning around the fact that (f˘)N is a better approximation to f˘, compared
to what fN is to f , the following is clear.
Observation 4.17. Using symmetric extensions for approximations.
The samples x̆ are a better approximation to the samples of (f˘)N , than the
samples x are to the samples of fN .
Now, let z = Sx, and z̆ = S x̆. The following is also clear from the preceding
observation, due to continuity of the digital filter S.
Observation 4.18. Using symmetric extensions for approximations.
z̆ is a better approximation to S(samples of (f˘)N ) = samples of s((f˘)N ),
than z is to S(samples of fN ) = samples of s(fN ).
f / s(f˜˘)
O
˘
f
x̆
Sr DCTN
(x̆0 , x̆1 , . . . , x̆N −1 ) / (z̆ 0 , z̆ 1 , . . . , z̆ N −1 ) /y
Figure 4.3: The connections between the new mapping Sr , sampling, and
interpolation. The right
PNvertical arrow represents interpolation with the DCT,
−1
i.e. that we compute n=0 yn dn,N cos(2π(n/2)t/T ) for values of t.
in that theorem, i.e. the samples are the midpoints on all intervals. This new
sampling procedure is not indicated in Figure 4.3.
Figure 4.3 can be further simplified to that shown in Figure 4.4.
f / s(f˜˘)
O
Sr
/z DCTN
/y
x
Figure 4.4: Simplification of Figure 4.3. The left vertical arrow represents
sampling as dictated by the DCT.
f / s(f˜˘)
O
DCTN
/y
x
Figure 4.5: How we can approximate a function from its samples with the DCT.
N
X −1
(Sx)n = sk x(n−k) mod N
k=0
(N −1)/2 N −1
X X
= s0 xn + sk x(n−k) mod N + sk x(n−k) mod N
k=1 k=(N +1)/2
(N −1)/2 (N −1)/2
X X
= s0 xn + sk x(n−k) mod N + sk x(n−(N −k)) mod N
k=1 k=1
(N −1)/2
X
= s0 xn + sk (x(n−k) mod N + x(n+k) mod N ). (4.7)
k=1
If we compare the first and last expressions here, we need the same number of
summations, but the number of multiplications needed in the latter expression
has been halved.
Observation 4.20. Reducing arithmetic operations for symmetric filters.
Assume that a symmetric filter has 2s + 1 filter coefficients. The filter applied
to a vector of length N can then be implemented using (s + 1)N multiplications
and 2sN additions. This gives a reduced number of arithmetic operations when
compared to a filter with the same number of coefficients which is not symmetric,
where a direct implementations requires (2s + 1)N multiplications and 2sN
additions.
Similarly to as in Section 3.6.2, a symmetric filter can be factored into a
product of symmetric filters. To see how, note first that a real polynomial is
symmetric if and only if 1/a is a root whenever a is. If we pair together the
factors for the roots a, 1/a when a is real we get a component in the frequency
response of degree 2. If we pair the factors for the roots a, 1/a, a, 1/a when a is
complex, we get a component in the frequency response of degree 4. We thus
get the following idea:
Idea 4.21. Factorizing symmetric filters.
Let S be a symmetric filter with real coefficients. There exist constants K,
a1 , . . . , am , b1 , c1 , . . . , bn , cn so that
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 151
n
X L
X
zn = t0 xn + tk (xn+k + xn−k ) + tk (xn+k + xn−k+N ). (4.8)
k=1 k=n+1
b) L ≤ n < N − L:
L
X
zn = t0 xn + tk (xn+k + xn−k ). (4.9)
k=1
c) N − L ≤ n < N :
−1−n
NX L
X
zn = t0 xn + tk (xn+k + xn−k ) + tk (xn+k−N + xn−k ). (4.10)
k=1 k=N −1−n+1
The convolve function may not pick up this reduction in the number of
multiplications, since it does not assume that the filter is symmetric. We will
still use the convolve function in implementations, however, due to its heavy
optimization.
result is much used in practical implementations of DCT, and can also be used
for practical implementation of the DFT as we will see in Exercise 4.9. Note
that the result, and the following results in this section, are stated in terms
n
k + 12 ,
of the cosine matrix CN (where the entries are (CN )n,k = cos 2π 2N
rather than the DCTN matrix (which uses the additional scaling factor dn,N
for the rows). The reason is that CN appears to me most practical for stating
algorithms. When computing the DCT, we simply need to scale with the dn,N
at the end, after using the statements below.
Theorem 4.22. DCT algorithm.
Let y = CN x. Then we have that
n n
yn = cos π <((DFTN x(1) )n ) + sin π =((DFTN x(1) )n ) , (4.11)
2N 2N
where x(1) ∈ RN is defined by
N −1
X n 1
yn = xk cos 2π k+
2N 2
k=0
N/2−1 N/2−1
X n 1 X n 1
= x2k cos 2π 2k + + x2k+1 cos 2π 2k + 1 + .
2N 2 2N 2
k=0 k=0
If we then also shift the indices with N/2 in this sum, we get
N −1
X n 1
x2N −2k−1 cos 2π 2N − 2k − 1 +
2N 2
k=N/2
N −1
X n 1
= x2N −2k−1 cos 2π 2k + ,
2N 2
k=N/2
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 153
where we used that cos is symmetric and periodic with period 2π. We see that
we now have the same cos-terms in the two sums. If we thus define the vector
x(1) as in the text of the theorem, we see that we can write
N −1
X n 1
yn = (x(1) )k cos 2π 2k +
2N 2
k=0
−1
N
!
−2πin(2k+ 12 )/(2N )
X
(1)
=< (x ) e k
k=0
−1
N
!
X
−πin/(2N ) (1) −2πink/N
=< e (x )k e
k=0
= < e−πin/(2N ) (DFTN x(1) )n
n n
= cos π <((DFTN x(1) )n ) + sin π =((DFTN x(1) )n ) ,
2N 2N
where we have recognized the N -point DFT. This completes the proof.
With the result above we have avoided computing a DFT of double size. If we
in the proof above define the N × N -diagonal matrix QN by Qn,n = e−πin/(2N ) ,
the result can also be written on the more compact form
y = CN x = < QN DFTN x(1) .
We will, however, not use this form, since there is complex arithmetic involved,
contrary to Equation(4.11). Code which uses Equation (4.11) to compute the
DCT, using the function FFTImpl from Section 2.3, can look as follows:
def DCTImpl(x):
"""
Compute the DCT of the vector x
x: a vector
"""
N = len(x)
if N > 1:
x1 = concatenate([x[0::2], x[-1:0:-2]]).astype(complex)
FFTImpl(x1, FFTKernelStandard)
cosvec = cos(pi*arange(float(N))/(2*N))
sinvec = sin(pi*arange(float(N))/(2*N))
if ndim(x) == 1:
x[:] = cosvec*real(x1) + sinvec*imag(x1)
else:
for s2 in xrange(shape(x)[1]):
x[:, s2] = cosvec*real(x1[:, s2]) \
+ sinvec*imag(x1[:, s2])
x[0] *= sqrt(1/float(N))
x[1:] *= sqrt(2/float(N))
In the code, the vector x(1) is created first by rearranging the components, and
it is sent as input to FFTImpl. After this we take real parts and imaginary parts,
and multiply with the cos- and sin-terms in Equation (4.11).
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 154
n n
yn = cos π <((DFTN x(1) )n ) + sin π =((DFTN x(1) )n )
2N 2N
N −n (1) N −n
yN −n = cos π <((DFTN x )N −n ) + sin π =((DFTN x(1) )N −n )
2N 2N
n n
= sin π <((DFTN x(1) )n ) − cos π =((DFTN x(1) )n ) ,
2N 2N
(4.12)
where we have used the symmetry of DFTN for real signals. These two equations
enable us to determine <((DFTN x(1) )n ) and =((DFTN x(1) )n ) from yn and
yN −n . We get
n n
cos π yn + sin π yN −n = <((DFTN x(1) )n )
2N
n 2N
n
sin π yn − cos π yN −n = =((DFTN x(1) )n ).
2N 2N
Adding we get
n n n n
(DFTN x(1) )n = cos π yn + sin π yN −n + i(sin π yn − cos π yN −n )
2N
n n 2N 2N 2N
=(cos π + i sin π )(yn − iyN −n ) = eπin/(2N ) (yn − iyN −n ).
2N 2N
This means that (DFTN x(1) )n = eπin/(2N ) (yn + iyN −n ) = (yn + iyN −n )/Qn,n
1
for n ≥ 1. Since =((DFTN x(1) )0 ) = 0 we have that (DFTN x(1) )0 = d0,N y0 =
y0 /Q0,0 . This means that x(1) can be recovered by taking the IDFT of the
vector with component 0 being y0 /Q0,0 , and the remaining components being
(yn − iyN −n )/Qn,n :
x(1) = IDFTN z,
where x(1) is defined as in Theorem 4.22.
The implementation of IDCT can thus go as follows:
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 155
def IDCTImpl(y):
"""
Compute the IDCT of the vector y
y: a vector
"""
N = len(y)
if N > 1:
y[0] /= sqrt(1/float(N))
y[1:] /= sqrt(2/float(N))
Q = exp(-pi*1j*arange(float(N))/(2*N))
y1 = zeros_like(y).astype(complex)
y1[0] = y[0]/Q[0]
if ndim(y) == 1:
y1[1:] = (y[1:] - 1j*y[-1:0:-1])/Q[1:]
else:
for s2 in xrange(shape(y)[1]):
y1[1:, s2] = (y[1:, s2] - 1j*y[-1:0:-1, s2])/Q[1:]
FFTImpl(y1, FFTKernelStandard, 0)
y[0::2] = real(y1[0:(N/2)])
y[1::2] = real(y1[-1:(N/2-1):-1])
n n
yn = cos π <((DFTN x(1) )n ) + sin π =((DFTN x(1) )n )
2N
n 2N
n
yN −n = sin π <((DFTN x(1) )n ) − cos π =((DFTN x(1) )n )
2N 2N
for the n’th and N − n’th coefficient of the DCT. This can also be rewritten as
n
yn = <((DFTN x(1) )n ) + =((DFTN x(1) )n ) cos π
n n 2N
(1)
− =((DFTN x )n )(cos π − sin π )
2N 2N n
yN −n = − <((DFTN x(1) )n ) + =((DFTN x(1) )n ) cos π
n n 2N
(1)
+ <((DFTN x )n )(sin π + cos π ).
2N 2N
Explain that the first two equations require 4 multiplications to compute yn and
yN −n , and that the last two equations require 3 multiplications to compute yn
and yN −n .
b) Explain why the trick in a) reduces the number of additional multiplications
in a DCT, from 2N to 3N/2.
c) Explain why the trick in a) can be used to reduce the number of additional
multiplications in an IDCT with the same number.
Hint. match the expression eπin/(2N ) (yn − iyN −n ) you encountered in the
IDCT with the rewriting you did in b).
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 157
d) Show that the penalty of the trick we here have used to reduce the number
of multiplications, is an increase in the number of additional additions from N
to 3N/2. Why can this trick still be useful?
<((DFTN/2 x(e) )n ) + (CN/4 z)n 0 ≤ n ≤ N/4 − 1
<(yn ) = <((DFTN/2 x(e) )n ) n = N/4
<((DFTN/2 x(e) )n ) − (CN/4 z)N/2−n N/4 + 1 ≤ n ≤ N/2 − 1
(4.13)
=((DFTN/2 x(e) )n ) n=0
=(yn ) = =((DFTN/2 x(e) )n ) + (CN/4 w)N/4−n 1 ≤ n ≤ N/4 − 1
=((DFTN/2 x(e) )n ) + (CN/4 w)n−N/4 N/4 ≤ n ≤ N/2 − 1
(4.14)
Explain from this how you can make an algorithm which reduces an FFT of
length N to an FFT of length N/2 (on x(e) ), and two DCT’s of length N/4 (on
z and w). We will call this algorithm the revised FFT algorithm.
a) says nothing about the coefficients yn for n > N2 . These are obtained in
the same way as before through symmetry. a) also says nothing about yN/2 .
This can be obtained with the same formula as in Theorem 2.15.
Let us now compute the number of arithmetic operations our revised algorithm
needs. Denote by the number of real multiplications needed by the revised N -
point FFT algorithm
b) Explain from the algorithm in a) that
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 158
Hint. 3N/8 should come from the extra additions/multiplications (see Exer-
cise 4.8) you need to compute when you run the algorithm from Theorem 4.22
for CN/4 . Note also that the equations in a) require no extra multiplications,
but that there are xix equations involved, each needing N/4 additions, so that
we need 6N/4 = 3N/2 extra additions.
c) Explain why xr = M2r is the solution to the difference equation
the revised DCT. The total number of operations is thus O(2N log2 N ), i.e. half
the operation count of the split-radix algorithm. The orders of these algorithms
are thus the same, since we here have adapted to read data.
e) Explain that, if you had not employed the
trick from Exercise 4.8, we
would
instead have obtained MN = O 23 log2 N , and AN = O 43 log2 N , which
equal the orders for the number of multiplications/additions for the split-radix
algorithm. In particular, the order of the operation count remains the same,
but the trick from Exercise 4.8 turned a bigger percentage of the arithmetic
operations into additions.
The algorithm we here have developed thus is constructed from the beginning
to apply for real data only. Another advantage of the new algorithm is that it
can be used to compute both the DCT and the DFT.
and explain how one can compute x(e) from this using an IFFT of length N/2.
4.4 Summary
We started this chapter by extending a previous result which had to do with that
the Fourier series of a symmetric function converged quicker. To build on this
we first needed to define symmetric extensions of vectors and symmetric vectors,
before we classified symmetric extensions in the frequency domain. From this
we could find a nice, orthonormal basis for the symmetric extensions, which
lead us to the definition of the DCT. We also saw a connection with symmetric
filters: These are exactly the filters which preserve symmetric extensions, and
we could characterize symmetric filters restricted to symmetric extension as an
N -dimensional mapping. We also showed that it is smart to replace the DFT
with the DCT when we work with filters which are known to be symmetric.
Among other things, this lead to a better way of approximating analog filters,
and better interpolation of functions.
We also showed how to obtain an efficient implementation of the DCT, which
could reuse the FFT implementation. The DCT has an important role in the
MP3 standard. As we have explained, the MP3 standard applies several filters
to the sound, in order to split it into bands concentrating on different frequency
ranges. Later we will look closer at how these filters can be implemented and
constructed. The implementation can use transforms similar to the MDCT, as
explained in Exercise 4.6. The MDCT is also used in the more advanced version
of the MP3 standard (layer III). Here it is applied to the filtered data to obtain
a higher spectral resolution of the sound. The MDCT is applied to groups of 576
(in special circumstances 192) samples. The MP3 standard document [20] does
not dig into the theory for this, only representing what is needed in order to
make an implementation. It is somewhat difficult to read this document, since it
is written in quite a different language, familiar mainly to those working with
international standards.
The different type of cosine-matrices can all be associated with some extension
strategy for the signal. [34] contains a review of these.
The DCT is particularly popular for processing sound data before they are
compressed with lossless techniques such as Huffman coding or arithmetic coding.
The reason is, as mentioned, that the DCT provides a better approximation
from a low-dimensional space than the DFT does, and that it has a very efficient
implementation. Libraries exist which goes into lengths to provide efficient
implementation of the FFT and the DCT. FFTW, short for Fastest Fourier
Transform in the West [17], is perhaps the best known of these.
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 160
Signal processing literature often does not motivate digital filters in explaining
where they come from, and where the input to the filters come from. Using
analog filters to motivate this, and to argue for improvements in using the DCT
and symmeric extensions, is not that common. Much literature simply says that
the property of linear phase is good, without elaborating on this further.
Chapter 5
In the first part of the book our focus was to approximate functions or vectors
with trigonometric functions. We saw that the Discrete Fourier transform could
be used to obtain a representation of a vector in terms of such functions, and
that computations could be done efficiently with the FFT algorithm. This was
useful for analyzing, filtering, and compressing sound and other discrete data.
The approach with trigonometric functions has some limitations, however. One
of these is that, in a representation with trigonometric functions, the frequency
content is fixed over time. This is in contrast with most sound data, where
the characteristics are completely different in different parts. We have also
seen that, even if a sound has a simple representation in terms of trigonometric
functions on two different parts, the representation of the entire sound may not
be simple. In particular, if the function is nonzero only on a very small interval,
a representation of it in terms of trigonometric functions is not so simple.
In this chapter we are going to introduce the basic properties of an alternative
to Fourier analysis for representing functions. This alternative is called wavelets.
Similar to Fourier analysis, wavelets are also based on the idea of expressing a
function in some basis. But in contrast to Fourier analysis, where the basis is
fixed, wavelets provide a general framework with many different types of bases.
In this chapter we first give a motivation for wavelets, before we continue by
introducing some very simple wavelets. The first wavelet we look at can be
interpreted as an approximation scheme based on piecewise constant functions.
The next wavelet we look at is similar, but with piecewise linear functions used
instead. Following these examples we will establish a more general framework,
based on experiences from the simple wavelets. In the following chapters we will
interpret this framework in terms of filters, and use this connection to construct
even more interesting wavelets.
Core functions in this chapter are collected in a module called dwt.
161
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES162
Figure 5.1: A view of Earth from space, together with versions of the image
where we have zoomed in.
The startup image in Google EarthTM , a program for viewing satellite images,
maps and other geographic information, is very similar to this. In the middle
image we have zoomed in on the Mexican Gulff, as marked with a rectangle in
the left image. Similarly, in the right image we have further zoomed in on Cuba
and a small portion of Florida, as marked with a rectangle in the middle image.
There is clearly an amazing amount of information available behind a program
like Google EarthTM , since we there can zoom further in, and obtain enough
detail to differentiate between buildings and even trees or cars all over the Earth.
So, when the Earth is spinning in the opening screen of Google EarthTM , all
the Earth’s buildings appear to be spinning with it! If this was the case the
Earth would not be spinning on the screen, since there would just be so much
information to process that a laptop would not be able to display a rotating
Earth.
There is a simple reason that the globe can be shown spinning in spite of
the huge amounts of information that need to be handled. We are going to see
later that a digital image is just a rectangular array of numbers that represent
the color at a dense set of points. As an example, the images in Figure 5.1 are
made up of a grid of 1064 × 1064 points, which gives a total of 1 132 096 points.
The color at a point is represented by three eight-bit integers, which means that
the image files contain a total of 3 396 288 bytes each. So regardless of how
close to the surface of the Earth our viewpoint is, the resulting image always
contains the same number of points. This means that when we are far away
from the Earth we can use a very coarse model of the geographic information
that is being displayed, but as we zoom in, we need to display more details and
therefore need a more accurate model.
Observation 5.1. Images model.
When discrete information is displayed in an image, there is no need to use a
mathematical model that contains more detail than what is visible in the image.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES163
8
7
6
5
4
3
2
1
00 2 4 6 8 10
Figure 5.2: A piecewise constant function.
1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.00 2 4 6 8 10 0.00 2 4 6 8 10
Figure 5.3: The basis functions φ2 and φ7 from φ0 .
1.0
1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.0
0 2 4 6 8 10 1.00 2 4 6 8 10
Figure 5.4: Examples of functions from V0 . The square wave in V0 (left), and
an approximation to cos t from V0 (right).
In our discussion of Fourier analysis, the starting point was the function
sin(2πt) that has frequency 1. We can think of the space V0 as being analogous
PN −1
to this function: The function n=0 (−1)n φn (t) is (part of the) square wave
that we discussed in Chapter 1, and which also oscillates regularly like the sine
function, see the left plot in Figure 5.4. The difference is that we have more
flexibility since we have a whole space at our disposal instead of just one function
— the right plot in Figure 5.4 shows another function in V0 .
In Fourier analysis we obtained a linear space of possible approximations by
including sines of frequency 1, 2, 3, . . . , up to some maximum. We use a similar
approach for constructing wavelets, but we double the frequency each time and
label the spaces as V0 , V1 , V2 , . . .
1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.00 2 4 6 8 10 1.00 2 4 6 8 10
1.0
1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.0
1.00 2 4 6 8 10 0 2 4 6 8 10
Figure 5.5: Piecewise constant approximations to cos t on the interval [0, 10] in
the spaces V1 , V2 , and V3 . The lower right plot shows the square wave in V2 .
Proof. The functions given by Equation (5.4) are nonzero on the subintervals
[n/2m , (n+1)/2m ) which we referred to in Definition 5.4, so that φm,n1 φm,n2 = 0
when n1 6= n2 , since these intervals are disjoint. The only mysterious thing may
be the normalisation factor 2m/2 . This comes from the fact that
Z N Z (n+1)/2m Z 1
m 2
φ(2 t − n) dt = φ(2m t − n)2 dt = 2−m φ(u)2 du = 2−m .
0 n/2m 0
f (t) − g(t) ≤
tm,n+1/2 = (n + 1/2)2−m .
For t in this subinterval we then obviously have |f (t) − g(t)| ≤ , and since these
intervals cover [0, N ], the conclusion holds for all t ∈ [0, N ].
Theorem 5.6 does not tell us how to find the approximation g although the
proof makes use of an approximation that interpolates f at the midpoint of each
subinterval. Note that if we measure the error in the L2 -norm, we have
Z N
2 2
kf − gk = f (t) − g(t) dt ≤ N 2 ,
0
√
so kf − gk ≤ N . We therefore have the following corollary.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES168
lim kf − projVm (f )k = 0.
m→∞
Figure 5.6 illustrates how some of the approximations of the function f (x) =
x2 from the resolution spaces for the interval [0, 1] improve with increasing m.
1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.20.0 0.2 0.4 0.6 0.8 1.0 0.20.0 0.2 0.4 0.6 0.8 1.0
1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.20.0 0.2 0.4 0.6 0.8 1.0 0.20.0 0.2 0.4 0.6 0.8 1.0
Figure 5.6: Comparison of the function defined by f (t) = t2 on [0, 1] with the
projection onto V2 , V4 , and V6 , respectively.
V0 ⊂ V1 ⊂ V2 ⊂ · · · ⊂ Vm · · · .
This means that it is meaningful to project Vk+1 onto Vk . The next step is to
characterize the projection from V1 onto V0 , and onto the orthogonal complement
of V0 in V1 . Before we do this, let us make the following definitions.
Definition 5.9. Detail spaces.
The orthogonal complement of Vm−1 in Vm is denoted Wm−1 . All the spaces
Wk are also called detail spaces, or error spaces.
The name detail space is used since the projection from Vm onto Vm−1 in
considered as a (low-resolution) approximation, and the error, which lies in
Wm−1 , is the detail which is left out when we replace with this approximation.
We will also write gm = gm−1 + em−1 when we split gm ∈ Vm into a sum of a
low-resolution approximation and a detail component. In the context of our
Google EarthTM example, in Figure 5.1 you should interpret g0 as the left image,
the middle image as an excerpt of g1 , and e0 as the additional details which are
needed to reproduce the middle image from the left image.
Since V0 and W0 are mutually orthogonal spaces they are also linearly
independent spaces. When U and V are two such linearly independent spaces,
we will write U ⊕ V for the vector space consisting of all vectors of the form
u + v, with u ∈ U , v ∈ V . U ⊕ V is also called the direct sum of U and V . This
also makes sense if we have more than two vector spaces (such as U ⊕ V ⊕ W ),
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES170
Vm = V0 ⊕ W0 ⊕ W1 ⊕ · · · ⊕ Wm−1 , (5.7)
where the spaces on the right hand side have dimension N, N, 2N, . . . , 2m−1 N .
This decomposition wil be important for our purposes. It says that the resolution
space Vm acan be written as the sum of a lower order resolution space V0 , and
m detail spaces W0 , . . . , Wm−1 . We will later interpret this splitting into a
low-resolution component and m detail components.
It turns out that the following function will play the same role for the detail
space Wk as the function φ plays for the resolution space Vk .
Definition 5.10. The function ψ.
We define
√
ψ(t) = φ1,0 (t) − φ1,1 (t) / 2 = φ(2t) − φ(2t − 1), (5.8)
and
m
also that the {ψm,n }2n=0N −1 is orthonormal for any m. We will write ψm for
m
the orthonormal basis {ψm,n }2n=0N −1 , and we will always denote the coordinates
in the basis ψm by wm,n . The next result motivates the definition of ψ, and
states how we can project from V1 onto V0 and W0 , i.e. find the low-resolution
approximation and the detail component of g1 ∈ V1 .
Lemma 5.11. Orthonormal bases.
For 0 ≤ n < N we have that
( √
φ0,n/2 / 2, if n is even;
projV0 (φ1,n ) = √ (5.11)
φ0,(n−1)/2 / 2, if n is odd.
( √
ψ0,n/2 / 2, if n is even;
projW0 (φ1,n ) = √ (5.12)
−ψ0,(n−1)/2 / 2, if n is odd.
N −1
X c1,2n + c1,2n+1
projV0 (g1 ) = c0,n φ0,n , where c0,n = √ (5.13)
n=0
2
N −1
X c1,2n − c1,2n+1
projW0 (g1 ) = w0,n ψ0,n , where w0,n = √ . (5.14)
n=0
2
Proof. We first observe that φ1,n (t) 6= 0 if and only if n/2 ≤ t < (n + 1)/2.
Suppose that n is even. Then the intersection
n n+1
, ∩ [n1 , n1 + 1) (5.15)
2 2
n
is nonempty only if n1 = 2. Using the orthogonal decomposition formula we get
N
X −1
projV0 (φ1,n ) = hφ1,n , φ0,k iφ0,k = hφ1,n , φ0,n1 iφ0,n1
k=0
Z (n+1)/2 √ 1
= 2 dt φ0,n/2 = √ φ0,n/2 .
n/2 2
1 1 1 1
projW0 (φ1,n ) = φ1,n − √ φ0,n/2 = φ1,n − √ √ φ1,n + √ φ1,n+1
2 2 2 2
1 1 √
= φ1,n − φ1,n+1 = ψ0,n/2 / 2.
2 2
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES172
This proves the expressions for both projections when n is even. When n is
odd, the intersection (5.15) is nonempty only if n1 = (n − 1)/2, which gives the
expressions for both projections when n is odd in the same way. In particular
we get
φ0,(n−1)/2 1 1 1
projW0 (φ1,n ) = φ1,n − √ = φ1,n − √ √ φ1,n−1 + √ φ1,n
2 2 2 2
1 1 √
= φ1,n − φ1,n−1 = −ψ0,(n−1)/2 / 2.
2 2
ψ0 must be an orthonormal basis for W0 since ψ0 is contained in W0 , and both
have dimension N .
We project the function g1 in V1 using the formulas in (5.11). We first split
the sum into even and odd values of n,
2N
X −1 N
X −1 N
X −1
g1 = c1,n φ1,n = c1,2n φ1,2n + c1,2n+1 φ1,2n+1 . (5.16)
n=0 n=0 n=0
−1 −1
N N
!
X X
projV0 (g1 ) = projV0 c1,2n φ1,2n + c1,2n+1 φ1,2n+1
n=0 n=0
N
X −1 N
X −1
= c1,2n projV0 (φ1,2n ) + c1,2n+1 projV0 (φ1,2n+1 )
n=0 n=0
N −1 N −1
X √ X √
= c1,2n φ0,n / 2 + c1,2n+1 φ0,n / 2
n=0 n=0
N −1
X c1,2n + c1,2n+1
= √ φ0,n
n=0
2
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0.0
0.5 φ1,0 0.5 φ1,0
1.0 projV0 (φ1,0) 1.0 projW0 (φ1,0)
1.50.5 0.0 0.5 1.0 1.5 1.50.5 0.0 0.5 1.0 1.5
Figure 5.8: The projection of φ1,0 ∈ V1 onto V0 and W0 .
( √
ψm−1,n/2 / 2, if n is even;
projWm−1 (φm,n ) = √ (5.17)
−ψm−1,(n−1)/2 / 2, if n is odd.
From this it follows as before that ψm is an orthonormal basis for Wm . If {Bi }ni=1
are mutually independent bases, we will in the following write (B1 , B2 , . . . , Bn )
for the basis where the basis vectors from Bi are included before Bj when i < j.
With this notation, the decomposition in Equation (5.7) can be restated as
follows
Theorem 5.13. Bases for Vm .
φm and (φ0 , ψ0 , ψ1 , · · · , ψm−1 ) are both bases for Vm .
The function ψ thus has the property that its dilations and translations
together span the detail components. Later we will encounter other functions,
which also will be denoted by ψ, and have similar properties. In the theory of
wavelets, such ψ are called mother wavelets. There is one important property of
ψ, which we will return to:
Observation 5.14. Vanishing moment.
RN
We have that 0 ψ(t)dt = 0.
This can be seen directly from the plot in Figure 5.7, since the parts of
the graph above and below the x-axis
R cancel. In general we say that ψ has k
vanishing moments if the integrals tl ψ(t)dt = 0 for all 0 ≤ l ≤ k − 1. Due to
Observation 5.14, ψ has one vanishing moment. In Chapter 7 we will show that
mother wavelets with many vanishing moments are very desirable when it comes
to approximation of functions.
We now have all the tools needed to define the Discrete Wavelet Transform.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES174
2mX
N −1
gm = cm,n φm,n ∈ Vm ,
n=0
2m−1
X N −1 2m−1
X N −1
gm−1 = cm−1,n φm−1,n ∈ Vm−1 em−1 = wm−1,n ψm−1,n ∈ Wm−1 ,
n=0 n=0
then the change of coordinates from φm to (φm−1 , ψm−1 ) (i.e. first stage in a
DWT) is given by
√ √
cm−1,n 1/√2 1/ √2 cm,2n
= (5.19)
wm−1,n 1/ 2 −1/ 2 cm,2n+1
Conversely, the change of coordinates from (φm−1 , ψm−1 ) to φm (i.e. the last
stage in an IDWT) is given by
√ √
cm,2n 1/√2 1/ √2 cm−1,n
= (5.20)
cm,2n+1 1/ 2 −1/ 2 wm−1,n
√ √ √ √
φm−1,n = φm,2n / 2 + φm,2n+1 / 2 ψm−1,n = φm,2n / 2 − φm,2n+1 / 2.
The change
√ √ matrix from the basis {φm−1,n , ψm−1,n } to {φm,2n , φm,2n+1 }
of coordinate
1/√2 1/ √2
is thus . This proves Equation (5.20). Equation (5.19) follows
1/ 2 −1/ 2
immediately since this matrix equals its inverse.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES175
Above we assumed that N is even. In Exercise 5.8 we will see how we can
handle the case when N is odd.
From Theorem 5.16, we see that, if we had defined
is repeated along the main diagonal 2m−1 N times. Also, from Equation (5.19) it
is apparent that H = PCm ←φm is the same matrix. Such matrices are called block
diagonal matrices. This particular block diagonal matrix is clearly orthogonal.
Let us make the following definition.
Definition 5.17. DWT and IDWT kernel transformations.
The matrices H = PCm ←φm and G = Pφm ←Cm are called the DWT and
IDWT kernel transformations. The DWT and the IDWT can be expressed in
terms of these kernel transformations by
respectively, where
• P(φm−1 ,ψm−1 )←Cm is a permutation matrix which groups the even elements
first, then the odd elements,
• PCm ←(φm−1 ,ψm−1 ) is a permutation matrix which places the first half at
the even indices, the last half at the odd indices.
Clearly, the kernel transformations H and G also invert each other. The point
of using the kernel transformation is that they compute the output sequentially,
similarly to how a filter does. Clearly also the kernel transformations are very
similar to a filter, and we will return to this in the next chapter.
At each level in a DWT, Vk is split into one low-resolution component from
Vk−1 , and one detail component from Wk−1 . We have illustrated this in figure 5.9,
where the arrows represent changes of coordinates.
The detail component from Wk−1 is not subject to further transformation.
This is seen in the figure since ψk−1 is a leaf node, i.e. there are no arrows going
out from ψm−1 . In a similar illustration for the IDWT, the arrows would go the
opposite way.
The Discrete Wavelet Transform is the analogue in a wavelet setting to the
Discrete Fourier transform. When applying the DFT to a vector of length N ,
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES176
" # #
ψ m−1 ψ m−2 ψ m−3 ψ0
one starts by viewing this vector as coordinates relative to the standard basis.
When applying the DWT to a vector of length N , one instead views the vector
as coordinates relative to the basis φm . This makes sense in light of Exercise 5.1.
for any f . Show also that the first part of Proposition 5.12 follows from this.
X Z n+1 X Z n+1 2
2
k f (t)dt φ0,n (t) − f k = hf, f i − f (t)dt .
n n n n
This, together with the previous exercise, gives us an expression for the least-
squares error for f from V0 (at least after taking square roots). 2DO: Generalize
to m
−1
N
!
X Z n+1/2 Z n+1
projW0 (f ) = f (t)dt − f (t)dt ψ0,n (t) (5.23)
n=0 n n+1/2
for any f . Show also that the second part of Proposition 5.12 follows from this.
(φ0,0 , ψ0,0 , φ0,1 , ψ0,1 , . . . , φ0,(N −1)/2 , ψ(N −1)/2 , φ0,(N +1)/2 ).
Since all functions are assumed to have period N , we have that
1 1
φ0,(N +1)/2 = √ (φ1,N −1 + φ1,N ) = √ (φ1,0 + φ1,N −1 ).
2 2
From this relation one can find the last column in the change of coordinate
matrix from φ0 to (φ1 , ψ1 ), i.e. the IDWT matrix. In particular, when N is
odd, we see that the last column in the IDWT matrix circulates to the upper
right corner. In terms of coordinates, we thus have that
1 1
c1,0 = √ (c0,0 + w0,0 + c0,(N +1)/2 ) c1,N −1 = √ c0,(N +1)/2 . (5.24)
2 2
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES178
1
c0,0 = √ (c1,0+c1,1 −c1,N −1 )
2
1
w0,0 = √ (c1,0−c1,1 −c1,N −1 )
2
1
c0,(N +1)/2 = √ 2c1,N −1 . (5.25)
2
b) Explain that the DWT matrix is orthogonal if and only if N is even. Also
explain that it is only the last column which spoils the orthogonality.
The code above accepts two-dimensional data, just as our function FFTImpl.
Thus, the function may be applied simultaneously to all channels in a sound.
The reason for using a general kernel function will be apparent later, when we
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES179
change to different types of wavelets. It is not meant that you call this kernel
function directly. Instead every time you apply the DWT call the function
x is the input to the DWT, and m is the number of levels. The three last
parameters will be addressed later in the book (the bd_mode-parameter addresses
how the boundary should be handled). The function also sets meaningful default
values for the three last parameters, so that you mostly only need to provide the
three first parameters.
We will later construct other wavelets, and we will distinguish them by
using different names. This is the purposes of the wave_name parameter. This
parameter is sent to a function called find_kernel which looks up a kernel
function by name (find_kernel also uses the dual and transpose parameters
to take a decision on which kernel to choose). The Haar wavelet is identified
with the name "Haar". When this is input to DWTImpl, find_kernel returns
the dwt_kernel_haar kernel. The kernel is then used as input to the following
function:
The code is applied to all columns if the data is two-dimensional, and we see
that the kernel function is invoked one time for each resolution. To reorder
coordinates in the same order as (φm , ψm ), note that the coordinates from φm
above end up at indices k2m , where m represents the current stage, and k runs
through the indices. The function reorganize_coeffs_forward uses this to
reorder the coordinates (you will be spared the details in this implementation).
Although the DWT requires this reorganization, this may not be required in
practice. In Exercise 5.27 we go through some aspects of this implementation.
The implementation is not recursive, as the for-loop runs through the different
stages.
In this implementation, note that the first levels require the most operations,
since the latter levels leave an increasing part of the coordinates unchanged. Note
also that the change of coordinates matrix is a very sparse matrix: At each level
a coordinate can be computed from only two of the other coordinates, so that
this matrix has only two nonzero elements in each row/column. The algorithm
clearly shows that there is no need to perform a full matrix multiplication to
perform the change of coordinates.
There is a similar setup for the IDWT:
If the wave_name-parameter is set to "Haar", also this function will use the
find_kernel function to look up another kernel function, idwt_kernel_haar
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES180
(when N is even, this uses the exact same code as dwt_kernel_haar. For N is
odd, see Exercises 5.8 and 5.26). This is then sent as input to
Here the steps are simply performed in the reverse order, and by iterating
Equation (5.20).
In the next sections we will consider other cases where the underlying function
φ may be something else, and not necessarily piecewise constant. It will turn
out that much of the analysis we have done makes sense for other functions φ as
well, giving rise to other structures which we also will refer to as wavelets. The
wavelet resulting from piecewise constant functions is thus simply one example
out of many, and it is commonly referred to as the Haar wavelet. Let us round
off this section with some important examples.
32 32 1
f (t) = √ φ1,0 (t) = √ √ (φ0,0 + ψ0,0 ) = 16φ0,0 + 16ψ0,0 .
2 2 2
From this we see that the coordinate vector of f in (φ0 , ψ0 , · · · , ψ9 ), i.e. the
10-level DWT of x, is (16, 16, 0, 0, . . . , 0). Note that here V0 and W0 are both
1-dimensional, since V10 was assumed to be of dimension 210 (in particular,
N = 1).
It is straightforward to verify what we found using the algorithm above:
x = hstack([ones(512), zeros(512)])
DWTImpl(x, 10, ’Haar’)
print x
The reason why the method from this example worked was that the vector we
started with had a simple representation in the wavelet basis, actually it equaled
the coordinates of a basis function in φ1 . Usually this is not the case, and our
only possibility then is to run the DWT on a computer.
The first part of the DWT plot represents the low resolution part, the second
the detail.
Since φ(2m t − n) ∈ Vm oscillates more quickly than φ(t − n) ∈ V0 , one is lead
to believe that coefficients from lower order resolution spaces correspond to lower
frequencies. The functions φm,n do not correspond to pure tones in the setting
of wavelets, however, but let us nevertheless listen to sound from the different
resolution spaces. The code base includes a function forw_comp_rev_DWT which
runs an m-level DWT on the first samples of the audio sample file, extracts the
detail or the low-resolution approximation, and runs an IDWT to reconstruct
the sound. Since the returned values may lie outside the legal range [−1, 1], the
values are normalized at the end.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES182
x, fs = forw_comp_rev_DWT(m, ’Haar’)
play(x, fs)
x, fs = forw_comp_rev_DWT(1, ’Haar’, 0)
play(x, fs)
We see that the detail is quite significant, so that the first order wavelet approxi-
mation does not give a very good approximation. For m = 2 the detail can be
played as follows
x, fs = forw_comp_rev_DWT(2, ’Haar’, 0)
play(x, fs)
1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.00 20000400006000080000100000
120000 1.00 20000400006000080000100000
120000
Figure 5.11: The detail in our audio sample file, for m = 1 (left) and m = 2
(right).
The errors are shown in Figure 5.11. The error is larger when two levels of
the DWT are performed, as one would suspect. It is also seen that the error
is larger in the part of the file where there are bigger variations. Since more
and more information is contained in the detail components as we increase m,
we here see the opposite effect: The sound gradually improves in quality as we
increase m.
The previous example illustrates that wavelets as well may be used to perform
operations on sound. As we will see later, however, our main application for
wavelets will be images, where they have found a more important role than
for sound. Images typically display variations which are less abrupt than the
ones found in sound. Just as the functions above had smaller errors in the
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES183
corresponding resolution spaces than the sound had, images are thus more suited
for for use with wavelets. The main idea behind why wavelets are so useful
comes from the fact that the detail, i.e., wavelet coefficients corresponding to the
spaces Wk , are often very small. After a DWT one is therefore often left with a
couple of significant coefficients, while most of the coefficients are small. The
approximation from V0 can be viewed as a good approximation, even though
it contains much less information. This gives another reason why wavelets
are popular for images: Detailed images can be very large, but when they are
downloaded to a web browser, the browser can very early show a low-resolution of
the image, while waiting for the rest of the details in the image to be downloaded.
When we later look at how wavelets are applied to images, we will need to handle
one final hurdle, namely that images are two-dimensional.
In these cases, we see that we require large m before the detail/error becomes
significant. We see also that there is no error for the square wave. The reason
is that the square wave is a piecewise constant function, so that it can be
represented exactly by the φ-functions. For the other functions, however, this is
not the case, so we here get an error.
Z N Z N
wm,n = hf, ψm,n i = f (t)ψm,n (t)dt = (1 − t/N )ψm,n (t)dt.
0 0
Using the definition of ψm,n we see that this can also be written as
!
Z N Z N Z N
m/2 m m/2 m t m
2 (1 − t/N )ψ(2 t − n)dt = 2 ψ(2 t − n)dt − ψ(2 t − n)dt .
0 0 0 N
RN
Using Observation 5.14 we get that 0 ψ(2m t − n)dt = 0, so that the first term
above vanishes. Moreover, ψm,n is nonzero only on [2−m n, 2−m (n + 1)), and is 1
on [2−m n, 2−m (n + 1/2)), and −1 on [2−m (n + 1/2), 2−m (n + 1)). We therefore
get
[T1 ⊕ T2 ⊕ . . . ⊕ Tn ](B1 ,B2 ,...,Bn ) = [T1 ]B1 ⊕ [T2 ]B2 ⊕ · · · ⊕ [Tn ]Bn ,
Here two new concepts are used: a direct sum of matrices, and a direct sum of
linear transformations.
1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 50 100 150 200 250 0.0 50 100 150 200 250
Figure 5.13: 2 vectors x1 and x2 which seem equal, but where the DWT’s are
very different.
You see that the two DWT’s are very different: For the first vector we see
that there is much detail present (the second part of the plot), while for the
second vector there is no detail present. Attempt to explain why this is the case.
Based on your answer, also attempt to explain what can happen if you change
the point of discontinuity for the piecewise constant function in the left part of
Figure 5.11 to something else.
orthonormality we had for the Haar wavelet. On the other hand, we will see
that the new scaling functions and mother wavelets are symmetric functions.
We will later see that this implies that the corresponding DWT and IDWT have
simple implementations with higher precision. Our experience from deriving
Haar wavelets will guide us in the construction of piecewise linear wavelets. The
first task is to define the new resolution spaces.
10 1.0
8 0.8
6 0.6
4 0.4
2 0.2
00 2 4 6 8 10 0.0 1 0 1 2 3 4
Figure 5.14: A piecewise linear function and the two functions φ(t) and φ(t − 3).
m
Any f ∈ Vm is uniquely determined by its values in the points {2−m n}2n=0N −1 .
The linear mapping which sends f to these samples is thus an isomorphism from
m
Vm onto RN 2 , so that the dimension of Vm is N2m . The lft plot in Figure 5.14
shows an example of a piecewise linear function in V0 on the interval [0, 10]. We
note that a piecewise linear function in V0 is completely determined by its value
at the integers, so the functions that are 1 at one integer and 0 at all others are
particularly simple and therefore interesting, see the right plot in Figure 5.14.
These simple functions are all translates of each other and can therefore be built
from one scaling function, as is required for a multiresolution analysis.
Lemma 5.19. The function φ.
Let the function φ be defined by
(
1 − |t|, if −1 ≤ t ≤ 1;
φ(t) = (5.26)
0, otherwise;
and for any m ≥ 0 set
1.2
1.0
0.8
0.6
0.4
0.2
0.0
0.2 1.0 0.5 0.0 0.5 1.0
Figure 5.15: How φ(t) can be decomposed as a linear combination of φ1,−1 ,
φ1,0 , and φ1,1 .
It turns out that this strategy is less appealing in the case of piecewise linear
functions. The reason is that the functions φ0,n are not orthogonal anymore
(see Exercise 5.32). Due to this we have no simple, orthogonal basis for the
set of piecewise linear functions, so that the orthogonal decomposition theorem
fails to give us the projection onto V0 in a simple way. It is therefore no reason
to use the orthogonal complement of V0 in V1 as our error space, since it is
hard to write a piecewise linear function as a sum of two other piecewise linear
functions which are orthogonal. Instead of using projections to find low-resolution
approximations, and orthogonal complements to find error functions, we will
attempt the following simple approximation method:
Definition 5.23. Alternative projection.
Let g1 be a function in V1 given by
2N
X −1
g1 = c1,n φ1,n . (5.29)
n=0
The approximation g0 = P (g1 ) in V0 is defined as the unique function in V0
which has the same values as g1 at the integers, i.e.
1
ψ(t) = √ φ1,1 (t) ψm,n (t) = 2m/2 ψ(2m t − n). (5.31)
2
Suppose that g1 ∈ V1 and that g0 = P (g1 ). Then
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES192
Proof. Since g0 (n) = g1 (n) for all integers n, e0 (n) = (g1 − g0 )(n) = 0, so that
e0 ∈ W0 . This proves the first statement.
For the second statement, note first that
1
ψ0,n (t) = ψ(t − n) = √ φ1,1 (t − n) = φ(2(t − n) − 1)
2
1
= φ(2t − (2n + 1)) = √ φ1,2n+1 (t). (5.32)
2
ψ0 is thus a linearly independent set of dimension N , since it corresponds to a
subset of φ1 . Since φ1,2n+1 is nonzero only on (n, n + 1), it follows that all of ψ0
lies in W0 . Clearly then ψ0 is also a basis for W0 , since W0 also has dimension
N (its image under L1 consists of points where every second component is zero).
Consider finally a linear combination from φ0 and ψ0 which gives zero:
N
X −1 N
X −1
an φ0,n + bn ψ0,n = 0.
n=0 n=0
m m−1 m−1
φm = {φm,n }2n=0N −1 , and (φm−1 , ψm−1 ) = {φm−1,n }2n=0 N −1 2 N −1
, {ψm−1,n }n=0 .
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES193
With this result we can define the DWT and the IDWT with their stages
as before, but the matrices themmselves are now different. For the IDWT
(i.e. Pφ1 ←(φ0 ,ψ0 ) ), the columns in the matrix can be found from equations (5.28)
and (5.32), i.e.
1 1 1
φ0,n = √ φ1,2n−1 + φ1,2n + φ1,2n+1
2 2 2
1
ψ0,n = √ φ1,2n+1 . (5.33)
2
This states that
1 0 0 0 ··· 0 0 0
1/2
1 1/2 0 · · · 0 0 0
0
1 0 1 0 ··· 0 0 0
G = Pφm ←Cm =√ . (5.34)
. .. .. .. .. .. .. ..
2
. . . . . . . .
0 0 0 0 ··· 0 1 0
1/2 0 0 0 ··· 0 1/2 1
√
H = PCm ←φm = 2B−1/2 (5.36)
1
G = Pφm ←Cm = √ B1/2 . (5.37)
2
In the exercises you will be asked to implement a function lifting_odd_symm
which computes Bλ . Using this the DWT kernel transformation for the piecewise
linear wavelet can be applied to a vector x as follows.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES194
x *= sqrt(2)
lifting_odd_symm(-0.5, x, symm)
x, fs = forw_comp_rev_DWT(m, ’pwl0’)
play(x, fs)
There is a new and undesired effect when we increase m here: The castanet
sound seems to grow strange. The sounds from the castanets are perhaps the
sound with the highest frequencies.
Now for the detail. For m = 1 this can be played as follows
x, fs = forw_comp_rev_DWT(1, ’pwl0’, 0)
play(x, fs)
x, fs = forw_comp_rev_DWT(2, ’pwl0’, 0)
play(x, fs)
1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.00 20000400006000080000100000
120000 1.00 20000400006000080000100000
120000
Figure 5.16: The detail in our audio sample file for the piecewise linear wavelet,
for m = 1 (left) and m = 2 (right).
The errors are shown in Figure 5.16. When comparing with Example 5.10
we see much of the same, but it seems here that the error is bigger than before.
In the next section we will try to explain why this is the case, and construct
another wavelet based on piecewise linear functions which remedies this.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES195
Figure 5.17 shows the new plot. With the square wave we see now that
there is an error. The reason is that a piecewise constant function can not be
represented exactly by piecewise linear functions, due to discontinuity. For the
second function we see that there is no error. The reason is that this function is
piecewise linear, so there is no error when we represent the function from the
space V0 . With the third function, however, we see an error.
1/2 √
Z
1 1
(x0 + (x1 − x0 )t)φ1,1 (t)dt = 2 2 x0 + x1
0 12 24
1 √
Z
1 1
(x0 + (x1 − x0 )t)φ1,1 (t)dt = 2 2 x0 + x1 .
1/2 24 12
Z N N
X −1
(φ1,1 (t) − xn φ0,n (t))2 dt
0 n=0
Z 1 Z 1/2 Z 1
2
= φ1,1 (t) dt − 2 (x0 + (x1 − x0 )t)φ1,1 (t)dt − 2 (x0 + (x1 − x0 )t)φ1,1 (t)dt
0 0 1/2
N
X −1 Z n+1
+ (xn + (xn−1 − xn )t)2 dt
n=0 n
PN −1
and a) and b) to find an expression for kφ1,1 (t) − n=0 xn φ0,n (t)k2 .
d) To find the minimum least squares error, we can set the gradient of the
expression in c) to zero, and thus find the expression for the projection of φ1,1
−1
onto V0 . Show that the values {xn }Nn=0 can be found by solving the equation
1
Sx = b, where S = 3 {1, 4, 1} is an N × N symmetric filter, and b is the vector
√
with components b0 = b1 = 2/2, and bk = 0 for k ≥ 2.
e) Solve the system in d. for some values of N to verify that the projection of
φ1,1 onto V0 is nonzero, and that its support covers the entire [0, N ].
2 1
hφ0,n , φ0,n i = hφ0,n , φ0,n±1 i = hφ0,n , φ0,n±k i = 0 for k > 1.
3 6
As a consequence, the {φ0,n }n are neither orthogonal, nor have norm 1.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES197
lifting_odd_symm(lambda, x, bd_mode)
Show that we can obtain the piecewise linear φ we have defined as φ = χ[−1/2,1/2) ∗
χ[−1/2,1/2) (recall that χ[−1/2,1/2) is the function which is 1 on [−1/2, 1/2) and
0 elsewhere). This gives us a nice connection between the piecewise constant
scaling function (which is similar to χ[−1/2,1/2) ) and the piecewise linear scaling
function in terms of convolution.
Z N Z N
ψ̂(t) dt = tψ̂(t) dt = 0, (5.39)
0 0
m
N 2 −1
and define ψm = {ψ̂m,n }n=0 , and Wm as the space spanned by ψm .
We thus have two free variables α, β in Equation (5.38), to enforce the two
conditions in Equation (5.39). In Exercise 5.38 you are taken through the details
of solving this as two linear equations in the two unknowns α and β, and this
gives the following result:
Lemma 5.28. The new function ψ.
The function
1
ψ̂(t) = ψ(t) − φ0,0 (t) + φ0,1 (t) (5.40)
4
satisfies the conditions (5.39).
Using Equation (5.28), which stated that
1 1 1
φ0,n = √ φ1,2n−1 + φ1,2n + φ1,2n+1 , (5.41)
2 2 2
we get
1
ψ̂0,n = ψ0,n − φ0,n + φ0,n+1
4
1 1 1 1 1
= √ φ1,2n+1 − √ φ1,2n−1 + φ1,2n + φ1,2n+1
2 4 2 2 2
1 1 1 1
− √ φ1,2n+1 + φ1,2n+2 + φ1,2n+3
4 2 2 2
1 1 1 3 1 1
=√ − φ1,2n−1 − φ1,2n + φ1,2n+1 − φ1,2n+2 − φ1,2n+3 (5.42)
2 8 4 4 4 8
In summary we have
1 1 1
φ0,n = √ ( φ1,2n−1 + φ1,2n + φ1,2n+1 )
2 2 2
1 1 1 3 1 1
ψ̂0,n = √ − φ1,2n−1 − φ1,2n + φ1,2n+1 − φ1,2n+2 − φ1,2n+3 ,
2 8 4 4 4 8
(5.43)
1.0
0.8
0.6
0.4
0.2
0.0
0.2
0.4
2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0
Figure 5.18: The function ψ̂ we constructed as an alternative wavelet for
piecewise linear functions.
1
G = Pφm ←Cˆm = Pφm ←Cm PCm ←Cˆm , = √ B1/2 A−1/4
2
This gives us a factorization of the IDWT in terms of lifting matrices. The inverse
of elementary lifting matrices of even type can be found similarly to how we
found the inverse of elementary lifting matrices of odd type, i.e. (Aλ )−1 = A−λ .
This means that the matrix for the DWT is easily found also in this case,
and
√
H = PCˆm ←φm = 2A1/4 B−1/2 (5.45)
1
G = Pφm ←Cˆm = √ B1/2 A−1/4 . (5.46)
2
Note that equations (5.43) also computes the matrix G, but we will rather
use these factorizations, since the elementary lifting operations are already
implemented in the exercises. We will also explain later why such a factorization
is attractive in terms of saving computations. In the exercises you will be asked
to implement a function lifting_even_symm which computes Aλ . Using this
the DWT kernel transformation for the alternative piecewise linear wavelet can
be applied to a vector x as follows.
x *= sqrt(2)
lifting_odd_symm(-0.5, x, symm)
lifting_even_symm(0.25, x, symm)
x, fs = forw_comp_rev_DWT(m, ’pwl2’)
play(x, fs)
The new, undesired effect in the castanets from Example 5.28 now seem to be
gone. The detail for m = 1 this can be played as follows
x, fs = forw_comp_rev_DWT(1, ’pwl2’, 0)
play(x, fs)
x, fs = forw_comp_rev_DWT(2, ’pwl2’, 0)
play(x, fs)
1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.00 20000400006000080000100000
120000 1.00 20000400006000080000100000
120000
Figure 5.19: The detail in our audio sample file for the alternative piecewise
linear wavelet, for m = 1 (left) and m = 2 (right).
The errors are shown in Figure 5.19. Again, when comparing with Exam-
ple 5.10 we see much of the same. It is difficult to see an improvement from
this figure. However, this figure also clearly shows a smaller error than the
piecewise linear wavelet. A partial explanation is that the wavelet we now have
constructed has two vanishing moments, while the other had not.
lifting_even_symm(lambda, x, bd_mode)
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.00 200 400 600 800 1000 0.00 200 400 600 800 1000
1.0
0.8
0.6
0.4
0.2
0.00 200 400 600 800 1000
Figure 5.20: The error (i.e. the contribution from W0 ⊕ W1 ⊕ · · · ⊕ Wm−1 ) for
N = 1025 when f is a square wave, the linear function f (t) = 1 − 2|1/2 − t/N |,
and the trigonometric function f (t) = 1/2 + cos(2πt/N )/2, respectivey. The
detail is indicated for m = 6 and m = 8.
Z N Z N
1 1
ψ̂(t)dt = − α − β, tψ̂(t)dt = − β.
0 2 0 4
c) Explain why there is a unique function on the form given by Equation (5.38)
which has two vanishing moments, and that this function is given by Equation
(5.40).
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES203
Z 1 Z 2 Z 1
ak = tk (1 − |t|)dt, bk = tk (1 − |t − 1|)dt, ek = tk (1 − 2|t − 1/2|)dt,
−1 0 0
Hint. you can integrate functions in Python with the function quad in the
package scipy.integrate . As an example, the function φ(t), which is nonzero
only on [−1, 1], can be integrated as follows:
Z 0 Z 3
k
gk = t (1 − |t + 1|)dt, dk = tk (1 − |t − 2|)dt
−2 1
Hint. The placement of −γ may seem a bit strange here, and has to with
−1
that φ0,−1 is not one of the basis functions {φ0,n }N
n=0 . However, we have that
φ0,−1 = φ0,N −1 , i.e. φ(t + 1) = φ(t − N + 1), since we always assume that the
functions we work with have period N .
e) Sketch a more general procedure than the one you found in b)., which can
be used to find wavelet bases where we have even more vanishing moments.
a) Show that ψ̂ has k vanishing moments if and only if a0 , . . . , ak−1 solves the
equation
c0,0 c0,1 ··· c0,k−1 a0 e0
c1,0 c1,1 ··· c1,k−1
a1 e1
.. = .. (5.48)
.. .. .. ..
. . . . . .
ck−1,0 ck−1,1 · · · ck−1,k−1 ak−1 ek−1
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES205
V0 ⊂ V1 ⊂ V2 ⊂ · · · ⊂ Vm ⊂ · · · .
We also showed that continuous functions could be approximated arbitrarily
well from Vm , as long as m was chosen large enough. Moreover, the space V0 is
closed under all translates, at least if we view the functions in V0 as periodic
with period N . In the following we will always identify a function with this
periodic extension, just as we did in Fourier analysis. When performing this
identification, we also saw that f (t) ∈ Vm if and only if g(t) = f (2t) ∈ Vm+1 .
We have therefore shown that the scaling functions we have considered fit into
the following general framework.
Definition 5.29. Multiresolution analysis.
A Multiresolution analysis, or MRA, is a nested sequence of function spaces
V0 ⊂ V1 ⊂ V2 ⊂ · · · ⊂ Vm ⊂ · · · , (5.49)
called resolution spaces, so that
• Find a function φ which can serve as the scaling function for an MRA,
• Find a function ψ so that ψ = {ψ(t − n)}0≤n<N and φ = {φ(t − n)}0≤n<N
together form an orthonormal basis for V1 . The function ψ is also called a
mother wavelet.
With V0 the space spanned by φ = {φ(t − n)}0≤n<N , and W0 the space spanned
by ψ = {ψ(t − n)}0≤n<N , φ and ψ should be chosen so that we easily can
compute the decomposition of g1 ∈ V1 into g0 + e0 , where g0 ∈ V0 and e0 ∈ W0 .
If we can achieve this, the Discrete Wavelet Transform is defined as the change
of coordinates from φ1 to (φ0 , ψ0 ).
More generally, if
X X X
f (t) = cm,n φm,n = c0,n φ0,n + wm0 ,n ψm0 ,n ,
n n m0 <m,n
While there are in general many possible choices of detail spaces, in the
case of an orthonormal wavelet we saw that it was natural to choose the detail
space Wm−1 as the orthogonal complement of Vm−1 in Vm , and obtain the
mother wavelet by projecting the scaling function onto the detail space. Thus,
for orthonormal MRA’s, the low-resolution approximation and the detail can be
obtained by computing projections, and the least squares approximation of f
from Vm can be computed as
X
projVm (f ) = hf, φm,n iφm,n (t).
n
Working with the samples of f rather than f itself: The first crime of
wavelets. In Exercise 5.1 we saw that for the piecewise constant wavelet the
coordinate vector of f in Φm equaled the sample vector of f . In Exercise 5.30 we
saw that the same held for the piecewise linear wavelet. The general statement
is false, however: The coordinate
P vector of f in Φ0 may not equal the samples
(f (0), f (1), ...), so that n f (n)φ0,n and f are two different functions.
In most applications, a function is usually only available through its samples.
In many books on wavelets, one starts with theseP samples, and computes their
DWT. This means that the underlying function is n f (n)φ0,n , and since this is
different from f in general, we compute something completely different than we
want. This shows that many books apply a wrong procedure when computing
the DWT. This kind of error is also called the first crime of wavelets.
So, how bad is this crime? We will address this with two results. First we
will see how the samples P are related to the wavelet coefficients. Then we will
see how the function s f (s/2m )φm,s (t) is related to f (wavelet crime assumes
equality).
Theorem 5.31. Relation between samples and wavelet coefficients.
RN
Assume that φ̃ has compact support and is absolutely integrable, i.e. 0 |φ̃(t)| dt <
∞. Assume also that f is continuous and has wavelet coefficients cm,n . Then
we have that
Z N
m/2 m0
lim 2 cm,n2m−m0 = f (n/2 ) φ̃(t) dt.
m→∞ 0
Z N
cm,n2m−m0 = f (t)φ̃m,n2m−m0 (t) dt
0
Z N Z N
0
= f (n/2m ) φ̃m,n2m−m0 (t) dt + r(t)φ̃m,n2m−m0 (t) dt
0 0
Z N Z N
0
≤ 2−m/2 f (n/2m ) φ̃(t) dt + |φ̃m,n2m−m0 (t)| dt
0 0
Z N Z N
0
= 2−m/2 f (n/2m ) φ̃(t) dt + 2−m/2 |φ̃(t)| dt.
0 0
0 RN
From this it follows that limm→∞ 2m/2 cm,n2m−m0 = f (n/2m ) 0 φ̃(t) dt, since
was arbitrary, and φ̃(t) was assumed to be absolutely integrable.
This result has an important application. It turns out that there is usually
no way to find analytical expressions for the scaling function and the mother
wavelet. Their coordinates in (φ0 , ψ0 ) are simple, however, since there is only
one non-zero coordinate:
P
This says that, up to the constant factor c = n φ(n), the functions fm ∈ Vm
with coordinates 2−m/2 (f (0/2m ), f (1/2m ), ...) in Φm converge pointwise to f as
m → ∞ (even though the samples of fm may not equal those of f ).
0
Proof. With t = n/2m , for m > m0 we have that
0 0 0
φm,s (t) = φm,s (n2m−m /2m ) = 2m/2 φ(2m n2m−m /2m −s) = 2m/2 φ(n2m−m −s).
Our algorithm can take as input whether we want to plot the φ or the ψ function
(and thereby choose among these sets of coordinates), and also the value of the
dual parameter, which we will return to. The following algorithm can be used
for all this
5.7 Summary
We started this chapter by motivating the theory of wavelets as a different
function approximation scheme, which solved some of the shortcomings of Fourier
series. While one approximates functions with trigonometric functions in Fourier
theory, with wavelets one instead approximates a function in several stages,
where one at each stage attempts to capture information at a given resolution,
using a function prototype. This prototype is localized in time, contrary to
the Fourier basis functions, and this makes the theory of wavelets suitable for
time-frequency representations of signals. We used an example based on Google
Earth to illustrate that the wavelet-based scheme can represent an image at
different resolutions in a scalable way, so that passing from one resolution to
another simply mounts to adding some detail information to the lower resolution
version of the image. This also made wavelets useful for compression, since the
images at different resolutions can serve as compressed versions of the image.
We defined the simplest wavelet, the Haar wavelet, which is a function
approximation scheme based on piecewise constant functions, and deduced its
properties. We defined the Discrete Wavelet Transform (DWT) as a change of
coordinates corresponding to the function spaces we defined. This transform is
the crucial object to study when it comes to more general wavelets also, since
it is the object which makes wavelets useful for computation. In the following
chapters, we will see that reordering of the source and target bases of the
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES211
DWT will aid in expressing connections between wavelets and filters, and in
constructing optimized implementations of the DWT.
We then defined another wavelet, which corresponded to a function approxi-
mation scheme based on piecewise linear functions, instead of piecewise constant
functions. There were several differences with the new wavelet when compared
to the previous one. First of all, the basis functions were not orthonormal, and
we did not attempt to make them orthonormal. The resolution spaces we now
defined were not defined in terms of orthogonal bases, and we had some freedom
on how we defined the detail spaces, since they are not defined as orthogonal
complements anymore. Similarly, we had some freedom on how we define the
mother wavelet, and we mentioned that we could define it so that it is more
suitable for approximation of functions, by adding what we called vanishing
moments.
From these examples of wavelets and their properties we made a generalization
to what we called a multiresolution analysis (MRA). In an MRA we construct
successively refined spaces of functions that may be used to approximate functions
arbitrarily well. We will continue in the next chapter to construct even more
general wavelets, within the MRA framework.
The book [29] goes through developments for wavelets in detail. While
wavelets have been recognized for quite some time, it was with the important
work of Daubechies [12, 13] that they found new arenas in the 80’s. Since then
they found important applications. The main application we will focus on in
later chapters is image processing.
Previously we saw that analog filters restricted to the Fourier spaces gave rise to
digital filters. These digital filters sent the samples of the input function to the
samples of the output function, and are easily implementable, in contrast to the
analog filters. We have also seen that wavelets give rise to analog filters. This
leads us to believe that the DWT also can be implemented in terms of digital
filters. In this chapter we will prove that this is in fact the case.
There are some differences between the Fourier and wavelet settings, however:
Due to these differences, the way we realize the DWT in terms of filters will
be a bit different. Despite the differences, this chapter will make it clear
that the output of a DWT can be interpreted as the combined output of two
different filters, and each filter will have an interpretation in terms of frequency
representations. We will also see that the IDWT has a similar interpretation in
terms of filters.
In this chapter we will also see that expressing the DWT in terms of filters
will also enable us to define more general transforms, where even more filters
are used. It is fruitful to think about each filter as concentrating on a particular
frequency range, and that these transforms thus simply splits the input into
different frequency bands. Such transforms have important applications to the
212
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 213
processing and compression of sound, and we will show that the much used MP3
standard for compression of sound takes use of such transforms.
(
(H0 cm )k when k is even
(Hcm )k =
(H1 cm )k when k is odd,
since the left hand side depends only on row k in the matrix H, and this is equal
to row k in H0 (when k is even) or row k in H1 (when k is odd). This means
that Hcm can be computed with the help of H0 and H1 as follows:
Theorem 6.3. DWT expressed in terms of filters.
Let cm be the coordinates in φm , and let H0 , H1 be defined as above. Any
stage in a DWT can ble implemented in terms of filters as follows:
This gives an important connection between wavelets and filters: The DWT
corresponds to applying two filters, H0 and H1 , and the result from the DWT
is produced by assembling half of the coordinates from each. Keeping only
every second coordinate is called downsampling (with a factor of two). Had
we not performed downsampling, we would have ended up with twice as many
coordinates as we started with. Downsampling with a factor of two means that
we end up with the same number of samples as we started with. We also say that
the output of the two filters is critically sampled. Due to the critical sampling, it
is inefficient to compute the full application of the filters. We will return to the
issue of making efficient implementations of critically sampled filter banks later.
We can now complement Figure 5.9 by giving names to the arrows as follows:
H0 H0 H0 H0 H0
φm / φm−1 / φm−2 / ··· / φ1 / φ0
H1 H1 H1 H1
" # #
ψ m−1 ψ m−2 ψ m−3 ψ0
Let us make a similar anlysis for the IDWT, and let us first make the following
definition:
Definition 6.4. G0 and G1 .
We denote by G0 the (unique) filter with the same first column as G, and by
G1 the (unique) filter with the same second column as G. G0 and G1 are also
called the IDWT filter components.
These filters are uniquely determined, since any filter is uniquely determined
from one of its columns. We can now write
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 215
cm−1,0 cm−1,0 0
wm−1,0
0
wm−1,0
cm−1,1
cm−1,1
0
cm = G
wm−1,1 = G
0 +
wm−1,1
···
···
···
cm−1,2m−1 N −1 cm−1,2m−1 N −1 0
wm−1,2m−1 N −1 0 wm−1,2m−1 N −1
cm−1,0 0
0
w m−1,0
cm−1,1
0
= G
0 + G
wm−1,1
···
···
cm−1,2m−1 N −1 0
0 wm−1,2m−1 N −1
cm−1,0 0
0
wm−1,0
cm−1,1
0
= G0
0 + G1
wm−1,1 .
···
···
cm−1,2m−1 N −1 0
0 wm−1,2m−1 N −1
Here we have split a vector into its even-indexed and odd-indexed elements,
which correspond to the coefficients from φm−1 and ψm−1 , respectively. In
the last equation, we replaced with G0 , G1 , since the multiplications with G
depend only on the even and odd columns in that matrix (due to the zeros
inserted), and these columns are equal in G0 , G1 . We can now state the following
characterization of the inverse Discrete Wavelet transform:
G1 G1 G1 G1
Note that the filters G0 , G1 were defined in terms of the columns of G, while
the filters H0 , H1 were defined in terms of the rows of H. This difference is seen
from the computations above to come from that the change of coordinates one
way splits the coordinates into two parts, while the inverse change of coordinates
performs the opposite. Let us summarize what we have found as follows.
G0
c1 ⊕O
G1
H1 c1 / ↓2 / w0 / ↑2 / (0, wm−1,0 , 0, wm−1,1 , · · · )
the world of filters in the world of wavelets, and to give useful interpretations
of the wavelet transform in terms of frequencies. Secondly, and perhaps most
important, it enables us to reuse efficient implementations of filters in order
to compute wavelet transformations. A lot of work has been done in order to
establish efficient implementations of filters, due to their importance.
In Example 5.10 we argued that the elements in Vm−1 correspond to frequen-
cies at lower frequencies than those in Vm , since V0 = Span({φ0,n }n ) should be
interpreted as content of lower frequency than the φ1,n , with W0 = Span({ψ0,n }n )
the remaining high frequency detail. To elaborate more on this, we have that
2N
X −1
φ(t) = (G0 )n,0 φ1,n (t) (6.2)
n=0
2N
X −1
ψ(t) = (G1 )n−1,1 φ1,n (t)., (6.3)
n=0
where (Gk )i,j are the entries in the matrix Gk . Similar equations are true for
φ(t − k), ψ(t − k). Due to Equation (6.2), the filter G0 should have lowpass
characteristics, since it extracts the information at lower frequencies. Similarly,
G1 should have highpass characteristics due to Equation (6.3).
repeated along the diagonal. The filters G0 and G1 can be found directly from
these columns:
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 218
√ √
G0 = {1/ 2, 1/ 2}
√ √
G1 = {1/ 2, −1/ 2}.
1 1 √
λG0 (ω) = √ + √ e−iω = 2e−iω/2 cos(ω/2)
2 2
1 iω 1 √
λG1 (ω) = √ e − √ = 2ieiω/2 sin(ω/2).
2 2
By considering the filters where the rows are as in Equation (6.4), it is clear that
√ √
H0 = {1/ 2, 1/ 2}
√ √
H1 = {−1/ 2, 1/ 2},
so that the frequency responses for the DWT have the same lowpass/highpass
characteristics.
1
G0 = √ {1/2, 1, 1/2}
2
1
G1 = √ {1}. (6.5)
2
G0 is again a filter we have seen before: Up to multiplication with a constant, it
is the treble-reducing filter with values from row 2 of Pascal’s triangle. We see
something different here when compared to the Haar wavelet, in that the filter
G1 is not the highpass filter corresponding to G0 . The frequency responses are
now
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 219
1 1 1 1
λG0 (ω) = √ eiω + √ + √ e−iω = √ (cos ω + 1)
2 2 2 2 2 2
1
λG1 (ω) = √ .
2
λG1 (ω) thus has magnitude √12 at all points. Comparing with Figure 6.5 we see
that here also the frequency response has a zero at π. The frequency response
seems also to be flatter around π. For the DWT we have that
√
H0 = 2{1}
√
H1 = 2{−1/2, 1, −1/2}. (6.6)
Even though G1 was not the highpass filter corresponding to G0 , we see that,
up to a constant, H1 is (it is a bass-reducing filter with values taken from row 2
of Pascals triangle).
1
G0 = √ {1/2, 1, 1/2}
2
1
G1 = √ {−1/8, −1/4, 3/4, −1/4, −1/8}. (6.7)
2
Here G0 was as for the wavelet of piecewise linear functions since we use the
same scaling function. G1 was changed, however. Clearly, G1 now has highpass
characteristics, while the lowpass characteristic of G0 has been preserved.
The filters G0 , G1 , H0 , H
√1 are
√ particularly important in applications: Apart
from the scaling factors 1/ 2, 2 in front, we see that the filter coefficients are
all dyadic fractions, i.e. they are on the form β/2j . Arithmetic operations with
dyadic fractions can be carried out exactly on a computer, due to representations
as binary numbers in computers. These filters are thus important in applications,
since they can be used as transformations for lossless coding. The same argument
can be made for the Haar wavelet, but this wavelet had one less vanishing moment.
Note that the role of H1 as the high-pass filter corresponding to G0 is the
case in both previous examples. We will prove in the next chapter that this
is a much more general result which holds for all wavelets, not only for the
orthonormal ones.
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 220
6.1.1 The dual filter bank transform and the dual param-
eter
Since the reverse transform inverts the forward transform, GH = I. If we
transpose this expression we get that H T GT = I. Clearly H T is a reverse
filter bank transform with filters (H0 )T , (H1 )T , and GT is a forward filter bank
transform with filters (G0 )T , (G1 )T . Due to their usefulness, these transforms
have their own name:
• If the dual parameter is false, the DWT is computed as the forward filter
bank transform with filters H0 , H1 , and the IDWT is computed as the
reverse filter bank transform with filters G0 , G1 .
• If the dual parameter is true, the DWT is computed as the forward filter
bank transform with filters (G0 )T , (G1 )T , and the IDWT is computed as
the reverse filter bank transform with filters (H0 )T , (H1 )T .
This means that we can differ between the DWT, IDWT, and their duals as
follows.
Note that, even though the reverse filter bank transform G can be associated
with certain function bases, it is not clear if the reverse filter bank transform
H T also can be associated with such bases. We will see in the next chapter that
such bases can in many cases be found. We will also denote these bases as dual
bases.
The construction of the dual wavelet transform was function-free - we have no
reason to believe that they correspond to scaling functions and mother wavelets.
In the next chapter we will show that such dual scaling functions and dual
mother wavelets exist in many cases. We can set the dual parameter to True in
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 221
the implementation of the cascade algorithm in Example 5.43 to see how the
functions must look. In Figure 6.4 we have plotted the result. We see that these
functions look very irregular. Also, they are very different from the original
scaling function and mother wavelet. We will later argue that this is bad, it
would be much better if φ ≈ φ̃ and ψ ≈ ψ̃.
1200 1200
φ̃(t) 1000 ψ̃(t)
1000
800
800 600
400
600
200
400 0
200
200
400
02 1 0 1 2 3 4 5 6 600 2 1 0 1 2 3 4 5 6
20
6 φ̃(t) ψ̃(t)
15
4 10
2 5
0 0
5
2
2 1 0 1 2 3 4 5 6 10 2 1 0 1 2 3 4 5 6
Figure 6.4: Dual functions for the two piecewise linear wavelets.
plt.figure()
plt.plot(omega, abs(fft.fft(g)), ’k-’)
If the parameter dual is set to True, the dual filters (H0 )T and (H1 )T are plotted
instead. If the filters have real coefficients, |λHiT (ω)| = |λHi (ω)|, so the correct
frequency responses are shown.
To plot the same frequency response for the alternative piecewise linear wavelet,
we can write
the function with the leftmost support is φ1,M0 , while the one with the rightmost
one is φ1,M1 . These supports are [(m0 + M0 )/2, (m1 + M1 )/2]. In order for the
supports of the two sides to match we clearly must have
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 223
1.4 1.4
1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.00 1 2 3 4 5 6 0.00 1 2 3 4 5 6
1.4 1.4
1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.00 1 2 3 4 5 6 0.00 1 2 3 4 5 6
Figure 6.5: The frequency responses λG0 (ω) and λG1 (ω) for the Haar wavelet
(top), and for the alternative piecewise linear wavelet (bottom).
There are two special cases of the above we will run into.
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 224
Wavelets with symmetric filters. The results then say the support of φ is
[−M1 , M1 ] (i.e. symmetric around 0), and the support of ψ is 1/2 + [−(M1 +
N1 )/2, (M1 + N1 )/2], i.e. symmetric around 1/2. The wavelet with most such
filter coefficients we will consider has 7 and 9 filter coefficients, respectively, so
that the support of φ is [−3, 3], and the support of ψ is [−3, 4]. This is why
we have plotted these functions over [−4, 4], so that the entire function can be
seen. For the alternative piecewise linear wavelet the same argument gives that
support of φ is [−1, 1], and the support of ψ is [−1, 2] (which we already knew
from Figure 5.18). For the piecewise linear wavelet the support of ψ is deduced
to be [0, 1].
Orthonormal wavelets. For these wavelets it will turn that G0 has filter
coefficients evenly distributed around 1/2, and G1 has equally many, and evenly
distributed around −1/2. It is straightforward to check that the filters for the
Haar wavelet are of this kind, and this will turn out to be the simplest case of an
orthonormal wavelet. For such supports Theorem 6.9 says that both supports are
symmetric around 1/2, and that both φ, ψ, G0 and G1 have the same support
lengths. This can also be verified from the plots for the Haar wavelet. We
will only consider orthonormal wavelets with at most 8 filter coefficients. This
number of filter coefficients is easily seen to give the support [−3, 4], which is
why we have used [−4, 4] as a common range when we plot functions on this
form.
This is different from the symmetric extension given by Definition 4.1. Note
that (f˘(0), f˘(1), ..., f˘(N − 1), f˘(N ), f˘(N + 1), ..., f˘(2N − 1)) ∈ R2N is now the
symmetric extension of (f (0), f (1), ..., f (N )), so that this way of defining sym-
metric extensions is perhaps the most natural when it comes to sampling which
includes the boundaries.
2.0 2.0
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0 10 20 30 40 0.0 0 10 20 30 40
Figure 6.6: A vector and its symmetric extension. Note that the period of the
vector is now 2N − 2, while it was 2N for the vector shown in Figure 4.1.
(H x̆)N −1−k = (Hi x̆)N −1−k = (Hi x̆)N −1+k = (H x̆)N −1+k .
It follows that H preserves the same type of symmetric extensions, i.e. there
exists an N × N -matrix Hr so that H x̆ = H˘r x. Moreover, the entries in Hr x
are assembled from the entries in (Hi )r x, in the same way as the entries in Hx
are assembled from the entries in Hi x.
Note also that setting every second element to zero in a symmetric extension
only creates a new symmetric extension, so that G also preserves symmetric
extensions. It follows that there exist N × N -matrices Gr , (G0 )r , (G1 )r so that
Gx̆ = G˘r x, and so that the entries 0, ..., N − 1 in the output of Gr are obtained
by combining (G0 )r and (G1 )r as in Theorem 6.5.
Theorem 6.11. Symmetric filters and symmetric extensions.
If the filters H0 , H1 , G0 , and G1 in a wavelet transform are symmetric, then
the DWT/IDWT preserve symmetric extensions (as defined in Definition 6.10).
Also, applying the filters H0 , H1 , G0 , and G1 to x̆ ∈ R2N −2 in the DWT/IDWT
is equivalent to applying (H0 )r , (H1 )r , (G0 )r , and (G1 )r to x ∈ RN as described
in theorems 6.3 and 6.5.
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 226
• The transpose of the dual DWT can be computed with an IDWT with the
kernel of the IDWT
• The transpose of the IDWT can be computed with a DWT with the kernel
of the dual DWT
• The transpose of the dual IDWT can be computed with a DWT with the
kernel of the DWT
1 2λ 0 0 ··· 0 00
0 1 0 0
··· 0 00
0 λ 1 λ ··· 0 00
(Aλ )r = . (6.9)
.. .. .. .. .. ....
.. . . . . . . .
0 0 0 0 · · · λ 1 λ
0 0 0 0 ··· 0 0 1
1 0 0 0 ··· 0 0 0
λ 1 λ 0
· · · 0 0 0
0 0 1 0 · · · 0 0 0
(Bλ )r = . . . . .. . (6.10)
.. .. ..
.. .. .. .. . . . .
0 0 0 0 · · · 0 1 0
0 0 0 0 · · · 0 2λ 1
(S2 )f
Sr = S1 + 0 0 .
Use the proof of Theorem 4.9 as a guide.
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 228
b with x ∈ R2N −2 a
is an orthonormal basis for the vectors on the form x
symmetric extension.
c) Show that
1 0
√ cos 2π k
2N − 2 2N − 2
N −2
1 n
√ cos 2π k
N −1 2N − 2 n=1
1 N −1
√ cos 2π k (6.12)
2N − 2 2N − 2
N −1
n
cos 2π k (6.13)
2N − 2 n=0
2N −3 2
if n1 = n2 ∈ {0, N − 1}
X n1 n2
cos 2π k cos 2π k = (N − 1) × 1 if n1 = n2 6∈ {0, N − 1} .
2N − 2 2N − 2
k=0
0 if n1 6= n2
(6.14)
a) Show that
1
if n1 = n2 ∈ {0, N − 1}
1
(N − 1) × if n1 = n2 6∈ {0, N − 1}
2
0 if n1 6= n2
1 n1 1 n2
= √ cos 2π · 0 √ cos 2π ·0
2 2N − 2 2 2N − 2
N −2
X n1 n2
+ cos 2π k cos 2π k
2N − 2 2N − 2
k=1
1 n1 1 n2
+ √ cos 2π (N − 1) √ cos 2π (N − 1) .
2 2N − 2 2 2N − 2
N −2 !
1 n n 1 n
dn,N √ cos 2π · 0 , cos 2π k , √ cos 2π (N − 1) ,
2 2N − 2 2N − 2 k=1 2 2N − 2
(I) (I) √ (I) p
and define d0,N = dN −1,N = 1/ N − 1, and dn,N = 2/(N − 1) when n > 1.
(I)
The orthogonal N × N matrix where the rows are dn is called the DCT-I,
(I)
and we will denote it by DN . DCT-I is also much used, just as the DCT-II of
Chapter 4. The main difference from the previous cosine vectors is that 2N has
been replaced by 2N − 2.
(I)
b) Explain that the vectors dn are orthonormal, and that the matrix
√ √
1/ 2 0 ··· 0 0 1/ 2 0 ··· 0 0
r 0 1 ··· 0 0 0
1 ··· 0 0
2
.. .. .. ..
..
. .. .. .. ..
cos 2π 2Nn−2 k ..
N −1 . . . . . . . . .
0 0 ··· 1 0√ 0 0 ··· 1 0√
0 0 ··· 0 1/ 2 0 0 ··· 0 1/ 2
is orthogonal.
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 230
−1
c) Explain from b) that cos 2π 2Nn−2 k can be written as
1/2 0 ··· 0 0 1/2 0 ··· 0 0
0 1 ··· 0 0 0 1 ··· 0 0
2
.. .. .. .. ..
. .. .. .. ..
cos 2π 2Nn−2 k ..
N −1 . . . . . . . . .
0 0 ··· 1 0 0 0 ··· 1 0
0 0 ··· 0 1/2 0 0 ··· 0 1/2
1/5 1/5 1/5 0 0 0 ··· 0 1/5 1/5
−1/3 1/3 −1/3 0 0 0 ··· 0 0 0
H = 1/5
1/5 1/5 1/5 1/5 0 ··· 0 0 0
0 0 −1/3 1/3 −1/3 0 ···0 0 0
.. .. .. .. .. .. .. .. .. ..
. . . . . . . . . .
Write down the compact form for the corresponding filters H0 , H1 , and compute
and plot the frequency responses. Are the filters symmetric?
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 231
c = (x[0::2] + x[1::2])/sqrt(2)
w = (x[0::2] - x[1::2])/sqrt(2)
newx = concatenate([c, w])
newx /= abs(newx).max()
play(newx,44100)
a) Comment the code and explain what happens. Which wavelet is used? What
do the vectors c and w represent? Describe the sound you believe you will hear.
b) Assume that we add lines in the code above which sets the elements in the
vector w to 0 before we compute the inverse operation. What will you hear if
you play the new sound you then get?
which return the DWT and IDWT kernels using theorems 6.3 and 6.5, respectively.
This function thus bases itself on that the filters of the wavelet are known. The
functions should call the function filterS from a). Recall also the definition of
the parameter dual from this section.
With the functions defined in b) you can now define standard DWT and
IDWT kernels in the following way, once the filters are known.
GHφn = cn φn ,
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 234
for some constant cn (i.e. GH is a filter). If all cn are real, we say that we
no phase distortion. If GH = I (i.e. cn = 1 for all n) we say that we have
perfect reconstruction. If all cn are close to 1, we say that we have near-perfect
reconstruction.
In signal processing, one also says that we have perfect- or near-perfect
reconstruction when GH equals Ed , or is close to Ed (i.e. the overall result is a
delay). The reason why a delay occurs has to do with that the transforms are
used in real-time processing, for which we may not be able to compute the output
at a given time instance before we know some of the following samples. Clearly
the delay is unproblematic, since one can still can reconstruct the input from
the output. We will encounter a useful example of near-perfect reconstruction
soon in the MP3 standard.
Let us now find a criterium for alias cancellation: When do we have that
GHe2πirk/N is a multiplum of e2πirk/N , for any r? We first remark that
(
λH0 ,r e2πirk/N k even
H(e2πirk/N ) =
λH1 ,r e2πirk/N k odd.
The frequency response of H(e2πirk/N ) is
N/2−1 N/2−1
X X
2πir(2k)/N −2πi(2k)n/N
λH0 ,r e e + λH1 ,r e2πir(2k+1)/N e−2πi(2k+1)n/N
k=0 k=0
N/2−1 N/2−1
X X
= λH0 ,r e2πi(r−n)(2k)/N + λH1 ,r e2πi(r−n)(2k+1)/N
k=0 k=0
N/2−1
X
= (λH0 ,r + λH1 ,r e2πi(r−n)/N ) e2πi(r−n)k/(N/2) .
k=0
PN/2−1
Clearly, k=0 e2πi(r−n)k/(N/2) = N/2 if n = r or n = r + N/2, and 0 else.
The frequency response is thus the vector
N N
(λH0 ,r + λH1 ,r )er + (λH0 ,r − λH1 ,r )er+N/2 ,
2 2
so that
1 1
H(e2πirk/N ) = (λH0 ,r + λH1 ,r )e2πirk/N + (λH0 ,r − λH1 ,r )e2πi(r+N/2)k/N .
2 2
(6.15)
Let us now turn to the reverse filter bank transform. We can write
1 2πirk/N
(e2πir·0/N , 0, e2πir·2/N , 0, . . . , e2πir(N −2)/N , 0) = (e + e2πi(r+N/2)k/N )
2
1
(0, e2πir·1/N , 0, e2πir·3/N , . . . , 0, e2πir(N −1)/N ) = (e2πirk/N − e2πi(r+N/2)k/N ).
2
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 235
1 2πirk/N 1 2πirk/N
G(e2πirk/N ) = G0 e + e2πi(r+N/2)k/N + G1 e − e2πi(r+N/2)k/N
2 2
1 1
= (λG0 ,r e2πirk/N + λG0 ,r+N/2 e2πi(r+N/2)k/N ) + (λG1 ,r e2πirk/N − λG1 ,r+N/2 e2πi(r+N/2)k/N )
2 2
1 1
= (λG0 ,r + λG1 ,r )e2πirk/N + (λG0 ,r+N/2 − λG1 ,r+N/2 )e2πi(r+N/2)k/N . (6.16)
2 2
Now, if we combine equations (6.15) and (6.16), we get
GH(e2πirk/N )
1 1
= (λH0 ,r + λH1 ,r )G(e2πirk/N ) + (λH0 ,r − λH1 ,r )G(e2πi(r+N/2)k/N )
2 2
1 1 2πirk/N 1 2πi(r+N/2)k/N
= (λH0 ,r + λH1 ,r ) (λG0 ,r + λG1 ,r )e + (λG0 ,r+N/2 − λG1 ,r+N/2 )e )
2 2 2
1 1 2πi(r+N/2)k/N 1 2πirk/N
+ (λH0 ,r − λH1 ,r ) (λG0 ,r+N/2 + λG1 ,r+N/2 )e + (λG0 ,r − λG1 ,r )e )
2 2 2
1
= ((λH0 ,r + λH1 ,r )(λG0 ,r + λG1 ,r ) + (λH0 ,r − λH1 ,r )(λG0 ,r − λG1 ,r )) e2πirk/N
4
1
(λH0 ,r + λH1 ,r )(λG0 ,r+N/2 − λG1 ,r+N/2 ) + (λH0 ,r − λH1 ,r )(λG0 ,r+N/2 + λG1 ,r+N/2 ) e2πi(r+N/2)k/N
+
4
1 1
= (λH0 ,r λG0 ,r + λH1 ,r λG1 ,r )e2πirk/N + (λH0 ,r λG0 ,r+N/2 − λH1 ,r λG1 ,r+N/2 )e2πi(r+N/2)k/N .
2 2
If we also replace with the continuous frequency response, we obtain the following:
Theorem 6.14. Expression for aliasing.
We have that
1
GH(e2πirk/N ) = (λH0 ,r λG0 ,r + λH1 ,r λG1 ,r )e2πirk/N
2
1
+ (λH0 ,r λG0 ,r+N/2 − λH1 ,r λG1 ,r+N/2 )e2πi(r+N/2)k/N .
2
(6.17)
X X
λH1 (ω) = (H1 )k e−ikω = (−1)k α−1 (G0 )k−2d e−ikω
k k
X X
−1
=α (−1)k (G0 )k e−i(k+2d)ω = α−1 e−2idω (G0 )k e−ik(ω+π)
k k
= α−1 e−2idω λG0 (ω + π).
We have a similar computation for λG1 (ω). We can thus state the following:
Theorem 6.16. Criteria for perfect reconstruction.
The following statements are equivalent for FIR filters H0 , H1 , G0 , G1 :
Proof. Let us prove first that equations (6.23)- (6.25) for a FIR filter implies
that we have perfect reconstruction. Equations (6.23)-(6.24) mean that the alias
cancellation condition (6.18) is satisfied, since
which is Equation (6.19), so that equations (6.23)- (6.25) imply perfect recon-
struction. We therefore only need to prove that any set of FIR filters which give
perfect reconstruction, also satisfy these equations. Due to the calculation above,
it is enough to prove that equations (6.23)-(6.24) are satisfied. The proof of this
will wait till Section 8.1, since it uses some techniques we have not introduced
yet.
When constructing a wavelet it may be that we know one of the two pairs
(G0 , G1 ), (H0 , H1 ), and that we would like to construct the other two. This can
be achieved if we can find the constants d and α from above. If the filters are
symmetric we just saw that d = 0. If G0 , G1 are known, it follows from from
equations (6.20) and(6.21) that
X X X
1= (G1 )n (H1 )n = (G1 )n α−1 (−1)n (G0 )n = α−1 (−1)n (G0 )n (G1 )n ,
n n n
X X X
1= (G1 )n (H1 )n = α(−1)n (H0 )n (H1 )n = α (−1)n (H0 )n (H1 )n ,
n n n
so that α = 1/( n (−1)n (H0 )n (H1 )n ). Let us use these observations to state
P
the filters for the alternative wavelet of piecewise linear functions, which is the
only wavelet we have gone through we have not computed the filters and the
frequency response for.
Let us use Theorem 6.16 to compute the filters H0 and H1 for the alternative
piecewise linear wavelet. These filters are also symmetric, since G0 , G1 were. We
get that
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 238
X 1 1 1 3 1 1 1
α= (−1)n (G0 )n (G1 )n = − − +1· − − = .
n
2 2 4 4 2 4 2
We now get
so that
√
H0 = 2{−1/8, 1/4, 3/4, 1/4, −1/8}
√
H1 = 2{−1/2, 1, −1/2}. (6.27)
Note that, even though conditions (6.23) and (6.24) together ensure that the
alias cancellation condition is satisfied, alias cancellation can occur also if these
conditions are not satisfied. Conditions (6.23) and (6.24) thus give a stronger
requirement than alias cancellation. We will be particularly concerned with
wavelets where the filters are symmetric, for which we can state the following
corollary.
Corollary 6.17. Criteria for perfect reconstruction .
The following statements are equivalent:
This shows that G1 is symmetric about both −2d, in addition to being symmetric
about 0 (by assumption). We must thus have that d = 0, so that (H1 )n =
(−1)n α(G0 )n and (G1 )n = (−1)n α−1 (H0 )n . We now get that
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 239
X X
λH1 (ω) = (H1 )k e−ikω = α−1 (−1)k (G0 )k e−ikω
k k
X X
−1 −ikπ −ikω
=α e (G0 )k e = α−1 (G0 )k e−ik(ω+π)
k k
= α−1 λG0 (ω + π),
2 = λH0 (ω)λG0 (ω) + λH1 (ω)λG1 (ω) = λH0 (ω)2 + λH0 (ω + π)2 . (6.32)
The perfect reconstruction condition for an alternative QMF filter bank can
be written as
2 = λH0 (ω)λG0 (ω) + λH1 (ω)λG1 (ω) = λH0 (ω)λH0 (ω) + λH0 (ω + π)λH0 (ω + π)
= |λH0 (ω)|2 + |λH0 (ω + π)|2 .
We see that the perfect reconstruction property of the two definitions of QMF
filter banks only differ in that the latter take absolute values. It turns out that
the latter also has many interesting solutions, as we will see in Chapter 7. If we
in in condition (6.23) substitute G0 = (H0 )T we get
• ...
• ziM +(M −1) = (HM −1 x)iM +(M −1) for any i so that 0 ≤ iM + (M − 1) < N .
x = G0 z0 + G1 z1 + . . . + GM −1 zM −1 . (6.34)
G0 , G1 , . . . , GM −1 are also called synthesis filter components.
Again, this generalizes the IDWT and its synthesis filters, and the IDWT
can be seen as a 2-channel reverse filter bank transform. Also, in the matrix of
a reverse filter bank transform, the columns repeat cyclically with period M ,
similarly to MRA-matrices. Also in this more general setting the filters Gi are
in general different from the filters Hi . But we will see that, just as we saw
for the Haar wavelet, there are important special cases where the analysis and
synthesis filters are equal, and where their frequency responses are simply shifts
of oneanother. It is clear that definitions 6.21 and 6.22 give the diagram for
computing forward and reverse filter bank transforms shown in Figure 6.7:
Here ↓M and ↑M means that we extract every M ’th element in the vector,
and add M − 1 zeros between the elements, respectively, similarly to how we
previously defined ↓2 and ↑2 . Comparing Figure 6.3 with Figure 6.7 makes the
similarities between wavelet transformations and the transformation used in
the MP3 standard very visible: Although the filters used are different, they are
subject to the same kind of processing, and can therefore be subject to the same
implementations.
In general it may be that the synthesis filters do not invert exactly the
analysis filters. If the synthesis system exactly inverts the analysis system, we
say that we have a perfect reconstruction filter bank. Since the analysis system
introduces undesired frequencies in the different channels, these have to cancel
in the inverse transform, in order to reconstruct the input exactly.
We will have use for the following simple connection between forward and
reverse filter bank transforms, which follows imemdiately from the definitions.
Theorem 6.23. Connection between forward and reverse filter bank transforms.
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 243
↓M ↑M
HE 0 x / ziM / z0
↓M ↑M
H / ziM +1 / z1
= 1x
G0
G1
.. .. ..
x . . . ⊕ /x
>F
GM −2
! ↓M ↑M
HM −2 x / ziM +(M −2) / z M −2 GM −1
↓M ↑M
HM −1 x / ziM +(M −1) / z M −1
The standard does not motivate these steps, and does not put them into the
filter bank transform framework which we have established. Also, the standard
does not explain how the values in the vector C have been constructed.
Let us start by proving that the steps above really corresponds to applying a
forward filter bank transform, and let us state the corresponding filters of this
transform. The procedure computes 32 outputs in each iteration, and each of
them is associated with a subband. Therefore, from the standard we would guess
that we have M = 32 channels, and we would like to find the corresponding 32
filters H0 , H1 , . . . , H31 .
It may seem strange to use the name matrixing here, for something which
obviously is matrix multiplication. The reason for this name must be that the
at the origin of the procedure come from outside a linear algebra framework.
The name windowing is a bit strange, too. This really does not correspond to
applying a window to the sound samples as we explained in Section 3.3.1. We
will see that it rather corresponds to applying a filter coefficient to a sound
sample. A third and final thing which seems a bit strange is that the order of the
input samples is reversed, since we are used to having the first sound samples
in time with lowest index. This is perhaps more usual to do in an engineering
context, and not so usual in a mathematical context. FIFO.
Clearly, the procedure above defines a linear transformation, and we need to
show that this linear transformation coincides with the procedure we defined for a
forward filter bank transform, for a set of 32 filters. The input to the transforma-
tion are the audio samples, which we will denote by a vector x. At iteration s of
the procedure above the input audio samples are x32s−512 , x32s−510 , . . . , x32s−1 ,
and Xi = x32s−i−1 due to the reversal of the input samples. The output to the
transformation at iteration s of the procedure are the S0 , . . . , S31 . We assem-
ble these into a vector z, so that the output at iteration s are z32(s−1) = S0 ,
z32(s−1)+1 = S1 ,. . . ,z32(s−1)+31 = S31 .
We will have use for the following cosine-properties, which are easily verified:
With the terminology above and using Property (6.35) the transformation can
be written as
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 245
63
X 63
X 7
X
z32(s−1)+n = cos((2n + 1)(k − 16)π/64)Yk = cos((2n + 1)(k − 16)π/64) Zk+64j
k=0 k=0 j=0
63 X
X 7
= (−1)j cos((2n + 1)(k + 64j − 16)π/64)Zk+64j
k=0 j=0
63 X
X 7
= cos((2n + 1)(k + 64j − 16)π/64)(−1)j Ck+64j Xk+64j
k=0 j=0
63 X
X 7
= cos((2n + 1)(k + 64j − 16)π/64)(−1)j Ck+64j x32s−(k+64j)−1 .
k=0 j=0
j
Now, if we define {hr }511
r=0 by hk+64j = (−1) Ck+64j , 0 ≤ j < 8, 0 ≤ k < 64, and
h(n) as the filter with coefficients {cos((2n + 1)(k − 16)π/64)hk }511
k=0 , the above
can be simplified as
511
X 511
X
z32(s−1)+n = cos((2n + 1)(k − 16)π/64)hk x32s−k−1 = (h(n) )k x32s−k−1
k=0 k=0
(n) (n)
= (h x)32s−1 = (En−31 h x)32(s−1)+n .
This means that the output of the procedure stated in the MP3 standard can
be computed as a forward filter bank transform, and that we can choose the
analysis filters as Hn = En−31 h(n) .
Theorem 6.24. Forward filter bank transform for the MP3 standard.
j
Define {hr }511
r=0 by hk+64j = (−1) Ck+64j , 0 ≤ j < 8, 0 ≤ k < 64, and h
(n)
511
as the filter with coefficients {cos((2n + 1)(k − 16)π/64)hk }k=0 . If we define
Hn = En−31 h(n) , the procedure stated in the MP3 standard corresponds to
applying the corresponding forward filter bank transform.
The filters Hn were shown in Example 3.34 as examples of filters which
concentrate on specific frequency ranges. The hk are the filter coefficients of
what is called a prototype filter. This kind of filter bank is also called a cosine-
modulated filter. The multiplication with cos (2π(n + 1/2)(k − 16)/(2N )) hk ,
modulated the filter coefficients so that the new filter has a frequency response
which is simply shifted in frequency in a symmetric manner: In Exercise 3.44,
we saw that, by multiplying with a cosine, we could contruct new filters with
real filter coefficients, which also corresponded to shifting a prototype filter in
frequency. Of course, multiplication with a complex exponential would also shift
the frequency response (such filter banks are called DFT-modulated filter banks),
but the problem with this is that the new filter has complex coefficients: It will
turn out that cosine-modulated filter banks can also be constructed so that they
are invertible, and that one can find such filter banks where the inverse is easily
found.
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 246
The effect of the delay in the definition of Hn is that, for each n, the
multiplications with the vector x are “aligned”, so that we can save a lot of
multiplications by performing this multiplication first, and summing these. We
actually save even more multiplications in the sum where j goes from 0 to 7, since
we here multiply with the same cosines. The steps defined in the MP3 standard
are clearly motivated by the desire to reduce the number of multiplications due
to these facts. A simple arithmetic count illutrates these savings: For every 32
output samples, we have the following number of multiplications:
The standard says nothing about how the matrix multiplication in the third
step can be implemented. A direct multiplication would yield 32 × 64 = 2048
multiplications, leaving a total number of multiplications at 2560. In a direct
implementation of the forward filter bank transform, the computation of 32
samples would need 32 × 512 = 16384 multiplications, so that the procedure
sketched in the standard gives a big reduction.
The standard does not mention all possibilities for saving multiplications,
however: We can reduce the number of multiplications even further, since clearly
a DCT-type implementation can be used for the matrixing operation. We already
have an efficient implementation for multiplication with a 32 × 32 type-III cosine
matrix (this is simply the IDCT). We have seen that this implementation can
be chosen to reduce the number of multiplications to N log2 N/2 = 80, so that
the total number of multiplications is 512 + 80 = 592. Clearly then, when we
use the DCT, the first step is the computationally most intensive part.
15
X 15
X
x32(s−1)+j = W32i+j = D32i+j U32i+j
i=0 i=0
7
X 7
X
= D64i+j U64i+j + D64i+32+j U64i+32+j
i=0 i=0
7
X 7
X
= D64i+j V128i+j + D64i+32+j V128i+96+j . (6.37)
i=0 i=0
V64r z32(s−r−1)
V64r+1 z32(s−r−1)+1
.. = N ,
..
. .
V64r+63 z32(s−r−1)+31
so that
31
X
V64r+j = cos((16 + j)(2k + 1)π/64)z32(s−r−1)+k
k=0
7
X 31
X
D64i+j cos((16 + j)(2k + 1)π/64)z32(s−2i−1)+k
i=0 k=0
7
X 31
X
+ D64i+32+j cos((16 + j + 32)(2k + 1)π/64)z32(s−2i−2))+k .
i=0 k=0
31 X
X 7
(−1)i D64i+j cos((16 + 64i + j)(2k + 1)π/64)z32(s−2i−1)+k
k=0 i=0
31 X
X 7
+ (−1)i D64i+32+j cos((16 + 64i + j + 32)(2k + 1)π/64)z32(s−2i−2)+k .
k=0 i=0
i
Now, if we define {gr }511
r=0 by g64i+s = (−1) C64i+s , 0 ≤ i < 8, 0 ≤ s < 64, and
g as the filter with coefficients {cos((r + 16)(2k + 1)π/64)gr }511
(k)
r=0 , the above
can be simplified as
X 7
31 X 31 X
X 7
(g (k) )64i+j z32(s−2i−1)+k + (g (k) )64i+j+32 z32(s−2i−2)+k
k=0 i=0 k=0 i=0
31 7 7
!
X X X
(k)
= (g )32(2i)+j z32(s−2i−1)+k + (g (k) )32(2i+1)+j z32(s−2i−2)+k
k=0 i=0 i=0
31 X
X 15
= (g (k) )32r+j z32(s−r−1)+k ,
k=0 r=0
where we observed that 2i and 2i + 1 together run through the values from 0 to
15 when i runs from 0 to 7. Since z has the same values as zk on the indices
32(s − r − 1) + k, this can be written as
31 X
X 15
= (g (k) )32r+j (zk )32(s−r−1)+k
k=0 r=0
31
X 31
X
(k)
= (g zk )32(s−1)+j+k = ((E−k g (k) )zk )32(s−1)+j .
k=0 k=0
P31 (k)
By substituting a general s and j we see that x = k=0 (E−k g )zk . We have
thus proved the following.
Theorem 6.25. Reverse filter bank transform for the MP3 standard.
i
Define {gr }511
r=0 by g64i+s = (−1) C64i+s , 0 ≤ i < 8, 0 ≤ s < 64, and g
(k)
511
as the filter with coefficients {cos((r + 16)(2k + 1)π/64)gr }r=0 . If we define
Gk = E−k g (k) , the procedure stated in the MP3 standard corresponds to applying
the corresponding reverse filter bank transform.
In other words, both procedures for encoding and decoding stated in the
MP3 standard both correspond to filter banks: A forward filter bank transform
for the encoding, and a reverse filter bank transform for the decoding. Moreover,
both filter banks can be constructed by cosine-modulating prototype filters, and
the coefficients of these prototype filters are stated in the MP3 standard (up to
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 249
multiplication with an alternating sign). Note, however, that the two prototype
filters may be different. When we compare the two tables for these coefficients in
the standard they do indeed seem to be different. At closer inspection, however,
one sees a connection: If you multiply the values in the D-table with 32, and
reverse them, you get the values in the C-table. This indicates that the analysis
and synthesis prototype filters are the same, up to multiplication with a scalar.
This connection will be explained in Section 8.3.
While the steps defined in the MP3 standard for decoding seem a bit more
complex than the steps for encoding, they are clearly also motivated by the
desire to reduce the number of multiplications. In both cases (encoding and
decoding), the window tables (C and D) are in direct connection with the filter
coefficients of the prototype filter: one simply adds a sign which alternates for
every 64 elements. The standard document does not mention this connection,
and it is perhaps not so simple to find this connection in the literature (but see
[35]).
The forward and reverse filter bank transforms are clearly very related. The
following result clarifies this.
Theorem 6.26. Connection between the forward and reverse filter bank trans-
forms in the MP3 standard.
Assume that a forward filter bank transform has filters on the form Hi =
Ei−31 h(i) for a prototype filter h. Then G = E481 H T is a reverse filter bank
transform with filters on the form Gk = E−k g (k) , where g is a prototype filter
where the elements equal the reverse of those in h. Vice versa, H = E481 GT .
Proof. From Theorem 6.23 we know that H T is a reverse filter bank transform
with filters
Now, we define the prototype filter g with elements gk = h−(k−512) . This has,
just as h, its support on [1, 511], and consists of the elements from h in reverse
order. If we define g (i) as the filter with coefficients cos((2i + 1)(k + 16)π/64))gk ,
we see that E481 H T is a reverse filter bank transform with filters E−i g (i) . Since
g (k) now has been defined as for the MP3 standard, and its elements are the
reverse of those in h, the result follows.
We will have use for this result in Section 8.3, when we find conditions on
the protototype filter in order for the reverse transform to invert the forward
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 250
transform. Preferably, the reverse filter bank transform inverts exactly the
forward filter bank transform. In Exercise 6.26 we construct examples which
show that this is not the case. In the same exercise we also find many examples
where the reverse transform does what we would expect. These examples will
also be explained in Section 8.3, where we also will see how one can get around
this so that we obtain a system with perfect reconstruction. It may seem strange
that the MP3 standard does not do this.
In the MP3 standard, the output from the forward filter bank transform is
processed further, before the result is compressed using a lossless compression
method.
6.4 Summary
We started this chapter by noting that, by reordering the target base of the
DWT, the change of coordinate matrix took a particular form. From this form
we understood that the DWT could be realized in terms of two filters H0 and
H1 , and that the IDWT could be realized in a similar way in terms of two filters
G0 and G1 . This gave rise to what we called the filter representation of wavelets.
The filter representation gives an entirely different view on wavelets: instead of
constructing function spaces with certain properties and deducing corresponding
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 251
filters from these, we can instead construct filters with certain properties (such
as alias cancellation and perfect reconstruction), and attempt to construct
corresponding mother wavelets, scaling functions, and function spaces. This
strategy, which replaces problems from function theory with discrete problems,
will be the subject of the next chapter. In practice this is what is done.
We stated what is required for filter bank matrices to invert each other: The
frequency responses of the lowpass filters needed to satisfy a certain equation,
and once this is satsified the highpass filters can easily be obtained in the same
way we previously obtained highpass filters from lowpass filters. We will return
to this equation in the next chapter.
A useful consequence of the filter representation was that we could reuse
existing implementations of filters to implement the DWT and the IDWT, and
reuse existing theory, such as symmetric extensions. For wavelets, symmetric
extensions are applied in a slightly different way, when compared to the develop-
ments which lead to the DCT. We looked at the frequency responses of the filters
for the wavelets we have encountered upto now. From these we saw that G0 , H0
were lowpass filters, and that G1 , H1 were highpass filters, and we argued why
this is typically the case for other wavelets as well. The filter reprersentation
was also easily generalized from 2 to M > 2 filters, and such transformations
had a similar interpretation in terms of splitting the input into a uniform set of
frequencies. Such transforms were generally called filter bank transforms, and
we saw that the processing performed by the MP3 standard could be interpreted
as a certain filter bank transform, called a cosine-modulated filter bank. This
is just one of many possible filter banks. In fact, the filter bank of the MP3
standard is largely outdated, since it is too simple, and as we will see it does not
even give perfect reconstruction (only alias cancellation and no phase distortion).
It is merely chosen here since it is the simplest to present theoretically, and
since it is perhaps the best known standard for compression of sound. Other
filters banks with better properties have been constructed, and they are used in
more recent standards. In many of these filter banks, the filters do not partition
frequencies uniformly, and have been adapted to the way the human auditory
system handles the different frequencies. Different contruction methods are used
to construct such filter banks. The motivation behind filter bank transforms is
that their output is more suitable for further processing, such as compression, or
playback in an audio system, and that they have efficient implementations.
We mentioned that the MP3 standard does not say how the prototype filters
were chosen. We will have more to say on what dictates their choice in Section 8.3.
There are several differences between the use of wavelet transformations
in wavelet theory, and the use of filter bank transforms in signal processing
theory One is that wavelet transforms are typically applied in stages, while filter
bank transforms often are not. Nevertheless, such use of filter banks also has
theoretical importance, and this gives rise to what is called tree-structured filter
banks [47]. Another difference lies in the use of the term perfect reconstruction
system. In wavelet theory this is a direct consequence of the wavelet construction,
since the DWT and the IDWT correspond to change of coordinates to and from
the same bases. The alternative QMF filter bank was used as an example
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 252
of a filter bank which stems from signal processing, and which also shows
up in wavelet transformation. In signal processing theory, one has a wider
perspective, since one can design many useful systems with fast implementations
when one replaces the perfect reconstruction requirement with a near perfect
reconstruction requirement. One instead requires that the reverse transform
gives alias cancellation. The classical QMF filter banks were an example of this.
The original definition of classical QMF filter banks are from [9], and differ only
in a sign from how they are defined here.
All filters we encounter in wavelets and filter banks in this book are FIR.
This is just done to limit the exposition. Much useful theory has been developed
using IIR-filters.
Constructing interesting
wavelets
253
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 254
2N
X −1
φ(t) = (G0 )n,0 φ1,n (t) (7.1)
n=0
2N
X −1
ψ(t) = (G1 )n,1 φ1,n (t) (7.2)
n=0
!
Z ∞ Z ∞ √
1 −iωt 1 X
φ̂(ω) = √ φ(t)e dt = √ (G0 )n,0 2φ(2t − n) e−iωt dt
2π −∞ 2π −∞ n
XZ ∞
1 −iω(t+n)/2
=√ √ (G0 )n,0 φ(t)e dt
2 2π n −∞
! Z ∞
1 X 1 λG0 (ω/2)
=√ (G0 )n,0 e−iωn/2 √ φ(t)e−i(ω/2)t) dt = √ φ̂(ω/2).
2 n
2π −∞ 2
(7.3)
Clearly this expression can be continued recursively. We can thus state the
following result.
Theorem 7.1. gN .
Define
N
Y λG0 (ω/2s )
gN (ω) = √ χ[0,2π] (2−N ω). (7.4)
s=1
2
λG0 (2πν/2s+1 )
ln √
2
.
λG0 (2πν/2s )
ln √
2
P √ √ √
Since n (G0 )n = 2, we see that λG0 (0) = 2. Since limν→0 λG0 (ν) = 2,
both the numerator and the denominator above tends
to 0 (to one inside the
λG (ν/2)
ln 0√
2
logarithms), so that we can use L’hospital’s rule on λ to obtain
G0 (ν)
ln √
2
−inν/2
P
λG0 (ν) n (G0 )n (−in)e /2 1
P −inν
→ <1
λG0 (ν/2) n (G )
0 n (−in)e 2
as ν → 0. It follows that the product converges for any ν. Clearly the conver-
gence is absolute and uniform on compact sets, so that the limit is infinitely
differentiable.
! Z ∞
1 X
−iωn/2 1
ψ̂(ω) = √ (G1 )n−1,0 e √ φ(t)e−i(ω/2)t) dt
2 n
2π −∞
!
1 X λG1 (ω/2)
=√ (G1 )n,0 e−iω(n+1)/2 φ̂(ω/2) = e−iω/2 √ φ̂(ω/2).
2 n
2
X
Akf k2 ≤ |hf, un i|2 ≤ Bkf k2 .
n
If A = B, the frame is said to be tight.
Note that, for a frame of H, any f ∈ H is uniquely characterized by the
inner products hf, un i. Indeed, if both a, b ∈ H have the same inner products,
then a − b ∈ H have inner products 0, which implies that a = b from the left
inequality.
For every frame one can find a dual frame {u˜n }n which satisfies
1 X 1
kf k2 ≤ |hf, u˜n i|2 ≤ kf k2 ,
B n
A
and
X X
f= hf, un iu˜n = hf, u˜n iun . (7.5)
n n
L L̃
1 + e−iω 1 + e−iω
λG0 (ω) λH0 (ω)
√ = F(ω) √ = F̃(ω), (7.6)
2 2 2 2
where F and F̃ are trigonometric polynomials of finite degree. Assume also that,
for some k, k̃ > 0,
1/k
Bk = max F(ω) · · · F(2k−1 ω) < 2L−1/2 (7.7)
ω
1/k̃
B˜k = max F̃(ω) · · · F̃(2k̃−1 ω) < 2L̃−1/2 (7.8)
ω
If also
then
The proof for Proposition 7.5 is long, technical, and split in many stages.
The entire proof can be found in [7], and we will not go through all of it, only
address some simple parts of it in the following subsections. After that we will
see how we can find G0 , H0 so that equations (7.6), (7.7), (7.8) are fulfilled.
Before we continue on this path, several comments are in order.
1. The paper [7] much more general conditions for when filters give rise to a
Riesz basis as stated here. The conditions (7.7), (7.8) are simply chosen because
they apply or the filters we consider.
2. From Equation (7.6) it follows that the flatness in the frequency responses
close to π explains how good the bases are for approximations, since the number
of vanishing moments is infered from the multiplictity of the zero at π for the
frequency response.
3. From the result we obtain an MRA (with scaling function φ), and a dual
MRA (with scaling function φ̃), as well as mother wavelets (ψ and ψ̃), and we
can define the resolution spaces Vm and the detail spaces Wm as before, as well
as the “dual resolution spaces” Ṽm , (the spaces spanned by φ̃m = {φ̃m,n }n ) and
“dual detail spaces” W̃m (the spaces spanned by ψ̃m = {ψ̃m,n }n ). In general
Vm is different from V˜m (except when φ = φ̃), and Wm is in general different
from the orthogonal complement of Vm−1 in Vm (except when φ = φ̃, when all
bases are orthonormal), although constructed so that Vm = Vm−1 ⊕ Wm−1 . Our
construction thus involves two MRA’s
4. The DWT and IDWT are defined as before, so that the same change
of coordinates can be applied, as dictated by the filter coefficients. As will
be seen below, while proving Proposition 7.5 it also follows that the bases
φ0 ⊕ ψ0 ⊕ ψ1 · · · ψm−1 and φ̃0 ⊕ ψ̃0 ⊕ ψ̃1 · · · ψ̃m−1 are biorthogonal (in addition
to that φm and φ̃m are biorthogonal, as stated). For f ∈ Vm this means that
X X X
f (t) = hf (t), φ̃m,n iφm,n = hf (t), φ̃0,n iφ0,n + hf (t), ψ̃m0 ,n iψm0 ,n ,
n n m0 <m,n
since this relationship is fulfilled for any linear combination of the {φm,n }n , or
for any of the {φ0,n , ψm0 ,n }m0 <m,n , due to biorthogonality. Similarly, for f˜ ∈ Ṽm
X X X
f˜(t) = hf˜(t), φm,n iφ̃m,n = hf˜(t), φ0,n iφ̃0,n + hf˜(t), ψm0 ,n iψ̃m0 ,n .
n n m0 <m,n
It follows that for f ∈ Vm and for f˜ ∈ Ṽm the DWT and the IDWT and their
duals can be expressed in terms of inner products as follows.
• The input to the DWT is cm,n = hf, φ̃m,n i. The output of the DWT is
c0,n = hf, φ̃0,n i and wm0 ,n = hf, ψ̃m0 ,n i
• The input to the dual DWT is c̃m,n = hf˜, φm,n i. The output of the dual
DWT is c̃0,n = hf˜, φ0,n i and w̃m0 ,n = hf˜, ψm0 ,n i.
• in the DWT matrix, column k has entries hφ1,k , φ̃0,l i, and hφ1,k , ψ̃0,l i (with
a similar expression for the dual DWT).
• in the IDWT matrix, column 2k has entries hφ0,k , φ̃1,l i, and column 2k + 1
has entries hψ0,k , φ̃1,l i (with a similar expression for the dual IDWT).
Equation (7.9) comes from eliminating the φm,n by letting m → ∞.
5. When φ = φ̃ (orthonormal MRA’s), the approximations (finite sums)
above coincide with projections onto the spaces Vm , Ṽm , Wm , W̃m . When φ 6= φ̃,
however, there are no reasons to believe that these approximations equal the best
approximations to f from Vm . In this case we have no procedure for computing
best approximations. When f is not in Vm , Ṽm we can, however, consider the
approximations
X X
hf (t), φ̃m,n iφm,n (t) ∈ Vm and hf (t), φm,n iφ̃m,n (t) ∈ Ṽm
n n
(when the MRA is orthonormal, this coincides Pwith the best approximation).
Now, we can choose m so large that f (t) = n cn φm,n (t) + (t), with (t) a
small function. The first approximation can now be written
XX X X
h cn0 φm,n0 (t) + (t), φ̃m,n iφm,n (t) = cn φm,n (t) + h(t), φ̃m,n iφm,n (t)
n n0 n n
X
= f (t) + h(t), φ̃m,n iφm,n (t) − (t).
n
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 259
P
Clearly, the difference n h(t), φ̃m,n iφm,n (t) − (t) from f is small. It may.
however, be hard to compute the cn above, so that instead, as in Theorem 5.33,
−m
one uses R N 2 f (n/2m )φm,n (t) as an approximation to f (i.e. use sample
φm,0 (t)dt
0
values as cn ) also in this more general setting.
6. Previously we were taught to think in a periodic or folded way, so that we
−1
could restrict to an interval [0, N ], and to bases of finite dimensions ({φ0,n }N
n=0 ).
But the results above are only stated for wavelet bases of infinite dimension. Let
us therefore say something on how the results carry over to our finite dimensional
setting. If f ∈ L2 (R) we can define the function
X X X
f per (t) = f (t + kN ) f f old (t) = f (t + 2kN ) + f (2kN − t).
k k k
f per and f f old are seen to be periodic with periods N and 2N . It is easy to see
that the restriction of f per to [0, N ] is in L2 ([0, N ]), and that the restriction of
f f old to [0, 2N ] is in L2 ([0, 2N ]). In [6] it is shown that the result above extends
f old
to a similar result for the periodized/folded basis (i.e. ψm,n ), so that we obtain
dual Riesz bases for L2 ([0, N ]) and L2 ([0, 2N ]) instead of L2 (R). The result on
the vanishing moments does not extend, however. One can, however, alter some
of the basis functions so that one achieves this. This simply changes some of the
columns in the DWT/IDWT matrices. Note that our extension strategy is not
optimal. The extension is usually not differentiable at the boundary, so that the
corresponding wavelet coefficients may be large, even though the wavelet has
many vanishing moments. The only way to get around this would be to find an
extension strategy which gave a more regular extension. However, natural images
may not have high regularity, which would make such an extension strategy
useless.
X √ X √
uN +1 (t) = (G0 )n,0 2uN (2t − n) vN +1 (t) = (H0 )0,n 2vN (2t − n).
n n
Now, note that g0 (ω) = h0 (ω) = χ[0,1] (ω). Since hu0 , v0 i = hg0 , h0 i we get that
Z −∞ Z −∞ Z 2π
u0 (t)v0 (t − k)dt = g0 (ν)h0 (ν)e2πikν dν = e−2πikν dν = δk,0 .
∞ ∞ 0
Now assume that we have proved that huN (t), vN (t − k)i = δk,0 . We then get
that
X
huN +1 (t), vN +1 (t − k)i = 2 (G0 )n1 ,0 (H0 )0,n2 huN (2t − n1 ), vN (2(t − k) − n2 )i
n1 ,n2
X
=2 (G0 )n1 ,0 (H0 )0,n2 huN (t), vN (t + n1 − n2 − 2k)i
n1 ,n2
X X
= (G0 )n1 ,0 (H0 )0,n2 = (H0 )0,n−2k (G0 )n,0
n1 ,n2 |n1 −n2 =2k n
X X
= (H0 )2k,n (G0 )n,0 = H2k,n Gn,0 = (HG)2k,0 = I2k,0 = δk,0
n n
where = L − 1/2 − log Bk / log 2 > 0 due to Assumption (7.7). In the paper
it is proved that this condition implies that the bases constitute dual frames.
The biorthogonality is used to show that they also are dual Riesz bases (i.e. that
they also are linearly independent).
X
hψ0,k , ψ̃0,l i = (G1 )n1 ,1 (H1 )1,n2 hφ1,n1 +2k (t)φ̃1,n2 +2l (t)i
n1 ,n2
X X X
= (G1 )n,1 (H1 )1,n+2(k−l) = (H1 )1+2(l−k),n (G1 )n,1 = H1+2(l−k),n Gn,1
n n n
= (HG)1+2(l−k),1 = δk,0 .
Similarly,
X X
hψ0,k φ̃0,l i = (G1 )n1 ,1 (H0 )0,n2 hφ1,n1 +2k (t)φ̃1,n2 +2l (t)i = (G1 )n,1 (H0 )0,n+2(k−l)
n1 ,n2 n
X X
= (H0 )2(l−k),n (G1 )n,1 = H2(l−k),n Gn,1 = (HG)2(l−k),1 = 0
n n
X X
hφ0,k ψ̃0,l i = (G0 )n1 ,0 (H1 )1,n2 hφ1,n1 +2k (t)φ̃1,n2 +2l (t)i = (G0 )n,0 (H1 )1,n+2(k−l)
n1 ,n2 n
X X
= (H1 )1+2(l−k),n (G0 )n,0 = H1+2(l−k),n Gn,0 = (HG)1+2(l−k),0 = 0.
n n
Regularity and vanishing moments. Now assume also that Bk < 2L−1−m ,
so that log Bk < L − 1 − m. We have that = L − 1/2 − log Bk / log 2 > L − 1/2 −
L + 1 + m = m + 1/2, so that |φ̂(ω)| < C(1 + |ω|)−1/2− = C(1 + |ω|)−m−1−δ
for some δ > 0. This implies that φ̂(ω)(1 + |ω|)m < C(1 + |ω|)−1−δ ∈ L1 . An
important property of the Fourier transform is that φ̂(ω)(1 + |ω|)m ∈ L1 if and
only if φ is m times differentiable. This property implies that φ, and thus ψ is
m times differentiable. Similarly, φ̃, ψ̃ are m̃ times differentiable.
In [7] it is also proved that if
then ψ̃ has m + 1 vanisning moments. In our case we have that ψ and ψ̃ have
compact support, so that these conditions are satisfied. It follows that ψ̃ has
m + 1 vanisning moments.
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 262
In the next section we will construct a wide range of forward and reverse
filter bank transforms which invert each other, and which give rise to wavelets.
In [7] one checks that many of these wavelets satisfy (7.7) and (7.8) (implying
that they give rise to dual Riesz bases for L2 (R)), or the more general (7.10)
(implying a certain regularity and a certain number of vanishing moments).
Requirements on the filters lengths in order to obtain a given number of vanishing
moments are also stated.
m increases. This means that, with relatively few ψm,n , we can create good
approximations of f .
In this section we will address a property which the mother wavelet must
fulfill in order to be useful in this respect. To motivate this property, let us first
use decompose f ∈ Vm as
r
N
X −1 m−1
X 2X N −1
f= hf, φ̃0,n iφ0,n + hf, ψ̃r,n iψr,n . (7.12)
n=0 r=0 n=0
r
N
X −1 m−1
X 2X N −1
f= hPs + Qs , φ̃0,n iφ0,n + hPs + Qs , ψ̃r,n iψr,n
n=0 r=0 n=0
r r
N
X −1 m−1
X 2X N −1 m−1
X 2X N −1
= hPs + Qs , φ̃0,n iφ0,n + hPs , ψ̃r,n iψr,n + hQs , ψ̃r,n iψr,n
n=0 r=0 n=0 r=0 n=0
r
N
X −1 m−1
X 2X N −1
= hf, φ̃0,n iφ0,n + hQs , ψ̃r,n iψr,n .
n=0 r=0 n=0
Here the first sum lies in V0 . We see that the wavelet coefficients from Wr are
hQs , ψ̃r,n i, which are very small since Qs is small. This means that the detail in
the different spaces Wr is very small, which is exactly what we aimed for. Let
us summarize this as follows:
Theorem 7.7. Vanishing moments.
If a function f ∈ Vm is r times differentiable, and ψ̃ has r vanishing mo-
ments, then f can be approximated well from V0 . Moreover, the quality of this
approximation improves when r increases.
Having many vanishing moments is thus very good for compression, since
the corresponding wavelet basis is very efficient for compression. In particular,
if f is a polynomial of degree less than or equal to k − 1 and ψ̃ has k vanishing
moments, then the detail coefficients wm,n are exactly 0. Since (φ, ψ) and (φ̃,
ψ̃) both are wavelet bases, it is equally important for both to have vanishing
moments. We will in the following concentrate on the number of vanishing
moments of ψ.
RN
The Haar wavelet has one vanishing moment, since ψ̃ = ψ and 0 ψ(t)dt = 0
as we noted in Observation 5.14. It isR Nan exercise to see that the Haar wavelet
has only one vanishing moment, i.e. 0 tψ(t)dt 6= 0.
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 264
section to construct interesting wavelets. There we will also cover how we can
construct the simplest possible such filters.
There are some details which have been left out in this section: We have not
addressed why the wavelet bases we have constructed are linearly independent,
and why they span L2 (R). Dual Riesz bases. These details are quite technical,
and we refer to [7] for them. Let us also express what we have found in terms of
analog filters.
Observation 7.9. Analog filters.
Let
X X X
f (t) = cm,n φm,n = c0,n φ0,n + wm0 ,n ψm0 ,n ∈ Vm .
n n m0 <m,n
cm,n and wm,n can be computed by sampling the output of an analog filter. To
be more precise,
Z N Z N
cm,n = hf, φ̃m,n i = f (t)φ̃m,n (t)dt = (−φ̃m,0 (−t))f (2−m n − t)dt
0 0
Z N Z N
wm,n = hf, ψ̃m,n i = f (t)ψ̃m,n (t)dt = (−ψ̃m,0 (−t))f (2−m n − t)dt.
0 0
In other words, cm,n can be obtained by sampling s−φ̃m,0 (−t) (f (t)) at the points
2−m n, wm,n by sampling s−ψ̃m,0 (−t) (f (t)) at 2−m n, where the analog filters
s−φ̃m,0 (−t) , s−ψ̃m,0 (−t) were defined in Theorem 1.25, i.e.
Z N
s−φ̃m,0 (−t) (f (t)) = (−φ̃m,0 (−s))f (t − s)ds (7.16)
0
Z N
s−ψ̃m,0 (−t) (f (t)) = (−ψ̃m,0 (−s))f (t − s)ds. (7.17)
0
A similar statement can be made for f˜ ∈ Ṽm . Here the convolution kernels
of the filters were as before, with the exception that φ, ψ were replaced by φ̃, ψ̃.
Note also that, if the functions φ̃, ψ̃ are symmetric, we can increase the precision
in the DWT with the method of symmetric extension also in this more general
setting.
Then N1 and N2 are even, and there exist a polynomial Q which satisfies
N1 /2
1 1
λH0 (ω) = (1 + cos ω) Q1 (1 − cos ω) (7.19)
2 2
N2 /2
1 1
λG0 (ω) = (1 + cos ω) Q2 (1 − cos ω) , (7.20)
2 2
where Q = Q1 Q2 .
Proof. Since the filters are symmetric, λH0 (ω) = λH0 (−ω) and λG0 (ω) =
λG0 (−ω). Since einω + e−inω = 2 cos(nω), and since cos(nω) is the real part of
(cos ω + i sin ω)n , which is a polynomial in cosk ω sinl ω with l even, and since
sin2 ω = 1 − cos2 ω, λH0 and λG0 can both be written on the form P (cos ω), with
P a real polynomial.
Note that a zero at π in λH0 , λG0 corresponds to a factor of the form 1 + e−iω ,
so that we can write
N1
1 + e−iω
λH0 (ω) = f (eiω ) = e−iN1 ω/2 cosN1 (ω/2)f (eiω ),
2
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 267
ω 1 ω 1
cos2 = (1 + cos ω) sin2 = (1 − cos ω),
2 2 2 2
we see that λH0 and λG0 satisfy equations (7.19) and (7.20). With Q = Q1 Q2 ,
Equation (6.25) can now be rewritten as
and uniqueness in Bezouts theorem gives that q1 (u) = q2 (1 − u), and q2 (u) =
q1 (1 − u). Equation (7.21) can thus be stated as
uN q2 (1 − u) + (1 − u)N q2 (u) = 1,
and comparing with Equation (7.18) (set N = (N1 + N2 )/2) we see that Q(u) =
2q2 (u). uN q1 (u) + (1 − u)N q2 (u) = 1 now gives
where we have used the first N terms in the Taylor series expansion of (1 − u)−N
around 0. Since q2 is a polynomial of degree N − 1, we must have that
N −1
X N +k−1 k
Q(u) = 2q2 (u) = 2 u . (7.22)
k
k=0
PN −1 N +k−1
Define Q(N ) (u) = 2 k=0 k uk . The first Q(N ) are
(1) 1
Q (1 − cos ω) = 2
2
1
Q(2)
(1 − cos ω) = −e−iω + 4 − eiω
2
1 3 9 19 9 iω 3 2iω
Q(3)
(1 − cos ω) = e−2iω − e−iω + − e + e
2 4 2 2 2 4
1 5 131 131 iω 5
Q(4) (1 − cos ω) = − e−3iω + 5e−2iω − e−iω + 26 − e + 5e2iω − e3iω ,
2 8 8 8 8
Thus in order to construct wavelets where λH0 , λG0 have as many zeros at π as
possible, and where there are as few filter coefficients as possible, we need to
compute the polynomials above, factorize them into polynomials Q1 and Q2 ,
and distribute these among λH0 and λG0 . Since we need real factorizations, we
must in any case pair complex roots. If we do this we obtain the factorizations
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 269
1
Q(1) (1 − cos ω) = 2
2
1 1
Q(2) (1 − cos ω) = (eiω − 3.7321)(e−iω − 3.7321)
2 3.7321
1 3 1
Q(3) (1 − cos ω) = (e2iω − 5.4255eiω + 9.4438)
2 4 9.4438
× (e−2iω − 5.4255e−iω + 9.4438)
1 5 1 1
Q(4) (1 − cos ω) = (eiω − 3.0407)(e2iω − 4.0623eiω + 7.1495)
2 8 3.0407 7.1495
× (e−iω − 3.0407)(e−2iω − 4.0623e−iω + 7.1495), (7.23)
N
1 + e−iω
λG0 (ω) = f (e−iω ), (7.25)
2
We avoided stating λH0 (ω) in this result, since the relation H0 = (G0 )T gives
that λH0 (ω) = λG0 (ω). In particular, λH0 (ω) also has a zero of multiplicity N
at π. That G0 is causal is included to simplify the expression further.
Proof. The proof is very similar to the proof of Theorem 7.10. N vanishing
moments and that G0 is causal means that we can write
N
1 + e−iω
λG0 (ω) = f (e−iω ) = (cos(ω/2))N e−iN ω/2 f (e−iω ),
2
Now, the function f (eiω )f (e−iω ) is symmetric around 0, so that it can be written
on the form P (cos ω) with P a polynomial, so that
b1 , . . . ., bm , 1/b1 , . . . , 1/bm ,
and the complex roots are
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 271
1
Q (1 − cos ω)
2
= K(e−iω − b1 ) . . . (e−iω − bm )
× (e−iω − a1 )(e−iω − a1 )(e−iω − a2 )(e−iω − a2 ) · · · (e−iω − an )(e−iω − an )
× (eiω − b1 ) . . . (eiω − bm )
× (eiω − a1 )(eiω − a1 )(eiω − a2 )(eiω − a2 ) · · · (eiω − an )(eiω − an )
√
f (eiω ) = K(eiω − b1 ) . . . (eiω − bm )
× (eiω − a1 )(eiω − a1 )(eiω − a2 )(eiω − a2 ) · · · (eiω − an )(eiω − an )
1
− cos ω) = f (eiω )f (e−iω ). This con-
in order to obtain a factorization Q 2 (1
cludes the proof.
In the previous proof we note that the polynomial f is not unique - we could
pair the roots in many different ways. The new algorithm is thus as follows:
roots.
• Split the roots into the two classes
{b1 , . . . ., bm , a1 , . . . , an , a1 , . . . , an }
and
Clearly the filters obtained with this strategy are not symmetric since f is not
symmetric. In Section 7.6 we will take a closer look at wavelets constructed in
this way.
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 272
bN = bN −2 − aN bN −1
= bN −2 − aN (bN −3 − aN −1 bN −2 )
= (1 + aN aN −1 )bN −2 − aN bN −3 .
p1 (q1 − r1 ) + p2 (q2 − r2 ) = 0.
Since p1 and p2 have no zeros in common this means that every zero of p2 is a
zero of q1 − r1 , with at least the same multiplicity. If q1 6= r1 , this means that
deg(q1 − r1 ) ≥ deg(p2 ), which is impossible since deg(q1 ) < deg(p2 ), deg(r1 ) <
deg(p2 ). Hence q1 = r1 . Similarly q2 = r2 , establishing uniqueness.
N1 /2
1 1
λH0 (ω) = (1 + cos ω) Q(N ) (1 − cos ω)
2 2
N2 /2
1
λG0 (ω) = (1 + cos ω) , (7.29)
2
where N = (N1 + N2 )/2. Since Q(N ) has degree N − 1, λH0 has degree N1 + N1 +
N2 − 2 = 2N1 + N2 − 2, and λG0 has degree N2 . These are both even numbers,
so that the filters have odd length. The names of these filters are indexed by
the filter lengths, and are called Spline wavelets, since, as we now now will show,
the scaling function for this design strategy is the B-spline of order N2 : we have
that
1
λG0 (ω) = (1 + cos ω)N2 /2 = cos(ω/2)N2 .
2N2 /2
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 274
k k
Y λG0 (2πf /2i ) Y cosN2 (πf /2i )
λs (f ) = λs (f /2k ) = λs (f /2k )
i=1
2 i=1
2
k N2 N2
sin(2πf /2i )
k
Y
k sin(πf )
= λs (f /2 ) = λs (f /2 ) ,
i=1
2 sin(πf /2i ) 2k sin πf /2k
Z 1/2 1/2
1
= e−2πif t dt = e−2πif t
−1/2 −2πif −1/2
1 1 sin(πf )
= (e−πif − eπif ) = 2i sin(−πf ) = .
−2πif −2πif πf
N2
sin(πf )
Due to this πf is the frequency response of ∗N
k=1 χ[−1/2,1/2) (t). By the
2
uniqueness of the frequency response we have that φ(t) = φ̂(0) ∗N k=1 χ[−1/2,1/2) (t).
2
In Exercise 7.5 you will be asked to show that this scaling function gives rise to
the multiresolution analysis of functions which are piecewise polynomials which
are differentiable at the borders, also called splines. This explains why this type
of wavelet is called a spline wavelet. To be more precise, the resolution spaces
are as follows.
1 1 1 1
λG0 (ω) = (1 + cos ω) = eiω + + e−iω
2 4 2 4
1 1 1
λH0 (ω) = (1 + cos ω)Q (1)
(1 − cos ω) = (2 + eiω + e−iω )(4 − eiω − e−iω )
2 2 4
1 1 3 1 1
= − e2iω + eiω + + e−iω − e−2iω .
4 2 2 2 4
The filters G0 , H0 are thus
1 1 1 1 1 3 1 1
G0 = , , H0 = − , , , ,−
4 2 4 4 2 2 2 4
The length of the filters are 3 and 5 in this case, so that this wavelet is called
the Spline 5/3 wavelet. Up to a constant, the filters are seen to be the same as
those of the alternative piecewise linear wavelet, see Example 6.3. Now, how do
we find the filters (G1 , H1 )? Previously we saw how to find the constant α in
Theorem 6.16 when we knew one of the two pairs (G0 , G1 ), (H0 , H1 ). This was
the last part of information we needed in order to construct the other two filters.
Here we know (G0 , H0 ) instead. In this case it is even easier to find (G1 , H1 )
since we can set α = 1. This means that (G1 , H1 ) can be obtained simply by
adding alternating signs to (G0 , H0 ), i.e. they are the corresponding high-pass
filters. We thus can set
1 1 3 1 1 1 1 1
G1 = − ,− , ,− ,− H1 = − , ,− .
4 2 2 2 4 4 2 4
We have now found all the filters. It is clear that the forward and reverse filter
bank transforms here differ only by multiplication with a constant from those of
the the alternative piecewise linear wavelet, so that this gives the same scaling
function and mother wavelet as that wavelet.
The coefficients for the Spline wavelets are always dyadic fractions, and are
therefore suitable for lossless compression, as they can be computed using low
precision arithmetic and bitshift operations. The particular Spline wavelet from
Example 7.4.1 is used for lossless compression in the JPEG2000 standard.
1 5 1 1
Q(3)
(1 − cos ω) = (eiω − 3.0407)(e−iω − 3.0407)
2 8 3.0407 7.1495
× (e2iω − 4.0623eiω + 7.1495)(e−2iω − 4.0623e−iω + 7.1495)
5 1 1
= (−3.0407eiω + 10.2456 − 3.0407e−iω )
8 3.0407 7.1495
× (7.1495e2iω − 33.1053eiω + 68.6168 − 33.1053e−iω + 7.1495e−2iω ).
2
1
λG0 (ω) = (1 + cos ω) Q1 (ω)
2
= −0.0645e3iω − 0.0407e2iω + 0.4181eiω + 0.7885
+ 0.4181e−iω − 0.0407e−2iω − 0.0645e−3iω
2
1
λH0 (ω) = (1 + cos ω) 40Q2 (ω)
2
= 0.0378e4iω − 0.0238e3iω − 0.1106e2iω + 0.3774eiω + 0.8527
+ 0.3774e−iω − 0.1106e−2iω − 0.0238e−3iω + 0.0378e−4iω .
1.4 1.4
1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.00 1 2 3 4 5 6 0.00 1 2 3 4 5 6
Figure 7.1: The frequency responses λH0 (ω) (left) and λG0 (ω) (right) for the
CDF 9/7 wavelet.
It is seen that both filters are low-pass filters also here, and that the are
closer to an ideal bandpass filter. Here, the frequency response acts even more
like the constant zero function close to π, proving that our construction has
worked. We also get
The length of the filters are 9 and 7 in this case, so that this wavelet is called
the CDF 9/7 wavelet. This wavelet is for instance used for lossy compression with
JPEG2000 since it gives a good tradeoff between complexity and compression.
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 278
In Example 6.3 we saw that we had analytical expressions for the scaling
functions and the mother wavelet, but that we could not obtain this for the
dual functions. For the CDF 9/7 wavelet it turns out that none of the four
functions have analytical expressions. Let us therefore use the cascade algorithm
to plot these functions. Note first that since G0 has 7 filter coefficients, and
G1 has 9 filter coefficients, it follows from Theorem 6.9 that supp(φ) = [−3, 3],
supp(ψ) = [−3, 4], supp(φ̃) = [−4, 4], and supp(ψ̃) = [−3, 4]. The scaling
functions and mother wavelets over these supports are shown in Figure 7.2.
Again they have irregular shapes, but now at least the functions and dual
functions more resemble each other.
1.4 2.0
1.2 φ(t) ψ(t)
1.5
1.0
0.8 1.0
0.6 0.5
0.4 0.0
0.2
0.0 0.5
0.2 4 3 2 1 0 1 2 3 4 1.0 4 3 2 1 0 1 2 3 4
1.4 2.0
1.2 φ̃(t) 1.5
ψ̃(t)
1.0
1.0
0.8
0.6 0.5
0.4 0.0
0.2
0.5
0.0
0.2 1.0
0.4 4 3 2 1 0 1 2 3 4 1.5 4 3 2 1 0 1 2 3 4
Figure 7.2: Scaling functions and mother wavelets for the CDF 9/7 wavelet.
In the above example there was a unique way of factoring Q into a product
of real polynomials. For higher degree polynomials there is no unique way to
form to distribute the factors, and we will not go into what strategy can be used
for this. In general, the steps we must go through are as follows:
1 −iω √
(e + 1) 2
2 r
1 −iω 1
(e + 1) 2
(e−iω − 3.7321)
4 3.7321
r
1 −iω 3 1
(e + 1)3 (e−2iω − 5.4255e−iω + 9.4438)
8 4 9.4438
r
1 −iω 5 1 1
(e + 1)4 (e−3iω − 7.1029e−2iω + 19.5014−iω − 21.7391),
16 8 3.0407 7.1495
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 280
√ √
G0 = (H0 )T =( 2/2, 2/2)
G0 = (H0 )T =(−0.4830, −0.8365, −0.2241, 0.1294)
G0 = (H0 )T =(0.3327, 0.8069, 0.4599, −0.1350, −0.0854, 0.0352)
G0 = (H0 )T =(−0.2304, −0.7148, −0.6309, 0.0280, 0.1870, −0.0308, −0.0329, 0.0106)
√ √
G1 = (H1 )T =( 2/2, − 2/2)
G1 = (H1 )T =(0.1294, 0.2241, −0.8365, 0.4830)
G1 = (H1 )T =(0.0352, 0.0854, −0.1350, −0.4599, 0.8069, −0.3327)
G1 = (H1 )T =(0.0106, 0.0329, −0.0308, −0.1870, 0.0280, 0.6309, −0.7148, 0.2304).
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
0 1 2 3 4 5 6
Figure 7.3: The magnitudes |λG0 (ω)| = |λH0 (ω)| for the first orthonormal
wavelets.
Clearly these filters have low-pass characteristic. We also see that the high-
pass characteristics resemble the low-pass characteristics. We also see that the
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 281
frequency response gets flatter near the high and low frequencies, as N increases.
One can verify that this is the case also when N is increased further. The shapes
for higher N are very similar to the frequency responses of those filters used in
the MP3 standard (see Figure 3.10). One difference is that the support of the
latter is concentrated on a smaller set of frequencies.
The way we have defined the filters, one can show in the same way as in
the proof of Theorem 6.9 that, when all filters have 2N coefficients, φ = φ̃ has
support [−N + 1, N ], ψ = ψ̃ has support [−N + 1/2, N − 1/2] (i.e. the support
of ψ is symmetric about the origin). In particular we have that
The scaling functions and mother wavelets are shown in Figure 7.4. All functions
have been plotted over [−4, 4], so that all these support sizes can be verified.
Also here we have used the cascade algorithm to approximate the functions.
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 282
1.4 2.0
1.2 φ(t) ψ(t)
1.5
1.0
1.0
0.8
0.6 0.5
0.4 0.0
0.2
0.5
0.0
0.2 1.0
0.4 4 3 2 1 0 1 2 3 4 1.5 4 3 2 1 0 1 2 3 4
1.4 2.0
1.2 φ(t) ψ(t)
1.5
1.0
1.0
0.8
0.6 0.5
0.4 0.0
0.2
0.5
0.0
0.2 1.0
0.4 4 3 2 1 0 1 2 3 4 1.5 4 3 2 1 0 1 2 3 4
1.2 1.5
1.0 φ(t) ψ(t)
1.0
0.8
0.6 0.5
0.4
0.2 0.0
0.0
0.5
0.2
0.4 4 3 2 1 0 1 2 3 4 1.0 4 3 2 1 0 1 2 3 4
Figure 7.4: The scaling functions and mother wavelets for orthonormal wavelets
with N vanishing moments, for different values of N .
7.7 Summary
We started the section by showing how filters from filter bank matrices can give
rise to scaling functions and mother wavelets. We saw that we obtained dual
function pairs in this way, which satisfied a mutual property called biorthogonal-
ity. We then saw how differentiable scaling functions or mother wavelets with
vanishing moments could be constructed, and we saw how we could construct
the simplest such. These could be found in terms of the frequency responses
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 283
The polyphase
representation and wavelets
284
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS285
The factorization into sparse matrices will be called the lifting factorization, and
it will be clear from this factorization how the wavelet kernels and their duals can
be implemented. We will also see how we can use the polyphase representation
to prove the remaining parts of Theorem 6.16.
Secondly, in Section 8.3 we will use the polyphase representation to analyze
how the forward and reverse filter bank transforms from the MP3 standard can
be chosen in order for us to have perfect or near perfect reconstruction. Actually,
we will obtain a factorization of the polyphase representation into block matrices
also here, and the conditions we need to put on the prototype filters will be clear
from this.
2 0 0 3 0 1
S (0,0) = 0 2 0 S (0,1) = 1 3 0
0 0 2 0 1 3
4 6 0 5 0 0
S (1,0) = 0 4 6 S (1,1) = 0 5 0
6 0 4 0 0 5
G(1,1) −G(0,1) (G(0,0) G(1,1) − G(1,0) G(0,1) )H (0,0) (G(0,0) G(1,1) − G(1,0) G(0,1) )H (0,1)
=
−G(1,0) G(0,0) (G(0,0) G(1,1) − G(1,0) G(0,1) )H (1,0) (G(0,0) G(1,1) − G(1,0) G(0,1) )H (1,1)
H (1,1) −H (0,1) (H (0,0) H (1,1) − H (1,0) H (0,1) )G(0,0) (H (0,0) H (1,1) − H (1,0) H (0,1) )G(0,1)
=
−H (1,0) H (0,0) (H (0,0) H (1,1) − H (1,0) H (0,1) )G(1,0) (H (0,0) H (1,1) − H (1,0) H (0,1) )G(1,1)
Now since G(0,0) G(1,1) − G(1,0) G(0,1) and H (0,0) H (1,1) − H (1,0) H (0,1) also are
circulant Toeplitz matrices, the expressions above give that
The following are the most useful properties of elementary lifting matrices:
Lemma 8.6. Lifting lemma.
The following hold:
T T
ST
I S I 0 I 0 I
= , and = ,
0 I ST I S I 0 I
I S1 I S2 I S1 + S2 I 0 I 0 I 0
= , and = ,
0 I 0 I 0 I S1 I S2 I S1 + S2 I
−1 −1
I S I −S I 0 I 0
= , and =
0 I 0 I S I −S I
These statements follow directly from Theorem 8.3. Due to Property 2, one
can assume that odd and even types of lifting matrices appear in alternating
order, since matrices of the same type can be grouped together. The following
result states why elementary lifting matrices can be used to factorize general
MRA-matrices:
Note that (Λi )−1 can be computed with the help of Property 3 of Lemma 8.6.
Proof. The proof will use the conceptof the length of a filter, as defined in
(0,0)
S S (0,1)
Definition 3.3. Let S = (1,0) be an arbitrary invertible matrix. We
S S (1,1)
will incrementally find an elementary lifting matrix Λi with filter Si in the lower
left or upper right corner so that Λi S has filters of lower length in the first
column. Assume first that l(S (0,0) ) ≥ l(S (1,0) ), where l(S) is the length of a
filter as given by Definition 3.3. If Λi is of even type, then the first column in
Λi S is
(0,0) (0,0)
+ Si S (1,0)
I Si S S
= . (8.6)
0 I S (1,0) S (1,0)
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS289
Si can now be chosen so that l(S (0,0) + Si S (1,0) ) < l(S (1,0) ). To see how,
recall that we in Section 3.1 stated that multiplying filters corresponds to
multiplying polynomials. Si can thus be found from polynomial division with
remainder: when we divide S (0,0) by S (1,0) , we actually find polynomials Si
and P with l(P ) < l(S (1,0) ) so that S (0,0) = Si S (1,0) + P , so that the length of
P = S (0,0) − Si S (1,0) is less than l(S (1,0) ). The same can be said if Λi is of odd
type, in which case the first and second components are simply swapped. This
procedure can be continued until we arrive at a product
Λn · · · Λ1 S
where either the first or the second component in the first column is 0. If the
first component in the first column is 0, the identity
I 0 I I 0 X Y X +Z
=
−I I 0 I Y Z 0 −X
explains that we can bring the matrix to a form where the second element in
the first column is zero instead, with the help of the additional lifting matrices
I I I 0
Λn+1 = and Λn+2 = ,
0 I −I I
so that we always can assume that the second element in the first column is 0,
i.e.
P Q
Λn · · · Λ1 S = ,
0 R
for some matrices P, Q, R. From the proof of Theorem 6.16 we will see that
in order for S to be invertible, we must have that S (0,0) S (1,1) − S (0,1) S (1,0) =
−α−1 Ed for some nonzero scalar α and integer d. Since
P Q
0 R
is also invertible, we must thus have that P R must be on the form αEn . When
the filters have a finite number of filter coefficients, the only possibility for this
to happen is when P = α0 Ep and R = α1 Eq for some p, q, α0 , α1 . Using this,
and also isolating S on one side, we obtain that
−1 −1 α0 Ep Q
S = (Λ1 ) · · · (Λn ) , (8.7)
0 α1 Eq
Noting that
1
α0 Ep Q
=
1 α1 E−q Q α0 Ep 0
,
0 α1 Eq 0 1 0 α1 Eq
we can rewrite Equation (8.7) as
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS290
1
S = (Λ1 ) −1
· · · (Λn ) −1 1 α1 E−q Q α0 Ep 0
,
0 1 0 α1 Eq
which is a lifting factorization of the form we wanted to arrive at. The last matrix
in the lifting factorization is not really a lifting matrix, but it too can easily be
inverted, so that we arrive at Equation (8.5). This completes the proof.
Factorizations on the form given by Equation (8.4) will be called lifting
factorizations. Assume that we have applied Theorem 8.7 in order to get a
factorization of the polyphase representation of the DWT kernel of the form
α 0
Λn · · · Λ2 Λ1 H = . (8.8)
0 β
Theorem 8.6 then immediately gives us the following factorizations.
−1 −1 −1 α 0
H = (Λ1 ) (Λ2 ) · · · (Λn ) (8.9)
0 β
1/α 0
G= Λn · · · Λ2 Λ1 (8.10)
0 1/β
α 0
HT = ((Λn )−1 )T ((Λn−1 )−1 )T · · · ((Λ1 )−1 )T (8.11)
0 β
T T T T 1/α 0
G = (Λ1 ) (Λ2 ) · · · (Λn ) . (8.12)
0 1/β
Since H T and GT are the kernel transformations of the dual IDWT and the
dual DWT, respectively, these formulas give us recipes for computing the DWT,
IDWT, dual IDWT, and the dual DWT, respectively. All in all, everything can
be computed by combining elementary lifting steps.
In practice, one starts with a given wavelet with certain proved properties
such as the ones from Chapter 7, and applies an algorithm to obtain a lifting
factorization of the polyphase representation of the kernels. The algorihtm can
easily be written down from the proof of Theorem 8.7. The lifting factorization
is far from unique, and the algorithm only gives one of them.
It is desirable for an implementation to obtain a lifting factorization where the
lifting steps are as simple as possible. Let us restrict to the case of wavelets with
symmetric filters, since the wavelets used in most applications are symmetric.
In particular this means that S (0,0) is a symmetric matrix, and that S (1,0) is
symmetric about −1/2 (see Exercise 8.8).
Assume that we in the proof of Theorem 8.7 add an elementary lifting of
even type. At this step we then compute S (0,0) + Si S (1,0) in the first entry of
the first column. Since S (0,0) is now assumed symmetric, Si S (1,0) must also be
symmetric in order for the length to be reduced. And since the filter coefficients
of S (1,0) are assumed symmetric about −1/2, Si must be chosen with symmetry
around 1/2.
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS291
For most of our wavelets we will consider in the following examples it will
turn out the filters in the first column differ in the number of filter coefficients
by 1 at all steps. When this is the case, we can choose a filter of length 2 to
reduce the length by 2, so that the Si in an even lifting step can be chosen on
the form Si = λi {1, 1}. Similarly, for an odd lifting step, Si can be chosen on
the form Si = λi {1, 1}. Let us summarize this as follows:
• the ψm−k coordinates are found at indices 2k−1 + r2k , i.e. the last k bits
are 1 followed by k − 1 zeros.
• the φ0 coordinates are found at indices r2m , i.e. the last m bits are 0.
If we place the last k bits of the ψm−k -coordinates in front in reverse order, and
the the last m bits of the φ0 -coordinates in front, the coordinates have the same
order as in the (φm−1 , ψm−1 )-basis. This is also called a partial bit-reverse, and
is related to the bit-reversal performed in the FFT algorithm.
Clearly, these lifting steps are also MRA-matrices with symmetric filters, so
that our procedure factorizes an MRA-matrix with symmetric filters into simpler
MRA-matrices which also have symmetric filters.
b) In the proof of the last part of Theorem 6.16, we defered the last part, namely
that equations (8.2)-(8.3) follow from
b)
I S
G
0 I
is an MRA matrix with filters G0 , G̃1 , where
A −B
H=G=
B A
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS294
T
AT BT −B T AT BT
A
H= G= = .
−B A B AT −B A
T T
BT BT AT BT
A A
H= G= = .
B −A B −AT B −A
It is straightforward to check that also these satisfy the alias cancellation con-
dition, and that the perfect reconstruction condition also here takes the form
|λH0 (ω)|2 + |λH0 (ω + π)|2 = 2.
P HP T = PDm ←φm PCm ←φm Pφm ←Dm = P(φ1 ,ψ1 )←φm Pφm ←Dm = P(φ1 ,ψ1 )←Dm .
Taking inverses here we obtain that P GP T = PDm ←(φ1 ,ψ1 ) . We therefore have
the following result:
√ 1
I − 14 {1, 1}
I 4 {1, 1} I 0 1 I 0
2 and √ ,
0 I − 12 {1, 1} I 1
2 2 {1, 1} I 0 I
(8.15)
respectively.
1 1 3 1 1 1 1 1
H0 = − , , , ,− H1 = − , ,− .
4 2 2 2 4 4 2 4
from which we see that the polyphase components of H are
1 3 1
{− 4 , 2 , − 4 } 12 {1, 1}
(0,0)
H (0,1)
H
=
H (1,0) H (1,1) − 41 {1, 1} 1
2I
We see here that the upper filter has most filter coefficients in the first column,
so that we must start with an elementary lifting of even type. We need to find a
filter S1 so that S1 {−1/4, −1/4} + {−1/4, 3/2, −1/4} has fewer filter coefficients
than {−1/4, 3/2, −1/4}. It is clear that we can choose S1 = {−1, −1}, and that
1 3 1 1
{− 4 , 2 , − 4 } 2 {1, 1}
I {−1, −1} 2I 0
Λ1 H = =
0 I − 14 {1, 1} 1
2I
− 41 {1, 1} 1
2I
Now we need to apply an elementary lifting of odd type, and we need to find a
filter S2 so that S2 I − 14 {1, 1} = 0. Clearly we can choose S2 = {1/8, 1/8}, and
we get
I 0 2I 0 2I 0
Λ2 Λ1 H = 1 = .
8 {1, 1} I − 14 {1, 1} 1
2I 0 1
2I
Multiplying with inverses of elementary lifting steps, we now obtain that the
polyphase representations of the kernels for the Spline 5/3 wavelet are
I {1, 1} I 0 2I 0
H=
0 I − 18 {1, 1} I 0 1
2I
and
1
2I 0 I 0 I {−1, −1}
G= 1 ,
0 2I 8 {1, 1} I 0 I
respectively. Two lifting steps are thus required. We also see that the lifting
steps involve only dyadic fractions, just as the filter coefficients did. This means
that the lifting factorization also can be used for lossless operations.
use these wavelets in implementations later they will use precomputed values of
these lifting steps, and you can take these implementations for granted too. If
we run the algorithm for computing the lifting factorization we obtain that the
polyphase representations of the kernels H and G for the CDF 9/7 wavelet are
I 0.5861{1, 1} I 0 I −0.0700{1, 1}
0 I 0.6681{1, 1} I 0 I
I 0 −1.1496 0
× and
−1.2002{1, 1} I 0 −0.8699
−0.8699 0 I 0 I 0.0700{1, 1}
0 −1.1496 1.2002{1, 1} I 0 I
I 0 I −0.5861{1, 1}
× ,
−0.6681{1, 1} I 0 I
In this case it is seen that all filters have equally many filter coefficients with
positive and negative indices, so that P1 holds also here.
Now let us turn to the first lifting step. We will choose it so that the number
of filter coefficients in the first column is reduced with 1, and so that H (0,0) has
an odd number of coefficients. If L is even, we saw that H (0,0) and H (1,0) had
an even number of coefficients, so that the first lifting step must be even. To
preserve P1, we must cancel t−L , so that the first lifting step is
I −t−L /s−L+1
Λ1 = .
0 I
If L is odd, we saw that H (0,0) and H (1,0) had an odd number of coefficients, so
that the first lifting step must be odd. To preserve P1, we must cancel sL , so
that the first lifting step is
I 0
Λ1 = .
−sL /tL−1 I
Now that we have a difference of one filter coefficent in the first column, we
will reduce the entry with the most filter coefficients with two with a lifting step,
until we have H (0,0) = {K}, H (1,0) = 0 in the first column.
Assume first that H (0,0) has the most filter coefficients. We then need to
apply an even lifting step. Before an even step, the first column has the form
{t−k , . . . , t−1 , t0 , t1 , . . . , tk }
.
{s−k , . . . , s−1 , s0 , s1 , . . . , sk−1 }
I {−t−k /s−k , −tk /sk−1 }
We can then choose Λi = as a lifting step.
0 I
Assume then that H (1,0) has the most filter coefficients. We then need to
apply an odd lifting step. Before an odd step, the first column has the form
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS299
{t−k , . . . , t−1 , t0 , t1 , . . . , tk }
.
{s−k−1 , . . . , s−1 , s0 , s1 , . . . , sk }
I 0
We can then choose Λi = as a lifting step.
{−s−k−1 /t−k , −sk /tk } I
α {0, K}
If L is even we end up with a matrix on the form , and we can
0 β
I {0, −K/β}
choose the final lifting step as Λn = .
0 I
If L is odd we end up with a matrix on the form
α K
,
0 β
I −K/β
and we can choose the final lifting step as Λn = . Again using
0 I
equations (8.9)-(8.10), this gives us the lifting factorizations.
In summary we see that all even and odd lifting steps take the form
I {λ1 , λ2 } I 0
and . We see that symmetric lifting steps cor-
0 I λ1 , λ2 } I
respond to the special case when λ1 = λ2 . The even and odd lifting matrices
now used are
1 λ1 0 0 ··· 0 0 λ2 1 0 0 0 ··· 0 0 0
0
1 0 0 ··· 0 0 0
λ2
1 λ1 0 ··· 0 0 0
0 λ2 1 λ1 ··· 0 0 0 0 0 1 0 ··· 0 0 0
.. and .. .. ,
.. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . .
0 0 0 0 ··· λ2 1 λ1 0 0 0 0 ··· 0 1 0
0 0 0 0 ··· 0 0 1 λ1 0 0 0 ··· 0 λ2 1
(8.16)
respectively. We note that when we reduce elements to the left and right in
the upper and lower part of the first column, the same type of reductions must
occur in the second column, since the determinant H (0,0) H (1,1) − H(0, 1)H (1,0)
is a constant after any number of lifting steps.
This example explains the procedure for finding the lifting factorization
into steps of the form given in Equation (8.16). You will be spared the details
of writing an implementation which applies this procedure. In order to use
orthornormal wavelets in implementations, we have implemented a function
liftingfactortho, which takes N as input, and computes the steps in a lifting
factorization so that (8.8) holds. These are written to file, and read from file
when needed (you need not call liftingfactortho yourself, this is handled
behind the curtains). In the exercises, you will be asked to implement both these
non-symmetric elementary lifting steps, as well as the full kernel transformations
for orthonormal wavelets.
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS300
dwt_kernel_53(x, bd_mode)
idwt_kernel_53(x, bd_mode)
which implement the DWT and IDWT kernel transformations for the Spline 5/3
wavelet. Use the lifting factorization obtained in Example 8.2.2.
b) Write functions
dwt_kernel_97(x, bd_mode)
idwt_kernel_97(x, bd_mode)
which implement the DWT and IDWT kernel transformations for the CDF 9/7
wavelet. Use the lifting factorization obtained in Example 8.2.3.
c) In Chapter 5, we listened to the low-resolution approximations and detail
components in sound for three different wavelets. Repeat these experiments
with the Spline 5/3 and the CDF 9/7 wavelet, using the new kernels we have
implemented in this exercise.
d) Plot all scaling functions and mother wavelets for the Spline 5/3 and the CDF
9/7 wavelets, using the cascade algorithm and the kernels you have implemented.
a) Write functions
which apply the elementary lifting matrices (8.16) to x. Assume that N is even.
b) Write functions
which apply the DWT and IDWT kernel transformations for orthonormal
wavelets to x. You should call the functions lifting_even and lifting_odd.
You can assume that you can access the lifting steps so that the lifting factor-
ization (8.8) holds, through the object filters by writing filters.lambdas,
filters.alpha, and filters.beta. filters.lambdas is an n × 2-matrix so
that the filter coefficients {λ1 , λ2 } or {λ1 , λ2 } in the i’th lifting step is found in
row i. Recall that the last lifting step was even.
Due to the filters object, the functions dwt_kernel_ortho and idwt_kernel_ortho
do not abide to the signature we have required for kernel functions up to now.
The code base creates such functions based on the functions above in the following
way:
filters = ...
dwt_kernel = lambda x, bd_mode: dwt_kernel_ortho(x, filters, bd_mode)
PD1 ←(φ1 ,ψ̂1 ) = PD1 ←(φ1 ,ψ1 ) P(φ1 ,ψ1 )←(φ1 ,ψ̂1 ) .
By inversion, find also a lifting factorization of H.
1
H0 = {−5, 20, −1, −96, 70, 280, 70, −96, −1, 20, −5}
128
1
H1 = {1, −4, 6, −4, 1}
16
1
G0 = {1, 4, 6, 4, 1}
16
1
G1 = {5, 20, 1, −96, −70, 280, −70, −96, 1, 20, 5}.
128
a) Show that
1
I − 14 {1, 1}
1
I − 128 {5, −29, −29, 5} I 0 0
G= 4 .
0 I −{1, 1} I 0 I 0 4
From this we can easily derive the lifting factorization of G.
b) Listen to the low-resolution approximations and detail components in sound
for this wavelet.
c) Plot all scaling functions and mother wavelets for this wavelet, using the
cascade algorithm.
63 X
X 7
= (−1)r cos (2π(n + 1/2)(m − 16)/64) hm+64r x32s−(m+64r)−1
m=0 r=0
63
X 7
X
= cos (2π(n + 1/2)(m − 16)/64) (−1)r hm+32·2r x32(s−2r)−m−1 .
m=0 r=0
63
X 15
X
cos (2π(n + 1/2)(m − 16)/64) Vr(m) x32(s−r)−m−1
m=0 r=0
63 15
(32−m−1)
X X
= cos (2π(n + 1/2)(m − 16)/64) Vr(m) xs−1−r
m=0 r=0
63
X
= cos (2π(n + 1/2)(m − 16)/64) (V (m) x(32−m−1) )s−1 ,
m=0
63
X
z (n) = cos (2π(n + 1/2)(m − 16)/64) IV (m) x(32−m−1) .
m=0
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS304
cos (2π(0 + 1/2) · (−16)/64) I ··· cos (2π(0 + 1/2) · (47)/64) I
z=
.. .. ..
. . .
cos (2π(31 + 1/2) · (−16)/64) I ··· cos (2π(31 + 1/2) · (47)/64) I
V (0)
0 ··· 0 0 (31)
0 V (1)
··· 0 0 x(30)
x
× ... .. .. .. ..
.. .
. . . . .
0 0 ··· V (62) 0
(63) x(−32)
0 0 ··· 0 V
If we place the 15 first columns in the cosine matrix last using Property (6.35)
(we must then also place the 15 first rows last in the second matrix), we obtain
cos (2π(0 + 1/2) · (0)/64) I ··· cos (2π(0 + 1/2) · (63)/64) I
z=
.. .. ..
. . .
cos (2π(31 + 1/2) · (0)/64) I ··· cos (2π(31 + 1/2) · (63)/64) I
··· V (16) ···
0 0 0
.. .. .. .. .. .. x(31)
. . . . . . (30)
x
0 ··· 0 0 ··· V (63)
× .. .
−V (0) ··· ··· 0 ··· 0
.
. .. .. .. .. (−32)
.. 0 x
. . . .
0 ··· −V (15) 0 ··· 0
Using Equation (6.36) to combine column k and 64 − k in the cosine matrix (as
well as row k and 64 − k in the second matrix), we can write this as
x(31)
cos (2π(0 + 1/2) · (0)/64) I ··· cos (2π(0 + 1/2) · (31)/64) I
.. .. x(30)
.. 0
A B0
.
. . . ..
.
cos (2π(31 + 1/2) · (0)/64) I ··· cos (2π(31 + 1/2) · (31)/64) I
x(−32)
where
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS305
x(31)
x(30)
4(D32 )T A B ,
..
.
x(−32)
√
where A and B are the √ matrices A0 , B 0 with the first row multiplied by 2
(i.e. replace V (16) with 2V (16) in the matrix A0 ). Using that x(−i) = E1 xi for
1 ≤ i ≤ 32, we can write this as
x(31)
..
(0) .
(31)
x
E1 x(31)
x T . ..
4(D32 )T A
E1 x(31) = 4(D32 ) A .. + B
B ,
.
(0) (0)
x E1 x
..
.
E1 x(0)
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS306
√
0 0 ··· 0 2V (16) 0 ··· 0
0 0 ··· V (15) 0 V (17) ··· 0
.. .. .. .. ..
.. ..
. . . . . . . x(31)
0 V (1) ··· 0 0 0 ··· V (31)
.
4(D32 )T
.. ,
V (0) + E1 V (32) 0 ··· 0 0 0 ··· 0
x(0)
0 E1 V (33) ··· 0 0 0 ··· −E1 V (63)
.. .. .. .. .. .. .. ..
. . . . . . . .
0 0 ··· E1 V (47) 0 −E1 V (49) ··· 0
√
0 ··· 0 2V (16) 0 ··· 0 0
0 ··· V (17) 0 V (15) ··· 0 0
.. .. .. .. .. ..
.. ..
. . . . . . . . x(0)
V (31) ··· 0 0 0 ··· V (1) 0 .
4(D32 )T
.. .
0 ··· 0 0 0 ··· 0 V (0) + E1 V (32)
x(31)
−E1 V (63) ··· 0 0 0 ··· E1 V (33) 0
.. .. .. .. .. .. .. ..
. . . . . . . .
0 ··· −E1 V (49) 0 E1 V (47) ··· 0 0
√
0 ··· 0 2V (16) 0 ··· 0 0
(17) (15)
0 ··· V 0 V ··· 0 0
.. .. .. .. .. ..
.. ..
. . . . . . . .
(31)
T V
··· 0 0 0 ··· V (1) 0
4(D32 ) (0)
(32)
0 ··· 0 0 0 ··· 0 V + E1 V
−E1 V (63) ··· 0 0 0 ··· E1 V (33) 0
.. .. .. .. .. .. .. ..
. . . . . . . .
0 ··· −E1 V (49) 0 E1 V (47) ··· 0 0
(8.20)
Due to Theorem 6.26, it is also very simple to write down the polyphase
factorization of the reverse filter bank transform as well. Since E481 GT is a
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS307
forward filter bank transform where the prototype filter has been reversed,
E481 GT can be factored as above, with V (m) replaced by W (m) , with W (m)
being the filters derived from the synthesis prototype filter in reverse order. This
means that the polyphase form of G can be factored as
Now, if we define U (m) as the filters derived from the synthesis prototype filter
itself, we have that
for 1 ≤ i ≤ 15. These make out submatrices in the matrices in equations (8.20)
and (8.22). Clearly, only the product of these matrices influence the result. Since
This result is the key ingredient we need in order to construct forward and
reverse systems which together give perfect reconstuction. In Exercise 8.15 we
go through how we can use lifting in order to express a wide range of possible
(U, V ) matrix pairs which satisfy Equation (8.25). This turns the problem of
constructing cosine-modulated filter banks which are useful for audio coding
into an optimization problem: the optimization variables are values λi which
characterize lifting steps, and the objective function is the deviation of the
corresponding prototype filter from an ideal bandpass filter. This optimization
problem has been subject to a lot of research, and we will not go into details on
this.
−U (32+i) U (i)
(32−i)
V (i)
V
−U (64−i) −U (32−i) −V (64−i) V (32+i)
−V (32+i) V (i)
(32−i)
V (i)
V
=32 .
−V (64−i) −V (32−i) −V (64−i) V (32+i)
Substituting for V (32+i) and V (64−i) after what we found by inspection now
gives
√ √
( 2V (16) )(− 2U (48) )
= −64V (16) V (48) = 64E14 V (16) (V (16) )T = 32E14 (V (16) (V (16) )T + V (16) (V (16) )T )
(8.29)
and
We see that the filters from equations (8.28)-(8.30) are similar, and that we thus
can combine them into
GH = 16 · 32cE33+448 = 512cE481 I.
This explains the observation from the MP3 standard that GH seems to be close
to E481 . Since all the filters V (i) (V (i) )T + V (32−i) (V (32−i) )T are symmetric, GH
is also a symmetric filter due to Theorem 8.3, so that its frequency response is
real, so that we have no phase distortion. We can thus summarize our findings
as follows.
Observation 8.14. MP3 standard.
The prototype filters from the MP3 standard do not give perfect reconstruc-
tion. They are found by choosing 17 filters {V (k) }16 k=0 so that the filters from
Equation (8.31) are equal, and so that their combination into a prototype filter
using equations (8.19) and (8.26) is as close to an ideal bandpass filter as possible.
When we have equality the alias cancellation condition is satisfied, and we also
1
have no phase distortion. When the common value is close to 512 I, GH is close
to E481 , so that we have near-perfect reconstruction.
This states clearly the optimization problem which the values stated in the
MP3 standard solves.
511
X
z32(s−1)+n = cos((n + 1/2)(k + 1/2 − 16)π/32)hk x32s−k−1 , (8.32)
k=0
i.e. 1/2 is added inside the cosine. We now have the properties
cos (2π(n + 1/2)(k + 64r + 1/2)/(2N )) = (−1)r cos (2π(n + 1/2)(k + 1/2)/(2N ))
(8.33)
cos (2π(n + 1/2)(2N − k − 1 + 1/2)/(2N )) = − cos (2π(n + 1/2)(k + 1/2)/(2N )) .
(8.34)
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS311
63
X
z (n) = cos (2π(n + 1/2)(m + 1/2 − 16)/64) IV (m) x(32−m−1) ,
m=0
where the filters V (m) are defined as before. As before placing the 15 first
columns of the cosine-matrix last, but instead using Property (8.34) to combine
columns k and 64 − k − 1 of the cosine-matrix, we can write this as
x(31)
cos (2π(0 + 1/2) · (0 + 1/2)/64) I ··· cos (2π(0 + 1/2) · (31 + 1/2)/64) I
.. .. .. B ...
A
. . .
cos (2π(31 + 1/2) · (0 + 1/2)/64) I ··· cos (2π(31 + 1/2) · (31 + 1/2)/64) I x(−32)
where
V (15) V (16)
··· ··· ···
0 0 0
.. .. .. .. .. .. .. ..
.
. . . . . . .
(1)
0
V ··· 0 0 ··· V (30) 0
(0)
A = V
0 ··· 0 0 ··· ··· V (31)
0
0 ··· 0 0 0 ··· 0
. .. .. .. .. .. .. ..
.. . . . . . . .
0 0 ··· 0 0 0 ··· 0
··· ···
0 0 0 0 0 0
.. .. .. .. .. .. .. ..
. . . . . . . .
V(32) 0 ··· 0 0 0 ··· −V (63)
B= .
0
V (33) ··· 0 0 ··· −V (62) 0
. .. .. .. .. .. .. ..
.. . . . . . . .
0 0 ··· V (47) −V (48) ··· ··· 0
q
M (iv)
Since the cosine matrix can be written as 2 DM , the above can be written as
(31)
x
(iv) .
4DM A B .. .
x(−32)
As before we can rewrite this as
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS312
x(31)
..
(0) .
(31)
x
E1 x(31)
(iv) x (iv) . ..
4DM A E1 x(31) = 4DM A .. + B
B ,
.
(0) (0)
x E1 x
..
.
E1 x(0)
T
V (1) V (2) (W (1) )T (W (2) )T
(1)
−W (3)
(1)
V (2)
W V
VW = =
−V (3) V (4) W (2) W (4) −V (3) V (4) −(W (3) )T (W (4) )T
T
V (1) V (2) E15 (W (1) )T E15 (W (2) )T
E15 0
= = I.
−V (3) V (4) −E15 (W (3) )T E15 (W (4) )T 0 E15
Now, the matrices U (i) = E15 (W (i) )T are on the form stated in Equation (8.19),
and we have that
(1)
V (2)
(1)
U (2)
V U E−15 0
=
−V (3) V (4) −U (3) U (4) 0 E−15
We can now conclude from Theorem 8.13 that if we define the synthesis prototype
filter as therein, and set c = 1, d = −15, we have that GH = 16E481−32·15 =
16E1 .
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS315
8.4 Summary
We defined the polyphase representation of a matrix, and proved some useful
properties. For filter bank transforms, the polyphase representation was a block
matrix where the blocks are filters, and these blocks/filters were called polyphase
components. In particular, the filter bank transforms of wavelets were 2 × 2-block
matrices of filters. We saw that, for wavelets, the polyphase representation could
be realized through a rearrangement of the wavelet bases, and thus paralleled
the development in Chapter 6 for expressing the DWT in terms of filters, where
we instead rearranged the target base of the DWT.
We showed with two examples that factoring the polyphase representation
into simpler matrices (also refered to as a polyphase factorization) could be
a useful technique. First, for wavelets (M = 2), we established the lifting
factorization. This is useful not only since it factorizes the DWT and the IDWT
into simpler operations, but also since it reduces the number of arithmetic
operations in these. The lifting factorization is therefore also used in practical
implementations, and we applied it to some of the wavelets we constructed in
Chapter 7. The JPEG2000 standard document [21] explains a procedure for
implementing some of these wavelet transforms using lifting, and the values of
the lifting steps used in the standard thus also appear here.
The polyphase representation was also useful for proving the characterization
of wavelets we encountered in Chapter 7, which we used to find expressions for
many useful wavelets.
The polyphase representation was also useful to explain how the prototype
filters of the MP3 standard should be chosen, in order for the reverse filter bank
transform to invert the forward filter bank transform. Again this was attacked
by factoring the polyphase representation of the forward and reverse filter bank
transforms. The parts of the factorization which represented the prototype
filters were represented by a sparse matrix, and it was clear from this matrix
what properties we needed to put on the prototype filter, in order to have alias
cancellation, and no phase distortion. In fact, we proved that the MP3 standard
could not possible give perfect reconstruction, but it was very clear from our
construction how the filter bank could be modified in order for the overall system
to provide perfect reconstruction.
The lifting scheme as introduced here was first proposed by Sweldens [45].
How to use lifting for in-place calculation for the DWT was also suggested by
Sweldens [44].
This development concludes the one-dimensional aspect of wavelets in this
book. In the following we will extend our theory to also apply for images. Images
will be presented in Chapter 9. After that we will define the tensor product
concept, which will be the key ingredient to apply wavelets to two-dimensional
objects such as images.
Chapter 9
Digital images
316
CHAPTER 9. DIGITAL IMAGES 317
Light.
Fact 9.1. Light.
Light is electromagnetic radiation with wavelengths in the range 400–700 nm
(1 nm is 10−9 m): Violet has wavelength 400 nm and red has wavelength 700
nm. White light contains roughly equal amounts of all wave lengths.
Digital output media. Our focus will be on objects that emit light, for
example a computer display. A computer monitor consists of a matrix of small
dots which emit light. In most technologies, each dot is really three smaller dots,
and each of these smaller dots emit red, green and blue light. If the amounts of
red, green and blue is varied, our brain merges the light from the three small
light sources and perceives light of different colors. In this way the color at each
set of three dots can be controlled, and a color image can be built from the total
number of dots.
It is important to realise that it is possible to generate most, but not all,
colors by mixing red, green and blue. In addition, different computer monitors
use slightly different red, green and blue colors, and unless this is taken into
consideration, colors will look different on the two monitors. This also means
that some colors that can be displayed on one monitor may not be displayable
on a different monitor.
Printers use the same principle of building an image from small dots. On
most printers however, the small dots do not consist of smaller dots of different
colors. Instead as many as 7–8 different inks (or similar substances) are mixed
to the right color. This makes it possible to produce a wide range of colors, but
not all, and the problem of matching a color from another device like a monitor
is at least as difficult as matching different colors across different monitors.
CHAPTER 9. DIGITAL IMAGES 318
Video projectors builds an image that is projected onto a wall. The final
image is therefore a reflected image and it is important that the surface is white
so that it reflects all colors equally.
The quality of a device is closely linked to the density of the dots.
Fact 9.2. Resolution.
The resolution of a medium is the number of dots per inch (dpi). The number
of dots per inch for monitors is usually in the range 70–120, while for printers it is
in the range 150–4800 dpi. The horizontal and vertical densities may be different.
On a monitor the dots are usually referred to as pixels (picture elements).
Digital input media. The two most common ways to acquire digital images
is with a digital camera or a scanner. A scanner essentially takes a photo of a
document in the form of a matrix of (possibly colored) dots. As for printers, an
important measure of quality is the number of dots per inch.
Fact 9.3. Printers.
The resolution of a scanner usually varies in the range 75 dpi to 9600 dpi,
and the color is represented with up to 48 bits per dot.
For digital cameras it does not make sense to measure the resolution in dots
per inch, as this depends on how the image is printed (its size). Instead the
resolution is measured in the number of dots recorded.
Fact 9.4. Pixels.
The number of pixels recorded by a digital camera usually varies in the range
320 × 240 to 6000 × 4000 with 24 bits of color information per pixel. The total
number of pixels varies in the range 76 800 to 24 000 000 (0.077 megapixels to
24 megapixels).
For scanners and cameras it is easy to think that the more dots (pixels), the
better the quality. Although there is some truth to this, there are many other
factors that influence the quality. The main problem is that the measured color
information is very easily polluted by noise. And of course high resolution also
means that the resulting files become very big; an uncompressed 6000 × 4000
image produces a 72 MB file. The advantage of high resolution is that you can
magnify the image considerably and still maintain reasonable quality.
that denote the amount of red, green and blue at the point (i, j).
Note that, when referring to the coordinates (i, j) in an image, i will refer to
row index, j to column index, in the same was as for matrices. In particular,
−1
the top row in the image have coordinates {(0, j)}N j=0 , while the left column in
M −1
the image has coordinates {(i, 0)}i=0 . With this notation, the dimension of the
image is M × N . The value pi,j gives the color information at the point (i, j).
It is important to remember that there are many formats for this. The simplest
case is plain black and white images in which case pi,j is either 0 or 1. For
grey-level images the intensities are usually integers in the range 0–255. However,
we will assume that the intensities vary in the interval [0, 1], as this sometimes
simplifies the form of some mathematical functions. For color images there are
many different formats, but we will just consider the rgb-format mentioned in
the fact box. Usually the three components are given as integers in the range
0–255, but as for grey-level images, we will assume that they are real numbers
in the interval [0, 1] (the conversion between the two ranges is straightforward,
see Example 9.3 below).
In Figure 9.1 we have shown the test image we will work with, called the
Lena image. It is named after the girl in the image. This image is also used as a
test image in many textbooks on image processing.
In Figure 9.2 we have shown the corresponding black and white, and grey-level
versions of the test image.
Fact 9.6. Intensity.
In these notes the intensity values pi,j are assumed to be real numbers in the
interval [0, 1]. For color images, each of the red, green, and blue intensity values
are assumed to be real numbers in [0, 1].
CHAPTER 9. DIGITAL IMAGES 320
Figure 9.2: Black and white (left), and grey-level (right) versions of the image
in Figure 9.1.
Figure 9.3: 18 × 18 pixels excerpt of the color image in Figure 9.1. The grid
indicates the borders between the pixels.
If we magnify the part of the color image in Figure 9.1 around one of the
eyes, we obtain the images in figures 9.3-9.4. As we can see, the pixels have
been magnified to big squares. This is a standard representation used by many
programs — the actual shape of the pixels will depend on the output medium.
Nevertheless, we will consider the pixels to be square, with integer coordinates
at their centers, as indicated by the grids in figures 9.3-9.4.
Fact 9.7. Shape of pixel.
CHAPTER 9. DIGITAL IMAGES 321
The pixels of an image are assumed to be square with sides of length one,
with the pixel with value pi,j centered at the point (i, j).
X = double(imread(’filename.fmt’, ’fmt’))
the image with the given path and format is read, and stored in the matrix
which we call X. ’fmt’ can be ’jpg’,’tif’, ’gif’, ’png’, and so on. This parameter is
optional: If it is not present, the program will attempt to determine the format
from the first bytes in the file, and from the filename. After the call to imread,
we have a matrix where the entries represent the pixel values, and of integer
data type (more precisely, the data type uint8). To perform operations on the
image, we must first convert the entries to the data type double, as shown above.
Similarly, the function imwrite
can be used to write the image represented by a matrix to file. If we write
the image represented by the matrix X is written to the given path, in the given
format. Before the image is written to file, you see that we have converted the
matrix values back to the integer data type. In other words: imread and imwrite
both assume integer matrix entries, while operations on matrices assume double
matrix entries. If you want to print images you have created yourself, you can
use this function first to write the image to a file, and then send that file to
the printer using another program. Finally, we need an alternative to playing a
sound, namely displaying an image. The function imshow(uint8(X)) displays
the matrix X as an image in a separate window. Also here we needed to convert
the samples using the function uint8.
The following examples go through some much used operations on images.
map the interval [pmin , pmax ] back to [0, 1]. Below we have shown a function
mapto01 which achieves this task.
def mapto01(X):
minval, maxval = X.min(), X.max()
X -= minval
X /= (maxval-minval)
def contrastadjust(X,epsilon):
"""
Assumes that the values are in [0,255]
"""
X /= 255.
X += epsilon
log(X, X)
X -= log(epsilon)
X /= (log(1+epsilon)-log(epsilon))
X *= 255
def contrastadjust0(X,n):
"""
Assumes that the values are in [0,255]
"""
X /= 255.
X -= 1/2.
X *= n
arctan(X, X)
X /= (2*arctan(n/2.))
X += 1/2.0
X *= 255 # Maps the values back to [0,255]
Several examples of using this function will be shown below. A good question
here is why the functions min and max are called three times in succession. The
reason is that there is a third “dimension” in play, besides the spatial x- and
y-directions. This dimension describes the color components in each pixel, which
are usually the red-, green-, and blue color components.
Pr = (ri,j )m,n
i,j=1 , Pr = (gi,j )m,n
i,j=1 , Pr = (bi,j )m,n
i,j=1 .
As an example, let us first see how we can produce three separate images, showing
the R,G, and B color components, respectively. Let us take the image lena.png
used in Figure 9.1. When the image is read (first line below), the returned
object has three dimensions. The first two dimensions represent the spatial
directions (the row-index and column-index). The third dimension represents
the color component. One can therefore view images representing the different
color components with the help of the following code:
CHAPTER 9. DIGITAL IMAGES 324
X1 = zeros_like(img)
X1[:, :, 0] = img[:, :, 0]
X2 = zeros_like(img)
X2[:, :, 1] = img[:, :, 1]
X3 = zeros_like(img)
X3[:, :, 2] = img[:, :, 2]
Figure 9.5: The red, green, and blue components of the color image in Figure 9.1.
q̂i,j
qi,j = .
maxk,l q̂k,l
q
• Compute q̂i,j = 2 + g 2 + b2 for all i and j.
ri,j i,j i,j
q̂i,j
qi,j = .
maxk,l q̂k,l
In practice one of the last two methods are preferred, perhaps with a preference
for the last method, but the actual choice depends on the application. These
can be implemented as follows.
The results of applying these three operations can be seen in Figure 9.6.
Figure 9.6: Alternative ways to convert the color image in Figure 9.1 to a grey
level image.
CHAPTER 9. DIGITAL IMAGES 326
Figure 9.7: The negative versions of the corresponding images in Figure 9.6.
arctan(n(x − 1/2)) 1
fn (x) = + (9.1)
2 arctan(n/2) 2
ln(x + ) − ln
g (x) = . (9.2)
ln(1 + ) − ln
The first type of functions have quite large derivatives near x = 0.5 and will
therefore increase the contrast in images with a concentration of intensities with
value around 0.5. The second type of functions have a large derivative near
x = 0 and will therefore increase the contrast in images with a large proportion
of small intensity values, i.e., very dark images. Figure 9.8 shows some examples
of these functions. The three functions in the left plot in Figure 9.8 are f4 , f10 ,
and f100 , the ones shown in the plot are g0.1 , g0.01 , and g0.001 .
CHAPTER 9. DIGITAL IMAGES 327
1.0 1.0
0.8 0.8
0.6 0.6 ²=0.1
0.4 n=4 0.4 ²=0.01
0.2 n=10 0.2 ²=0.001
n=100
0.00.0 0.2 0.4 0.6 0.8 1.0 0.00.0 0.2 0.4 0.6 0.8 1.0
Figure 9.8: Some functions that can be used to improve the contrast of an
image.
Figure 9.9: The result after applying f10 and g0.01 to the test image.
In Figure 9.9 f10 and g0.01 have been applied to the image in the right part
of Figure 9.6. Since the image was quite well balanced, f10 made the dark areas
too dark and the bright areas too bright. g0.01 on the other hand has made the
image as a whole too bright.
Increasing the contrast is easy to implement. The following function uses the
contrast adjusting function from Equation (9.2), with as parameter.
def contrastadjust(X,epsilon):
"""
Assumes that the values are in [0,255]
"""
X /= 255.
X += epsilon
log(X, X)
X -= log(epsilon)
X /= (log(1+epsilon)-log(epsilon))
X *= 255
CHAPTER 9. DIGITAL IMAGES 328
def contrastadjust0(X,n):
"""
Assumes that the values are in [0,255]
"""
X /= 255.
X -= 1/2.
X *= n
arctan(X, X)
X /= (2*arctan(n/2.))
X += 1/2.0
X *= 255 # Maps the values back to [0,255]
This has been used to generate the right image in Figure 9.9.
fn (x) = xn ,
for all n maps the interval [0, 1] → [0, 1], and that f 0 (1) → ∞ as n → ∞.
b) The color image secret.jpg,shown in Figure 9.10, contains some informa-
tion that is nearly invisible to the naked eye on most computer monitors. Use
the function f (x), to reveal the secret message.
Hint. You will first need to convert the image to a greyscale image. You can
then use the function contrastadjust as a starting point for your own program.
CHAPTER 9. DIGITAL IMAGES 329
.. .. .. .. ..
. . . . .
· · · a−1,−1 a−1,0 a−1,1 · · ·
A= · · · a0,−1 a0,0 a0,1 · · ·
· · · a1,−1 a1,0 a 1,1 · · ·
.. .. .. .. ..
. . . . .
if we have that
X
(SX)i,j = ak1 ,k2 Xi−k1 ,j−k2 . (9.3)
k1 ,k2
In the molecule, indices are allowed to be both positive and negative, we underline
the element with index (0, 0) (the center of the molecule), and assume that ai,j
with indices falling outside those listed in the molecule are zero (as for compact
filter notation).
In Equation (9.3), it is possible for the indices i − k1 and j − k2 to fall
outside the legal range for X. We will solve this case in the same way as we
CHAPTER 9. DIGITAL IMAGES 330
did for filters, namely that we assume that X is extended (either periodically
or symmetrically) in both directions. The interpretation of a computational
molecule is that we place the center of the molecule on a pixel, multiply the
pixel and its neighbors by the corresponding weights ai,j in reverse order, and
finally sum up in order to produce the resulting value. This type of operation
will turn out to be particularly useful for images. The following result expresses
how computational molecules and filters are related. It states that, if we apply
one filter to all the columns, and then another filter to all the rows, the end
result can be expressed with the help of a computational molecule.
Theorem 9.9. Filtering and computational molecules.
Let S1 and S2 be filters with compact filter notation t1 and t2 , respectively,
and consider the operation S where S1 is first applied to the columns in the
image, and then S2 is applied to the rows in the image. Then S is an operation
which can be expressed in terms of the computational molecule ai,j = (t1 )i (t2 )j .
Proof. Let Xi,j be the pixels in the image. When we apply S1 to the columns
of X we get the image Y defined by
X
Yi,j = (t1 )k1 Xi−k1 ,j .
k1
X X X
Zi,j = (t2 )k2 Yi,j−k2 = (t2 )k2 (t1 )k1 Xi−k1 ,j−k2
k2 k2 k1
XX
= (t1 )k1 (t2 )k2 Xi−k1 ,j−k2 .
k1 k2
def S1(x):
filterS(S1, x, True)
def S2(x):
filterS(S2, x, True)
tensor_impl(X, S1, S2)
We have here used the function filterS to implement the filtering, so that we
assume that the image is periodically or symmetrically extended. The above
code uses symmetric extension, and can thus be used for symmetric filters. If
the filter is non-symmetric, we should use a periodic extension instead, for which
the last parameter to filterS should be changed.
is a basis for LM,N (R), the set of M × N -matrices. This basis is often referred
to as the standard basis for LM,N (R).
The standard basis thus consists of rank 1-matrices. An image can simply be
thought of as a matrix in LM,N (R), and a computational molecule is simply a
special type of linear transformation from LM,N (R) to itself. Let us also define
the tensor product of matrices.
Definition 9.13. Tensor product of matrices.
If S1 : RM → RM and S2 : RN → RN are matrices, we define the linear
mapping S1 ⊗ S2 : LM,N (R) → LM,N (R) by linear extension of (S1 ⊗ S2 )(ei ⊗
ej ) = (S1 ei ) ⊗ (S2 ej ). The linear mapping S1 ⊗ S2 is called the tensor product
of the matrices S1 and S2 .
A couple of remarks are in order. First, from linear algebra we know that,
when S is linear mapping from V and S(vi ) is known for a basis {vi }i of V , S is
uniquely determined. In particular, since the {ei ⊗ej }i,j form a basis, there exists
a unique linear transformation S1 ⊗S2 so that (S1 ⊗S2 )(ei ⊗ej ) = (S1 ei )⊗(S2 ej ).
This unique linear transformation is what we call the linear extension from
the values in the given basis. Clearly, by linearity, also (S1 ⊗ S2 )(x ⊗ y) =
(S1 x) ⊗ (S2 y), since
X X X
(S1 ⊗ S2 )(x ⊗ y) = (S1 ⊗ S2 )(( xi e i ) ⊗ ( yj ej )) = (S1 ⊗ S2 )( xi yj (ei ⊗ ej ))
i j i,j
X X
= xi yj (S1 ⊗ S2 )(ei ⊗ ej ) = xi yj (S1 ei ) ⊗ (S2 ej )
i,j i,j
X X X
= xi yj S1 ei ((S2 ej ))T = S1 ( xi ei )(S2 ( yj ej ))T
i,j i j
T
= S1 x(S2 y) = (S1 x) ⊗ (S2 y).
Here we used the result from Exercise 9.17. We can now prove the following.
Theorem 9.14. Compact filter notation and computational molecules.
If S1 : RM → RM and S2 : RN → RN are matrices of linear transformations,
then (S1 ⊗ S2 )X = S1 X(S2 )T for any X ∈ LM,N (R). In particular S1 ⊗ S2 is
the operation which applies S1 to the columns of X, and S2 to the resulting rows.
In other words, if S1 , S2 have compact filter notations t1 and t2 , respectively,
then S1 ⊗ S2 has computational molecule t1 ⊗ t2 .
We have not formally defined the tensor product of compact filter notations.
This is a straightforward extension of the usual tensor product of vectors, where
we additionally mark the element at index (0, 0).
Proof. We have that
This means that (S1 ⊗ S2 )X = S1 X(S2 )T for any X ∈ LM,N (R) also, since
equality holds on the basis vectors ei ⊗ ej . Since the matrix A with entries
ai,j = (t1 )i (t2 )j also can be written as t1 ⊗ t2 , the result follows.
We have thus shown that we alternatively can write S1 ⊗S2 for the operations
we have considered. This notation also makes it easy to combine several two-
dimensional filtering operations:
Corollary 9.15. Composing tensor products.
We have that (S1 ⊗ T1 )(S2 ⊗ T2 ) = (S1 S2 ) ⊗ (T1 T2 ).
Proof. By Theorem 9.14 we have that
(S1 ⊗T1 )(S2 ⊗T2 )X = S1 (S2 XT2T )T1T = (S1 S2 )X(T1 T2 )T = ((S1 S2 )⊗(T1 T2 ))X.
If the pixels in the image are pi,j , this means that we compute the new pixels by
1
p̂i,j = 4pi,j + 2(pi,j−1 + pi−1,j + pi+1,j + pi,j+1 )
16
+ pi−1,j−1 + pi+1,j−1 + pi−1,j+1 + pi+1,j+1 .
1
If we instead use the filter S = 64 {1, 6, 15, 20, 15, 6, 1} (row 6 from Pascal’s
triangle), we get the computational molecule
1 6 15 20 15 6 1
6 36 90 120 90 36 6
15 90 225 300 225 90 15
1 20 120 300 400 300 120
20. (9.7)
4096
15 90 225 300 225 90
15
6 36 90 120 90 36 6
1 6 15 20 15 6 1
We anticipate that both molecules give a smoothing effect, but that the second
molecules provides more smoothing. The result of applying the two molecules
in (9.6) and (9.7) to our greyscale-image is shown in the two right images in
Figure 9.11. With the help of the function tensor_impl, smoothing with the
first molecule (9.6) above can be obtained by writing
def S(x):
filterS([1., 2., 1.]/4., x, True);
tensor_impl(X, S, S)
To make the smoothing effect visible, we have zoomed in on the face in the
image. The smoothing effect is clarly best visible in the second image.
Figure 9.11: The two right images show the effect of smoothing the left image.
Smoothing effects are perhaps more visible if we use a simple image, as the
one in the left part of Figure 9.12.
Again we have used the filter S = 14 {1, 2, 1}. Here we also have shown what
happens if we only smooth the image in one of the directions. In the right
CHAPTER 9. DIGITAL IMAGES 335
Figure 9.12: The results of smoothing the simple image to the left in three
different ways.
image we have smoothed in both directions. We clearly see the union of the two
one-dimensional smoothing operations then.
lie outside the legal range: many of the intensities are in fact negative. More
specifically, the intensities turn out to vary in the interval [−0.424, 0.418]. Let
us therefore normalise and map all intensities to [0, 1]. This gives the second
image in Figure 9.13. The predominant color of this image is an average grey,
i.e. an intensity of about 0.5. To get more detail in the image we therefore try
to increase the contrast by applying the function f50 in equation (9.1) to each
intensity value. The result is shown in the third image in Figure 9.13. This does
indeed show more detail.
Figure 9.13: Experimenting with the partial derivative in the x-direction for
the image in 9.6. The left image has artefacts, since the pixel values are outside
the legal range. We therefore normalize the intensities to lie in [0, 255] (middle),
before we increase the contrast (right).
all columns of the image (alternatively, apply the tensor product −S ⊗ I to the
image), where S is the filter which we used for edge detection in the x-direction.
Note that the positive direction of this axis in an image is opposite to the
direction of the y-axis we use when plotting functions. We can express this in
terms of the computational molecule
0 1 0
1
0 0 0 .
2
0 −1 0
Let us compare the partial derivatives in both directions. The result is shown in
Figure 9.14.
The intensities have been normalised and the contrast enhanced by the
function f50 from Equation (9.1).
When the two first derivatives have been computed it is a simple matter to
compute the gradient vector and its length. Note that, as for the first order
derivatives, it is possible for the length of the gradient to be outside the legal
range of values. The computed gradient values, the gradient mapped to the legal
range, and the gradient with contrast adjusted, are shown in Figure 9.15.
Figure 9.15: The computed gradient (left). In the middle the intensities have
been normalised to the [0, 255], and to the right the contrast has been increased.
The image of the gradient looks quite different from the images of the two
partial derivatives. The reason is that the numbers that represent the length of
the gradient are (square roots of) sums of squares of numbers. This means that
the parts of the image that have virtually constant intensity (partial derivatives
close to 0) are colored black. In the images of the partial derivatives these
values ended up in the middle of the range of intensity values, with a final color
of grey, since there were both positive and negative values. To enhance the
contrast for this image we should thus do something different from what was
done in the other images, since we now have a large number of intensities near
0. The solution was to apply a function like the ones shown in the right plot in
Figure 9.8. Here we have used the function g0.01 .
Figure 9.14 shows the two first-order partial derivatives and the gradient. If
we compare the two partial derivatives we see that the x-derivative seems to
emphasise vertical edges while the y-derivative seems to emphasise horizontal
edges. This is precisely what we must expect. The x-derivative is large when
the difference between neighbouring pixels in the x-direction is large, which is
the case across a vertical edge. The y-derivative enhances horizontal edges for a
similar reason.
The gradient contains information about both derivatives and therefore
emphasises edges in all directions. It also gives a simpler image since the sign of
the derivatives has been removed.
2 0 0 0
∂ P 1 −2 1 ,
: (9.10)
∂x2
0 0 0
−1 0 1
∂2P 1
: 0 0 0 , (9.11)
∂y∂x 4
1 0 −1
0 1 0
∂2P 0 −2 0 .
: (9.12)
∂y 2
0 1 0
Figure 9.16: The second-order partial derivatives in the xx-, xy-, and yy-
directions, respectively. In all images, the computed numbers have been nor-
malised and the contrast enhanced.
The computed derivatives were first normalised and then the contrast en-
hanced with the function f100 in each image, see equation (9.1).
As for the first derivatives, the xx-derivative seems to emphasise vertical
edges and the yy-derivative horizontal edges. However, we also see that the
second derivatives are more sensitive to noise in the image (the areas of grey are
CHAPTER 9. DIGITAL IMAGES 340
less uniform). The mixed derivative behaves a bit differently from the other two,
and not surprisingly it seems to pick up both horizontal and vertical edges.
This procedure can be generalized to higher order derivatives also. To apply
∂ k+l P
∂xk ∂y l
to an image we can compute Sl ⊗ Sk where Sr corresponds to any point
method for computing the r’th order derivative. We can also compute (S l )⊗(S k ),
where we iterate the filter S = 12 {1, 0, −1} for the first derivative, but this gives
longer filters.
Figure 9.17: Different tensor products applied to the simple chess pattern image
shown in the upper left.
of pixel values (since they may be negative). The figures have taken this into
account by mapping the values back to a legal range of values, as we did in
Chapter 9. Finally, we also see additional edges at the first and last rows/edges
in the images. The reason is that the filter S is defined by assuming that the
pixels repeat periodically (i.e. it is a circulant Toeplitz matrix). Due to this, we
have additional edges at the first/last rows/edges. This effect can also be seen in
Chapter 9, although there we did not assume that the pixels repeat periodically.
Defining a two-dimensional filter by filtering columns and then rows is not
the only way we can define a two-dimensional filter. Another possible way is
to let the M N × M N -matrix itself be a filter. Unfortunately, this is a bad way
to define filtering of an image, since there are some undesirable effects near the
boundaries between rows: in the vector we form, the last element of one row
is followed by the first element of the next row. These boundary effects are
unfortunate when a filter is applied.
for n in range(N):
X[0, n] = 0.25*X[N-1, n] + 0.5*X[0, n] + 0.25*X[1, n]
X[1:(N-1), n] = 0.25*X[0:(N-2), n] + 0.5*X[1:(N-1), n] \
+ 0.25*X[2:N, n]
X[N-1, n] = 0.25*X[N-2, n] + 0.5*X[N-1, n] + 0.25*X[0, n]
for m in range(m):
X[m, 0] = 0.25*X[m, M-1] + 0.5*X[m, 0] + 0.25*X[m, 1]
X[m, 1:(M-1)] = 0.25*X[m, 0:(M-2)] + 0.5*X[m, 1:(M-1)] \
+ 0.25*X[m, 2:M]
X[m, M-1] = 0.25*X[m, M-2] + 0.5*X[m, M-1] + 0.25*X[m, 0]
Which tensor product is applied to the image? Comment what the code does, in
particular the first and third line in the inner for-loop. What effect does the
code have on the image?
x ⊗ y ∈ RM,N → x ⊗k y ∈ RM N
CHAPTER 9. DIGITAL IMAGES 344
thus stacks the rows of the input matrix into one large row vector, and transposes
the result.
b) Show that (A ⊗k B)(x ⊗k y) = (Ax) ⊗k (By). We can thus use any of
the defined tensor products ⊗, ⊗k to produce the same result, i.e. we have the
commutative diagram shown in Figure 9.18, where the vertical arrows represent
stacking the rows in the matrix, and transposing, and the horizontal arrows
represent the two tensor product linear transformations we have defined. In
particular, we can compute the tensor product in terms of vectors, or in terms
of matrices, and it is clear that the Kronecker tensor product gives the matrix
of tensor product operations.
A⊗B
x⊗y / (Ax) ⊗ (By)
A⊗k B
x ⊗k y / (Ax) ⊗k (By),
hx1 ⊗ y1 , x2 ⊗ y2 i = hx1 ⊗k y1 , x2 ⊗k y2 i.
Show that
where we have used the bi-linearity of the tensor product mapping (x, y) → x⊗y
(Exercise 9.17). This means that
(M −1,N −1) M −1 M −1
X X X
0= αi,j (vi ⊗ wj ) = vi ⊗ hi = vi hTi .
(i,j)=(0,0) i=0 i=0
PM −1
Column k in this matrix equation says 0 = i=0 hi,k vi , where hi,k are the
components in hi . By linear independence of the vi we must have that h0,k =
h1,k = · · · = hM −1,k = 0. Since this applies for all k, we must have that all
PN −1
hi = 0. This means that j=0 αi,j wj = 0 for all i, from which it follows by
linear independence of the wj that αi,j = 0 for all j, and for all i. This means
that B1 ⊗ B2 is a basis.
In particular, as we have already seen, the standard basis for LM,N (R) can be
written EM,N = EM ⊗ EN . This is the basis for a useful convention: For a tensor
product the bases are most naturally indexed in two dimensions, rather than
the usual sequential indexing. This difference translates also to the meaning
of coordinate vectors, which now are more naturally thought of as coordinate
matrices:
Definition 9.17. Coordinate matrix.
−1 N −1
Let B = {bi }M
i=0 , C = {cj }j=0 be bases for R
M
and RN , and let A ∈
in B ⊗ C we mean the M × N -matrix
LM,N (R). By the coordinate matrix of A P
X (with components Xkl ) such that A = k,l Xk,l (bk ⊗ cl ).
We will have use for the following theorem, which shows how change of
coordinates in RM and RN translate to a change of coordinates in the tensor
product:
CHAPTER 9. DIGITAL IMAGES 346
Y = S1 X(S2 )T . (9.13)
Proof. Denote the change of coordinates from B1 ⊗ B2 to C1 ⊗ C2 by S. Since
any change of coordinates is linear, it is enough to show that S(ei ⊗ ej ) =
S1 (ei ⊗ ej )(S2 )T for any i, j. We can write
! !
X X X
b1i ⊗ b2j = (S1 )k,i c1k ⊗ (S2 )l,j c2l = (S1 )k,i (S2 )l,j (c1k ⊗ c2l )
k l k,l
X X
= (S1 )k,i ((S2 )T )j,l (c1k ⊗ c2l ) = (S1 ei (ej )T (S2 )T )k,l (c1k ⊗ c2l )
k,l k,l
X
= (S1 (ei ⊗ ej )(S2 )T )k,l (c1k ⊗ c2l )
k,l
This shows that the coordinate matrix of b1i ⊗ b2j in C1 ⊗ C2 is S1 (ei ⊗ ej )(S2 )T .
Since the coordinate matrix of b1i ⊗ b2j in B1 ⊗ B2 is ei ⊗ ej , this shows that
S(ei ⊗ ej ) = S1 (ei ⊗ ej )(S2 )T . The result follows.
In both cases of filtering and change of coordinates in tensor products, we
see that we need to compute the mapping X → S1 X(S2 )T . As we have seen,
this amounts to a row/column-wise operation, which we restate as follows:
Observation 9.19. Change of coordinates in tensor products.
The change of coordinates from B1 ⊗ B2 to C1 ⊗ C2 can be implemented as
follows:
• For every column in the coordinate matrix in B1 ⊗ B2 , perform a change
of coordinates from B1 to C1 .
• For every row in the resulting matrix, perform a change of coordinates
from B2 to C2 .
The operation X → (S1 )X(S2 )T , which we now have encountered in two different
2
ways, is one particular type of linear transformation from RN to itself (see
Exercise 9.23 for how the matrix of this linear transformation can be constructed).
While a general such linear transformation requires N 4 multiplications (i.e. when
we perform a full matrix multiplication), X → (S1 )X(S2 )T can be implemented
generally with only 2N 3 multiplications (since multiplication of two N × N -
matrices require N 3 multiplications in general). The operation X → (S1 )X(S2 )T
is thus computationally simpler than linear transformations in general. In
practice the operations S1 and S2 are also computationally simpler, since they
can be filters, FFT’s, or wavelet transformations, so that the complexity in
X → (S1 )X(S2 )T can be even lower.
In the following examples, we will interpret the pixel values in an image as
coordinates in the standard basis, and perform a change of coordinates.
Figure 9.19: The effect on an image when it is transformed with the DFT, and
the DFT-coefficients below a certain threshold are zeroed out. The threshold
has been increased from left to right, from 100, to 200, and 400. The percentage
of pixel values that were zeroed out are 76.6, 89.3, and 95.3, respectively.
Figure 9.20: The effect on an image when it is transformed with the DCT, and
the DCT-coefficients below a certain threshold are zeroed out. The threshold
has been increased from left to right, from 30, to 50, and 100. The percentage of
pixel values that were zeroed out are 93.2, 95.8, and 97.7, respectively.
Figure 9.21: The effect on an image when it is transformed with the DCT, and
the DCT-coefficients below a certain threshold are zeroed out. The image has
not been split into blocks here, and the same thresholds as in Figure 9.20 were
used. The percentage of pixel values that were zeroed out are 93.2, 96.6, and
98.8, respectively.
IDFTImpl8, DCTImpl8, and IDCTImpl8 which apply the DFT, IDFT, DCT, and
IDCT, to consecutive segments of length 8.
threshold = 30
[M, N] = shape(X)[0:2]
for n in range(N):
FFTImpl(X[:, n], FFTKernelStandard)
for m in range(M):
FFTImpl(X[m, :], FFTKernelStandard)
X = X.*(abs(X) >= threshold)
for n in range(N):
FFTImpl(X[:, n], FFTKernelStandard, 0)
for m in range(M):
FFTImpl((X[m, :], FFTKernelStandard, 0)
Comment what the code does. Comment in particular on the meaning of the
parameter threshold, and what effect this has on the image.
9.5 Summary
We started by discussing the basic question what an image is, and took a closer
look at digital images. We then went through several operations which give
meaning for digital images. Many of these operations could be described in
terms of a row/column-wise application of filters, and more generally in term
of what we called computational molecules. We defined the tensor product,
and saw how our operations could be expressed within this framework. The
tensor product framework could also be used to state change of coordinates for
images, so that we could consider changes of coordinates such as the DFT and
the DCT also for images. The algorithm for computing filtering operations or
changes of coordinates for images turned out to be similar, in the sense that the
one-dimensional counterparts were simply assplied to the rows and the columns
in the image.
In introductory image processing textbooks, many other image processing
methods are presented. We have limited to the techniques presented here, since
CHAPTER 9. DIGITAL IMAGES 351
our interest in images is mainly for transformation operations which are useful
for compression. An excellent textbook on image processing which uses Matlab is
[18]. This contains important topics such as image restoration and reconstruction,
geometric transformations, morphology, and object recognition. None of these
are considered in this book.
In much literature, one only mentions that filtering can be extended to images
by performing one-dimensional filtering for the rows, followed by one-dimensional
filtering for the columns, without properly explaining why this is the natural
thing to do. The tensor product may be the most natural concept to explain this,
and a concept which is firmly established in mathematical literature. Tensor
products are usually not part of beginning courses in linear algebra. We have
limited the focus here to an introduction to tensor products, and the theory
needed to explain filtering an image, and computing the two-dimensional wavelet
transform. Some linear algebra books (such as [30]) present tensor products in
exercise form only, and often only mentions the Kronecker tensor product, as we
defined it.
Many international standards exist for compression of images, and we will
take a closer look at two of them in this book. The JPEG standard, perhaps the
most popular format for images on the Internet, applies a change of coordinates
with a two-dimensional DCT, as described in this chapter. The compression
level in JPEG images is selected by the user and may result in conspicuous
artefacts if set too high. JPEG is especially prone to artefacts in areas where
the intensity changes quickly from pixel to pixel. JPEG is usually lossy, but may
also be lossless and has become. The standard defines both the algorithms for
encoding and decoding and the storage format. The extension of a JPEG-file is
.jpg or .jpeg. JPEG is short for Joint Photographic Experts Group, and was
approved as an international standard in 1994. A more detailed description of
the standard can be found in [36].
The second standard we will consider is JPEG2000. It was developed to
address some of the shortcomings of JPEG, and is based on wavelets. The
standard document for this [21] does not focus on explaining the theory behind
the standard. As the MP3 standard document, it rather states step-by-step
procedures for implementing the standard.
The theory we present related to these image standards concentrate on
transforming the image (either with a DWT or a DCT) to obtain something
which is more suitable for (lossless or lossy) compression. However, many other
steps are also needed in order to obtain a full image compression system. One of
these is quantization. In the simplest form of quantization, every resulting sample
from the transformation is rounded to a fixed number of bits. Quantization can
also be done in more advanced ways than this: We have already mentioned that
the MP3 standard may use different number of bits for values in the different
subbands, depending on the importance of the samples for the human perception.
The JPEG2000 standard quantizes in such a way that there is bigger interval
around 0 which is quantized to 0, i.e. the rounding error is allowed to be bigger
in an interval around 0. Standards which are lossless do not apply quantization,
since this always leads to loss.
CHAPTER 9. DIGITAL IMAGES 352
• The case when the Si are smoothing filters gives rise to smoothing opera-
tions on images.
• A simple highpass filter, corresponding to taking the derivative, gives rise
to edge-detection operations on images.
Previously we have used the theory of wavelets to analyze sound. We would also
like to use wavelets in a similar way to analyze images. Since the tensor product
concept constructs two dimensional objects (matrices) from one-dimensional
objects (vectors), we are lead to believe that tensor products can also be used to
apply wavelets to images. In this chapter we will see that this can indeed be
done. The vector spaces we Vm encountered for wavelets were function spaces,
however. What we therefore need first is to establish a general definition of
tensor products of function spaces. This will be done in the first section of this
chapter. In the second section we will then specialize the function spaces to the
spaces Vm we use for wavelets, and interpret the tensor product of these and the
wavelet transform applied to images more carefully. Finally we will look at some
examples on this theory applied to some example images.
The examples in this chapter can be run from the notebook applinalgnbchap10.ipynb.
354
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES355
Z N Z M
hf1 ⊗ f2 , g1 ⊗ g2 i = f1 (t1 )f2 (t2 )g1 (t1 )g2 (t2 )dt1 dt2
0 0
Z M Z N
= f1 (t1 )g1 (t1 )dt1 f2 (t2 )g2 (t2 )dt2 = hf1 , g1 ihf2 , g2 i.
0 0
(10.2)
This means that for tensor products, a double integral can be computed as the
product of two one-dimensional integrals. This formula also ensures that inner
products of tensor products of functions obey the same rule as we found for
tensor products of vectors in Exercise 9.23.
The tensor product space defined in Definition 10.1 is useful for approximation
of functions of two variables if each of the two spaces of univariate functions
have good approximation properties.
Idea 10.2. Using tensor products for approximation.
If the spaces U1 and U2 can be used to approximate functions in one variable,
then U1 ⊗ U2 can be used to approximate functions in two variables.
We will not state this precisely, but just consider some important examples.
The tensor product space U1 ⊗ U1 now consists of all functions on the form
N
X
αk,l e2πikt1 /T e2πilt2 /T .
k,l=−N
One can show that this space has approximation properties similar to VN,T for
functions in two variables. This is the basis for the theory of Fourier series in
two variables.
In the following we think of U1 ⊗ U2 as a space which can be used for
approximating a general class of functions. By associating a function with the
vector of coordinates relative to some basis, and a matrix with a function in two
variables, we have the following parallel to Theorem 9.16:
Theorem 10.3. Bases for tensor products of function spaces.
−1 −1
If {fi }M
i=0 is a basis for U1 and {gj }N
j=0 is a basis for U2 , then {fi ⊗
(M −1,N −1)
gj }(i,j)=(0,0) is a basis for U1 ⊗ U2 . Moreover, if the bases for U1 and U2 are
orthogonal/orthonormal, then the basis for U1 ⊗ U2 is orthogonal/orthonormal.
Proof. The proof is similar to that of Theorem 9.16: if
(M −1,N −1)
X
αi,j (fi ⊗ gj ) = 0,
(i,j)=(0,0)
PN −1 PM −1
we define hi (t2 ) = j=0 αi,j gj (t2 ). It follows as before that i=0 hi (t2 )fi = 0
for any t2 , so that hi (t2 ) = 0 for any t2 due to linear independence of the fi . But
then αi,j = 0 also, due to linear independene of the gj . The statement about
orthogonality follows from Equation (10.2).
We can now define the tensor product of two bases of functions as before,
and coordinate matrices as before:
Definition 10.4. Coordinate matrix.
−1 −1
if B = {fi }M
i=0 and C = {gj }Nj=0 , we define B ⊗ C as the basis {fi ⊗
(M −1,N −1)
gj }(i,j)=(0,0) for U1 ⊗ U2 . We say that X is the coordinate matrix of f if
P
f (t1 , t2 ) = i,j Xi,j (fi ⊗ gj )(t1 , t2 ), where Xi,j are the elements of X.
Theorem 9.18 can also be proved in the same way in the context of function
spaces. We state this as follows:
Theorem 10.5. Change of coordinates in tensor products of function spaces.
Assume that U1 and U2 are function spaces, and that
Y = S1 X(S2 )T . (10.3)
2m−1
XN (0,1)
(cm−1,n1 ,n2 (φm−1,n1 ⊗ φm−1,n2 ) + wm−1,n1 ,n2 (φm−1,n1 ⊗ ψm−1,n2 )+
n1 ,n2 =0
(1,0) (1,1)
wm−1,n1 ,n2 (ψm−1,n1 ⊗ φm−1,n2 ) + wm−1,n1 ,n2 (ψm−1,n1 ⊗ ψm−1,n2 ))
(10.5)
• The cm−1 -values, i.e. the coordinates for Vm−1 ⊕ Vm−1 . This is the upper
left corner in Equation (10.6).
(0,1) (0,1)
• The wm−1 -values, i.e. the coordinates for Wm−1 . This is the upper right
corner in Equation (10.6).
(1,0) (1,0)
• The wm−1 -values, i.e. the coordinates for Wm−1 . This is the lower left
corner in Equation (10.6).
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES359
(1,1) (1,1)
• The wm−1 -values, i.e. the coordinates for Wm−1 . This is the lower right
corner in Equation (10.6).
(i,j)
The wm−1 -values are as in the one-dimensional situation often refered to as
wavelet coefficients. Let us consider the Haar wavelet as an example.
√ √
2 2 1
hφ1,k1 ⊗ φ1,k2 , φ0,n1 ⊗ φ0,n2 i = hφ1,k1 , φ0,n1 ihφ1,k2 , φ0,n2 i = =
2 2 2
when the supports intersect, we obtain
1
2 (φ0,k1 /2 ⊗ φ0,k2 /2 ) when k1 , k2 are even
1 (φ
0,k1 /2 ⊗ φ0,(k2 −1)/2 ) when k1 is even, k2 is odd
projV0 ⊗V0 (φ1,k1 ⊗φ1,k2 ) = 12
(φ0,(k1 −1)/2 ⊗ φ0,k2 /2 ) when k1 is odd, k2 is even
12
2 (φ0,(k1 −1)/2 ⊗ φ0,(k2 −1)/2 ) when k1 , k2 are odd
So, in this case there were 4 different formulas, since there were 4 different
combinations of even/odd. Let us also compute the projection onto the orthogonal
complement of V0 ⊗V0 in V1 ⊗V1 , and let us express this in terms of the φ0,n , ψ0,n ,
like we did in the one-variable case. Also here there are 4 different formulas.
When k1 , k2 are both even we obtain
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES360
1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.0 1.0
1.4 1.4
1.01.2
0.8 1.01.2
0.8
0.4 0.20.0
0.2 0.4 0.6 0.40.6
t_2 0.4 0.20.0
0.2 0.4 0.6 0.40.6
t_2
t_1 0.8 1.0 1.2 0.00.2
0.2 t_1 0.8 1.0 1.2 0.00.2
0.2
1.4 0.4 1.4 0.4
1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.0 1.0
1.4 1.4
1.01.2
0.8 1.01.2
0.8
0.4 0.20.0
0.2 0.4 0.6 0.40.6
t_2 0.4 0.20.0
0.2 0.4 0.6 0.40.6
t_2
t_1 0.8 1.0 1.2 0.00.2
0.2 t_1 0.8 1.0 1.2 0.00.2
0.2
1.4 0.4 1.4 0.4
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.2 0.2
0.4 0.4
2.0 2.0
1.5 1.5
1.0 0.5 1.0 1.0 0.5 1.0
0.5t_2 0.5t_2
0.0 0.5 0.0 0.0 0.5 0.0
t_1 1.0 1.5 0.5 t_1 1.0 1.5 0.5
2.0 1.0 2.0 1.0
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.2 0.2
0.4 0.4
2.0 2.0
1.5 1.5
1.0 0.5 1.0 1.0 0.5 1.0
0.5t_2 0.5t_2
0.0 0.5 0.0 0.0 0.5 0.0
t_1 1.0 1.5 0.5 t_1 1.0 1.5 0.5
2.0 1.0 2.0 1.0
10.2.1 Interpretation
An immediate corollary of Theorem 10.5 is the following:
Corollary 10.8. Implementing tensor product.
Let
Y = Am XATm (10.8)
T
X = Bm Y Bm (10.9)
Figure 10.3: Illustration of the different coordinates in a two level DWT2 before
the first stage is performed (left), after the first stage (middle), and after the
second stage (right).
to the same lowpass filter to the rows, and the right half after the DWT has
been subject to the same highpass filter to the rows.
These observations split the resulting matrix after DWT2 into four blocks,
with each block corresponding to a combination of lowpass and highpass filters.
The following names are thus given to these blocks:
The two letters indicate the type of filters which have been applied (L=lowpass,
H=highpass). The first letter indicates the type of filter which is applied to the
columns, the second indicates which is applied to the rows. The order is therefore
important. The name subband comes from the interpretation of these filters as
being selective on a certain frequency band. In conclusion, a block in the matrix
after the DWT2 corresponds to applying a combination of lowpass/higpass filters
to the rows of the columns of the image. Due to this, and since lowpass filters
extract slow variations, highpass filters abrupt changes, the following holds:
Observation 10.9. Visual interpretation of the DWT2.
After the DWT2 has been applied to an image, we expect to see the following:
• In the upper left corner, slow variations in both the vertical and horizontal
directions are captured, i.e. this is a low-resolution version of the image.
• In the upper right corner, slow variations in the vertical direction are
captured, together with abrupt changes in the horizontal direction.
• In the lower left corner, slow variations in the horizontal direction are
captured, together with abrupt changes in the vertical direction.
• In the lower right corner, abrupt changes in both directions appear are
captured.
(i,j)
the detail components from the Wk -spaces to zero, or the low-resolution
approximation from V0 ⊗ V0 to zero, depending on whether we want to inspect
the detail components or the low-resolution approximation. Finally we apply
the IDWT2 to end up with coordinates in φm ⊗ φm again, and display the new
image with pixel values equal to these coordinates.
DWT2Impl(X, 1, ’Haar’)
X = X[0:(shape(X)[0]/2), 0:(shape(X)[1]/2)]
mapto01(X); X *= 255
Note that here it is necessary to map the result back to [0, 255].
In Figure 10.7 the results are shown up to 4 resolutions. In Figure 10.8 we
have also shown the entire result after a 1- and 2-stage DWT2 on the image.
The first two thumbnail images can be seen as the the upper left corners of the
first two images. The other corners represent detail.
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES366
Figure 10.6: The chess pattern example image after application of the DWT2.
The Haar wavelet was used.
Figure 10.7: The corresponding thumbnail images for the Image of Lena,
obtained with a DWT of 1, 2, 3, and 4 levels.
Figure 10.8: The corresponding image resulting from a wavelet transform with
the Haar-wavelet for m = 1 and m = 2.
color indicates values which are close to 0. In other words, most of the coefficients
are close to 0.
Figure 10.9: Low resolution approximations of the Lena image, for the Haar
wavelet.
Figure 10.10: Detail of the Lena image, for the Haar wavelet.
which perform the m-level DWT2 and the IDWT2, respectively, on an image.
The arguments are the as those in DWTImpl_internal and IDWTImpl_internal,
with the input vector x replaced with a two-dimensional object/image. The
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES370
Figure 10.11: Low resolution approximations of the Lena image, for the CDF
9/7 wavelet.
functions should at each stage apply the kernel function f to the appropriate
rows and columns. If the image has several color components, the functions
should be applied to each color component (there are three color components in
the test image ’lena.png’).
for n in range(N):
c = (X[0:M:2, n] + X[1:M:2, n])/sqrt(2)
w = (X[0:M:2, n] - X[1:M:2, n])/sqrt(2)
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES371
Figure 10.12: Detail of the Lena image, for the CDF 9/7 wavelet.
a) Comment what the code does, and explain what you will see if you display X
as an image after the code has run.
b) The code above has an inverse transformation, which reproduce the original
image from the transformed values which we obtained. Assume that you zero
out the values in the lower left and the upper right corner of the matrix X after
the code above has run, and that you then reproduce the image by applying this
inverse transformation. What changes can you then expect in the image?
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES372
Figure 10.13: Image of Lena, with various bands of detail at the first level
(1,1) (1,0) (0,1)
zeroed out. From left to right, the detail at W1 , W1 , W1 , as illustrated
in Figure 10.4. The Spline 5/3 wavelet was used.
Figure 10.14: Image of Lena, with various bands of detail at the second level
(1,1) (1,0) (0,1)
zeroed out. From left to right, the detail at W2 , W2 , W2 , as illustrated
in Figure 10.5. The Spline 5/3 wavelet was used.
M, N = shape(X)
for n in range(N):
c = X[0:M:2, n] + X[1:M:2, n]
w = X[0:M:2, n] - X[1:M:2, n]
X[:, n] = vstack([c,w])
for m in range(M):
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES373
Figure 10.15: Image of Lena, with detail including level 3 and 4 zeroed out.
The Spline 5/3 wavelet was used.
Comment the code. Describe what will be shown in the upper left corner of
X after the code has run. Do the same for the lower left corner of the matrix.
What is the connection with the images (G0 ⊗ G0 )X, (G0 ⊗ G1 )X, (G1 ⊗ G0 )X,
and (G1 ⊗ G1 )X?
Figure 10.16: The corresponding detail for the image of Lena. The Spline 5/3
wavelet was used.
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES375
Figure 10.17: A simple image before and after one level of the DWT2. The
Haar wavelet was used.
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES376
Fingerprint images are a very specific type of images, as seen in Figure 10.18.
They differ from natural images by having a large number of abrupt changes.
One may ask whether other wavelets than the ones we have used up to now are
more suitable for compressing such images. After all, the technique of vanishing
moments we have used for constructing wavelets are most suitable when the
images display some regularity (as many natural images do). Extensive tests
were undertaken to compare different wavelets, and the CDF 9/7 wavelet used
by JPEG2000 turned out to perform very well, also for fingerprint images. One
advantage with the choice of this wavelet for the FBI standard is that one then
can exploit existing wavelet transformations from the JPEG2000 standard.
Besides the choice of wavelet, one can also ask other questions in the quest to
compress fingerprint images: What number of levels is optimal in the application
of the DWT2? And, while the levels in a DWT2 (see Figure 10.3) have an
interpretation as change of coordinates, one can apply a DWT2 to the other
subbands as well. This can not be interpreted as a change of coordinates, but
if we assume that these subbands have the same characteristics as the original
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES377
image, the DWT2 will also help us with compression when applied to them.
Let us illustrate how the FBI standard applies the DWT2 to the different
subbands. We will split this process into five stages. The subband structures
and the resulting images after stage 1-4 are illustrated in Figure 10.19 and in
Figure 10.20, respectively.
Figure 10.19: Subband structure after the different stages of the wavelet
applications in the FBI fingerprint compression scheme.
1. First apply the first stage in a DWT2. This gives the upper left corners in
the two figures.
2. Then apply a DWT2 to all four resulting subbands. This is different from
the DWT2, which only continues on the upper left corner. This gives the
upper right corners in the two figures.
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES378
3. Then apply a DWT2 in three of the four resulting subbands. This gives
the lower left corners.
4. In all remaining subbands, the DWT2 is again applied. This gives the
lower right corners.
Now for the last stage. A DWT2 is again applied, but this time only to the upper
left corner. The subbands are illustrated in Figure 10.21, and in Figure 10.22
the resulting image is shown.
When establishing the standard for compression of fingerprint images, the
FBI chose this subband decomposition. In Figure 10.23 we also show the
corresponding low resolution approximation and detail.
As can be seen from the subband decomposition, the low-resolution approxi-
mation is simply the approximation after a five stage DWT2.
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES379
Figure 10.22: The resulting image obtained with the subband decomposition
employed by the FBI.
The original JPEG2000 standard did not give the possibility for this type
of subband decomposition. This has been added to a later extension of the
standard, which makes the two standards more compatible. IN FBI’s system,
there are also other important parts besides the actual compression strategy,
such as fingerprint pattern matching: In order to match a fingerprint quickly
with the records in the database, several characteristics of the fingerprints are
stored, such as the number of lines in the fingerprint, and points where the lines
split or join. When the database is indexed with this information, one may not
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES380
Figure 10.23: The low-resolution approximation and the detail obtained by the
FBI standard for compression of fingerprint images, when applied to our sample
fingerprint image.
10.5 Summary
We extended the tensor product construction to functions by defining the tensor
product of functions as a function in two variables. We explained with some
examples that this made the tensor product formalism useful for approximation
of functions in several variables. We extended the wavelet transform to the tensor
product setting, so that it too could be applied to images. We also performed
several experiments on our test image, such as creating low-resolution images and
neglecting wavelet coefficients. We also used different wavelets, such as the Haar
wavelet, the Spline 5/3 wavelet, and the CDF 9/7 wavelet. The experiments
confirmed what we previously have proved, that wavelets with many vanishing
moments are better suited for compression purposes.
The specification of the JPGE2000 standard can be found in [21]. In [46],
most details of this theory is covered, in particular details on how the wavelet
coefficients are coded (which is not covered here).
One particular application of wavelets in image processing is the compression
of fingerprint images. The standard which describes how this should be performed
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES381
can be found in [15]. In [4], the theory is described. The book [16] uses the
application to compression of fingerprint images as an example of the usefulness
of recent developments in wavelet theory.
382
CHAPTER 11. THE BASICS AND APPLICATIONS 383
Due to its character this chapter is a “proof-free zone”, but in the remaining
text we usually give full proofs of the main results.
P (x) (x ∈ H)
means that the statement P (x) is true for all x ∈ H.
received the Nobel prize in economics 1 (in 1990) for his contributions in this
area:
P Pn
minimize α i,j≤n cij xi xj − j=1 µj xj
subject to P
n
j=1 xj = 1
xj ≥ 0 (j ≤ n).
The model may be understood as follows. The decision variables are x1 , x2 ,
. . . , xn where xi is the fraction of a total investment that is made in (say) stock
i. Thus one has available a set of stocks in different companies (Statoil, IBM,
Apple etc.) or bonds. The fractions xi must be nonnegative (so we consider no
short sale) and add up to 1. The function f to be minimized is
X n
X
f (x) = α cij xi xj − µj xj .
i,j≤n j=1
Nobel".
CHAPTER 11. THE BASICS AND APPLICATIONS 386
y = Fα (x)
for some function Fα : R → R. Here α = (α1 , α2 , . . . , αn ) ∈ Rn is a parameter
m
vector (so we may have several parameters). Perhaps there are natural constraints
on the parameter, say α ∈ A for a given set A in Rn .
For instance, consider
y = α1 cos x1 + xα
2
2
y = Fα (x) + error
since it is usually a simplification of the system one considers. In statistics
one specifies this error term as a random variable with some (partially) known
distribution. Sometimes one calls y the dependent variable and x the explaining
variable. The goal is to understand how y depends on x.
To proceed, assume we are given a number of observations of the phenomenon
given by points
(xi , y i ) (i = 1, 2, . . . , m).
meaning that one has observed y i corresponding to x = xi . We have m such
observations. Usually (but not always) we have m ≥ n. The model fit problem
is to adjust the parameter α so that the model fits the given data as good as
possible. ThisPleads to the optimization problem
m
minimize i=1 (y i − Fα (xi ))2 subject to α ∈ A.
The optimization variable is the parameter α. Here the model error is
quadratic (corresponding to the Euclidean norm), but other norms are also used.
This optimization problem above is a constrained nonlinear optimization
problem. When the function Fα depends linearly on α, which often is the case in
practice, the problem becomes the classical least squares approximation problem
which is treated in basic linear algebra courses. The solution is then characterized
by a certain linear system of equations, the so-called normal equations.
max p(x; α)
α
where x is fixed and the optimization variable is α. We may here add a constraint
on α, say α ∈ C for some set C, which may incorporate possible knowledge of
α and assure that p(x; α) is positive for α ∈ C. Often it is easier to solve the
equivalent optimization problem of maximizing the logarithm of the likelihood
function
max ln p(x; α)
α
x = Aα + w
where A is a given m × n matrix, α ∈ Rn is an unknown parameter, w ∈ Rm is a
random variable (the “noise”), and x ∈ Rn is the observed quantity. We assume
that the components of w, i.e., w1 , w2 , . . . , wm are independent and identically
distributed with common density function p on R. This leads to the likelihood
function
m
Y
p(x; α) = p(xi − ai α)
i=1
where ai is the i’th row in A. Taking the logarithm we obtain the maximum
likelihood problem
CHAPTER 11. THE BASICS AND APPLICATIONS 388
m
X
max ln p(xi − ai α).
i=1
n
! n
Y X
f (α) = − ln g(α) = − ln p(xi ; α) =− ln((1 + αxi )/2). (11.2)
i=1 i=1
We compute
n n
X xi /2 X xi
f 0 (α) = − =−
i=1
(1 + αxi )/2 i=1
1 + αxi
n
X x2i
f 00 (α) =
i=1
(1 + αxi )2
We see that f 00 (α) ≥ 0, so that f is convex. As explained, this will make the
problem easier to solve using numerical methods. If we try to solve f 0 (α) = 0
we will run into problems, however. We see, however, that f 0 (α) → 0 when
xi
α → ±∞, and since 1+αx i
= 1/x1i +α , we must have that f 0 (α) → ∞ when
α → −1/xi from below, and f 0 (α) → −∞ when α → −1/xi from above. It is
therefore clear that f has exactly one minimum in every interval of the form
[−1/xi , −1/xi+1 ] when we list the xi in increasing order. It is not for sure that
there is a minimum within [−1, 1] at all. If all measurements have the same sign
we are guaranteed to find no such point. In this case the minimum must be one
of the end points in the interval. We will later look into numerical method for
finding this minimum.
xt+1 = ht (xt ) (t = 0, 1, . . .)
CHAPTER 11. THE BASICS AND APPLICATIONS 389
xt+1 = ht (xt , ut ) (t = 0, 1, . . . , T − 1)
where xt is the state of the system at time t and the new variable ut is the
control at time t. We assume xt ∈ Rn and ut ∈ Rm for each t (but these
things also work if these vectors lie in spaces of different dimensions). Thus,
when we choose the controls u0 , u1 , . . . , uT −1 and x0 is known, the sequence
{xt } of states is uniquely determined. Next, assume there are given functions
ft : Rn × Rm → R that we call cost functions. We think of ft (xt , ut ) as the
“cost” at time t when the system is in state xt and we choose control ut . The
optimal control problem is
PT −1
minimize fT (xT ) + t=0 ft (xt , ut )
subject to (11.3)
xt+1 = ht (xt , ut ) (t = 0, 1, . . . , T − 1)
where the control is the sequence (u0 , u1 , . . . , uT −1 ) to be determined. This
problem arises an many applications, in engineering, finance, economics etc. We
now rewrite this problem. First, let u = (u1 , u2 , . . . , uT ) ∈ RN where N = T n.
Since, as we noted, xt is uniquely determined by u, there is a function vt such
that xt = vt (u) (t = 1, 2, . . . , T ); x0 is given. Therefore the total cost may be
written
T
X −1 T
X −1
fT (xT ) + ft (xt , ut ) = fT (vT (u)) + ft (vt (u), ut ) := f (u)
t=0 t=0
which is a function of u. Thus, we see that the optimal control problem may be
transformed to the unconstrained optimization problem
min f (u)
u∈RN
Sometimes there may be constraints on the control variables, for instance that
they each lie in some interval, and then the transformation above results in a
constrained optimization problem.
CHAPTER 11. THE BASICS AND APPLICATIONS 390
1. A is positive semi-definite
2. all eigenvalues of A are nonnegative
3. A = W T W for some matrix W .
Similarly, a real symmetric matrix is positive definite if xT Ax > 0 for all nonzero
x ∈ Rn . The following statements are equivalent.
1. A is positive definite
∂ 2 f (x)
.
∂xi ∂xj
If these second order partial derivatives are continuous, then we may switch the
order in the derivations, and ∇2 f (x) is a symmetric matrix.
For vector-valued functions we also need the derivative. Consider the vector-
valued function F given by
F1 (x)
F2 (x)
F (x) =
..
.
Fn (x)
so Fi : R → R is the ith component function of F . F 0
n
denotes the Jacobi
matrix5 , or simply the derivative, of F
∂F1 (x) ∂F1 (x)
· · · ∂F∂x1 (x)
∂x1 ∂x2 n
∂F2 (x) ∂F2 (x) ∂F1 (x)
· · ·
∂x ∂x ∂x
F 0 (x) =
1 2 n
..
.
∂Fn (x) ∂Fn (x)
∂x1 ∂x2 · · · ∂F∂x
n (x)
n
The ith row of this matrix is therefore the gradient of Fi , now viewed as a row
vector.
Next we recall Taylor’s theorems from multivariate calculus6 :
Theorem 11.2. First order Taylor theorem.
3 This is somewhat different from [26], since the gradient there is always considered as a
row vector.
4 See Section 5.9 in [26].
5 See Section 2.6 in [26].
6 This theorem is also the mean value theorem of functions in several variables, see Section
5.5 in [26].
CHAPTER 11. THE BASICS AND APPLICATIONS 392
f (x + h) = f (x) + ∇f (x + th)T h.
The next one is known as Taylor’s formula, or the second order Taylor’s
theorem7 :
Theorem 11.3. Second order Taylor theorem.
Let f : Rn → R be a function having second order partial derivatives that
are continuous in some ball B(x; r). Then, for each h ∈ Rn with khk < r there
is some t ∈ (0, 1) such that
1
f (x + h) = f (x) + ∇f (x)T h + hT ∇2 f (x + th)h.
2
This may be shown by considering the one-variable function g(t) = f (x + th)
and applying the chain rule and Taylor’s formula in one variable.
There is another version of the second order Taylor theorem in which the
Hessian is evaluated in x and, as a result, we get an error term. This theorem
shows how f may be approximated by a quadratic polynomial in n variables8 :
As we shall see, one can get a lot of optimization out of these approximations!
We also need a Taylor theorem for vector-valued functions, which follows by
applying Taylor’ theorem above to each component function:
Theorem 11.5. First order Taylor theorem for vector-valued functions.
Let F : Rn → Rm be a vector-valued function which is continuously differen-
tiable in a neighborhood N of x. Then
kxk+1 − x∗ k ≤ γkxk − x∗ k2 (k = 0, 1, . . .)
for some γ < 1.
x1 + x2 = 3, x1 ≥ 0, x2 ≥ 0
396
CHAPTER 12. A CRASH COURSE IN CONVEXITY 397
2 2 2
1 1 1
0 0 0
−1 −1 −1
−2 −2 −2
−2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2
x2
Figure 12.1: Examples of some convex sets. A square, the ellipse 4 + y 2 ≤ 1,
and the area x4 + y 4 ≤ 1.
is a linear system in the variables x1 , x2 . The solution set is the set of points
(x1 , 3 − x1 ) where 0 ≤ x1 ≤ 3. The set of solutions of a linear system is called a
polyhedron. These sets often occur in optimization. Thus, a polyhedron has the
form
P = {x ∈ Rn : Ax ≤ b}
where A ∈ Rm,n and b ∈ Rm (m is arbitrary, but finite) and ≤ means compo-
nentwise inequality. There are simple techniques for rewriting any linear system
in the form Ax ≤ b.
Proposition 12.1. Polyhedra are convex.
Every polyhedron is a convex set.
(This inequality holds for all x, y and λ as specified). Due to the convexity of
C, the point (1 − λ)x + λy lies in C, so the inequality is well-defined. The
geometrical interpretation in one dimension is that, for any x, y, the graph of
f on [x, y] lies below the secant through (x, f (x)) and (y, f (y)). For z ∈ (x, y),
since f (z) lies below that secant, the secant through (x, f (x)) and (z, f (z)) has
a smaller slope than the secant through (x, f (x)) and (y, f (y)). Since the slope
of the secant through (x, f (x)) and (y, f (y)) is (f (y) − f (x))/(y − x), it follows
that the slope function
f (y) − f (x)
gx (y) =
y−x
is increasing for any x. This characterizes all convex functions in one dimension
in terms of slope functions.
A function g is called concave if −g is convex.
For every linear function we have that f ((1−λ)x+λy) = (1−λ)f (x)+λf (y),
so that every linear function is convex. Some other examples of convex functions
in n variables are
The following result is an exercise to prove, and it gives a method for proving
convexity of a function.
CHAPTER 12. A CRASH COURSE IN CONVEXITY 399
Xr r
X
f( λj xj ) ≤ λj f (xj ). (12.2)
j=1 j=1
Pr
A point of the form j=1 λj xj , where the λj ’s are nonnegative and sum to
1, is called a convex combination of the points x1 , x2 . . . , xr . One can show that
a set is convex if and only if it contains all convex combinations of its points.
Finally, one connection between convex sets and convex functions is the
following fact whose proof is an exercise.
Proposition 12.5. sub-level sets of convex functions are convex.
Let C ⊆ Rn be a convex set and consider a convex function f : C → R. Let
α ∈ R. Then the “sub-level” set
{x ∈ C : f (x) ≤ α}
is a convex set.
Pn
(r − i=1 |zi |) |z1 | |zn |
z= 0+ sign(z1 )re1 + · · · + sign(zn )ren .
r r r
CHAPTER 12. A CRASH COURSE IN CONVEXITY 400
This shows that any point in B1 (0, r) can be written as a convex combination
of the points 0, {±rei }ni=1 . Labeling these as y1 , y2 ,...,y2n+1 and using the
convexity of f we obtain
2n+1
! 2n+1
X X
f (z) = f λi yi ≤ λi f (yi ) ≤ max f (yi ),
i
i=1 i=1
which proves that f has a maximum on B1 (0, r), and this maximum is achieved
in one of the yi . Since
1 1 1 1
f (0) = f rei + (−rei ) ≤ f (rei ) + f (−rei ),
2 2 2 2
this maximum must be achieved in a point of the form ±rei . The result
follows.
Theorem 12.7. Convex functions are continuous on open sets.
Let f : C → R be a convex function defined on an open set C ⊆ Rn . Then f
is continuous on C.
Proof. Let x be in C, and let us show that f is continuous at x. Since C is open
we can find an r so that B(x, r) ⊂ C. We claim first that we can assume that f
is bounded from above on B(x, r). To prove this, note first that kxk1 > kxk,
so that B1 (x, r) ⊂ B(x, r). On the other hand B(x, r/n) ⊂ B1 (x, r). Using
Lemma 12.6 we see that f is also bounded from above on a set of the form
B(x, s) (choose s = r/n for instance).
Assume now that f (y) ≤ M on B(x, r), and let z ∈ B(x, r). Define the
z−x
function g(t) = f x + t kz−xk for t ∈ (−r, r). Note that g(kz − xk) = f (z).
H(t) = x + (z − x)/kz − xk takes its values in B(x, r), and since f is convex
and H is affine, g(t) = f (H(t)) is convex, and then g has an increasing slope
function s → (g(s) − g(t))/(s − t). In particular, with s = −r, kz − xk, r and
t = 0 we obtain
|M | + |f (x)|
|f (z) − f (x)| ≤ kz − xk,
r
and the continuity of f follows.
CHAPTER 12. A CRASH COURSE IN CONVEXITY 401
6
1
4
2 0
0
2 −1
2
0 0
−2 −2 −2
−2 −1 0 1 2
x2
Figure 12.2: The function f (x, y) = 4 + y 2 and some of its level curves.
X n
X
T T
f (x) = (1/2) x Ax − b x = (1/2) aij xi xj − bj xj .
i,j j=1
(If A = 0, then the function is linear, and it may be strange to call it quadratic.
But we still do this, for simplicity.) Then (Exercise 11.9) the Hessian matrix
of f is A, i.e., ∇2 f (x) = A for each x ∈ Rn . Therefore, by Theorem 12.8 is a
convex function.
We remark that sometimes it may be easy to check that a symmetric matrix
P A (real) symmetric n × n matrix A is called diagonally
A is positive semidefinite.
dominant if |aii | ≥ j6=i |aij | for i = 1, . . . , n. These matrices arise in many
CHAPTER 12. A CRASH COURSE IN CONVEXITY 402
applications, e.g. splines and differential equations. It can be shown that every
symmetric diagonally dominant matrix is positive semidefinite. For a simple
proof of this fact using convexity, see [10]. Thus, we get a simple criterion
for convexity of a function: check if the Hessian matrix ∇2 f (x) is diagonally
dominant for each x. Be careful here: this matrix may be positive semidefinite
without being diagonally dominant!
We now look at differentiability properties of convex functions.
Theorem 12.9. Convexity, partial derivatives and differentiability.
Let f be a real-valued convex function defined on an open convex set C ⊆ Rn .
Assume that all the partial derivatives ∂f (x)/∂x1 , . . . , ∂f (x)/∂xn exist at a
point x ∈ C. Then f is differentiable at x.
1. f is convex.
2. f (x) ≥ f (x0 ) + ∇f (x0 )T (x − x0 ) for all x, x0 ∈ C.
3. (∇f (x) − ∇f (x0 ))T (x − x0 ) ≥ 0 for all x, x0 ∈ C.
Taking the limit as t → 0 shows that (ii) holds. (iii) follows from (ii) by adding
the two equations
CHAPTER 12. A CRASH COURSE IN CONVEXITY 403
and reorganizing the terms (actually this holds for any n). (iii) says that the
derivative is increasing. Given x1 < x2 < x3 , the mean value theorem says that
there exist x1 ≤ c ≤ x2 , x2 ≤ d ≤ x3 , so that
c) Explain why it follows from b) that f (x)g(x) is convex, under the same
conditions on f and g.
max{cT x : Hx ≤ h, x ≥ 0}
for suitable matrix H and vector h.
Nonlinear equations
x21 − x1 x−3
2 + cos x1 = 1
5x41+ 2x31 − tan(x1 x82 ) = 3
Clearly, such equations can be very hard to solve. The general problem is to
solve the equation
F (x) = 0 (13.1)
for a given function F : Rn → Rn . If F (x) = 0 we call x a root of F
(or of the equation). The example above is equivalent to finding roots in
F (x) = (F1 (x), F2 (x)) where
407
CHAPTER 13. NONLINEAR EQUATIONS 408
Often the problem F (x) = 0 has the following form, or may be rewritten to
it:
K(x) = x. (13.2)
for some function K : Rn → Rn . This corresponds to the special choice
F (x) = K(x) − x. A point x ∈ Rn such that x = K(x) is called a fixed point of
the function K. In finding such a fixed point it is tempting to use the following
iterative method: choose a starting point x0 and repeat the following iteration
kxk − x∗ k ≤ ck kx0 − x∗ k.
Proof. First, note that if both x and y are fixed points of K, then
so
CHAPTER 13. NONLINEAR EQUATIONS 409
Pm−1 Pm−1
kxm − x0 k = k k=0 (xk+1 − xk )k ≤ k=0 kxk+1 − xk k
Pn−1 k
≤ ( k=0 c )kx1 − x0 k ≤ (1/(1 − c))kx1 − x0 k
From this we derive that {xk } is a Cauchy sequence; as we have
and 0 < c < 1. Any Cauchy sequence in Rn has a limit point, so xm → x∗ for
some x∗ ∈ Rn . We now prove that the limit point x∗ is a (actually, the) fixed
point:
x0 , x1 , x2 , . . .
in R which, hopefully, converges to a root x∗ of F , so F (x∗ ) = 0. The idea is
n
to linearize F at the current iterate xk and choose the next iterate xk+1 as a
zero of this linearized function. The first order Taylor approximation of F at xk
is
which leads to Newton’s method. One here assumes that the derivative F 0 is
known analytically. Note that we do not (and hardly ever do!) compute the
inverse of the matrix F 0 . In the main step, which is to compute p, one needs
to solve an n × n linear system of equations where the coefficient matrix is the
Jacobi matrix of F , evaluated at xk . In MAT1110 [26] we implemented the
following code for Newton’s method for nonlinear equations:
function x=newtonmult(x0,F,J)
% Performs Newtons method in many variables
% x: column vector which contains the start point
% F: computes the values of F
% J: computes the Jacobi matrix
epsilon=0.0000001; N=30; n=0;
x=x0;
while norm(F(x)) > epsilon && n<=N
x=x-J(x)\F(x);
fval = F(x);
%fprintf(’itnr=%2d x=[%13.10f,%13.10f] F(x)=[%13.10f,%13.10f]\n’,...
% n,x(1),x(2),fval(1),fval(2))
n = n + 1;
end
This code also terminates after a given number of iterations, and when a given
accuracy is obtained. Note that this function should work for any function F ,
since it is a parameter to the function.
The convergence of Newton’s method may be analyzed using fixed point
theory since one may view Newton’s method as a fixed point iteration. Observe
that the Newton iteration (13.5) may be written
xk+1 = G(xk )
where G is the function
where K and L are some constants. Here kF 0 (x0 )k2 denotes the spectral norm
of the square matrix F 0 (x0 ). For a square matrix A this is defined by
It is a fact that kAk2 is equal to the largest singular value of A, and that it
measures how much the operator F 0 (x0 ) may increase the size of vectors. The
following convergence result for Newton’s method is known as Kantorovich’
theorem.
Theorem 13.3. Kantorovich’ theorem.
Let F : U → Rn be a differentiable function satisfying (13.6). Assume that
B̄(x0 ; 1/(KL)) ⊆ U and that
F (x∗ ) = 0.
A proof of this theorem is quite long (but not very difficult to understand)
[26].
One disadvantage with Newton’s method is that one needs to know the
Jacobi matrix F 0 explicitly. For complicated functions, or functions being the
CHAPTER 13. NONLINEAR EQUATIONS 412
Bk p = −F (xk )
The method we define will make the following assumption:
Definition 13.4. Broyden’s method.
Assume that we have chosen the next iterate xk+1 . Broyden’s method updates
Bk to Bk+1 in such a way that
sk sT yk sT (yk − Bk sk )sTk
Bk+1 = Bk I − T k + T k = Bk + . (13.8)
sk sk sk sk sTk sk
Note that the matrix in Equation (13.8) is a rank one update of Bk , so that
it can be computed efficiently. In an algorithm for Broyden’s method Bk+1 is
computed from Equation (13.8), then xk+2 is computed by following the search
direction p obtained by solving Bk+1 p = −F (xk+1 ), and so on. Finally sk+1
and yk+1 are updated. An algorithm also computes an α through what we call a
line search, to attempt to find the optimal distance to follow the search direction.
We do not here specify how this line search can be performed. Also, we do
not specify how the initial values can be chosen. For B0 , any approximation of
the Jacobian of F at x0 can be used, using a numerical differentiation method
of your own choosing. One can show that Broyden’s method, under certain
assumptions, also converges superlinearly, see [32].
function x=broyden(x0,F)
for any matrix A. Show that kABkF ≤ kAkF kBk2 whenever the matrix product
AB is well-defined.
Unconstrained optimization
∇f (x∗ ) = 0. (14.1)
If, moreover, f has continuous second order partial derivatives, then ∇ f (x∗ ) is
2
positive semidefinite.
Proof. Assume that x∗ is a local minimum of f and that ∇f (x∗ ) 6= 0. Let
h = −α∇f (x∗ ) where α > 0. Then ∇f (x∗ )T h = −αk∇f (x∗ )k2 < 0 and
by continuity of the partial derivatives of f , ∇f (x)T h < 0 for all x in some
neighborhood of x∗ . From Theorem 11.2 (first order Taylor) we obtain
415
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 416
1
f (x∗ + h) = f (x∗ ) + ∇f (x∗ )T h + hT ∇2 f (x∗ + th)h
2
1
= f (x∗ ) + hT ∇2 f (x + th)h (14.3)
2
If ∇2 f (x∗ ) is not positive semidefinite, there is an h such that hT ∇2 f (x∗ )h < 0
and, by continuity of the second order partial derivatives, hT ∇2 f (x)h < 0 for
all x in some neighborhood of x∗ . But then (14.3) gives f (x∗ + h) − f (x∗ ) < 0;
a contradiction. This proves that ∇2 f (x) is positive semidefinite.
The two necessary optimality conditions in Theorem 14.1 are called the first-
order and the second-order conditions, respectively. The first-order condition
says that the gradient must be zero at x∗ , and such a point if often called a
stationary point. The second-order condition may be interpreted by f being
"convex locally" at x∗ , although this is not a precise term. A stationary point
which is neither a local minimum or a local maximum is called a saddle point.
So, every neighborhood of a saddle point contains points with larger and points
with smaller f -value.
Theorem 14.1 gives a connection to nonlinear equations. In order to find a
stationary point we may solve ∇f (x) = 0, which is a n × n (usually nonlinear)
system of equations. (The system is linear whenever f is a quadratic function.)
One may solve this equation, for instance, by Newton’s method and thereby
get a candidate for a local minimum. Sometimes this approach works well,
in particular if f has a unique local minimum and we have an initial point
"sufficiently close". However, there are other better methods which we discuss
later.
It is important to point out that any algorithm for finding a minimum of f
has to be able to find a stationary point. Therefore algorithms in this area are
typically iterative and move to gradually better points where the norm of the
gradient becomes smaller, and eventually almost equal to zero.
f (x) = (1/2) xT Ax − bT x
where A is the (symmetric) Hessian matrix is (constant equal to) A and this
matrix is positive semidefinite. Then ∇f (x) = Ax−b so the first-order necessary
optimality condition is
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 417
Ax = b
which is a linear system of equations. If f is strictly convex, which happens when
A is positive definite, then A is invertible and the unique solution is x∗ = A−1 b.
Thus, there is only one candidate for a local (and global) minimum, namely
x∗ = A−1 b. Actually, this is indeed a unique global minimum, but to verify
this we need a suitable argument. One way is to use convexity (with results
presented later) or an alternative is to use sufficient optimality conditions which
we discuss next. The linear system Ax = b, when A is positive definite, may be
solved by several methods. A popular, and very fast, method is the conjugate
gradient method. This method, and related methods, are discussed in detail in
the course INF-MAT4360 Numerical linear algebra [28].
In order to present a sufficient optimality condition we need a result from
linear algebra. Recall from linear algebra that a symmetric positive definite
matrix has only real eigenvalues and all these are positive.
Lemma 14.2. Smallest eigenvalue.
Let A be an n × n symmetric positive definite matrix, and let λn > 0 denote
its smallest eigenvalue. Then
hT Ah ≥ λn khk2 (h ∈ Rn ).
A = P DP T
where D is the diagonal matrix with the eigenvalues λ1 , . . . , λn on the diagonal.
Let h ∈ Rn and define y = P T h. Then kyk = khk and
n
X n
X
hT Ah = hT P DP T h = y T Dy = λi yi2 ≥ λn yi2 = λn kyk2 = λn khk2 .
j=1 i=1
Proof. From Theorem 11.4 (second order Taylor) and Lemma 14.2 we get
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 418
where λn > 0 is the smallest eigenvalue of ∇2 f (x∗ ). Dividing here by khk2 gives
1
(f (x∗ + h) − f (x∗ ))/|hk2 = λn + (h)
2
Since limh→0 (h) = 0, there is an r such that for khk < r, |(h)| < λn /4. This
implies that
∇f (x∗ ) = 0.
Proof. Let x1 be a local minimum. If x1 is not a global minimum, there is an
x2 6= x1 with f (x2 ) < f (x1 ). Then for 0 < λ < 1
14.2 Methods
Algorithms for unconstrained optimization are iterative methods that generate
a sequence of points with gradually smaller values on the function f which is
to be minimized. There are two main types of algorithms in unconstrained
optimization:
• Line search methods: Here one first chooses a search direction dk from
the current point xk , using information about the function f . Then one
chooses a step length αk so that the new point xk+1 = xk + αk dk has a
small, perhaps smallest possible, value on the half-line {xk + αdk : α ≥ 0}.
αk describes how far one should go along the search direction. The problem
of choosing αk is a one-dimensional optimization problem. Sometimes we
can find αk exactly, and in such cases we refer to the method as exact line
search. In cases where αk can not be found analytically, algorithms can be
used to approximate how we can get close to the minimum on the half-line.
Such a method is also refered to as inexact line search.
These types are typically both based on quadratic approximation of f , but they
differ in the order in which one chooses search direction and step size. In the
following we only discuss the first type, the line search methods.
A very natural choice for search direction at a point xk is the negative
gradient, dk = −∇f (xk ). Recall that the direction of maximum increase of a
(differentiable) function f at a point x is ∇f (x), and the direction of maximum
decrease is −∇f (x). To verify this, Taylor’s theorem gives
1
f (x + h) = f (x) + ∇f (x)T h + hT ∇2 f (x + th)h.
2
So, for small h, the first order term dominates and we would like to make this
term small. By the Cauchy-Schwarz inequality 1 .
xk+1 = xk + αk dk (14.4)
1 The Cauchy-Schwarz’ inequality says: |u · v| ≤ kuk kvk for u, v ∈ Rn .
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 420
xk+1 = xk − αk ∇f (xk ).
In each step it moves in the direction of the negative gradient. Sometimes
this gives slow convergence, so other methods have been developed where other
choices of direction dk are made.
reduction factor β satisfying 0 < β < 1, and 0 < σ < 1 (typically this is chosen
very small, e.g. σ = 10−3 ). We define the integer
What this condition assures is that kdk k is not too small or large compared
to k∇f (xk )k and that the angle between the vectors dk and ∇f (xk ) is not too
close to 90◦ . The proof of the following theorem may be found in [2].
Theorem 14.7. Backtracking line search and gradient related.
Let {xk }∞ ∞
k=0 be generated by the gradient method (14.4), where {dk }k=0 is
∞
gradient related to {xk }k=0 and the step size αk is chosen using backtracking
line search. Then every limit point of {xk }∞
k=0 is a stationary point.
We remark that in Theorem 14.7 the same conclusion holds if we use exact
minimization as step size rule, i.e., f (xk + αdk ) is minimized exactly with respect
to α.
A very important property of a numerical algorithm is its convergence
speed. Let us consider the steepest descent method first. It turns out that the
convergence speed for this algorithm is very well explained by its performance on
minimizing a quadratic function, so therefore the following result is important.
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 422
f (xk+1 ) ≤ mA f (xk )
where mA = ((λ1 − λn )/(λ1 + λn ))2 .
The proof may be found in [2]. Thus, if the largest eigenvalue is much
larger than the smallest one, mA will be nearly 1 and one typically have slow
convergence. In this case we have mA ≈ cond(A) where cond(A) = λ1 /λn is
the condition number of the matrix A. So the rule is: if the condition number
of A is small we get fast convergence, but if cond(A) is large, there will be
slow convergence. A similar behavior holds for most functions f because locally
near a minimum point the function is very close to its second order Taylor
approximation in x∗ which is a quadratic function with A = ∇2 f (x∗ ).
Thus, Theorem 14.8 says that the sequence obtained in the steepest descent
method converges linearly to a stationary point (at least for quadratic functions).
We now turn to Newton’s method.
Recall that the pure Newton step minimizes the second order Taylor ap-
proximation of f at the current iterate xk . Thus, if the function we minimize
is quadratic, we are done in one step. Similarly, if the function can be well
approximated by a quadratic function, then one would expect fast convergence.
We shall give a result on the convergence of Newton’s method (see [3] for
further details). When A is symmetric, we let λmin (A) denote that smallest
eigenvalue of A.
For the convergence result we need a lemma on strictly convex functions.
Assume that x0 is a starting point for Newton’s method and let S = {x ∈ Rn :
f (x) ≤ f (x0 )}. We shall assume that f is continuous and convex, and this
implies that S is a closed convex set. We also assume that f has a minimum
point x∗ which then must be a global minimum. Moreover the minimum point
will be unique due to a strict convexity assumption on f . Let f ∗ = f (x∗ ) be the
optimal value.
The following lemma says that for a convex function as just described, a
point is nearly a minimum point (in terms of the f -value) whenever the gradient
is small in that point.
Proof. From Theorem 11.3, the second order Taylor’ theorem, we have for each
x, y ∈ S
(f (x0 ) − f ∗ )/γ
which is a finite number. For some k we must thus have by Lemma 14.11 that
k∇f (xk )k < η, and we can then use (14.11) and Lemma 14.12 to obtain
2
2m2
L
k∇f (xk+1 )k ≤ k∇f (x k )k
L 2m2
L L 2 1 L 1
= 2
k∇f (xk )k2 ≤ 2
η = 2
ηη ≤ η ≤ η.
2m 2m 2m 2
Therefore, as soon as (14.11) occurs in the iterative process, in all the remaining
iterations (14.11) will occur. Actually, let us show that as soon as (14.11) “kicks
in”, quadratic convergence starts:
L 2
Define µl = 2m 2 k∇f (xl )k for each l ≥ k. Then 0 ≤ µk < 1/2 as η ≤ m /L.
2
1 4m4
∗ 1 2 L
f (xl ) − f ≤ k∇f (xl )k = k∇f (xl )k
2m 2m L2 2m2
2m3 2m3 l−k+1
= 2 µ2l ≤ 2 (1/2)2 ,
L L
for l ≥ k. This inequality shows that f (xl ) → f ∗ , and since the minimum
point is unique due to convexity, we must have xl → x∗ . It follows that the
convergence is quadratic.
From the proof it is also possible to say something about haw many iterations
that are needed to reach a certain accuracy. In fact, if > 0 a bound on the
number of iterations until f (xk ) ≤ f ∗ + is
2m3
(f (x0 ) − f ∗ )/γ + log2 log2 .
L2
Here γ is the parameter introduced in the proof above. The second term in
this expression (the logarithmic term) grows very slowly as is decreased, and
it may roughly be replaced by the constant 6. So, whenever the second stage
(14.11) occurs, the convergence is extremely fast, it takes about 6 more Newton
iterations. Note that quadratic convergence means, roughly, that the number of
correct digits in the answer doubles for every iteration.
1
kdk k2 = (∇f (xk ))T (∇2 f (xk ))−2 ∇f (xk ) ≤ (∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ),
m
since the largest eigenvalue of (∇f (xk ))T (∇2 f (xk ))−2 ∇f (xk ) is less than 1/m.
Since there also is an upper bound M on the highest eigenvalue of ∇2 f (x), the
second order Taylor approximation gives
1
f (xk + αk dk ) = f (xk ) + αk ∇f (xk )T dk + (αk )2 (dk )T ∇2 f (z)dk
2
M kdk k2 2
≤ f (xk ) + αk ∇f (xk )T dk + αk
2
M (∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ) 2
≤ f (xk ) − αk (∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ) + αk
2m
If we try the value αˆk = m/M we get
1
f (xk + αˆk dk ) ≤ f (xk ) − αˆk (∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ),
2
which can be written as
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 426
1
f (xk ) − f (xk + αˆk dk ) ≥ αˆk (∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk )
2
≥ σ αˆk (∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk )
= −σ αˆk (∇f (xk ))T dk ,
which shows that αˆk = m/M satisfies the stopping criterion of backtracking line
search. Since we may not have exactly m/M = β n s for some n,we may still
conclude that backtracking line search stops at αk ≥ βm/M , so that
*The proof for Lemma 14.12. We will first show that backtracking line
search chooses unit steps provided that η ≤ 3(1 − 2σ)m2 /L. By condition (ii),
Now we define the function g(t) = f (x + tdk ). The chain rule gives that
In particular, note that g 00 (0) = (∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ). The inequality
above can therefore be written as
mkdk k2 = m(∇f (xk ))T (∇2 f (xk ))−2 ∇f (xk ) ≤ (∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ).
1
f (xk + dk ) ≤ f (x) − (∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk )
2
L
+ ((∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ))3/2 .
6m3/2
Assume now that also k∇f (xk k ≤ 3(1 − 2σ)m2 /L. Since the biggest eigenvalue
of (∇2 f (xk ))−1 is less than 1/m, we have that
1
(∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ) ≤ (3(1 − 2σ)m2 /L)2 = (3(1 − 2σ)m3/2 /L)2 .
m
This implies that
1 L
f (xk + dk ) ≤ f (x) − ∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ) − ((∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ))1/2
2 6m3/2
≤ f (x) − σ∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ) = f (x) + σ∇f (xk ))T dk ,
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 428
Hint. Start by writing out one step with Newton’s method when the search
direction happens to be equal to an eigenvector of A, and establish a connection
with the steepest descent method.
x = (0.4992, −0.8661, 0.7916, 0.9107, 0.5357, 0.6574, 0.6353, 0.0342, 0.4988, −0.4607)
Use the start value α0 = 0 for Newtons method. What estimate for the minimum
of f (and thereby α) did you obtain?
b) The ten measurements from a) were generated from a probability distribution
where α = 0.5. The answer you obtained was quite far from this. Let us therefore
take a look at how many measurements we should use in order to get quite
precise estimates for α. You can use the function
function ret=randmuon(alpha,m,n)
10 times, and plot the ten estimates you obtain. Repeat for n = 1000, and
for n = 100000 (in all cases you are supposed to plot 10 maximum likelihood
estimates). How many measurements do we need in order to obtain maximum
likelihood estimates which are reliable?
Note that it is possible for the maximum likelihood estimates you obtain here to
be outside the domain of definition [−1, 1]. You need not take this into account.
Chapter 15
Constrained optimization -
theory
minimize f (x)
subject to
(15.1)
hi (x) = 0 (i ≤ m)
gj (x) ≤ 0 (j ≤ r)
where f , h1 , h2 , . . . , hm and g1 , g2 , . . . , gr are continuously differentiable functions
from Rn into R. A point x satisfying all the m + r constraints will be called
feasible. Thus, we look for a feasible point with smallest f -value.
Our goal is to establish optimality conditions for this problem, starting with
the special case with only equality constraints. Then we discuss algorithms for
solving this problem. Our presentation is strongly influenced by [3] and [2].
minimize f (x)
subject to (15.2)
hi (x) = 0 (i ≤ m)
432
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 433
If f and each hi are twice continuously differentiable, then the following also
holds
m
X
hT (∇2 f (x∗ ) + λ∗i ∇2 hi (x∗ ))h ≥ 0 for all h ∈ T (x∗ ) (15.4)
i=1
m
X
L(x, λ) = f (x) + λi hi (x) = f (x) + λT H(x) (x ∈ Rn , λ ∈ Rm ).
i=1
Then
X
∇x L(x, λ) = ∇f (x) + λi ∇hi
i
∇λ L(x, λ) = H(x).
Therefore, the first order conditions in Theorem 15.1 may be rewritten as follows
∇x L(x∗ , λ∗ ) = 0, ∇λ L(x∗ , λ∗ ) = 0.
Here the second equation simply means that H(x) = 0. These two equations
say that (x∗ , λ∗ ) is a stationary point for the Lagrangian, and it is a system of
n + m (possibly nonlinear) equations in n + m variables.
Let us interpret Theorem 15.1. First of all, T (x∗ ) can be interpreted as a
linear subspace consisting of the “first order feasible directions” at x∗ , i.e. search
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 434
directions we can choose which do not violate the constraints (so that hi (x∗ +h) =
0 whenever hi (x∗ ) = 0, i ≤ m). To see this, note that ∇hi (x∗ ) · h is what is
called the directional derivative of hi in the direction h. This quantity measures
the change of hi in direction h, and if this is zero, hi remains zero when we move
in direction h, so that the constraints are kept. Actually, if each hi is linear, then
T (x∗ ) consists of those h such that x∗ + h is also feasible, i.e., hi (x∗ + h) = 0
for each i ≤ m. Thus, Equation (15.3) says that in a local minimum x∗ the
gradient ∇f (x∗ ) is orthogonal to the subspace T (x∗ ) of the first order feasible
variations. This is reasonable since otherwise there would be a feasible direction
in which f would decrease. In Figure 15.1 we have plotted a curve where two
constraints are fulfilled. In Figure 15.2 we have then shown an interpretation of
Theorem 15.1. Note that this necessary optimality condition corresponds to the
condition ∇f (x∗ ) = 0 in the unconstrained case. The second condition (15.4) is
a similar generalization of the second order condition in Theorem 14.1 (saying
that ∇2 f (x∗ ) is positive semidefinite).
h2 (x) = b2
h1 (x) = b1
Figure 15.1: The two surfaces h1 (x) = b1 og h2 (x) = b2 intersect each other in
a curve. Along this curve the constraints are fulfilled.
Figure 15.2: ∇f (x∗ ) as a linear combination of ∇h1 (x∗ ) and ∇h2 (x∗ ).
of the problem min{F k (x) : x ∈ B̄(x∗ ; )}; the existence here follows from the
extreme value theorem (F k is continuous and the ball is compact). For every k
For suitably large k the matrix H 0 (xk )H 0 (xk )T is invertible (as the rows of
H 0 (xk ) are linearly independent due to rank(H 0 (x∗ )) = m and a continuity
argument). Multiply equation (15.5) by (H 0 (xk )H 0 (xk )T )−1 H 0 (xk ) on the left
to obtain
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 436
kH(xk ) = −(H 0 (xk )H 0 (xk )T )−1 H 0 (xk )(∇f (xk ) + α(xk − x∗ )).
Letting k → ∞ we see that the sequence {kH(xk )} is convergent and its limit
point λ∗ is given by
0 = ∇f (x∗ ) + H 0 (x∗ )T λ∗
This proves the first part of the theorem; we omit proving the second part which
may be found in [2].
The first order necessary condition (15.3) along with the constraints H(x) =
0 is a system of n + m equations in the n + m variables x1 , x2 , . . . , xn and
λ1 , λ2 , . . . , λm . One may use e.g. Newton’s method for solving these equations
and find a candidate for an optimal solution. But usually there are better
numerical methods for solving the optimization (15.1), as we shall see soon.
Necessary optimality conditions are used for finding a candidate solution
for being optimal. In order to verify optimality we need sufficient optimality
conditions.
Theorem 15.2. Lagrange, necessary condition.
Assume that f and H are twice continuously differentiable functions. More-
over, let x∗ be a point satisfying the first order necessary optimality condition
(15.3) and the following condition
and this problem must have the same local minima as the problem of minimizing
f (x) subject to H(x) = 0. The objective function in (15.8) contains the penalty
term (c/2)kH(x)k2 which may be interpreted as a penalty (increased function
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 437
value) for violating the constraint H(x) = 0. In connection with the proof of
Theorem 15.2 based on the augmented Lagrangian one also obtains the following
interesting and useful fact:
if x∗ and λ∗ satisfy the sufficient conditions in Theorem 15.2 then there
exists a positive c̄ such that for all c ≥ c̄ the point x∗ is also a local minimum of
the augmented Lagrangian Lc (·, λ∗ ).
Thus, the original constrained problem has been converted to an uncon-
strained one involving the augmented Lagrangian. And, as we know, uncon-
strained problems are easier to solve (solve the equations saying that the gradient
is equal to zero).
minimize f (x)
subject to
(15.9)
hi (x) = 0 (i ≤ m)
gj (x) ≤ 0 (j ≤ r)
We assume, as usual, that all these functions are continuously differentiable
real-valued functions defined on Rn . In short form we write the constraints
as H(x) = 0 and G(x) ≤ 0 where we let H = (h1 , h2 , . . . , hm ) and G =
(g1 , g2 , . . . , gr ).
A main difficulty in problems with inequality constraints is to determine which
of the inequalities that are active in an optimal solution. If we knew the active
inequalities, we would essentially have a problem with only equality constraints,
H(x) = 0 plus the active equalities, i.e., a problem of the form discussed in
the previous section. For very small problems (solvable by hand-calculation) a
direct method is to consider all possible choices of active inequalities and solve
the corresponding equality-constrained problem by looking at the Lagrangian
function.
Interestingly, one may also transform the problem (15.9) into the following
equality-constrained problem
minimize f (x)
subject to
(15.10)
hi (x) = 0 (i ≤ m)
gj (x) + zj2 = 0 (j ≤ r).
We have introduced extra variables zj , one for each inequality. The square of
these variables represent slack in each of the original inequalities. Note that
there is no sign constraint on zj . Clearly, the problems (15.9) and (15.10) are
equivalent. This transformation can also be useful computationally. Moreover,
it is useful theoretically as one may apply the optimality conditions from the
previous section to problem (15.10) to derive the theorem below (see [2]).
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 438
m
X r
X
L(x, λ, µ) = f (x) + λi hi (x) + µj gj (x) = f (x) + λT H(x) + µT G(x).
i=1 j=1
(15.11)
The gradient of L with respect to x is given by
m
X r
X
∇x L(x, λ, µ) = ∇f (x) + λi ∇hi (x) + µj ∇gj (x).
i=1 j=1
∇x L(x∗ , λ∗ , µ∗ ) = 0
µ∗j ≥ 0 (j ≤ r) (15.12)
µ∗j = 0 (j 6∈ A(x∗ )).
If f , g and h are twice continuously differentiable, then the following also holds
minimize f (x)
subject to
(15.14)
hi (x) = 0 (i ≤ m)
gj (x) = 0 (j ∈ A(x∗ ))
which is obtained by removing all inactive constraints in x∗ . Then x∗ must
be a local minimum in (15.14); otherwise there would be a point x0 in the
neighborhood of x∗ which is feasible in (15.14) and satisfying f (x0 ) < f (x∗ ).
By choosing x0 sufficiently near x∗ we would get gj (x0 ) < 0 for all j 6∈ A(x∗ ),
contradicting that x∗ is a local minimum in (15.9). Therefore we may apply
Theorem 15.1 to problem (15.14) and by regularity of x∗ there must be unique
Lagrange multiplier vectors λ∗ = (λ∗1 , λ∗2 , . . . , λ∗m ) and µ∗j (j ∈ A(x∗ )) such that
m
X X
∇f (x∗ ) + λ∗i ∇hi (x∗ ) + µ∗j ∇gj (x∗ ) = 0
i=1 j∈A(x∗ )
g1 (x1 , x2 ) = −x2 ≤ 0
g2 (x1 , x2 ) = (x1 − 1)2 + x22 − 1 ≤ 0.
If we compute the gradients we see that the KKT conditions take the form
1 0 2(x1 − 1)
+ µ1 + µ2 = 0,
0 −1 2x2
where the two last terms on the left hand side only are included if the correspond-
ing inequalities are active. It is clear that we find no solutions if no inequalities
are active. If only the first inequality is active we find no solution either. If only
the second inequality is active we get the equations
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 440
From the last equation we see that either x2 = 0 or µ2 = 0. But here x2 > 0
since only the second inequality is active, so that µ2 = 0. µ2 = 0 is in conflict
with the second equation, however. Finally, let us consider the case where both
equalities are active. This occurs only in the points (0, 0) and (2, 0). These two
points give the gradients ∇g2 = (∓2, 0), so that the gradient equation can be
written as
1 0 ∓2
+ µ1 + µ2 = 0,
0 −1 0
These give µ1 = 0 and µ1 = ±1/2. Since we require µ2 ≥ 0, the only candidate
we obtain is (0, 0).
Finally we should comment on any points which are not regular. If only the
first inequality is active it is impossible to have that ∇g1 = 0. If only the second
inequality is active it is impossible to have ∇g2 = 0 since this would require
x1 = 1, x2 = 0 which contradicts an active point. If both inequalities are active,
we saw that (0, 0) and (2, 0) are the only possible points. This gave the gradients
(0, −1) and (∓2, 0), which clearly are linearly independent. We therefore have
that all points are regular.
We remark that the assumption that x∗ is a regular point may be too
restrictive in some situations, for instance there may be more than n active
inequalities in x∗ . There exist several other weaker assumptions that assure the
existence of Lagrangian multipliers (and similar necessary conditions).
In the proof of Theorem 15.3 we did not prove the nonnegativity of µ. To
show this is actually quite hard, but let us comment on the main lines. We first
need the concept of a tangent vector.
d · ∇hi (x) = 0 (i ≤ m)
d · ∇gj (x) ≤ 0 (j ∈ A(x))
(since H 0 (x) is the matrix with rows ∇hi (x), the first condition is the same as
H 0 (x)d = 0. Similarly, when all constraints are active the second condition is
the same as G0 (x)d ≤ 0). We denote by LFC (x) the set of all linearized feasible
directions at x.
So, if we move from x along a linearized feasible direction with a suitably
small step, then the new point is feasible if we only care about the linearized
constraints at x∗ (the first order Taylor approximations) of each hi and each
gj for active constraints at x∗ , i.e., those inequality constraints that hold with
equality. With this notation we have the following lemma. The proof may be
found in [32] and it involves the implicit function theorem from multivariate
calculus [26].
Lemma 15.6. Tangent cone and feasible directions.
Let x∗ ∈ C. Then TC (x∗ ) ⊆ LFC (x∗ ). If x∗ is a regular point, then
TC (x∗ ) = LFC (x∗ ).
Putting these things together, when x∗ is regular, ∇f (x∗ )T d ≥ 0 for all
d ∈ LFC (x∗ ). Now we need a lemma called Farkas’ lemma.
Lemma 15.7. Farkas lemma.
If B and C are matrices with n rows, and K is the cone defined by K =
{By + Cw, with y ≥ 0}, then exactly one of the following two alternatives are
true:
1. g ∈ K
2. There exists a d ∈ Rn so that g T d < 0, B T d ≥ 0, and C T d = 0.
1 1 1
z
z
0.5 0.5 0.5
0 0 0
1 1 1
1 1 1
0.5 0.5 0.5 0.5 0.5 0.5
y 0 0 x y 0 0 x y 0 0 x
Figure 15.3: The different possibilities (one, two, and three active constraints)
for ∇f in a minimum of f , under the constraints x ≥ 0.
minimize (1/2) xT Dx − q T x
subject to
Ax = b
where D is positive semidefinite and A ∈ Rm×n , b ∈ Rm . This is a special
case of (15.16) where f (x) = (1/2) xT Dx − q T x. Then ∇f (x) = Dx − q (see
Exercise 11.9 in Chapter 11). Thus, the KKT conditions are: there is some
λ ∈ Rm such that Dx − q + AT λ = 0. In addition, the vector x is feasible so
we have Ax = b. Thus, solving the quadratic optimization problem amounts to
solving the linear system of equations
Dx + AT λ = q, Ax = b
which may be written as
AT
D x q
= . (15.15)
A 0 λ b
Under the additional assumption that D is positive definite and A has full row
rank, one can show that the coefficient matrix in (15.15) is invertible so this
system has a unique solution x, λ. Thus, for this problem, we may write down an
explicit solution (in terms of the inverse of the block matrix). Numerically, one
finds x (and the Lagrangian multiplier λ) by solving the linear system (15.15)
by e.g. Gaussian elimination or some faster (direct or iterative) method.
minimize (1/2) xT Dx − q T x
subject to
Ax = b
x≥0
Here D, A and b are as above. Then ∇f (x) = Dx − q and ∇gk (x) = −ek .
Thus, the KKT conditions for this problem are: there are λ ∈ Rm and µ ∈ Rn
such that Dx − q + AT λ − µ = 0, µ ≥ 0 and µk = 0 if xk > 0 (k ≤ n). We
eliminate µ from the first equation and obtain the equivalent condition: there is
a λ ∈ Rm such that Dx + AT λ ≥ q and (Dx + AT λ − q)k · xk = 0 (k ≤ n). In
addition, we have Ax = b, x ≥ 0. This problem may be solved numerically, for
instance, by a so-called active set method, see [27].
Proof. 1.) The proof of property 1 is exactly as the proof of the first part of
Theorem 14.4, except that we work with local and global minimum of f over C.
2.) Assume the set C ∗ of minimum points is nonempty and let α =
minx∈C f (x). Then C ∗ = {x ∈ C : f (x) ≤ α} is a convex set, see Propo-
sition 12.5. Moreover, this set is closed as f is continuous.
3.) This follows directly from Theorem 12.10.
Next, we consider a quite general convex optimization problem which is of
the form (15.9):
minimize f (x)
subject to
(15.16)
Ax = b
gj (x) ≤ 0 (j ≤ r)
where all the functions f and gj are differentiable convex functions, and A ∈
Rm×n and b ∈ Rm . Let C denote the feasible set of problem (15.16). Then C is a
convex set, see Proposition 12.5. A special case of (15.16) is linear optimization.
An important concept in convex optimization is duality. To briefly explain
this introduce again the Lagrangian function L : Rn × Rm × Rr+ → R given by
Note that this infimum may sometimes be equal to −∞ (meaning that the
function x → L(x, λ, ν) is unbounded below). The function g is the pointwise
infimum of a family of affine functions in (λ, µ), one function for each x, and this
implies that g is a concave function. We are interested in g due to the following
fact, which is easy to prove. It is usually referred to as weak duality.
Lemma 15.9. Weak duality.
Let x be feasible in problem (15.16) and let λ ∈ Rm , ν ∈ Rr where ν ≥ 0.
Then
g(λ, ν) ≤ f (x).
Proof. For λ ∈ Rm , ν ∈ Rr with ν ≥ 0 and x feasible in problem (15.16) we
have
g(λ, ν) ≤ L(x, λ, ν)
= f (x) + λT (Ax − b) + ν T G(x)
≤ f (x)
as Ax = b, ν ≥ 0 and G(x) ≤ 0.
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 446
maximize g(λ, ν)
subject to (15.17)
ν ≥ 0.
Actually, in this dual problem, we may further restrict the attention to those
(λ, ν) for which g(λ, ν) is finite. g(λ, ν) is also called the dual objective function.
The original problem (15.16) will be called the primal problem. It follows
from Lemma 15.9 that
g∗ ≤ f ∗
where f ∗ denotes the optimal value in the primal problem and g ∗ the optimal
value in the dual problem. If g ∗ < f ∗ , we say that there is a duality gap. Note
that the derivation above, and weak duality, holds for arbitrary functions f and
gj (j ≤ r). The concavity of g also holds generally.
The dual problem is useful when the dual objective function g may be
computed efficiently, either analytically or numerically. Duality provides a
powerful method for proving that a solution is optimal or, possibly, near-optimal.
If we have a feasible x in (15.16) and we have found a dual solution (λ, ν) with
ν ≥ 0 such that
f (x) = g(λ, ν) +
for some (which then has to be nonnegative), then we can conclude that x is
“nearly optimal”, it is not possible to improve f by more than . Such a point x
is sometimes called -optimal, where the case = 0 means optimal.
So, how good is this duality approach? For convex problems it is often perfect
as the next theorem says. We omit most of the proof, see [19, 2, 49]). For
non-convex problems one should expect a duality gap. Recall that G0 (x) denotes
the Jacobi matrix of G = (g1 , g2 , . . . , gr ) at x.
Theorem 15.10. Convex optimization.
Consider convex optimization problem (15.16) and assume this problem has
a feasible point satisfying
∇f (x) + AT λ + G0 (x)T ν = 0
and
νj gj (x) = 0 (j ≤ r).
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 447
Proof. We only prove the second part (see the references above). So assume that
f ∗ = g ∗ and the infimum and supremum are attained in the primal and dual
problems, respectively. Let x be a feasible point in the primal problem. Then x
is a minimum in the primal problem if and only if there are λ ∈ Rm and ν ∈ Rr
such that all the inequalities in the proof of Lemma 15.9 hold with equality.
This means that g(λ, ν) = L(x, λ, ν) and ν T G(x) = 0. But L(x, λ, ν) is convex
in x so it is minimized by x if and only if its gradient is the zero vector, i.e.,
∇f (x) + λT A + G0 (x)T ν = 0. This leads to the desired characterization.
The assumption stated in the theorem, that gj (x0 ) < 0 for each j, is called
the weak Slater condition.
2 2 !
3ν 3ν 3ν
g(ν) = L ,ν = +1+ν −3 −1 .
1+ν 1+ν 1+ν
only if
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 448
8
Objective function
15 Inequality constraint
6
10
4
5
2
0
0
0 1 2 3 4 0 1 2 3 4
Figure 15.4: The objective function and the dual objective function of Exam-
ple 15.7.
2
3
min{ x1 − + x22 : x1 + x2 ≤ 1, x1 − x2 ≤ 1, −x1 + x2 ≤ 1, −x1 − x2 ≤ 1}.
2
a) Draw the region which we minimize over, and find the minimum of f (x) =
2
x1 − 32 + x22 by a direct geometric argument.
b) Write down the KKT conditions for this problem. From a., decide which
two conditions g1 and g2 are active at the minimum, and verify that you can
find µ1 ≥ 0, µ2 ≥ 0 so that ∇f + µ1 ∇g1 + µ2 ∇g2 = 0 (as the KKT conditions
guarantee in a minimum) (it is not the meaning here that you should go through
all possibilities for active inequalities, only those you see must be fulfilled from
a.).
Constrained optimization -
methods
In this final chapter we present numerical methods for solving nonlinear opti-
mization problems. This is a huge area, so we can here only give a small taste of
it! The algorithms we present are known good methods.
minimize f (x)
subject to (16.1)
Ax = b
Newton’s method may be applied to this problem. The method is very similar
to the unconstrained case, but with two modifications. First, the initial point x0
must be chosen so that it is feasible, i.e., Ax0 = b. Next, the search direction d
must be such that the new iterate is feasible as well. This means that Ad = 0,
so the search direction lies in the nullspace of A.
The second order Taylor approximation of f at an iterate xk is
Ah = 0 (16.2)
Since the gradient of Tf1 w.r.t. h is ∇f (xk ) + ∇2 f (xk )h, setting the gradient of
the Lagrangian w.r.t. h equal to zero gives
452
CHAPTER 16. CONSTRAINED OPTIMIZATION - METHODS 453
∇ f (xk ) AT
2
h −∇f (xk )
= .
A 0 λ 0
The Newton step is only defined when the coefficient matrix in the KKT problem
is invertible. In that case, the problem has a unique solution (h, λ) and we define
dN t = h and call this the Newton step. Newton’s method for solving Equation
(16.1) can now be extended from the previous code.
TODO: Explain why the stop criterion η 2 /2 < with η := dTN t ∇2 f (x)dN t is
used
This leads to an algorithm for Newtons’s method for linear equality con-
strained optimization which is very similar to the function newtonbacktrack
from Exercise 14.12. We do not state a formal convergence theorem for this
method, but it behaves very much like Newton’s method for unconstrained opti-
mization. Actually, it can be seen that the method just described corresponds to
eliminating variables based on the equations Ax = b and using the unconstrained
Newton method for the resulting (smaller) problem. So as soon as the solution
is “sufficiently near” an optimal solution, the convergence rate is quadratic, so
extremely few iterations are needed in this final stage.
minimize f (x)
subject to
(16.4)
Ax = b
gj (x) ≤ 0 (j ≤ r)
where A is an m × n matrix and b ∈ R . The feasible set here is F = {x ∈ Rn :
m
Ax = b, gj (x) ≤ 0 (j ≤ r)
ν ≥ 0, ∇f (x) + AT λ + G0 (x)T ν = 0 (16.5)
νj gj (x) = 0 (j ≤ r).
So, x is a minimum in (16.4) if and only if there are λ ∈ Rm and ν ∈ Rr such
that (16.5) holds.
Let us state an algorithm for Newton’s method for linear equality constrained
optimization with inequality constraints. Before we do this there is one final
problem we need to address: The α we get from backtracking line search may be
so that x + αdN t do not satisfty the inequality constraints (in the exercises you
will be asked to verify that this is the case for a certain function). The problem
comes from that the iterates xk + β m sdk from Armijo’s rule do not necessarily
satisfy the inequality constraints. However, we can choose m large enough so
that all succeeding iterates satisfy these constraints. We can modify the function
newtonbacktrack from Exercise 14.12 to a function newtonbacktrackg1g2 in
an obvious way so that, in addition to applying Armijos rule, we also choose a
step size so small that the inequality constraints are satisfied:
function [x,numit]=newtonbacktrackg1g2LEC(f,df,d2f,A,b,x0,g1,g2)
epsilon=10^(-3);
x=x0;
maxit=100;
for numit=1:maxit
matr=[d2f(x) A’; A zeros(size(A,1))];
vect=[-df(x); zeros(size(A,1),1)];
solvedvals=matr\vect;
d=solvedvals(1:size(A,2));
eta=d’*d2f(x)*d;
if eta^2/2<epsilon
break;
end
% Armijos rule with two inequalities
beta=0.2; s=0.5; sigma=10^(-3);
m=0;
while (f(x)-f(x+beta^m*s*d) < -sigma *beta^m*s *(df(x))’*d) || (g1(x+beta^m*s*d)>0) || (g2(x
m=m+1;
end
alpha = beta^m*s;
x=x+alpha*d;
end
Here g1 and g2 are function handles which represent the inequality constraints.
The new function works only in the case when there are exactly two inequality
constraints.
The interior-point barrier method is based on an approximation of problem
(16.4) by the barrier problem
r
X
φ(x) = − ln(−gj (x))
j=1
and µ > 0 is a parameter (in R). The function φ is called the (logarithmic)
barrier function and its domain is the relative interior of the feasible set
r
X 1
∇φ(x) = ∇gj (x) (16.7)
j=1
(−gj (x))
r r
X 1 X 1
2
∇ φ(x) = 2 ∇gj (x)∇gj (x)T + ∇2 gj (x) (16.8)
j=1
gj (x) j=1
(−gj (x))
r
!
T 2
X 1 T T 1 T 2
h ∇ φ(x)h = h ∇gj (x)∇gj (x) h + h ∇ gj (x)h
j=1
gj2 (x) (−gj (x))
r
!
X 1 T 2 1 T 2
= k∇gj (x) hk + h ∇ gj (x)h ≥ 0
j=1
gj2 (x) (−gj (x))
since (−gj1(x)) > 0 and hT ∇2 gj (x)h ≥ 0 (since all gj are convex, ∇2 gj (x) is
positive semidefinite).
3. If {xk } is a sequence in F ◦ such that gj (xk ) → 0 for some j ≤ r, then
φ(xk ) → ∞. This is the barrier property.
The idea here is that for points x near the boundary of F the value of φ(x)
is very large. So, an iterative method which moves around in the interior F ◦ of
F will typically avoid points near the boundary as the logarithmic penalty term
makes the function value f (x) + µφ(x) very large.
The interior point method consists in solving the barrier problem, using
Newton’s method, for a sequence {µk } of (positive) barrier parameters; these
are called the outer iterations. The solution xk found for µ = µk is used as the
starting point in Newton’s method in the next outer iteration where µ = µk+1 .
The sequence {µk } is chosen such that µk → 0. When µ is very small, the barrier
function approximates the "ideal" penalty function η(x) which is zero in F and
−∞ when one of the inequalities gj (x) ≤ 0 is violated.
A natural question is why one bothers to solve the barrier problems for more
than one single µ, typically a very small value. The reason is that it would be
CHAPTER 16. CONSTRAINED OPTIMIZATION - METHODS 456
hard to find a good starting point for Newton’s method in that case; the Hessian
matrix of µφ is typically ill-conditioned for small µ.
Assume now that the barrier problem has a unique optimal solution x(µ);
this is true under reasonable assumptions that we shall return to. The point x(µ)
is called a central point. Assume also that Newton’s method may be applied to
solve the barrier problem. The set of points x(µ) for µ > 0 is called the central
path; it is a path (or curve) as we know it from multivariate calculus. In order
to investigate the central path we prefer to work with the equivalent problem 1 .
to (16.6) obtained by multiplying the objection function by 1/µ, so
Ax(µ) = b
gj (x(µ)) < 0 (j ≤ r)
and the existence of λ ∈ Rm (the Lagrange multiplier vector) such that
i.e.,
r
X 1
(1/µ)∇f (x(µ)) + ∇gj (x(µ)) + AT λ = 0. (16.10)
j=1
(−gj (x(µ)))
A fundamental question is: how far from being optimal is the central point
x(µ)? We now show that duality provides a very elegant way of answering this
question.
f ∗ ≤ f (x(µ)) ≤ f ∗ + rµ.
Proof. Define ν(µ) = (ν1 (µ), . . . , νr (µ)) ∈ Rr and λ(µ) ∈ Rm as Lagrange
parameters for the original problem by
is feasible in the dual problem to (16.4). We thus need to show that ν(µ)
is nonnegative. This is immediate: since gj (x(µ)) < 0 and µ > 0, we get
νj (µ) = −µ/gj (x(µ)) > 0 for each j. We now also want to show that x(µ)
satisfies
where g is the dual objective function. To see this, note first that the Lagrangian
function L(x, λ, ν) = f (x) + λT (Ax − b) + ν T G(x) is convex in x for given λ
and µ ≥ 0. Thus, x minimizes this function if and only if ∇x L = 0. Now,
by (16.10) and the definition of the dual variables (16.11). This shows that
g(λ(µ), ν(µ)) = L(x(µ), λ(µ), ν(µ)).
By weak duality and Lemma 15.9, we now obtain
f ∗ ≥ g(λ(µ), ν(µ))
= L(x(µ), λ(µ), ν(µ))
r
X
= f (x(µ)) + λ(µ)T (Ax(µ) − b) + νj (µ)gj (x(µ))
j=1
= f (x(µ)) − rµ
which proves the result.
This theorem is very useful and shows why letting µ → 0 (more accurately
µ → 0+ ) is a good idea.
Corollary 16.2. Convergence of the central path.
The central path has the following property
lim f (x(µ)) = f ∗ .
µ→0
Proof. This follows from Theorem 16.1 by letting µ → 0. The second part follows
from
by the first part and the continuity of f ; moreover x∗ must be a feasible point
by elementary topology.
function xopt=IPBopt(f,g1,g2,df,dg1,dg2,d2f,d2g1,d2g2,A,b,x0)
xopt=x0;
mu=1;
alpha=0.1;
r=2;
epsilon=10^(-3);
numitouter=0;
while (r*mu>epsilon)
[xopt,numit]=newtonbacktrackg1g2LEC(...
@(x)(f(x)-mu*log(-g1(x))-mu*log(-g2(x))),...
@(x)(df(x) - mu*dg1(x)/g1(x) - mu*dg2(x)/g2(x)),...
@(x)(d2f(x) + mu*dg1(x)*dg1(x)’/(g1(x)^2) ...
+ mu*dg2(x)*dg2(x)’/(g2(x)^2) - mu*d2g1(x)/g1(x)...
- mu*d2g2(x)/g2(x) ),A,b,xopt,g1,g2);
mu=alpha*mu;
numitouter=numitouter+1;
fprintf(’Iteration %i:’,numitouter);
fprintf(’(%f,%f)\n’,xopt,f(xopt));
end
Note that we here have inserted the expressions from Equation (16.7) and Equa-
tion (16.8) for the gradient and the Hesse matrix of the barrier function. The input
are f , g1 , g2 , their gradients and their Hesse matrices, the matrix A, the vector
b, and an initial feasible point x0 . The function calls newtonbacktrackg1g2LEC,
and returns the optimal solution x∗ . It also gives some information on the values
of f during the iterations. The iterations used in Newton’s method is called the
inner iterations. There are different implementation details here that we do not
discuss very much. A typical value on α is 0.1. The choice of the initial µ0 can
be difficult, if it is chosen too large, one may experience many outer iterations.
Another issue is how accurately one solves (16.6). It may be sufficient to find
a near-optimal solution here as this saves inner iterations. For this reason the
method is also called a path-following method; it follows in the neighborhood of
the central path.
Finally, it should be mentioned that there exists a variant of the interior-point
barrier method which permits an infeasible starting point. For more details on
this and various implementation issues one may consult [3] or [32].
CHAPTER 16. CONSTRAINED OPTIMIZATION - METHODS 459
20 20
15 15
10 10
5 5
2 2.5 3 3.5 4 2 2.5 3 3.5 4
20 20
15 15
10 10
5 5
2 2.5 3 3.5 4 2 2.5 3 3.5 4
Figure 16.1: The function from Example 16.1 and its barrier functions with
µ = 0.2, µ = 0.5, and µ = 1.
for the function f considered here, some of the iterates from Armijo’s rule do
not satisfy the constraints.
It is straightforward to implement a function newtonbacktrackg1g2 which
implements Newtons method for two inequality constraints and no equality con-
straints , similarly to how we implemented the function newtonbacktrackg1g2LEC.
This leads to the following algorithm for the internal point barrier method for
the case of no equality constraints, but 2 inequality constraints:
function xopt=IPBopt2(f,g1,g2,df,dg1,dg2,d2f,d2g1,d2g2,x0)
xopt=x0;
mu=1; alpha=0.1; r=2; epsilon=10^(-3);
numitouter=0;
while (r*mu>epsilon)
[xopt,numit]=newtonbacktrackg1g2(...
@(x)(f(x)-mu*log(-g1(x))-mu*log(-g2(x))),...
@(x)(df(x) - mu*dg1(x)/g1(x) - mu*dg2(x)/g2(x)),...
@(x)(d2f(x) + mu*dg1(x)*dg1(x)’/(g1(x)^2) ...
+ mu*dg2(x)*dg2(x)’/(g2(x)^2) ...
- mu*d2g1(x)/g1(x) - mu*d2g2(x)/g2(x) ),xopt,g1,g2);
mu=alpha*mu;
numitouter=numitouter+1;
fprintf(’Iteration %i:’,numitouter);
fprintf(’(%f,%f)\n’,xopt,f(xopt));
end
Note that this function also prints a summary for each of the outer iterations,
so that we can see the progress in the barrier method. We can now find the
minimum of f with the following code, where we have substituted functions for
f , gi , their gradients and Hessians.
IPBopt2(@(x)(x.^2+1),@(x)(2-x),@(x)(x-4),...
@(x)(2*x),@(x)(-1),@(x)(1),...
@(x)(2),@(x)(0),@(x)(0),3)
for a ν1 ≥ 0, where the last term is included only if x1 + x2 = 2 (i.e. when the
constraint is active). If the constraint is not active we see that x1 = x2 = 0,
which does not satisfy the inequality constraint. If the constraint is active we
CHAPTER 16. CONSTRAINED OPTIMIZATION - METHODS 461
This book assumes that the student has taken a beginning course in linear algebra
at university level. In this appendix we summarize the most important concepts
one needs to know from linear algebra. Note that what is listed here should not
be considered as a substitute for such a course: It is important for the student
to go through a full course in linear algebra, in order to get good intuition for
these concepts through extensive exercises. Such exercises are omitted here.
A.1 Matrices
An m × n-matrix is simply a set of mn numbers, stored in m rows and n columns.
We write akn for the entry in row k and column n of the matrix A. The zero
matrix, denoted 0 is the matrix with all zeroes. A square matrix (i.e. where
m = n) is said to be diagonal if akn = 0 whenever k 6= n. The identity matrix,
denoted I, or In to make the dimension of the matrix clear, is the diagonal
matrix where the entries on the diagonal are 1, the rest zeroes. If A is a matrix
we will denote the transpose of A by AT . If A is invertible we denote its inverse
by A−1 . We say that a matrix A is orthogonal if AT A = AAT = I. A matrix is
called sparse if most of the entries in the matrix are zero.
463
APPENDIX A. BASIC LINEAR ALGEBRA 464
• hu, vi = hv, ui
• hu + v, wi = hu, wi + hv, wi
• hcu, vi = chu, vi for any scalar c
• hu, ui ≥ 0, and hu, ui = 0 if and only if u = 0.
for any u, v in Rn . For functions we have seen examples which are variants of
the following form:
Z
hf, gi = f (t)g(t)dt. (A.2)
Any set of mutually orthogonal elements are also linearly independent. A basis
where all basis vectors are mutually orthogonal is called an orthogonal basis. If
additionally the vectors all have length 1, we say that the basis is orthonormal.
If x is in a vector space with an orthogonal basis B = {vk }n−1
k=0 , we can express
x as
If B and C are two different bases for the same vector space, we can write
down the two coordinate vectors [x]B and [x]C . A useful operation is to transform
the coordinates in B to those in C, i.e. apply the transformation which sends [x]B
to [x]C . This is a linear transformation, and we will denote the n × n-matrix
of this linear transformation by PC←B , and call this the change of coordinate
matrix from B to C In other words, the change of coordinate matrix is defined
by requiring that
Fo non-symmetric matrices, these results do not hold in general. But for filters,
clearly the second and third property always hold, regardless of whether the
filter is symmetric or not.
A.6 Diagonalization
One can show that, for a symmetric matrix, A = P DP T where D is a digonal
matrix and the eigenvalues of A are the values on the diagonal of D, and P
is a matrix where the columns are the eigenvectors of A, with corresponding
eigenvalue appearing in the same column in D.
Appendix B
B.2 Functions
What in signal processing are refered to as continuous-time signals, are here
refered to as functions. Usually we refer to a function by the letter f , according
466
APPENDIX B. SIGNAL PROCESSING AND LINEAR ALGEBRA: A TRANSLATION GUIDE467
B.3 Vectors
Discrete-time signals, as they are used in signal processing, are here mostly
refered to as vectors. To as big extent as possible, we have attempted to keep
vectors finite-dimensional. Vectors are in boldface (i.e. x), but its elements are
not in boldface, and with subscripts (i.e. xn ). Superscripts are also used to
differ between vectors with the same base name (i.e. x(1) , x(2) etc.), so that
this does not interfer with the vector indices. In signal processing literature the
corresponding notation would be x for the signal, and x[n] for its elements, and
signals with equal base names could be named like x1 [n], x2 [n].
We have sometimes denoted the Fourier transform of x by x b, according to
the mathematical tradition. More often we have distuinguished between a vector
and its Discrete Fourier transform by using x for the first, and y for the latter.
This also makes us distuinguish between the input and output to a filter, where
we instead use z for the latter. Much signal processing literature write (capital)
X for the DFT of the vector x.
such as for the DFT and DCT, or matrix factorizations. Instead one typically
writes down each equation, one equation for each row in y = Ax, i.e. not
recognizing matrix/vector multiplication. We have sticked to the name filtering
operations, but made it clear that this is nothing but a linear transformation
with a Toeplitz matrix as its matrix. In particular, we alternately use the terms
filtering and multiplication with a Toeplitz matrix. The characterization of filters
as circulant Toeplitz matrices is usually not done in signal processing literature
(but see [16]). In this text we allow for matrices also to be of infinite dimensions,
expanding on the common use in linear algebra. When infinite dimensions are
assumed, infinite in both directions is assumed. Matrices are scaled if necessary
to make them unitary, in particular the DCT and the DFT. This scaling is
usually not done in signal processing literature.
Representing a filter in terms of a finite matrix and restriction of a filter to a
finite signal. This is usually omitted in signal processing literature.
One of the most important statements in signal processing is that convolution
in time is equivalent to multiplication in frequency. We have presented a
compelling interpretation of this in linear algebra terms. Since the frequency
response simply are eigenvalues of the filter, and convolution simply is matrix
factorization, multiplication in frequency simply means to multiply two diagonal
matrices to obtain the frequency response of the product. Moreover, the Fourier
basis vectors can be interpreted as eigenvectors.
B.6 Convolution
While we have defined the concept of convolution, readers familiar with signal
processing may have noticed that this concept has not been used much. The
reason is that we have wanted to present convolution as a matrix multiplication
(to adapt to mathematical tradition), and that we have used the concept of
filtering often instead. In signal processing literature one defines convolution in
terms of vectors of infinite length. We have avoided this, since in practice vectors
always need to be truncated to finite lengths. Due to this, we also have analyzed
how a finite vector may be turned into a periodic vector (periodic or symmetric
extension), and how this affects our analysis. Also we have concentrated on
FIR-filters, and this makes us avoid convergence issues. Note that we do not
present matrix multiplication as a method of implementing filtering, due to
the special structure of this operation. We do not suggest other methods for
implementation than applying the convolution formula in a brute-force way, or
factoring the filter in simpler components.
symbol definition
fs Sampling frequency
Ts Sampling period
T Period of a function
ν Frequency
fN N th order Fourier series of f
VN,T N th order Fourier space
DN,T Order N real Fourier basis for VN,T
FN,T Order N complex Fourier basis for VN,T
f˘ Symmetric extension of the function f
λs (ν) Frequency response of a filter
N Number of points in a DFT/DCT
FN = {φ0 , φ1 , · · · , φN −1 } Fourier basis for RN
FN N imesN -Fourier matrix
x̂ DFT of the vector x
A Conjugate of a matrix
AH Conjugate transpose of a matrix
x(e) Vector of even samples
x(o) Vector of odd samples
O(N ) Order of an algorithm
l(S) Length of a filter
x∗y Convolution of vectors
λS,n Vector frequency response of a digital filter
Ed Filter which delays with d samples
ω Angular frequency
λS (ω) Continuous frequency response of a digital filter
x̆ Symmetric extension of a vector
Sr Symmetric restriction of S
Sf Matrix with the columns reversed
DN = {d0 , d1 , · · · , dN −1 } N -point DCT basis for RN
DCTN N × N -DCT matrix
471
APPENDIX B. SIGNAL PROCESSING AND LINEAR ALGEBRA: A TRANSLATION GUIDE472
symbol definition
φ Scaling function
Vm Resolution space
rφm Basis for Vm
cm,n Coordinates in φm
Wm Detail space
rU ⊕ V Direct sum of vector spaces
ψm Basis for Wm
wm,n Coordinates in ψm
Cm Reordering of (φm−1 , ψm−1 )
φ̃ Dual scaling function
ψ̃ Dual mother wavelet
Ṽm Dual resolution space
W̃m Dual detail space
Dm Reordering of φm
EN = {e0 , e1 , · · · , eN −1 } Standard basis for RN
⊗ Tensor product
(0,1)
Wm Resolution m Complementary wavelet space, LH
(1,0)
Wm Resolution m Complementary wavelet space, HL
(1,1)
Wm Resolution m Complementary wavelet space, HH
AT Transpose of a matrix
A−1 Inverse of a matrix
hu, vi Inner product
[x]B Coordinate vector of x relative to the basis B
PC←B Change of coordinate matrix from B to C
Bibliography
473
BIBLIOGRAPHY 474
[48] M. Vetterli and H. J. Nussbaumer. Simple FFT and DCT algorithms with
reduced number of operations. Signal Processing, 6:267–278, 1984.
[49] R. Webster. Convexity. Oxford University Press, Oxford, 1994.
BIBLIOGRAPHY 476
AD conversion, 1 Biorthogonal
affine function, 398 bases, 256
algebra, 96 Biorthogonality, 256
Alias cancellation, 233 bit rate, 1
Alias cancellation condition, 236 Bit-reversal
Aliasing, 233 DWT, 291
analysis, 14 FFT, 73
equations, 14 block diagonal matrices, 175
Analysis filter components of a forward block matrix, 71
filter bank transform, 241 Blocks, 352
Angular frequency, 104
Arithmetic operation count Cascade algorithm, 208
DCT, 155 Causal filter, 269
DFT direct implementation, 55 central path, 456
FFT, 77 central point, 456
revised DCT, 158 chain rule, 393
revised FFT, 158 Change of coordinate matrix, 464
symmetric filters, 149 Change of coordinates, 464
with tensor products, 347 in tensor product, 345
audioread, 6 Channel, 241
audiowrite, 6 Compact support, 37
Complex Fourier coefficients, 23
Backtracking line search, 420 Computational molecule, 329
Bandpass filter, 112 Partial derivative in x-direction,
barrier method, 453 335
barrier problem, 454 Partial derivative in y-direction,
Basis 336
C, 175 Second order derivatives, 338
D, 294 smoothing, 334
φm , 166 concave, 398
ψm , 170 condition number, 422
DCT, 141 Conjugate transpose, 52
for VN,T , 14, 22 continuous
Fourier, 50 sound, 1
basis, 463 Continuous-time Fourier transform, 47
477
INDEX 478
Toeplitz matrix, 88
circulant, 88
Transpose DWT, 226
Transpose IDWT, 226
triangle wave, 8
Unitary matrix, 52
upsampling, 215
Vector space
of symmetric vectors, 133
Wavelets
Alternative piecewise linear, 198
CDF 9/7, 276
Orthonormal, 279
Piecewise linear, 191
Spline, 273
Spline 5/3, 275
weak duality, 445
weak Slater condition, 447
window, 108
Hamming, 108
Hanning, 111
in the MP3 standard, 108
rectangular, 108