0% found this document useful (0 votes)
218 views496 pages

Linear Algebra, Signal Processing, and Wavelets. A Unified Approach. Python Version

Uploaded by

ZIYANG WANG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
218 views496 pages

Linear Algebra, Signal Processing, and Wavelets. A Unified Approach. Python Version

Uploaded by

ZIYANG WANG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 496

Linear algebra, signal processing, and

wavelets. A unified approach.


Python version

Øyvind Ryan

Feb 12, 2017


Contents

1 Sound and Fourier series 1


1.1 Sound and digital sound: Loudness and frequency . . . . . . . . 3
1.1.1 The frequency of a sound . . . . . . . . . . . . . . . . . . 5
1.1.2 Working with digital sound on a computer . . . . . . . . . 6
1.2 Fourier series: Basic concepts . . . . . . . . . . . . . . . . . . . . 10
1.2.1 Fourier series for symmetric and antisymmetric functions 19
1.3 Complex Fourier series . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4 Some properties of Fourier series . . . . . . . . . . . . . . . . . . 28
1.4.1 Rate of convergence for Fourier series . . . . . . . . . . . 31
1.4.2 Differentiating Fourier series . . . . . . . . . . . . . . . . 32
1.5 Operations on sound: filters . . . . . . . . . . . . . . . . . . . . . 36
1.6 Convergence of Fourier series* . . . . . . . . . . . . . . . . . . . . 38
1.6.1 Interpretation in terms of filters . . . . . . . . . . . . . . . 42
1.7 The MP3 standard . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2 Digital sound and Discrete Fourier analysis 49


2.1 Discrete Fourier analysis and the discrete Fourier transform . . . 49
2.1.1 Properties of the DFT . . . . . . . . . . . . . . . . . . . . 55
2.2 Connection between the DFT and Fourier series. Sampling and
the sampling theorem . . . . . . . . . . . . . . . . . . . . . . . . 59
2.2.1 Change in frequency representation when windowing a signal 65
2.3 The Fast Fourier Transform (FFT) . . . . . . . . . . . . . . . . . 69
2.3.1 Reduction in the number of arithmetic operations . . . . 74
2.3.2 The FFT when N is not a power of 2 . . . . . . . . . . . 77
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

3 Operations on digital sound: digital filters 87


3.1 Matrix representations of filters . . . . . . . . . . . . . . . . . . . 87
3.1.1 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.2 Formal definition of filters and the vector frequency response . . 95
3.2.1 Time delay . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.2.2 Using digital filters to approximate filters . . . . . . . . . 100
3.3 The continuous frequency response and properties . . . . . . . . 103

ii
CONTENTS iii

3.3.1 Windowing operations . . . . . . . . . . . . . . . . . . . . 107


3.4 Some examples of filters . . . . . . . . . . . . . . . . . . . . . . . 111
3.5 More general filters . . . . . . . . . . . . . . . . . . . . . . . . . . 125
3.6 Implementation of filters . . . . . . . . . . . . . . . . . . . . . . . 128
3.6.1 Implementation of filters using the DFT . . . . . . . . . . 128
3.6.2 Factoring a filter . . . . . . . . . . . . . . . . . . . . . . . 128
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

4 Symmetric filters and the DCT 132


4.1 Symmetric vectors and the DCT . . . . . . . . . . . . . . . . . . 133
4.2 Improvements using the DCT for interpolation . . . . . . . . . . 148
4.2.1 Implementations of symmetric filters . . . . . . . . . . . . 149
4.3 Efficient implementations of the DCT . . . . . . . . . . . . . . . 151
4.3.1 Efficient implementations of the IDCT . . . . . . . . . . . 154
4.3.2 Reduction in the number of arithmetic operations . . . . 155
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

5 Motivation for wavelets and some simple examples 161


5.1 Why wavelets? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.2 A wavelet based on piecewise constant functions . . . . . . . . . 163
5.2.1 Function approximation property . . . . . . . . . . . . . . 167
5.2.2 Detail spaces and wavelets . . . . . . . . . . . . . . . . . . 168
5.3 Implementation of the DWT and examples . . . . . . . . . . . . 178
5.4 A wavelet based on piecewise linear functions . . . . . . . . . . . 187
5.4.1 Detail spaces and wavelets . . . . . . . . . . . . . . . . . . 190
5.5 Alternative wavelet based on piecewise linear functions . . . . . . 197
5.6 Multiresolution analysis: A generalization . . . . . . . . . . . . . 205
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

6 The filter representation of wavelets 212


6.1 The filters of a wavelet transformation . . . . . . . . . . . . . . . 213
6.1.1 The dual filter bank transform and the dual parameter . 220
6.1.2 The support of the scaling function and the mother wavelet222
6.1.3 Symmetric extensions and the bd_mode parameter. . . . . 224
6.2 Properties of the filter bank transforms of a wavelet . . . . . . . 233
6.3 A generalization of the filter representation, and its use in audio
coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
6.3.1 Forward filter bank transform in the MP3 standard . . . 243
6.3.2 Reverse filter bank transform in the MP3 standard . . . . 246
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

7 Constructing interesting wavelets 253


7.1 From filters to scaling functions and mother wavelets . . . . . . . 253
7.2 Vanishing moments . . . . . . . . . . . . . . . . . . . . . . . . . . 262
7.3 Characterization of wavelets w.r.t. number of vanishing moments 265
7.3.1 Symmetric filters . . . . . . . . . . . . . . . . . . . . . . . 266
CONTENTS iv

7.3.2 Orthonormal wavelets . . . . . . . . . . . . . . . . . . . . 269


7.3.3 The proof of Bezouts theorem . . . . . . . . . . . . . . . . 272
7.4 A design strategy suitable for lossless compression . . . . . . . . 273
7.4.1 The Spline 5/3 wavelet . . . . . . . . . . . . . . . . . . . 274
7.5 A design strategy suitable for lossy compression . . . . . . . . . . 276
7.6 Orthonormal wavelets . . . . . . . . . . . . . . . . . . . . . . . . 279
7.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282

8 The polyphase representation and wavelets 284


8.1 The polyphase representation and the lifting factorization . . . . 285
8.1.1 Reduction in the number of arithmetic operations . . . . 291
8.2 Examples of lifting factorizations . . . . . . . . . . . . . . . . . . 294
8.2.1 The piecewise linear wavelet . . . . . . . . . . . . . . . . . 295
8.2.2 The Spline 5/3 wavelet . . . . . . . . . . . . . . . . . . . 295
8.2.3 The CDF 9/7 wavelet . . . . . . . . . . . . . . . . . . . . 296
8.2.4 Orthonormal wavelets . . . . . . . . . . . . . . . . . . . . 297
8.3 Cosine-modulated filter banks and the MP3 standard . . . . . . . 302
8.3.1 Polyphase representations of the filter bank transforms . . 303
8.3.2 The prototype filters . . . . . . . . . . . . . . . . . . . . . 308
8.3.3 Perfect reconstruction . . . . . . . . . . . . . . . . . . . . 310
8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

9 Digital images 316


9.1 What is an image? . . . . . . . . . . . . . . . . . . . . . . . . . . 317
9.2 Some simple operations on images with Python . . . . . . . . . . 321
9.3 Filter-based operations on images . . . . . . . . . . . . . . . . . . 329
9.3.1 Tensor product notation for operations on images . . . . . 331
9.4 Change of coordinates in tensor products . . . . . . . . . . . . . 345
9.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350

10 Using tensor products to apply wavelets to images 354


10.1 Tensor product of function spaces . . . . . . . . . . . . . . . . . . 354
10.1.1 Tensor products of polynomials . . . . . . . . . . . . . . . 355
10.1.2 Tensor products of Fourier spaces . . . . . . . . . . . . . . 355
10.2 Tensor product of function spaces in a wavelet setting . . . . . . 357
10.2.1 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . 361
10.3 Experiments with images using wavelets . . . . . . . . . . . . . . 364
10.4 An application to the FBI standard for compression of fingerprint
images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
10.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380

11 The basics and applications 382


11.1 The basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 383
11.2 Some applications . . . . . . . . . . . . . . . . . . . . . . . . . . 384
11.3 Multivariate calculus and linear algebra . . . . . . . . . . . . . . 390
CONTENTS v

12 A crash course in convexity 396


12.1 Convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
12.2 Convex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
12.3 Properties of convex functions . . . . . . . . . . . . . . . . . . . . 399

13 Nonlinear equations 407


13.1 Equations and fixed points . . . . . . . . . . . . . . . . . . . . . . 407
13.2 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . 409

14 Unconstrained optimization 415


14.1 Optimality conditions . . . . . . . . . . . . . . . . . . . . . . . . 415
14.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419

15 Constrained optimization - theory 432


15.1 Equality constraints and the Lagrangian . . . . . . . . . . . . . . 432
15.2 Inequality constraints and KKT . . . . . . . . . . . . . . . . . . . 437
15.3 Convex optimization . . . . . . . . . . . . . . . . . . . . . . . . . 444
15.3.1 A useful theorem on convex optimization . . . . . . . . . 447

16 Constrained optimization - methods 452


16.1 Equality constraints . . . . . . . . . . . . . . . . . . . . . . . . . 452
16.2 Inequality constraints . . . . . . . . . . . . . . . . . . . . . . . . 453

A Basic Linear Algebra 463


A.1 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
A.2 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
A.3 Inner products and orthogonality . . . . . . . . . . . . . . . . . . 464
A.4 Coordinates and change of coordinates . . . . . . . . . . . . . . . 464
A.5 Eigenvectors and eigenvalues . . . . . . . . . . . . . . . . . . . . 465
A.6 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465

B Signal processing and linear algebra: a translation guide 466


B.1 Complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 466
B.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
B.3 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
B.4 Inner products and orthogonality . . . . . . . . . . . . . . . . . . 467
B.5 Matrices and filters . . . . . . . . . . . . . . . . . . . . . . . . . . 467
B.6 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
B.7 Polyphase factorizations and lifting . . . . . . . . . . . . . . . . . 468
B.8 Transforms in general . . . . . . . . . . . . . . . . . . . . . . . . 469
B.9 Perfect reconstruction systems . . . . . . . . . . . . . . . . . . . 469
B.10 Z-transform and frequency response . . . . . . . . . . . . . . . . 470

Nomenclature 471

Bibliography 473
CONTENTS vi

Index 477
List of Examples and
Exercises

Example 1.1: Listen to different channels . . . . . . . . . . . . . . . . . . 7


Example 1.2: Playing the sound backwards . . . . . . . . . . . . . . . . . 7
Example 1.3: Playing pure tones. . . . . . . . . . . . . . . . . . . . . . . . 7
Example 1.4: The square wave . . . . . . . . . . . . . . . . . . . . . . . . 8
Example 1.5: The triangle wave . . . . . . . . . . . . . . . . . . . . . . . . 9
Exercise 1.6: The Krakatoa explosion . . . . . . . . . . . . . . . . . . . . 9
Exercise 1.7: Sum of two pure tones . . . . . . . . . . . . . . . . . . . . . 9
Exercise 1.8: Sum of two pure tones . . . . . . . . . . . . . . . . . . . . . 9
Exercise 1.9: Playing with different sample rates . . . . . . . . . . . . . . 9
Exercise 1.10: Play sound with added noise . . . . . . . . . . . . . . . . . 10
Exercise 1.11: Playing the triangle wave . . . . . . . . . . . . . . . . . . . 10
Example 1.12: Fourier coefficients of the square wave . . . . . . . . . . . . 14
Example 1.13: Fourier coefficients of the triangle wave . . . . . . . . . . . 17
Example 1.14: Fourier coefficients of a simple function . . . . . . . . . . . 19
Exercise 1.15: Shifting the Fourier basis vectors . . . . . . . . . . . . . . . 20
Exercise 1.16: Playing the Fourier series of the triangle wave . . . . . . . 20
Exercise 1.17: Riemann-integrable functions which are not square-integrable 20
Exercise 1.18: When are Fourier spaces included in each other? . . . . . . 20
Exercise 1.19: antisymmetric functions are sine-series . . . . . . . . . . . 20
Exercise 1.20: More connections between symmetric-/antisymmetric func-
tions and sine-/cosine series . . . . . . . . . . . . . . . . . 21
Exercise 1.21: Fourier series for low-degree polynomials . . . . . . . . . . 21
Exercise 1.22: Fourier series for polynomials . . . . . . . . . . . . . . . . . 21
Exercise 1.23: Fourier series of a given polynomial . . . . . . . . . . . . . 21
Example 1.24: Complex Fourier coefficients of a simple function . . . . . 25
Example 1.25: Complex Fourier coefficients of composite function . . . . 25
Example 1.26: Complex Fourier coefficients of f (t) = cos3 (2πt/T ) . . . . 27
Exercise 1.27: Orthonormality of Complex Fourier basis . . . . . . . . . . 27
Exercise 1.28: Complex Fourier series of f (t) = sin2 (2πt/T ) . . . . . . . . 27
Exercise 1.29: Complex Fourier series of polynomials . . . . . . . . . . . . 27
Exercise 1.30: Complex Fourier series and Pascals triangle . . . . . . . . . 27
Exercise 1.31: Complex Fourier coefficients of the square wave . . . . . . 28

vii
List of Examples and Exercises viii

Exercise 1.32: Complex Fourier coefficients of the triangle wave . . . . . . 28


Exercise 1.33: Complex Fourier coefficients of low-degree polynomials . . 28
Exercise 1.34: Complex Fourier coefficients for symmetric and antisym-
metric functions . . . . . . . . . . . . . . . . . . . . . . . . 28
Example 1.35: Periodic extension . . . . . . . . . . . . . . . . . . . . . . . 35
Exercise 1.36: Fourier series of a delayed square wave . . . . . . . . . . . 35
Exercise 1.37: Find function from its Fourier series . . . . . . . . . . . . . 36
Exercise 1.38: Relation between complex Fourier coefficients of f and
cosine-coefficients of f˘ . . . . . . . . . . . . . . . . . . . . . 36
Exercise 1.39: Filters preserve sine- and cosine-series . . . . . . . . . . . . 38
Exercise 1.40: Approximation in norm with continuous functions . . . . . 44
Exercise 1.41: The Dirichlet kernel . . . . . . . . . . . . . . . . . . . . . . 44
Exercise 1.42: The Fejer summability kernel . . . . . . . . . . . . . . . . . 44
Example 2.1: DFT of a cosine . . . . . . . . . . . . . . . . . . . . . . . . . 53
Example 2.2: DFT on a square wave . . . . . . . . . . . . . . . . . . . . . 53
Example 2.3: Computing the DFT by hand . . . . . . . . . . . . . . . . . 54
Example 2.4: Direct implementation of the DFT . . . . . . . . . . . . . . 55
Example 2.5: Computing the DFT when multiplying with a complex
exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Exercise 2.6: Computing the DFT by hand . . . . . . . . . . . . . . . . . 57
Exercise 2.7: Exact form of low-order DFT matrix . . . . . . . . . . . . . 57
Exercise 2.8: DFT of a delayed vector . . . . . . . . . . . . . . . . . . . . 57
Exercise 2.9: Using symmetry property . . . . . . . . . . . . . . . . . . . 57
Exercise 2.10: DFT of cos2 (2πk/N ) . . . . . . . . . . . . . . . . . . . . . 57
Exercise 2.11: DFT of ck x . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Exercise 2.12: Rewrite a complex DFT as real DFT’s . . . . . . . . . . . 58
Exercise 2.13: DFT implementation . . . . . . . . . . . . . . . . . . . . . 58
Exercise 2.14: Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Exercise 2.15: DFT on complex and real data . . . . . . . . . . . . . . . . 59
Example 2.16: Using the DFT to adjust frequencies in sound . . . . . . . 65
Example 2.17: Compression by zeroing out small DFT coefficients . . . . 67
Example 2.18: Compression by quantizing DFT coefficients . . . . . . . . 68
Exercise 2.19: Comment code . . . . . . . . . . . . . . . . . . . . . . . . . 69
Exercise 2.20: Which frequency is changed? . . . . . . . . . . . . . . . . . 69
Exercise 2.21: Implement interpolant . . . . . . . . . . . . . . . . . . . . . 69
Exercise 2.22: Extra results for the FFT when N = N1 N2 . . . . . . . . . 79
Exercise 2.23: Extend implementation . . . . . . . . . . . . . . . . . . . . 79
Exercise 2.24: Compare execution time . . . . . . . . . . . . . . . . . . . 80
Exercise 2.25: Combine two FFT’s . . . . . . . . . . . . . . . . . . . . . . 80
Exercise 2.26: FFT operation count . . . . . . . . . . . . . . . . . . . . . 80
Exercise 2.27: FFT algorithm adapted to real data . . . . . . . . . . . . . 81
Exercise 2.28: Non-recursive FFT algorithm . . . . . . . . . . . . . . . . . 81
Exercise 2.29: The Split-radix FFT algorithm . . . . . . . . . . . . . . . . 82
Exercise 2.30: Bit-reversal . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Example 3.1: Finding the matrix elements from the filter coefficients . . . 90
Example 3.2: Finding the filter coefficients from the matrix . . . . . . . . 91
List of Examples and Exercises ix

Example 3.3: Writing down compact filter notation . . . . . . . . . . . . . 91


Exercise 3.4: Finding the filter coefficients and the matrix . . . . . . . . . 94
Exercise 3.5: Finding the filter coefficients from the matrix . . . . . . . . 94
Exercise 3.6: Convolution and polynomials . . . . . . . . . . . . . . . . . 94
Exercise 3.7: Implementation of convolution . . . . . . . . . . . . . . . . . 94
Exercise 3.8: Filters with a different number of coefficients with positive
and negative indices . . . . . . . . . . . . . . . . . . . . . . 95
Exercise 3.9: Implementing filtering with convolution . . . . . . . . . . . . 95
Example 3.10: Frequency response of a simple filter . . . . . . . . . . . . 98
Example 3.11: Matrix form . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Example 3.12: Computing the output of a filter . . . . . . . . . . . . . . . 99
Exercise 3.13: Time reversal is not a filter . . . . . . . . . . . . . . . . . . 102
Exercise 3.14: When is a filter symmetric? . . . . . . . . . . . . . . . . . . 102
Exercise 3.15: Eigenvectors and eigenvalues . . . . . . . . . . . . . . . . . 102
Exercise 3.16: Composing filters . . . . . . . . . . . . . . . . . . . . . . . 103
Exercise 3.17: Keeping every second component . . . . . . . . . . . . . . . 103
Example 3.18: Plotting a simple frequency response . . . . . . . . . . . . 106
Example 3.19: Computing a composite filter . . . . . . . . . . . . . . . . 107
Exercise 3.20: Plotting a simple frequency response . . . . . . . . . . . . . 109
Exercise 3.21: Low-pass and high-pass filters . . . . . . . . . . . . . . . . 109
Exercise 3.22: Circulant matrices . . . . . . . . . . . . . . . . . . . . . . . 110
Exercise 3.23: Composite filters . . . . . . . . . . . . . . . . . . . . . . . . 110
Exercise 3.24: Maximum and minimum . . . . . . . . . . . . . . . . . . . 110
Exercise 3.25: Plotting a simple frequency response . . . . . . . . . . . . . 110
Exercise 3.26: Continuous- and vector frequency responses . . . . . . . . . 110
Exercise 3.27: Starting with circulant matrices . . . . . . . . . . . . . . . 111
Exercise 3.28: When the filter coefficients are powers . . . . . . . . . . . . 111
Exercise 3.29: The Hanning window . . . . . . . . . . . . . . . . . . . . . 111
Example 3.30: Adding echo . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Example 3.31: Reducing the treble with moving average filters . . . . . . 114
Example 3.32: Ideal low-pass filters . . . . . . . . . . . . . . . . . . . . . 115
Example 3.33: Dropping filter coefficients . . . . . . . . . . . . . . . . . . 116
Example 3.34: Filters and the MP3 standard . . . . . . . . . . . . . . . . 117
Example 3.35: Reducing the treble using Pascals triangle . . . . . . . . . 119
Example 3.36: Reducing the bass using Pascals triangle . . . . . . . . . . 121
Exercise 3.37: Composing time delay filters . . . . . . . . . . . . . . . . . 122
Exercise 3.38: Adding echo filters . . . . . . . . . . . . . . . . . . . . . . . 122
Exercise 3.39: Reducing bass and treble . . . . . . . . . . . . . . . . . . . 123
Exercise 3.40: Constructing a high-pass filter . . . . . . . . . . . . . . . . 123
Exercise 3.41: Combining low-pass and high-pass filters . . . . . . . . . . 123
Exercise 3.42: Composing filters . . . . . . . . . . . . . . . . . . . . . . . 123
Exercise 3.43: Composing filters . . . . . . . . . . . . . . . . . . . . . . . 123
Exercise 3.44: Filters in the MP3 standard . . . . . . . . . . . . . . . . . 124
Exercise 3.45: Explain code . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Example 3.46: Moving average filter . . . . . . . . . . . . . . . . . . . . . 126
Exercise 3.47: A concrete IIR filter . . . . . . . . . . . . . . . . . . . . . . 127
List of Examples and Exercises x

Exercise 3.48: Implementing the factorization . . . . . . . . . . . . . . . . 129


Exercise 3.49: Factoring concrete filter . . . . . . . . . . . . . . . . . . . . 129
Example 4.1: Computing lower order DCTs . . . . . . . . . . . . . . . . . 144
Exercise 4.2: Computing eigenvalues . . . . . . . . . . . . . . . . . . . . . 145
Exercise 4.3: Writing down lower order Sr . . . . . . . . . . . . . . . . . . 146
Exercise 4.4: Writing down lower order DCTs . . . . . . . . . . . . . . . . 146
Exercise 4.5: DCT-IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
Exercise 4.6: MDCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
Exercise 4.7: Component expressions for a symmetric filter . . . . . . . . 151
Exercise 4.8: Trick for reducing the number of multiplications with the DCT156
Exercise 4.9: An efficient joint implementation of the DCT and the FFT . 157
Exercise 4.10: Implementation of the IFFT/IDCT . . . . . . . . . . . . . 158
Exercise 5.1: The vector of samples is the coordinate vector . . . . . . . . 176
Exercise 5.2: Proposition 5.12 . . . . . . . . . . . . . . . . . . . . . . . . . 176
Exercise 5.3: Computing projections 1 . . . . . . . . . . . . . . . . . . . . 176
Exercise 5.4: Computing projections 2 . . . . . . . . . . . . . . . . . . . . 176
Exercise 5.5: Computing projections 3 . . . . . . . . . . . . . . . . . . . . 177
Exercise 5.6: Finding the least squares error . . . . . . . . . . . . . . . . . 177
Exercise 5.7: Projecting on W0 . . . . . . . . . . . . . . . . . . . . . . . . 177
Exercise 5.8: When N is odd . . . . . . . . . . . . . . . . . . . . . . . . . 177
Example 5.9: Computing the DWT by hand . . . . . . . . . . . . . . . . 180
Example 5.10: DWT on sound . . . . . . . . . . . . . . . . . . . . . . . . 181
Example 5.11: DWT on the samples of a mathematical function . . . . . 183
Example 5.12: Computing the wavelet coefficients . . . . . . . . . . . . . 184
Exercise 5.13: Implement IDWT for The Haar wavelet . . . . . . . . . . . 185
Exercise 5.14: Computing projections 4 . . . . . . . . . . . . . . . . . . . 185
Exercise 5.15: Scaling a function . . . . . . . . . . . . . . . . . . . . . . . 185
Exercise 5.16: Direct sums . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Exercise 5.17: Eigenvectors of direct sums . . . . . . . . . . . . . . . . . . 185
Exercise 5.18: Invertibility of direct sums . . . . . . . . . . . . . . . . . . 186
Exercise 5.19: Multiplying direct sums . . . . . . . . . . . . . . . . . . . . 186
Exercise 5.20: Finding N . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Exercise 5.21: Different DWTs for similar vectors . . . . . . . . . . . . . . 186
Exercise 5.22: Construct a sound . . . . . . . . . . . . . . . . . . . . . . . 186
Exercise 5.23: Exact computation of wavelet coefficients 1 . . . . . . . . . 187
Exercise 5.24: Exact compution of wavelet coefficients 2 . . . . . . . . . . 187
Exercise 5.25: Computing the DWT of a simple vector . . . . . . . . . . . 187
Exercise 5.26: The Haar wavelet when N is odd . . . . . . . . . . . . . . 187
Exercise 5.27: in-place DWT . . . . . . . . . . . . . . . . . . . . . . . . . 187
Example 5.28: DWT on sound . . . . . . . . . . . . . . . . . . . . . . . . 194
Example 5.29: DWT on the samples of a mathematical function . . . . . 195
Exercise 5.30: The vector of samples is the coordinate vector 2 . . . . . . 195
Exercise 5.31: Computing projections . . . . . . . . . . . . . . . . . . . . 196
Exercise 5.32: Non-orthogonality for the piecewise linear wavelet . . . . . 196
Exercise 5.33: Implement elementary lifting steps of odd type . . . . . . . 197
Exercise 5.34: Wavelets based on polynomials . . . . . . . . . . . . . . . . 197
List of Examples and Exercises xi

Example 5.35: DWT on sound . . . . . . . . . . . . . . . . . . . . . . . . 200


Example 5.36: DWT on the samples of a mathematical function . . . . . 201
Exercise 5.37: Implement elementary lifting steps of even type . . . . . . 201
Exercise 5.38: Two vanishing moments . . . . . . . . . . . . . . . . . . . . 202
Exercise 5.39: Implement finding ψ with vanishing moments . . . . . . . . 203
Exercise 5.40: ψ for the Haar wavelet with two vanishing moments . . . . 204
Exercise 5.41: More vanishing moments for the Haar wavelet . . . . . . . 204
Exercise 5.42: Listening experiments . . . . . . . . . . . . . . . . . . . . . 205
Example 5.43: Implementing the cascade algorithm . . . . . . . . . . . . . 210
Example 6.1: The Haar wavelet . . . . . . . . . . . . . . . . . . . . . . . . 217
Example 6.2: Wavelet for piecewise linear functions . . . . . . . . . . . . 218
Example 6.3: The alternative piecewise linear wavelet . . . . . . . . . . . 219
Example 6.4: Plotting the frequency responses . . . . . . . . . . . . . . . 222
Exercise 6.5: Implement the dual filter bank transforms . . . . . . . . . . 226
Exercise 6.6: Transpose of the DWT and IDWT . . . . . . . . . . . . . . 226
Exercise 6.7: Reduced matrices for elementary lifting . . . . . . . . . . . . 227
Exercise 6.8: Prove expression for Sr . . . . . . . . . . . . . . . . . . . . . 227
Exercise 6.9: Orthonormal basis for the symmetric extensions . . . . . . . 228
Exercise 6.10: Diagonalizing Sr . . . . . . . . . . . . . . . . . . . . . . . . 228
Exercise 6.11: Compute filters and frequency responses 1 . . . . . . . . . 230
Exercise 6.12: Symmetry of MRA matrices vs. symmetry of filters 1 . . . 230
Exercise 6.13: Symmetry of MRA matrices vs. symmetry of filters 2 . . . 230
Exercise 6.14: Finding H0 , H1 from H . . . . . . . . . . . . . . . . . . . . 230
Exercise 6.15: Finding G0 ,G1 from G . . . . . . . . . . . . . . . . . . . . 231
Exercise 6.16: Finding H from H0 , H1 . . . . . . . . . . . . . . . . . . . . 231
Exercise 6.17: Finding G from G0 , G1 . . . . . . . . . . . . . . . . . . . . 231
Exercise 6.18: Computing by hand . . . . . . . . . . . . . . . . . . . . . . 231
Exercise 6.19: Comment code . . . . . . . . . . . . . . . . . . . . . . . . . 231
Exercise 6.20: Computing filters and frequency responses . . . . . . . . . 232
Exercise 6.21: Computing filters and frequency responses 2 . . . . . . . . 232
Exercise 6.22: Implementing with symmetric extension . . . . . . . . . . . 232
Exercise 6.23: Finding FIR filters . . . . . . . . . . . . . . . . . . . . . . . 240
Exercise 6.24: The Haar wavelet as an alternative QMF filter bank . . . . 241
Exercise 6.25: Plotting frequency responses . . . . . . . . . . . . . . . . . 250
Exercise 6.26: Implementing forward and reverse filter bank transforms . 250
Exercise 7.1: Implementation of the cascade algorithm . . . . . . . . . . . 262
Exercise 7.2: Using the cascade algorithm . . . . . . . . . . . . . . . . . . 262
Exercise 7.3: Compute filters . . . . . . . . . . . . . . . . . . . . . . . . . 273
Exercise 7.4: Viewing the frequency response . . . . . . . . . . . . . . . . 275
Exercise 7.5: Wavelets based on higher degree polynomials . . . . . . . . 276
Example 7.6: The CDF 9/7 wavelet . . . . . . . . . . . . . . . . . . . . . 276
Exercise 8.1: The frequency responses of the polyphase components . . . 292
Exercise 8.2: Finding new filters . . . . . . . . . . . . . . . . . . . . . . . 292
Exercise 8.3: Relating to the polyphase components . . . . . . . . . . . . 293
Exercise 8.4: QMF filter banks . . . . . . . . . . . . . . . . . . . . . . . . 293
Exercise 8.5: Alternative QMF filter banks . . . . . . . . . . . . . . . . . 294
List of Examples and Exercises xii

Exercise 8.6: Alternative QMF filter banks with additional sign . . . . . . 294
Example 8.7: Lifting factorization of the alternative piecewise linear wavelet295
Exercise 8.8: Polyphase components for symmetric filters . . . . . . . . . 300
Exercise 8.9: Implementing kernels transformations using lifting . . . . . 300
Exercise 8.10: Lifting orthonormal wavelets . . . . . . . . . . . . . . . . . 300
Exercise 8.11: 4 vanishing moments . . . . . . . . . . . . . . . . . . . . . 301
Exercise 8.12: Wavelet based on piecewise quadratic scaling function . . . 302
Exercise 8.13: Run forward and reverse transform . . . . . . . . . . . . . 313
Exercise 8.14: Verify statement of filters . . . . . . . . . . . . . . . . . . . 313
Exercise 8.15: Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
Example 9.1: Normalising the intensities . . . . . . . . . . . . . . . . . . . 322
Example 9.2: Extracting the different colors . . . . . . . . . . . . . . . . . 323
Example 9.3: Converting from color to grey-level . . . . . . . . . . . . . . 324
Example 9.4: Computing the negative image . . . . . . . . . . . . . . . . 326
Example 9.5: Increasing the contrast . . . . . . . . . . . . . . . . . . . . . 326
Exercise 9.6: Generate black and white images . . . . . . . . . . . . . . . 328
Exercise 9.7: Adjust contrast in images . . . . . . . . . . . . . . . . . . . 328
Exercise 9.8: Adjust contrast with another function . . . . . . . . . . . . 328
Example 9.9: Smoothing an image . . . . . . . . . . . . . . . . . . . . . . 333
Example 9.10: Edge detection . . . . . . . . . . . . . . . . . . . . . . . . . 335
Example 9.11: Second-order derivatives . . . . . . . . . . . . . . . . . . . 338
Example 9.12: Chess pattern image . . . . . . . . . . . . . . . . . . . . . 340
Exercise 9.13: Implement a tensor product . . . . . . . . . . . . . . . . . 341
Exercise 9.14: Generate images . . . . . . . . . . . . . . . . . . . . . . . . 341
Exercise 9.15: Interpret tensor products . . . . . . . . . . . . . . . . . . . 341
Exercise 9.16: Computational molecule of moving average filter . . . . . . 342
Exercise 9.17: Bilinearity of the tensor product . . . . . . . . . . . . . . . 342
Exercise 9.18: Attempt to write as tensor product . . . . . . . . . . . . . 342
Exercise 9.19: Computational molecules . . . . . . . . . . . . . . . . . . . 342
Exercise 9.20: Computational molecules 2 . . . . . . . . . . . . . . . . . . 342
Exercise 9.21: Comment on code . . . . . . . . . . . . . . . . . . . . . . . 343
Exercise 9.22: Eigenvectors of tensor products . . . . . . . . . . . . . . . 343
Exercise 9.23: The Kronecker product . . . . . . . . . . . . . . . . . . . . 343
Example 9.24: Change of coordinates with the DFT . . . . . . . . . . . . 347
Example 9.25: Change of coordinates with the DCT . . . . . . . . . . . . 348
Exercise 9.26: Implement DFT and DCT on blocks . . . . . . . . . . . . . 348
Exercise 9.27: Implement two-dimensional FFT and DCT . . . . . . . . . 349
Exercise 9.28: Zeroing out DCT coefficients . . . . . . . . . . . . . . . . . 350
Exercise 9.29: Comment code . . . . . . . . . . . . . . . . . . . . . . . . . 350
Example 10.1: Piecewise constant functions . . . . . . . . . . . . . . . . . 359
Example 10.2: Piecewise linear functions . . . . . . . . . . . . . . . . . . . 361
Example 10.3: Applying the Haar wavelet to a very simple example image 365
Example 10.4: Creating thumbnail images . . . . . . . . . . . . . . . . . . 365
Example 10.5: Detail and low-resolution approximations for different wavelets366
Example 10.6: The Spline 5/3 wavelet and removing bands in the detail
spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
List of Examples and Exercises xiii

Exercise 10.7: Implement two-dimensional DWT . . . . . . . . . . . . . . 369


Exercise 10.8: Comment code . . . . . . . . . . . . . . . . . . . . . . . . . 370
Exercise 10.9: Comment code . . . . . . . . . . . . . . . . . . . . . . . . . 372
Exercise 10.10: Experiments on a test image . . . . . . . . . . . . . . . . 373
Exercise 10.11: Implement the fingerprint compression scheme . . . . . . 380
Exercise 11.1: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
Exercise 11.2: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
Exercise 11.3: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
Exercise 11.4: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
Exercise 11.5: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
Exercise 11.6: Level sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
Exercise 11.7: Sub-level sets . . . . . . . . . . . . . . . . . . . . . . . . . . 394
Exercise 11.8: Portfolio optimization . . . . . . . . . . . . . . . . . . . . . 394
Exercise 11.9: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
Exercise 11.10: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
Exercise 11.11: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
Exercise 11.12: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
Exercise 12.1: The intersection of convex sets is convex. . . . . . . . . . . 403
Exercise 12.2: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
Exercise 12.3: The convexity of a product of functions. . . . . . . . . . . . 404
Exercise 12.4: The convexity of the composition of functions. . . . . . . . 404
Exercise 12.5: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
Exercise 12.6: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
Exercise 12.7: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
Exercise 12.8: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
Exercise 12.9: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
Exercise 12.10: The set of convex combinations is convex . . . . . . . . . 405
Exercise 12.11: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
Exercise 12.12: A convex function defined on a closed real interval attains
its maximum in one of the end points. . . . . . . . . . . . . 405
Exercise 12.13: The maximum of convex functions is convex. . . . . . . . 406
Exercise 12.14: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
Exercise 12.15: The distance to a convex set is a convex function. . . . . . 406
Exercise 13.1: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
Exercise 13.2: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Exercise 13.3: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Exercise 13.4: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Exercise 13.5: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Exercise 13.6: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Exercise 13.7: Broyden’s method . . . . . . . . . . . . . . . . . . . . . . . 413
Exercise 13.8: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
Exercise 13.9: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
Exercise 13.10: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
Exercise 14.1: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
Exercise 14.2: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
Exercise 14.3: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
List of Examples and Exercises xiv

Exercise 14.4: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428


Exercise 14.5: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
Exercise 14.6: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Exercise 14.7: When steepest descent finds the minimum in one step . . . 429
Exercise 14.8: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Exercise 14.9: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Exercise 14.10: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Exercise 14.11: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
Exercise 14.12: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
Exercise 14.13: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
Example 15.1: A simple optimization problem . . . . . . . . . . . . . . . 439
Example 15.2: a one-variable problem . . . . . . . . . . . . . . . . . . . . 442
Example 15.3: a multi-variable problem . . . . . . . . . . . . . . . . . . . 442
Example 15.4: Quadratic optimization problem with linear equality con-
straints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Example 15.5: Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Example 15.6: Linear optimization . . . . . . . . . . . . . . . . . . . . . . 444
Example 15.7: Comparing the primal and the dual problem . . . . . . . . 447
Exercise 15.8: Find min . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
Exercise 15.9: Find min . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
Exercise 15.10: Find min . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
Exercise 15.11: Find min . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
Exercise 15.12: Find min . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
Exercise 15.13: Find min . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
Exercise 15.14: Find min . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
Exercise 15.15: Find min . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
Exercise 15.16: Find min . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
Exercise 15.17: Find min . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
Exercise 15.18: Find min . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
Exercise 15.19: Find min . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
Exercise 15.20: Find min . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
Exercise 15.21: Find min . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
Exercise 15.22: Find min . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
Exercise 15.23: Find min . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
Exercise 15.24: Find min . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
Exercise 15.25: Find min . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
Example 16.1: Numeric test of the internal-point barrier method . . . . . 459
Example 16.2: Analytic test of the internal-point barrier method . . . . . 460
Exercise 16.3: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
Exercise 16.4: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
Exercise 16.5: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
Exercise 16.6: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
Exercise 16.7: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
Exercise 16.8: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
Exercise 16.9: Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
Chapter 1

Sound and Fourier series

A major part of the information we receive and perceive every day is in the
form of audio. Most sounds are transferred directly from the source to our ears,
like when we have a face to face conversation with someone or listen to the
sounds in a forest or a street. However, a considerable part of the sounds are
generated by loudspeakers in various kinds of audio machines like cell phones,
digital audio players, home cinemas, radios, television sets and so on. The sounds
produced by these machines are either generated from information stored inside,
or electromagnetic waves are picked up by an antenna, processed, and then
converted to sound.
What we perceive as sound corresponds to the physical phenomenon of slight
variations in air pressure near our ears. Air pressure is measured by the SI-unit
Pa (Pascal) which is equivalent to N/m2 (force / area). In other words, 1 Pa
corresponds to the force exerted on an area of 1 m2 by the air column above this
area. Larger variations mean louder sounds, while faster variations correspond
to sounds with a higher pitch.
Observation 1.1. Continuous Sound.
A sound can be represented as a function, corresponding to air pressure
measured over time. When a function represents a sound, it is often referred to
as a continuous sound.

Continuous sounds are defined for all time instances. On computers and
various kinds of media players, however, the sound is digital, i.e. it is represented
by a large number of function values, stored in a suitable number format. Such
digital sound is easier to manipulate and process on a computer.

Observation 1.2. Digital sound.


−1
A digital sound is a sequence x = {xi }N
i=0 that corresponds to measurements
of a continuous sound f , recorded at a fixed rate of fs (the sampling frequency
or sample rate) measurements per second, i.e.,

xk = f (k/fs ), for k = 0, 1, . . . , N − 1.

1
CHAPTER 1. SOUND AND FOURIER SERIES 2

Note that the indexing convention for digital sound is not standard in
mathematics, where vector indices start at 1. The components in digital sound
are often referred to as samples, the time between successive samples is called
the sampling period, denoted Ts . and measuring the sound is also referred to as
sampling the sound.
The quality of digital sound is often measured by the bit rate (number of
bits per second), i.e. the product of the sampling rate and the number of bits
(binary digits) used to store each sample. Both the sample rate and the number
format influence the quality of the resulting sound. These are encapsulated in
digital sound formats. A couple of them are described below.

Telephony. For telephony it is common to sample the sound 8000 times per
second and represent each sample value as a 13-bit integer. These integers are
then converted to a kind of 8-bit floating-point format with a 4-bit significand.
Telephony therefore generates a bit rate of 64 000 bits per second, i.e. 64 kb/s.

The CD-format. In the classical CD-format the audio signal is sampled 44


100 times per second and the samples stored as 16-bit integers. The value 44
100 for the sampling rate is not coincidental, and we will return to this shortly.
16-bit integers work well for music with a reasonably uniform dynamic range, but
is problematic when the range varies. Suppose for example that a piece of music
has a very loud passage. In this passage the samples will typically make use of
almost the full range of integer values, from −215 − 1 to 215 . When the music
enters a more quiet passage the sample values will necessarily become much
smaller and perhaps only vary in the range −1000 to 1000, say. Since 210 = 1024
this means that in the quiet passage the music would only be represented with
10-bit samples. This problem can be avoided by using a floating-point format
instead, but very few audio formats appear to do this.
The bit rate for CD-quality stereo sound is 44100 × 2 × 16 bits/s = 1411.2
kb/s. This quality measure is particularly popular for lossy audio formats where
the uncompressed audio usually is the same (CD-quality). However, it should
be remembered that even two audio files in the same file format and with the
same bit rate may be of very different quality because the encoding programs
may be of different quality.
Below we will read files in the wav-format. This format was developed by
Microsoft and IBM, and is one of the most common file formats for CD-quality
audio. It uses a 32-bit integer to specify the file size at the beginning of the file,
which means that a WAV-file cannot be larger than 4 GB.

Newer formats. Newer formats with higher quality are available. Music is
distributed in various formats on DVDs (DVD-video, DVD-audio, Super Audio
CD) with sampling rates up to 192 000 and up to 24 bits per sample. These
formats also support surround sound (up to seven channels in contrast to the
two stereo channels on a CD). In the following we will assume all sound to be
CHAPTER 1. SOUND AND FOURIER SERIES 3

digital. Later we will return to how we reconstruct audible sound from digital
sound.
In the following we will briefly discuss the basic properties of sound: loudness
(the size of the variations), and frequency (the number of variations per second).
We will then address to what extent sounds can be decomposed as a sum of
different frequencies, and look at important operations on sound, called filters,
which preserve frequencies. We will also see how we can experiment with digital
sound.
The functionality for accessing sound in this chapter is collected in a module
called sound.

1.1 Sound and digital sound: Loudness and fre-


quency
An example of a simple sound is shown in the left plot in Figure 1.1 where the
oscillations in air pressure are plotted against time. The initial air pressure
has the value 101 325 Pa, which is the normal air pressure at sea level. Then
the pressure varies more and more until it oscillates regularly between 101 323
Pa and 101 327 Pa. In the area where the air pressure is constant, no sound
will be heard, but as the variations increase in size, the sound becomes louder
and louder until about time t = 0.03 where the size of the oscillations becomes
constant.

6 +1.01322e5
5 1.0
4 0.5
3 0.0
2 0.5
1 1.0
0
0.00 0.01 0.02 0.03 0.04 0.000 0.002 0.004 0.006 0.008 0.010
Figure 1.1: An audio signal shown in terms of air pressure (left), and in terms
of the difference from the ambient air pressure (right).

When discussing sound, one is usually only interested in the variations in


air pressure, so the ambient air pressure (101 325 Pa) is subtracted from the
measurement. Everyday sounds typically correspond to variations in air pressure
of about 0.00002–2 Pa (0.00002 Pa corresponds to a just audible sound), while a
jet engine may cause variations as large as 200 Pa. Short exposure to variations
of about 20 Pa may in fact lead to hearing damage. The volcanic eruption at
Krakatoa, Indonesia, in 1883, produced a sound wave with variations as large as
almost 100 000 Pa, and the explosion could be heard 5000 km away.
CHAPTER 1. SOUND AND FOURIER SERIES 4

The right plot in Figure 1.1 shows another sound which displays a slow, cos-
like, variation in air pressure, with some smaller and faster variations imposed on
this. This combination of several kinds of systematic oscillations in air pressure
is typical for general sounds. The size of the oscillations is directly related to the
loudness of the sound. The range of the oscillations is so big that it is common
to measure the loudness of a sound on a logarithmic scale:
Fact 1.3. Sound pressure and decibels.
It is common to relate a given sound pressure to the smallest sound pressure
that can be perceived, as a level on a decibel scale,
 2   
p p
Lp = 10 log10 = 20 log 10 .
p2ref pref

Here p is the measured sound pressure while pref is the sound pressure of a just
perceivable sound, usually considered to be 0.00002 Pa.
The square of the sound pressure appears in the definition of Lp since this
represents the power of the sound which is relevant for what we perceive as
loudness.
0.15 0.10
0.10
0.05
0.05
0.00 0.00
0.05
0.05
0.10
0.150.0 0.1 0.2 0.3 0.4 0.5 0.10
0.000 0.005 0.010 0.015 0.020

0.04
0.02
0.00
0.02
0.04
0.0000 0.0005 0.0010 0.0015 0.0020
Figure 1.2: Variations in air pressure during parts of a song. The first 0.5
seconds, the first 0.02 seconds, and the first 0.002 seconds.

The sounds in Figure 1.1 are synthetic in that they were constructed from
mathematical formulas. The sounds in Figure 1.2 on the other hand show the
variation in air pressure for a song, where there is no mathematical formula
involved. In the first half second there are so many oscillations that it is
CHAPTER 1. SOUND AND FOURIER SERIES 5

impossible to see the details, but if we zoom in on the first 0.002 seconds we
can see that there is a continuous function behind all the ink. In reality the
air pressure varies more than this, even over this short time period, but the
measuring equipment may not be able to pick up those variations, and it is also
doubtful whether we would be able to perceive such rapid variations.

1.1.1 The frequency of a sound


The other important characteristic in sound is frequency, i.e. the speed of the
variations. To make this concept more precise, let us start with a couple of
definitions.
Definition 1.4. Periodic functions.
A real function f is said to be periodic with period T if

f (t + T ) = f (t)

for all real numbers t.

Note that all the values of a periodic function f with period T are known if
f (t) is known for all t in the interval [0, T ). The following will be our prototype
for periodic functions:
Observation 1.5. Frequency.
If ν is a real number, the function f (t) = sin(2πνt) is periodic with period
T = 1/ν. When t varies in the interval [0, 1], this function covers a total of
ν periods. This is expressed by saying that f has frequency ν. Frequency is
measured in Hz (Hertz) which is the same as s−1 (the time t is measured in
seconds). The function sin(2πνt) is also called a pure tone.
Clearly sin(2πνt) and cos(2πνt) have the same frequency, and they are simply
shifted versions of oneanother (since cos(2πνt) = sin(2πνt + π/2)) Both, as well
as linear combinations of them, are called pure tones with frequency ν. Due to
this, the complex functions e±2πiνt = cos(2πνt) ± i cos(2πνt) will also be called
pure tones. They will also turn out to be useful in the following.
If we are to perceive variations in air pressure as sound, they must fall within
a certain range. It turns out that, for a human with good hearing to perceive a
sound, the number of variations per second must be in the range 20–20 000.
There is a simple way to change the period of a periodic function, namely by
multiplying the argument by a constant. Figure 1.3 illustrates this. The function
in the upper left is the plain sin t which covers one period when t varies in the
interval [0, 2π]. By multiplying the argument by 2π, the period is squeezed into
the interval [0, 1] so the function sin(2πt) has frequency ν = 1. Then, by also
multiplying the argument by 2, we push two whole periods into the interval [0, 1],
so the function sin(2π2t) has frequency ν = 2. In the lower right the argument
has been multiplied by 5 — hence the frequency is 5 and there are five whole
periods in the interval [0, 1].
CHAPTER 1. SOUND AND FOURIER SERIES 6

1.0 1.0
sin(t) sin(2πt)
0.5 0.5
0.0 0.0
0.5 0.5
1.00 1 2 3 4 5 6 1.00.0 0.2 0.4 0.6 0.8 1.0
1.0 1.0
sin(2π2t) sin(2π5t)
0.5 0.5
0.0 0.0
0.5 0.5
1.00.0 0.2 0.4 0.6 0.8 1.0 1.00.0 0.2 0.4 0.6 0.8 1.0
Figure 1.3: Versions of sin with different frequencies.

1.1.2 Working with digital sound on a computer


Before we can do anything at all with digital sound, we need to know how we
can read and write such data from and to files, and also how to play the data on
the computer. These commands are as follows.

x, fs = audioread(filename) # Read from file


play(x, fs) # Play the entire sound
audiowrite(filename, x, fs) # Write to file

These functions can be found in the module sound. Note that the method play
does not block - if we play several sounds in succession they will be played
simultaneously. To avoid this, we can block the program ourselves using the
raw_input function, in order to wait for input from the terminal.
play basically sends the array of sound samples x and sample rate fs to the
sound card, which uses some method for reconstructing the sound to an analog
sound signal. This analog signal is then sent to the loudspeakers and we hear
the sound.
The sound samples can have different data types. We will always assume that
they are of type double. The computer requires that they have values between
−1 and 1 (0 corresponding to no variation in air pressure from ambience, and
−1 and 1 the largest variations in air pressure). If they are not the behaviour
when the sound is played is undefined.
You can also create the vector x you play on your own, without reading it
from file. Below we do this for pure tones.
CHAPTER 1. SOUND AND FOURIER SERIES 7

Example 1.1: Listen to different channels


Our audio sample file actually has two sound channels. In such cases x is actually
a matrix with two columns, and each column represents a sound channel. To
listen to each channel we can run the following code.

play(x[:, 0], fs)

play(x[:, 1], fs)

You may not hear a difference between the two channels. There may still be
differences, however, they may only be notable when the channels are sent to
different loudspeakers.
We will later apply different operations to sound. It is possible to apply these
operations to the sound channels simultaneously, and we will mostly do this.
Sounds we generate on our own, such as pure tones, will mostly be generated in
one channel.

Example 1.2: Playing the sound backwards


At times a popular game has been to play music backwards to try and find secret
messages. In the old days of analog music on vinyl this was not so easy, but
with digital sound it is quite simple; we just need to reverse the samples. To do
this we just loop through the array and put the last samples first.
−1
Let x = (xi )Ni=0 be the samples of a digital sound. Then the samples
N −1
y = (yi )i=0 of the reverse sound are given by

yi = xN −i−1 , for i = 0, 1, . . . N − 1.
When we reverse the sound samples, we have to reverse the elements in both
sound channels. For our audio sample file this can be performed as follows.

z = x[::(-1), :]
play(z, fs)

Performing this on our sample file you generate the sound you can find in the
file castanetsreverse.wav.

Example 1.3: Playing pure tones.


To create the samples of a pure tone we can write

t = linspace(0, antsec, fs*antsec)


x = sin(2*pi*f*t)

Here f is the frequency, antsec the length in seconds, and fs the sampling rate.
A pure tone with frequency 440 Hz can be found in the file puretone440.wav,
and a pure tone with frequency 1500 Hz can be found in the file puretone1500.wav.
CHAPTER 1. SOUND AND FOURIER SERIES 8

Example 1.4: The square wave


There are many other ways in which a function can oscillate regularly. The
square wave is one such, but we will see later that it can not be written as a
simple, trigonometric function. Given a period T we define the square wave to
be 1 on the first half of each period, and −1 on the second half:
(
1, if 0 ≤ t < T /2;
fs (t) = (1.1)
−1, if T /2 ≤ t < T .
In the left part of Figure 1.4 we have plotted the square wave with the same
period we used for the pure tone.

1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.0 1.0
0.000 0.002 0.004 0.006 0.008 0.010 0.000 0.002 0.004 0.006 0.008 0.010
Figure 1.4: The first five periods of the square wave and the triangle wave.

Let us listen also listen to the square wave. We can first create the samples
for one period.

antsec = 3
samplesperperiod = fs/f # The number of samples for one period
oneperiod = hstack([ones((samplesperperiod/2),dtype=float), \
-ones((samplesperperiod/2),dtype=float)])

Then we repeat one period to obtain a sound with the desired length, and play
it as follows.

x = tile(oneperiod, antsec*f) # Repeat one period


play(x, fs)

You can listen to this square wave in the file square440.wav. We hear a sound
which seems to have the same "base frequency" as sin(2π440t), but the square
wave is less pleasant to listen to: There seems to be some "sharp corners" in the
sound, translating into a rather shrieking, piercing sound. We will later explain
this by the fact that the square wave can be viewed as a sum of many frequencies,
and that many frequencies pollute the sound so that it is not pleasant to listen
to.
CHAPTER 1. SOUND AND FOURIER SERIES 9

Example 1.5: The triangle wave


Given a period T we define the triangle wave to increase linearly from −1 to 1
on the first half of each period, and decrease linearly from 1 to −1 on the second
half of each period. This means that we can define it as the function
(
4t/T − 1, if 0 ≤ t < T /2;
ft (t) = (1.2)
3 − 4t/T, if T /2 ≤ t < T .
In the right part of Figure 1.4 we have plotted the square wave with the same
period we used for the pure tone. In Exercise 1.11 you will be asked to reproduce
this plot, as well as construct and play the corresponding sound, which also can
be found in the file triangle440.wav. Again you will note that the triangle wave
has the same "base frequency" as sin(2π440t), and is less pleasant to listen to
than this pure tone. However, one can argue that it is somewhat more pleasant
to listen to than a square wave. This will also be explained in terms of pollution
with other frequencies later.
In the next sections we will address why many sounds may be approximated
well by adding many pure sounds together. In particular, this will apply for the
square wave and the triangle wave above, and we wil also have something to say
about why they sound so different.

Exercise 1.6: The Krakatoa explosion


Compute the loudness of the Krakatoa explosion on the decibel scale, assuming
that the variation in air pressure peaked at 100 000 Pa.

Exercise 1.7: Sum of two pure tones


Consider a sum of two pure tones, f (t) = A1 sin(2πν1 t) + A2 sin(2πν2 t). For
which values of A1 , A2 , ν1 , ν2 is f periodic? What is the period of f when it is
periodic?

Exercise 1.8: Sum of two pure tones


Find two constant a and b so that the function f (t) = a sin(2π440t)+b sin(2π4400t)
resembles the right plot of Figure 1.1 as closely as possible. Generate the samples
of this sound, and listen to it.

Exercise 1.9: Playing with different sample rates


If we provide another samples rate fs to the play functions, the sound card will
assume a different time distance between neighboring samples. Play and listen
to the audio sample file again, but with three different sample rates: 2*fs, fs,
and fs/2, where fs is the sample rate returned by audioread.
CHAPTER 1. SOUND AND FOURIER SERIES 10

Exercise 1.10: Play sound with added noise


To remove noise from recorded sound can be very challenging, but adding noise
is simple. There are many kinds of noise, but one kind is easily obtained by
adding random numbers to the samples of a sound. For this we can use the
function random.random as follows.

z = x + c*(2*random.random(shape(x))-1)

This adds noise to all channels. The function for returning random numbers
returns numbers between 0 and 1, and above we have adjusted these so that
they are between −1 and 1 instead, as for other sound which can be played by
the computer. c is a constant (usually smaller than 1) that dampens the noise.
Write code which adds noise to the audio sample file, and listen to the result
for damping constants c=0.4 and c=0.1. Remember to scale the sound values
after you have added noise, since they may be outside [−1, 1] then.

Exercise 1.11: Playing the triangle wave


Repeat what you did in Example 1.4, but now for the triangle wave of Example 1.5.
Start by generating the samples for one period of the triangle wave, then plot
five periods, before you generate the sound over a period of three seconds, and
play it. Verify that you generate the same sound as in Example 1.5.

1.2 Fourier series: Basic concepts


We will now discuss the idea of decomposing a sound into a linear combination
of pure sounds. A coefficient in such a decomposition then gives the content
at a given frequency. Such a decomposition will pave the way for constructing
useful operations on sound, such as amplifying or annihilating certain frequencies:
Certain frequencies may not be important for our perception of the sound, so
that annihilating or rounding these may not affect how we perceive them. For
simplicity we will first restrict to functions which are periodic with period T , so
that they are uniquely defined by their values on on [0, T ]. Our analysis can be
carried out for square-integrable functions:
Definition 1.6. Continuous and square-integrable functions.
The set of continuous, real functions defined on an interval [0, T ] is denoted
C[0, T ].
A real function f defined on [0, T ] is said to be square integrable if f 2 is
Riemann-integrable, i.e., if the Riemann integral of f 2 on [0, T ] exists,
Z T
f (t)2 dt < ∞.
0

The set of all square integrable functions on [0, T ] is denoted L2 [0, T ].


CHAPTER 1. SOUND AND FOURIER SERIES 11

The sets of continuous and square-integrable functions can be equipped with


an inner-product, a generalization of the so-called dot-product for vectors.
Theorem 1.7. Inner product spaces.
Both L2 [0, T ] and C[0, T ] are vector spaces. Moreover, if the two functions f
and g lie in L2 [0, T ] (or in C[0, T ]), then the product f g is Riemann-integrable
(or in C[0, T ]). Moreover, both spaces are inner product spaces 1 with inner
product 2 defined by
Z T
1
hf, gi = f (t)g(t) dt, (1.3)
T 0
and associated norm
s
Z T
1
kf k = f (t)2 dt. (1.4)
T 0

Proof. Since

|f + g|2 ≤ (2 max(|f |, |g|)2 ≤ 4(|f |2 + |g|2 ).


f +g is square integrable whenever f and g are. It follows that L2 [0, T ] is a vector
space. The properties of an inner product space follow directly from the properties
of Riemann-integrable functions. Also, since |f g| ≤ |f |2 + |g|2 , it follows that
hf, gi < ∞ whenever f and g are square integrable. It follows immediately that
f g is Riemann-integrable whenever f and g are square integrable.
The mysterious factor 1/T is included so that the constant function f (t) = 1
has norm 1, i.e., its role is as a normalizing factor.
Definition 1.6 and Theorem 1.7 state how general we will allow our sounds
to be. Theorem 1.7 also explains how we may determine approximations: Recall
from linear algebra that the projection of a function f onto a subspace W
with respect to an inner product h·, ·i is the function g ∈ W which minimizes
kf − gk, also called the error (or least square error) in the approximation 3 .
This projection is therefore also called a best approximation of f from W and is
characterized by the fact that the function f − g, also called the error function,
should be orthogonal to the subspace W , i.e.

hf − g, hi = 0, for all h ∈ W .
If we have an orthogonal basis φ = {φi }m
for W , the orthogonal decomposition
i=1
theorem states that the best approximation from W is
m
X hf, φi i
g= φi . (1.5)
i=1
hφ i , φi i

1 See Section 6.1 in [25] for a review of inner products and orthogonality.
2 See Section 6.7 in [25] for a review of function spaces as inner product spaces.
3 See Section 6.3 in [25] for a review of projections and least squares approximations.
CHAPTER 1. SOUND AND FOURIER SERIES 12

What we would like is a sequence of spaces

V1 ⊂ V2 ⊂ · · · ⊂ Vn ⊂ · · ·
of increasing dimensions so that most sounds can be approximated arbitrarily
well by choosing n large enough, and use the orthogonal decomposition theorem
to compute the approximations. It turns out that pure tones can be used for
this purpose:
Definition 1.8. Fourier series.
Let VN,T be the subspace of C[0, T ] spanned by the set of functions given by

DN,T = {1, cos(2πt/T ), cos(2π2t/T ), · · · , cos(2πN t/T ),


sin(2πt/T ), sin(2π2t/T ), · · · , sin(2πN t/T )}. (1.6)

The space VN,T is called the N ’th order Fourier space. The N th-order Fourier
series approximation of f , denoted fN , is defined as the best approximation of f
from VN,T with respect to the inner product defined by (1.3).
We see that pure tones at frequencies 1/T , 2/T ,..., N/T are a basis for VN,T .
A best approximation at these frequencies, as described above will be called a
Fourier series. They are similar to Taylor series, where instead polynomials are
used in the approximation, but we will see that there is a major difference in
how the two approximations are computed. The theory of approximation of
functions with Fourier series is referred to as Fourier analysis, and is a central
tool in practical fields like image- and signal processing, but is also an important
field of research within pure mathematics. The approximation fN ∈ VN,T can
serve as a compressed version of f if many of the coefficients can be set to 0
without the error becoming too big.
Note that all the functions in the set DN,T are periodic with period T , but
most have an even shorter period (cos(2πnt/T ) also has period T /n). In general,
the term fundamental frequency is used to denote the lowest frequency of a given
periodic function.
The next theorem explains that the DN,T actually forms a basis for the
Fourier spaces, and also how to obtain the coefficients in this basis.
Theorem 1.9. Fourier coefficients.
The set DN,T is an orthogonal basis for VN,T . In particular, the dimension
of VN,T is 2N + 1, and if f is a function in L2 [0, T ], we denote by a0 , . . . , aN
and b1 , . . . , bN the coordinates of fN in the basis DN,T , i.e.
N
X
fN (t) = a0 + (an cos(2πnt/T ) + bn sin(2πnt/T )) . (1.7)
n=1

The a0 , . . . , aN and b1 , . . . , bN are called the (real) Fourier coefficients of f , and


they are given by
CHAPTER 1. SOUND AND FOURIER SERIES 13

Z T
1
a0 = hf, 1i = f (t) dt, (1.8)
T 0
Z T
2
an = 2 f, cos(2πnt/T ) = f (t) cos(2πnt/T ) dt for n ≥ 1, (1.9)
T 0
Z T
2
bn = 2hf, sin(2πnt/T )i = f (t) sin(2πnt/T ) dt for n ≥ 1. (1.10)
T 0

Proof. Assume first that m 6= n. We compute the inner product

hcos(2πmt/T ), cos(2πnt/T )i
1 T
Z
= cos(2πmt/T ) cos(2πnt/T )dt
T 0
Z T
1
= (cos(2πmt/T + 2πnt/T ) + cos(2πmt/T − 2πnt/T ))
2T 0
 T
1 T T
= sin(2π(m + n)t/T ) + sin(2π(m − n)t/T )
2T 2π(m + n) 2π(m − n) 0
= 0.

Here we have added the two identities cos(x±y) = cos x cos y∓sin x sin y together
to obtain an expression for cos(2πmt/T ) cos(2πnt/T )dt in terms of cos(2πmt/T +
2πnt/T ) and cos(2πmt/T − 2πnt/T ). By testing all other combinations of sin
and cos also, we obtain the orthogonality of all functions in DN,T . We also
obtain that

1
hcos(2πmt/T ), cos(2πmt/T )i =
2
1
hsin(2πmt/T ), sin(2πmt/T )i =
2
h1, 1i = 1,

From the orthogonal decomposition theorem (1.5) it follows from this that
CHAPTER 1. SOUND AND FOURIER SERIES 14

N
hf, 1i X hf, cos(2πnt/T )i
fN (t) = 1+ cos(2πnt/T )
h1, 1i n=1
hcos(2πnt/T ), cos(2πnt/T )i
N
X hf, sin(2πnt/T )i
+ sin(2πnt/T )
n=1
hsin(2πnt/T ), sin(2πnt/T )i
1 T N 1 RT
R
T 0 f (t)dt T 0 f (t) cos(2πnt/T )dt
X
= + 1 cos(2πnt/T )
1 n=1 2
N 1 RT
T 0 f (t) sin(2πnt/T )dt
X
+ 1 sin(2πnt/T )
n=1 2
N
!
Z T Z T
1 X 2
= f (t)dt + f (t) cos(2πnt/T )dt cos(2πnt/T )
T 0 n=1
T 0
N
!
Z T
X 2
+ f (t) sin(2πnt/T )dt sin(2πnt/T ).
n=1
T 0

Equations (1.8)-(1.10) now follow by comparison with Equation (1.7).


From the orthogonality and the inner products of the Fourier basis functions
it immediately follows that
N
1X 2
kfN k2 = a20 + (a + b2n )
2 n=1 n
Since f is a function in time, and the an , bn represent contributions from different
frequencies, the Fourier series can be thought of as a change of coordinates,
from what often is called the time domain, to the frequency domain (or Fourier
domain). We will call the basis DN,T the N ’th order Fourier basis for VN,T . We
note that DN,T is not an orthonormal basis; it is only orthogonal.
In the signal processing literature, Equation (1.7) is known as the synthesis
equation, since the original function f is synthesized as a sum of the basis
functions. Equations (1.8)-(1.10) are also called analysis equations.
An illustration of convergence of Fourier series is shown in Figure 1.5 where
the cubic polynomial f (x) = − 13 x3 + 12 x2 − 16 3
x + 1 is approximated by a 9’th
order Fourier series. The trigonometric approximation is periodic with period 1
so the approximation becomes poor at the ends of the interval since the cubic
polynomial is not periodic. The approximation is plotted on a larger interval in
the right plot in Figure 1.5, where its periodicity is clearly visible.
Let us compute the Fourier series of some interesting functions.

Example 1.12: Fourier coefficients of the square wave


Let us compute the Fourier coefficients of the square wave, as defined by Equation
(1.1) in Example 1.4. If we first use Equation (1.8) we obtain
CHAPTER 1. SOUND AND FOURIER SERIES 15

1.005 1.005
1.000 1.000
0.995 0.995
0.990 0.990
0.985 0.985
0.980 0.980
0.9750.0 0.2 0.4 0.6 0.8 1.0 0.9750.0 0.5 1.0 1.5 2.0
Figure 1.5: The cubic polynomial f (x) = − 13 x3 + 12 x2 − 163
x + 1 on the interval
[0, 1], together with its Fourier series approximation from V9,1 . The function and
its Fourier series is shown left. The Fourier series on a larger interval is shown
right.

Z T Z T /2 Z T
1 1 1
a0 = fs (t)dt = dt − dt = 0.
T 0 T 0 T T /2

Using Equation (1.9) we get

Z T
2
an = fs (t) cos(2πnt/T )dt
T 0
Z T /2 Z T
2 2
= cos(2πnt/T )dt − cos(2πnt/T )dt
T 0 T T /2
 T /2  T
2 T 2 T
= sin(2πnt/T ) − sin(2πnt/T )
T 2πn 0 T 2πn T /2
2 T
= ((sin(nπ) − sin 0) − (sin(2nπ) − sin(nπ)) = 0.
T 2πn
Finally, using Equation (1.10) we obtain
CHAPTER 1. SOUND AND FOURIER SERIES 16

Z T
2
bn = fs (t) sin(2πnt/T )dt
T 0
Z T /2 Z T
2 2
= sin(2πnt/T )dt − sin(2πnt/T )dt
T 0 T T /2
 T /2  T
2 T 2 T
= − cos(2πnt/T ) + cos(2πnt/T )
T 2πn 0 T 2πn T /2
2 T
= ((− cos(nπ) + cos 0) + (cos(2nπ) − cos(nπ)))
T 2πn
2(1 − cos(nπ)
=
( nπ
0, if n is even;
=
4/(nπ), if n is odd.

In other words, only the bn -coefficients with n odd in the Fourier series are
nonzero. This means that the Fourier series of the square wave is

4 4 4 4
sin(2πt/T )+ sin(2π3t/T )+ sin(2π5t/T )+ sin(2π7t/T )+· · · . (1.11)
π 3π 5π 7π
With N = 20, there are 10 trigonometric terms in this sum. The corresponding
Fourier series can be plotted over one period with the following code.

N = 20
T = 1/440.
t = linspace(0, T, 100)
x = zeros(len(t))
for k in range(1, N + 1, 2):
x += (4/(k*pi))*sin(2*pi*k*t/T)
plt.figure()
plt.plot(t, x, ’k-’)

The left plot in Figure 1.6 shows the resulting plot. In the right plot the values of
the first 100 Fourier coefficients bn are shown, to see that they actually converge
to zero. This is clearly necessary in order for the Fourier series to converge.
Even though f oscillates regularly between −1 and 1 with period T , the
discontinuities mean that it is far from the simple sin(2πt/T ) which corresponds
to a pure tone of frequency 1/T . Clearly b1 sin(2πt/T ) is the dominant term in
the Fourier series. This is not surprising since the square wave has the same
period as this term, but the additional terms in the Fourier series pollute the
pure sound. As we include more and more of these, we gradually approach the
square wave.
There is a connection between how fast the Fourier coefficients go to zero, and
how we perceive the sound. A pure sine sound has only one nonzero coefficient,
while the square wave Fourier coefficients decrease as 1/n, making the sound
CHAPTER 1. SOUND AND FOURIER SERIES 17

1.0 1.0
0.8
0.5
0.6
0.0
0.4
0.5
0.2
1.0 0.0
0.0000 0.0005 0.0010 0.0015 0.0020 20 40 60 80 100
Figure 1.6: The Fourier series of the square wave with N = 20, and the values
for the first 100 Fourier coefficients bn .

less pleasant. This explains what we heard when we listened to the sound in
Example 1.4. Also, it explains why we heard the same pitch as the pure tone,
since the first frequency in the Fourier series has the same frequency as the pure
tone we listened to, and since this had the highest value.
Let us listen to the Fourier series approximations of the square wave. For
N = 1 the sound can be found in the file square440s1.wav. This sounds exactly
like the pure sound with frequency 440Hz, as noted above. For N = 5 the sound
can be found in the file square440s5.wav, and for N = 9 it can be found in the
file square440s9.wav. The latter sounds are more like the square wave itself. As
we increase N we can hear how the introduction of more frequencies gradually
pollutes the sound more.

Example 1.13: Fourier coefficients of the triangle wave


Let us also compute the Fourier coefficients of the triangle wave, as defined by
Equation (1.2) in Example 1.5. We now have
Z T /2   Z T  
1 4 T 1 4 3T
a0 = t− dt + − t dt.
T 0 T 4 T T /2 T 4
Instead of computing this directly, it is quicker to see geometrically that the
graph of ft has as much area above as below the x-axis, so that this integral
must be zero. Similarly, since ft is symmetric about the midpoint T /2, and
sin(2πnt/T ) is antisymmetric about T /2, we have that ft (t) sin(2πnt/T ) also is
antisymmetric about T /2, so that
Z T /2 Z T
ft (t) sin(2πnt/T )dt = − ft (t) sin(2πnt/T )dt.
0 T /2

This means that, for n ≥ 1,

Z T /2 Z T
2 2
bn = ft (t) sin(2πnt/T )dt + ft (t) sin(2πnt/T )dt = 0.
T 0 T T /2
CHAPTER 1. SOUND AND FOURIER SERIES 18

For the final coefficients, since both f and cos(2πnt/T ) are symmetric about
T /2, we get for n ≥ 1,

Z T /2 Z T
2 2
an = ft (t) cos(2πnt/T )dt + ft (t) cos(2πnt/T )dt
T 0 T T /2
T /2
4 T /2 4
Z Z  
4 T
= ft (t) cos(2πnt/T )dt = t− cos(2πnt/T )dt
T 0 T 0 T 4
16 T /2 4 T /2
Z Z
= 2 t cos(2πnt/T )dt − cos(2πnt/T )dt
T 0 T 0
4
= 2 2 (cos(nπ) − 1)

n
0, if n is even;
=
−8/(n2 π 2 ), if n is odd.
where we have dropped the final tedious calculations (use integration by parts).
From this it is clear that the Fourier series of the triangle wave is

8 8 8 8
− cos(2πt/T )− 2 2 cos(2π3t/T )− 2 2 cos(2π5t/T )− 2 2 cos(2π7t/T )+· · · .
π2 3 π 5 π 7 π
(1.12)
In Figure 1.7 we have repeated the plots used for the square wave, for the triangle
wave. The figure indicates that the Fourier series coefficients decay faster.

1.0 1.0
0.8
0.5
0.6
0.0
0.4
0.5
0.2
1.0 0.0
0.0000 0.0005 0.0010 0.0015 0.0020 20 40 60 80 100
Figure 1.7: The Fourier series of the triangle wave with N = 20, and the values
for the first 100 Fourier coefficients an .

Let us also listen to different Fourier series approximations of the triangle


wave. For N = 1 it can be found in the file triangle440s1.wav. Again, this sounds
exactly like the pure sound with frequency 440Hz. For N = 5 the Fourier series
approximation can be found in the file triangle440s5.wav, and for N = 9 it can
be found in the file triangle440s9.wav. Again the latter sounds are more like the
triangle wave itself, and as we increase N we can hear that more frequencies
pollutes the sound. However, since the triangle wave Fourier coefficients decrease
as 1/n2 rather than 1/n, the sound is somewhat less unpleasant. The faster
convergence can also be heard.
CHAPTER 1. SOUND AND FOURIER SERIES 19

Example 1.14: Fourier coefficients of a simple function


There is an important lesson to be learned from the previous examples: Even if
the signal is nice and periodic, it may not have a nice representation in terms
of trigonometric functions. Thus, trigonometric functions may not be the best
bases to use for expressing other functions. Unfortunately, many more such cases
can be found, as we will now explain. Let us consider a periodic function which
is 1 on [0, T0 ], but 0 is on [T0 , T ]. This is a signal with short duration when T0
is small compared to T . We compute that y0 = T0 /T , and

Z T0
2 1 T sin(2πnT0 /T )
an = cos(2πnt/T )dt = [sin(2πnt/T )]0 0 =
T 0 πn πn

for n ≥ 1. Similar computations hold for bn . We see that |an | is of the order
1/(πn), and that infinitely many n contribute, This function may be thought
of as a simple building block, corresponding to a small time segment. However,
we see that it is not a simple building block in terms of trigonometric functions.
This time segment building block may be useful for restricting a function to
smaller time segments, and later on we will see that it still can be useful.

1.2.1 Fourier series for symmetric and antisymmetric func-


tions
In Example 1.12 we saw that the Fourier coefficients bn vanished, resulting in a
sine-series for the Fourier series of the square wave. Similarly, in Example 1.13
we saw that an vanished, resulting in a cosine-series for the triangle wave. This
is not a coincident, and is captured by the following result.
Theorem 1.10. Symmetry and antisymmetry.
If f is antisymmetric about 0 (that is, if f (−t) = −f (t) for all t), then an = 0,
so the Fourier series is actually a sine-series. If f is symmetric about 0 (which
means that f (−t) = f (t) for all t), then bn = 0, so the Fourier series is actually
a cosine-series.

The point is that the square wave is antisymmetric about 0, and the triangle
wave is symmetric about 0.
Proof. Note first that we can write

Z T /2 Z T /2
2 2
an = f (t) cos(2πnt/T )dt bn = f (t) sin(2πnt/T )dt,
T −T /2 T −T /2

i.e. we can change the integration bounds from [0, T ] to [−T /2, T /2]. This
follows from the fact that all f (t), cos(2πnt/T ) and sin(2πnt/T ) are periodic
with period T .
Suppose first that f is symmetric. We obtain
CHAPTER 1. SOUND AND FOURIER SERIES 20

Z T /2
2
bn = f (t) sin(2πnt/T )dt
T −T /2
0
2 T /2
Z Z
2
= f (t) sin(2πnt/T )dt + f (t) sin(2πnt/T )dt
T −T /2 T 0
2 −T /2
Z 0 Z
2
= f (t) sin(2πnt/T )dt − f (−t) sin(−2πnt/T )dt
T −T /2 T 0
Z 0
2 0
Z
2
= f (t) sin(2πnt/T )dt − f (t) sin(2πnt/T )dt = 0.
T −T /2 T −T /2

where we have made the substitution u = −t, and used that sin is antisymmetric.
The case when f is antisymmetric can be proved in the same way, and is left as
an exercise.

Exercise 1.15: Shifting the Fourier basis vectors


Show that sin(2πnt/T + a) ∈ VN,T when |n| ≤ N , regardless of the value of a.

Exercise 1.16: Playing the Fourier series of the triangle


wave
a) Plot the Fourier series of the triangle wave.
b) Write code so that you can listen to the Fourier series of the triangle wave.
How high must you choose N for the Fourier series to be indistuingishable from
the square/triangle waves themselves?

Exercise 1.17: Riemann-integrable functions which are not


square-integrable
RT
Find a function f which is Riemann-integrable on [0, T ], and so that 0
f (t)2 dt
is infinite.

Exercise 1.18: When are Fourier spaces included in each


other?
Given the two Fourier spaces VN1 ,T1 , VN2 ,T2 . Find necessary and sufficient
conditions in order for VN1 ,T1 ⊂ VN2 ,T2 .

Exercise 1.19: antisymmetric functions are sine-series


Prove the second part of Theorem 1.10, i.e. show that if f is antisymmetric about
0 (i.e. f (−t) = −f (t) for all t), then an = 0, i.e. the Fourier series is actually a
sine-series.
CHAPTER 1. SOUND AND FOURIER SERIES 21

Exercise 1.20: More connections between symmetric-/antisymmetric


functions and sine-/cosine series
Show that
PN
a) Any cosine series a0 + n=1 an cos(2πnt/T ) is a symmetric function.
PN
b) Any sine series n=1 bn sin(2πnt/T ) is an antisymmetric function.
c) Any periodic function can be written as a sum of a symmetric - and an
antisymmetric function by writing f (t) = f (t)+f
2
(−t)
+ f (t)−f
2
(−t)
.
PN
d) If fN (t) = a0 + n=1 (an cos(2πnt/T ) + bn sin(2πnt/T )), then

N
fN (t) + fN (−t) X
= a0 + an cos(2πnt/T )
2 n=1
N
fN (t) − fN (−t) X
= bn sin(2πnt/T ).
2 n=1

Exercise 1.21: Fourier series for low-degree polynomials


Find the Fourier series coefficients of the periodic functions with period T defined
by being f (t) = t, f (t) = t2 , and f (t) = t3 , on [0, T ].

Exercise 1.22: Fourier series for polynomials


Write down difference equations for finding the Fourier coefficients of f (t) = tk+1
from those of f (t) = tk , and write a program which uses this recursion. Use the
program to verify what you computed in Exercise 1.21.

Exercise 1.23: Fourier series of a given polynomial


Use the previous exercise to find the Fourier series for f (x) = − 13 x3 + 12 x2 − 16
3
x+1
on the interval [0, 1]. Plot the 9th order Fourier series for this function. You
should obtain the plots from Figure 1.5.

1.3 Complex Fourier series


In Section 1.2 we saw how a function can be expanded in a series of sines and
cosines. These functions are related to the complex exponential function via
Eulers formula
eix = cos x + i sin x
where i is the imaginary unit with the property that i2 = −1. Because the
algebraic properties of the exponential function are much simpler than those
of cos and sin, it is often an advantage to work with complex numbers, even
CHAPTER 1. SOUND AND FOURIER SERIES 22

though the given setting is real numbers. This is definitely the case in Fourier
analysis. More precisely, we will make the substitutions

1  2πint/T 
cos(2πnt/T ) = e + e−2πint/T (1.13)
2
1  2πint/T 
sin(2πnt/T ) = e − e−2πint/T (1.14)
2i
in Definition 1.8. From these identities it is clear that the set of complex
exponential functions e2πint/T also is a basis of periodic functions (with the same
period) for VN,T . We may therefore reformulate Definition 1.8 as follows:
Definition 1.11. Complex Fourier basis.
We define the set of functions

FN,T = {e−2πiN t/T , e−2πi(N −1)t/T , · · · , e−2πit/T , (1.15)


2πit/T 2πi(N −1)t/T 2πiN t/T
1, e ,··· ,e ,e }, (1.16)

and call this the order N complex Fourier basis for VN,T .
The function e2πint/T is also called a pure tone with frequency n/T , just
as sines and cosines are. We would like to show that these functions also are
orthogonal. To show this, we need to say more on the inner product we have
defined by Equation (1.3). A weakness with this definition is that we have
assumed real functions f and g, so that this can not be used for the complex
exponential functions e2πint/T . For general complex functions we will extend
the definition of the inner product as follows:
Z T
1
hf, gi = f ḡ dt. (1.17)
T 0
The associated norm now becomes
s
Z T
1
kf k = |f (t)|2 dt. (1.18)
T 0

The motivation behind Equation (1.17), where we have conjugated the second
function, lies in the definition of an inner product for vector spaces over complex
numbers. From before we are used to vector spaces over real numbers, but vector
spaces over complex numbers are defined through the same set of axioms as
for real vector spaces, only replacing real numbers with complex numbers. For
complex vector spaces, the axioms defining an inner product are the same as for
real vector spaces, except for that the axiom

hf, gi = hg, f i (1.19)


is replaced with the axiom
CHAPTER 1. SOUND AND FOURIER SERIES 23

hf, gi = hg, f i, (1.20)


i.e. a conjugation occurs when we switch the order of the functions. This new
axiom can be used to prove the property hf, cgi = c̄hf, gi, which is a somewhat
different property from what we know for real inner product spaces. This follows
by writing

hf, cgi = hcg, f i = chg, f i = c̄hg, f i = c̄hf, gi.


Clearly the inner product given by (1.17) satisfies Axiom (1.20). With this
definition it is quite easy to see that the functions e2πint/T are orthonormal.
Using the orthogonal decomposition theorem we can therefore write

N N
X hf, e2πint/T i 2πint/T
X
fN (t) = 2πint/T , e2πint/T i
e = hf, e2πint/T ie2πint/T
n=−N
he n=−N
N Z T !
X 1
= f (t)e−2πint/T dt e2πint/T .
T 0
n=−N

We summarize this in the following theorem, which is a version of Theorem 1.9


which uses the complex Fourier basis:
Theorem 1.12. Complex Fourier coefficients.
We denote by y−N , . . . , y0 , . . . , yN the coordinates of fN in the basis FN,T ,
i.e.
N
X
fN (t) = yn e2πint/T . (1.21)
n=−N

The yn are called the complex Fourier coefficients of f , and they are given by.
Z T
1
yn = hf, e 2πint/T
i= f (t)e−2πint/T dt. (1.22)
T 0

Let us consider two immediate and important consequences of the orthonormal


basis we have established. The first one follows directly from the orthonormality.
Theorem 1.13. Parseval’s theorem.
We have that
N
X
kfN k2 = |yn |2
n=−N

Theorem 1.14. Bessel’s inequality.


For any f ∈ L2 [0, T ] we have that kf k2 ≥ kfN k2 . In particular, the sequence
PN
kfN k2 = n=−N |yn |2 is convergent, so that yn → 0.
Versions of these two results also hold for the real Fourier coefficients.
CHAPTER 1. SOUND AND FOURIER SERIES 24

Proof. Since fN (t) is the projection of f onto VN,T we have that

kf k2 = kf − fN k2 + kfN k2 ≥ kfN k2 ,

InPparticular the Fourier coefficients go to zero. The results does not say
N
that n=−N |yn |2 → kf k2 , which would imply that kf − fN k → 0. This is more
difficult to analyze, and we will only prove a particular case of it in Section 1.6.
If we reorder the real and complex Fourier bases so that the two functions
{cos(2πnt/T ), sin(2πnt/T )} and {e2πint/T , e−2πint/T } have the same index in
the bases, equations (1.13)-(1.14) give us that the change of coordinates matrix
4
from DN,T to FN,T , denoted PFN,T ←DN,T , is represented by repeating the
matrix
 
1 1 1/i
2 1 −1/i
along the diagonal (with an additional 1 for the constant function 1). In other
words, since an , bn are coefficients relative to the real basis and yn , y−n the
corresponding coefficients relative to the complex basis, we have for n > 0,
    
yn 1 1 1/i an
= .
y−n 2 1 −1/i bn
This can be summarized by the following theorem:
Theorem 1.15. Change of coordinates between real and complex Fourier bases.
The complex Fourier coefficients yn and the real Fourier coefficients an , bn of
a function f are related by

y0 = a0 ,
1
yn = (an − ibn ),
2
1
y−n = (an + ibn ),
2
for n = 1, . . . , N .
Combining with Theorem 1.10, Theorem 1.15 can help us state properties of
complex Fourier coefficients for symmetric- and antisymmetric functions. We
look into this in Exercise 1.34.
Due to the somewhat nicer formulas for the complex Fourier coefficients when
compared to the real Fourier coefficients, we will write most Fourier series in
complex form in the following.
Let us consider some examples where we compute complex Fourier series.
4 See Section 4.7 in [25], to review the mathematics behind change of coordinates.
CHAPTER 1. SOUND AND FOURIER SERIES 25

Example 1.24: Complex Fourier coefficients of a simple


function
Let us consider the pure sound f (t) = e2πit/T2 with period T2 , but let us consider
it only on the interval [0, T ] instead, where T < T2 . Note that this f is not
periodic, since we only consider the part [0, T ] of the period [0, T2 ]. The Fourier
coefficients are

Z T iT
1 1 h
yn = e2πit/T2 e−2πint/T dt = e2πit(1/T2 −n/T )
T 0 2πiT (1/T2 − n/T ) 0
1  
= e2πiT /T2 −1 .
2πi(T /T2 − n)

Here it is only the term 1/(T /T2 − n) which depends on n, so that yn can only be
large when n is close T /T2 . In Figure 1.8 we have plotted |yn | for two different
combinations of T, T2 .
1.0 1.0
0.8 0.8
0.6 0.6
|yn |

|yn |

0.4 0.4
0.2 0.2
0.00 5 10 15 20 0.00 5 10 15 20
n n
Figure 1.8: Plot of |yn | when f (t) = e2πit/T2 , and T2 > T . Left: T /T2 = 0.5.
Right: T /T2 = 0.9.

In both examples it is seen that many Fourier coefficients contribute, but


this is more visible when T /T2 = 0.5. When T /T2 = 0.9, most contribution is
seen to be in the y1 -coefficient. This sounds reasonable, since f then is closest
to the pure tone f (t) = e2πit/T of frequency 1/T (which in turn has y1 = 1 and
all other yn = 0).
Apart from computing complex Fourier series, there is an important lesson to
be learned from this example: In order for a periodic function to be approximated
by other periodic functions, their period must somehow match.

Example 1.25: Complex Fourier coefficients of composite


function
What often is the case is that a sound changes in content over time. Assume
that it is equal to a pure tone of frequency n1 /T on [0, T /2), and equal to a pure
tone of frequency n2 /T on [T /2, T ), i.e.
CHAPTER 1. SOUND AND FOURIER SERIES 26

(
e2πin1 t/T on [0, T2 ]
f (t) = .
e2πin2 t/T on[T2 , T )
When n 6= n1 , n2 we have that
!
Z T /2 Z T
1
yn = e2πin1 t/T e−2πint/T dt + e2πin2 t/T e−2πint/T dt
T 0 T /2
 T /2  T !
1 T T
= e2πi(n1 −n)t/T + e2πi(n2 −n)t/T
T 2πi(n1 − n) 0 2πi(n2 − n) T /2

eπi(n1 −n) − 1 1 − eπi(n2 −n)


= + .
2πi(n1 − n) 2πi(n2 − n)
Let us restrict to the case when n1 and n2 are both even. We see that
1 1
 2 + πi(n2 −n1 ) n = n1 , n2

yn = 0 n even , n 6= n1 , n2
n1 −n2

n odd

πi(n1 −n)(n2 −n)
Here we have computed the cases n = n1 and n = n2 as above. In Figure 1.9 we
have plotted |yn | for two different combinations of n1 , n2 .
1.0 1.0
0.8 0.8
0.6 0.6
|yn |

|yn |

0.4 0.4
0.2 0.2
0.00 5 10 15 20 25 30 35 0.00 5 10 15 20 25 30 35
n n
Figure 1.9: Plot of |yn | when we have two different pure tones at the different
parts of a period. Left: n1 = 10, n2 = 12. Right: n1 = 2, n2 = 20.

We see that, when n1 , n2 are close, the Fourier coefficients are close to those
of a pure tone with n ≈ n1 , n2 , but that also other frequencies contribute. When
n1 , n2 are further apart, we see that the Fourier coefficients are like the sum of
the two base frequencies, but that other frequencies contribute also here.
There is an important lesson to be learned from this as well: We should
be aware of changes in a sound over time, and it may not be smart to use
a frequency representation over a large interval when we know that there are
simpler frequency representations on the smaller intervals. The following example
shows that, in some cases it is not necessary to compute the Fourier integrals at
all, in order to compute the Fourier series.
CHAPTER 1. SOUND AND FOURIER SERIES 27

Example 1.26: Complex Fourier coefficients of f (t) = cos3 (2πt/T )


Let us compute the complex Fourier series of the function f (t) = cos3 (2πt/T ),
where T is the period of f . We can write

 3
1 2πit/T
cos3 (2πt/T ) = (e + e−2πit/T )
2
1 2πi3t/T
= (e + 3e2πit/T + 3e−2πit/T + e−2πi3t/T )
8
1 3 3 1
= e2πi3t/T + e2πit/T + e−2πit/T + e−2πi3t/T .
8 8 8 8
From this we see that the complex Fourier series is given by y1 = y−1 = 38 , and
that y3 = y−3 = 18 . In other words, it was not necessary to compute the Fourier
integrals in this case, and we see that the function lies in V3,T , i.e. there are
finitely many terms in the Fourier series. In general, if the function is some
trigonometric function, we can often use trigonometric identities to find an
expression for the Fourier series.

Exercise 1.27: Orthonormality of Complex Fourier basis


Show that the complex functions e2πint/T are orthonormal.

Exercise 1.28: Complex Fourier series of f (t) = sin2 (2πt/T )


Compute the complex Fourier series of the function f (t) = sin2 (2πt/T ).

Exercise 1.29: Complex Fourier series of polynomials


Repeat Exercise 1.21, computing the complex Fourier series instead of the real
Fourier series.

Exercise 1.30: Complex Fourier series and Pascals triangle


In this exercise we will find a connection with certain Fourier series and the rows
in Pascal’s triangle.
a) Show that both cosn (t) and sinn (t) are in VN,2π for 1 ≤ n ≤ N .
b) Write down the N ’th order complex Fourier series for f1 (t) = cos t, f2 (t) =
cos2 t, og f3 (t) = cos3 t.
c) In b) you should be able to see a connection between the Fourier coefficients
and the three first rows in Pascal’s triangle. Formulate and prove a general
relationship between row n in Pascal’s triangle and the Fourier coefficients of
fn (t) = cosn t.
CHAPTER 1. SOUND AND FOURIER SERIES 28

Exercise 1.31: Complex Fourier coefficients of the square


wave
Compute the complex Fourier coefficients of the square wave using Equation
(1.22), i.e. repeat the calculations from Example 1.12 for the complex case. Use
Theorem 1.15 to verify your result.

Exercise 1.32: Complex Fourier coefficients of the triangle


wave
Repeat Exercise 1.31 for the triangle wave.

Exercise 1.33: Complex Fourier coefficients of low-degree


polynomials
Use Equation (1.22) to compute the complex Fourier coefficients of the periodic
functions with period T defined by, respectively, f (t) = t, f (t) = t2 , and f (t) = t3 ,
on [0, T ]. Use Theorem 1.15 to verify your calculations from Exercise 1.21.

Exercise 1.34: Complex Fourier coefficients for symmetric


and antisymmetric functions
In this exercise we will prove a version of Theorem 1.10 for complex Fourier
coefficients.
a) If f is symmetric about 0, show that yn is real, and that y−n = yn .
b) If f is antisymmetric about 0, show that the yn are purely imaginary, y0 = 0,
and that y−n = −yn .
PN
c) Show that n=−N yn e2πint/T is symmetric when y−n = yn for all n, and
rewrite it as a cosine-series.
PN
d) Show that n=−N yn e2πint/T is antisymmetric when y0 = 0 and y−n = −yn
for all n, and rewrite it as a sine-series.

1.4 Some properties of Fourier series


We continue by establishing some important properties of Fourier series, in
particular the Fourier coefficients for some important functions. In these lists,
we will use the notation f → yn to indicate that yn is the n’th (complex) Fourier
coefficient of f (t).
Theorem 1.16. Fourier series pairs.
The functions 1, e2πint/T , and χ−a,a have the Fourier coefficients
CHAPTER 1. SOUND AND FOURIER SERIES 29

1 → e0 = (1, 0, 0, 0 . . . , )
2πint/T
e → en = (0, 0, . . . , 1, 0, 0, . . .)
sin(2πna/T )
χ−a,a → .
πn
The 1 in en is at position n and the function χ−a,a is the characteristic function
of the interval [−a, a], defined by
(
1, if t ∈ [−a, a];
χ−a,a (t) =
0, otherwise.
The first two pairs are easily verified, so the proofs are omitted. The case for
χ−a,a is very similar to the square wave, but easier to prove, and therefore also
omitted.
Theorem 1.17. Fourier series properties.
The mapping f → yn is linear: if f → xn , g → yn , then

af + bg → axn + byn

For all n. Moreover, if f is real and periodic with period T , the following
properties hold:

1. yn = y−n for all n.


2. If f (t) = f (−t) (i.e. f is symmetric), then all yn are real, so that bn are
zero and the Fourier series is a cosine series.
3. If f (t) = −f (−t) (i.e. f is antisymmetric), then all yn are purely imaginary,
so that the an are zero and the Fourier series is a sine series.
4. If g(t) = f (t − d) (i.e. g is the function f delayed by d) and f → yn , then
g → e−2πind/T yn .
5. If g(t) = e2πidt/T f (t) with d an integer, and f → yn , then g → yn−d .
6. Let d be a number. If f → yn , then f (d + t) = f (d − t) for all t if and only
if the argument of yn is −2πnd/T for all n.

Proof. The proof of linearity is left to the reader. Property 1 follows immediately
by writing

Z T Z T
1 1
yn = f (t)e−2πint/T dt = f (t)e2πint/T dt
T 0 T 0
Z T
1
= f (t)e−2πi(−n)t/T dt = y−n .
T 0
CHAPTER 1. SOUND AND FOURIER SERIES 30

Also, if g(t) = f (−t), we have that

Z T Z T Z −T
1 −2πint/T 1 −2πint/T 1
g(t)e dt = f (−t)e dt = − f (t)e2πint/T dt
T 0 T 0 T 0
Z T
1
= f (t)e2πint/T dt = yn .
T 0

The first part of property 2 follows from this. The second part follows directly
by noting that

yn e2πint/T + y−n e−2πint/T = yn (e2πint/T + e−2πint/T ) = 2yn cos(2πnt/T ),

or by invoking Theorem 1.10. Property 3 is proved in a similar way. To prove


property 4, we observe that the Fourier coefficients of g(t) = f (t − d) are

Z T Z T
1 1
g(t)e−2πint/T dt = f (t − d)e−2πint/T dt
T 0 T 0
Z T
1
= f (t)e−2πin(t+d)/T dt
T 0
Z T
−2πind/T 1
=e f (t)e−2πint/T dt = e−2πind/T yn .
T 0

For property 5 we observe that the Fourier coefficients of g(t) = e2πidt/T f (t) are

Z T Z T
1 −2πint/T 1
g(t)e dt = e2πidt/T f (t)e−2πint/T dt
T 0 T 0
Z T
1
= f (t)e−2πi(n−d)t/T dt = yn−d .
T 0

If f (d + t) = f (d − t) for all t, we define the function g(t) = f (t + d) which is


symmetric about 0, so that it has real Fourier coefficients. But then the Fourier
coefficients of f (t) = g(t − d) are e−2πind/T times the (real) Fourier coefficients
of g by property 4. It follows that yn , the Fourier coefficients of f , has argument
−2πnd/T . The proof in the other direction follows by noting that any function
where the Fourier coefficients are real must be symmetric about 0, once the
Fourier series is known to converge. This proves property 6.
Let us analyze these properties, to see that they match the notion we already
have for frequencies and sound. We will say that two sounds “essentially are
the same” if the absolute values of each Fourier coefficient are equal. Note that
this does not mean that the sounds sound the same, it merely says that the
contributions at different frequencies are comparable.
CHAPTER 1. SOUND AND FOURIER SERIES 31

The first property says that the positive and negative frequencies in a (real)
sound essentially are the same. The second says that, when we play a sound
backwards, the frequency content is essentially the same. This is certainly the
case for all pure sounds. The third property says that, if we delay a sound, the
frequency content also is essentially the same. This also matches our intuition
on sound, since we think of the frequency representation as something which
is time-independent. The fourth property says that, if we multiply a sound
with a pure tone, the frequency representation is shifted (delayed), according
to the value of the frequency. This is something we see in early models for the
transmission of audio, where an audio signal is transmitted after having been
multiplied with what is called a ‘carrier wave‘. You can think of the carrier signal
as a pure tone. The result is a signal where the frequencies have been shifted
with the frequency of the carrier wave. The point of shifting the frequency of
the transmitted signal is to make it use a frequency range in which one knows
that other signals do not interfere. The last property looks a bit mysterious. We
will not have use for this property before the next chapter.
From Theorem 1.17 we also see that there exist several cases of duality
between a function and its Fourier series:

• Delaying a function corresponds to multiplying the Fourier coefficients


with a complex exponential. Vice versa, multiplying a function with a
complex exponential corresponds to delaying the Fourier coefficients.
• Symmetry/antisymmetry for a function corresponds to the Fourier coef-
ficients being real/purely imaginary. Vice versa, a function which is real
has Fourier coefficients which are conjugate symmetric.

Actually, one can show that these dualities are even stronger if we had considered
Fourier series of complex functions instead of real functions. We will not go into
this.

1.4.1 Rate of convergence for Fourier series


We have earlier mentioned criteria which guarantee that the Fourier series
converges. Another important topic is the rate of convergence, given that it
actually converges. If the series converges quickly, we may only need a few terms
in the Fourier series to obtain a reasonable approximation. We have already seen
examples which illustrate different convergence rates: The square wave seemed
to have very slow convergence rate near the discontinuities, while the triangle
wave did not seem to have the same problem.
Before discussing results concerning convergence rates we consider a simple
lemma which will turn out to be useful.

Lemma 1.18. The order of computing Fourier series and differentiation does
not matter.
Assume that f is differentiable. Then (fN )0 (t) = (f 0 )N (t). In other words,
the derivative of the Fourier series equals the Fourier series of the derivative.
CHAPTER 1. SOUND AND FOURIER SERIES 32

Proof. We first compute

Z T
1
hf, e 2πint/T
i= f (t)e−2πint/T dt
T 0
 T Z T !
1 T −2πint/T T 0 −2πint/T
= − f (t)e + f (t)e dt
T 2πin 0 2πin 0
T 1 T 0
Z
T
= f (t)e−2πint/T dt = hf 0 , e2πint/T i.
2πin T 0 2πin

where we used integration by parts, and that − 2πin T


f (t)e−2πint/T are periodic
with period T . It follows that hf, e2πint/T
i = 2πin hf 0 , e2πint/T i. From this we
T

get that

N
!0 N
0
X
2πint/T 2πint/T 2πin X
(fN ) (t) = hf, e ie = hf, e2πint/T ie2πint/T
T
n=−N n=−N
N
X
= hf 0 , e2πint/T ie2πint/T = (f 0 )N (t).
n=−N

where we substituted the connection between the inner products we just found.

1.4.2 Differentiating Fourier series


The connection between the Fourier series of the function and its derivative
can be used to simplify the computation of Fourier series for new functions.
Let us see how we can use this to compute the Fourier series of the triangle
wave, which was quite a tedious job in Example 1.13. However, the relationship
ft0 (t) = T4 fs (t) is straightforward to see from the plots of the square wave fs and
the triangle wave ft . From this relationship and from Equation (1.11) for the
Fourier series of the square wave it follows that

 
4 4 4 4
((ft )0 )N (t) = sin(2πt/T ) + sin(2π3t/T ) + sin(2π5t/T ) + · · · .
T π 3π 5π

If we integrate this we obtain

 
8 1 1
(ft )N (t) = − cos(2πt/T ) + cos(2π3t/T ) + 2 cos(2π5t/T ) + · · · + C.
π2 32 5

What remains is to find the integration constant C. This is simplest found if


we set t = T /4, since then all cosine terms are 0. Clearly then C = 0, and we
CHAPTER 1. SOUND AND FOURIER SERIES 33

arrive at the same expression as in Equation (1.12) for the Fourier series of the
triangle wave. This approach clearly had less computations involved. There
is a minor point here which we have not addressed: the triangle wave is not
differentiable at two points, as required by Lemma 1.18. It is, however, not too
difficult to see that this result still holds in cases where we have a finite number
of nondifferentiable points only.
We get the following corollary to Lemma 1.18:
Corollary 1.19. Connection between the Fourier coefficients of f (t) and f 0 (t).
If the complex Fourier coefficients of f are yn and f is differentiable, then
the Fourier coefficients of f 0 (t) are 2πin
T yn .

If we turn this around, we note that the Fourier coefficients of f (t) are
T /(2πin) times those of f 0 (t). If f is s times differentiable, we can repeat
s this
argument to show that the Fourier coefficients of f (t) are T /(2πin) times
those of f (s) (t). In other words, the Fourier coefficients of a function which is
many times differentiable decay to zero very fast.
Observation 1.20. Convergence speed of differentiable functions.
The Fourier series converges quickly when the function is many times differ-
entiable.
An illustration is found in examples 1.12 and 1.13, where we saw that the
Fourier series coefficients for the triangle wave converged more quickly to zero
than those of the square wave. This is explained by the fact that the square
wave is discontinuous, while the triangle wave is continuous with a discontinuous
first derivative. Also, the functions considered in examples 1.24 and 1.25 are not
continuous, which partially explain why we there saw contributions from many
frequencies.
The requirement of continuity in order to obtain quickly converging Fourier
series may seem like a small problem. However, often the function is not defined
on the whole real line: it is often only defined on the interval [0, T ). If we
extend this to a periodic function on the whole real line, by repeating one
period as shown in the left plot in Figure 1.10, there is no reason why the
new function should be continuous at the boundaries 0, T, 2T etc., even though
the function we started with may be continuous on [0, T ). This would require
that f (0) = limt→T f (t). If this does not hold, the function may not be well
approximated with trigonometric functions, due to a slowly convergence Fourier
series.
We can therefore ask ourselves the following question:
Idea 1.21. Continuous Extension.
Assume that f is continuous on [0, T ). Can we construct another periodic
function which agrees with f on [0, T ], and which is both continuous and periodic
(maybe with period different from T )?
If this is possible the Fourier series of the new function could produce better
approximations for f . It turns out that the following extension strategy does
the job:
CHAPTER 1. SOUND AND FOURIER SERIES 34

2.0 2.0
1.5 1.5
1.0 1.0
0.5 0.5
0.0 1 0 1 2 3 4 5 6 7 0.0 1 0 1 2 3 4 5 6 7
Figure 1.10: Two different extensions of f to a periodic function on the whole
real line. Periodic extension (left) and symmetric extension (right).

Definition 1.22. Symmetric extension of a function.


Let f be a function defined on [0, T ]. By the symmetric extension of f ,
denoted f˘, we mean the function defined on [0, 2T ] by
(
f (t), if 0 ≤ t ≤ T ;
f˘(t) =
f (2T − t), if T < t ≤ 2T .

Clearly the following holds:


Theorem 1.23. Continuous Extension.
If f is continuous on [0, T ], then f˘ is continuous on [0, 2T ], and f˘(0) = f˘(2T ).
If we extend f˘ to a periodic function on the whole real line (which we also
will denote by f˘), this function is continuous, agrees with f on [0, T ), and is a
symmetric function.
This also means that the Fourier series of f˘ is a cosine series, so that it is
determined by the cosine-coefficients an . The symmetric extension of f is shown
in the right plot in Figure 1.10. f˘ is symmetric since, for 0 ≤ t ≤ T ,

f˘(−t) = f˘(2T − t) = f (2T − (2T − t)) = f (t) = f˘(t).


In summary, we now have two possibilities for approximating a function f defined
only on [0, T ), where the latter addresses a shortcoming of the first:

• By the Fourier series of f


• By the Fourier series of f˘ restricted to [0, T ) (which actually is a cosine-
series)
CHAPTER 1. SOUND AND FOURIER SERIES 35

Example 1.35: Periodic extension


Let f be the function with period T defined by f (t) = 2t/T − 1 for 0 ≤ t < T .
In each period the function increases linearly from −1 to 1. Because f is
discontinuous at the boundaries, we would expect the Fourier series to converge
slowly. The Fourier series is a sine-series since f is antisymmetric, and we can
compute bn as

Z T   Z T 
2 2 T 4 T
bn = t− sin(2πnt/T )dt = 2 t− sin(2πnt/T )dt
T 0 T 2 T 0 2
T
2 T
Z Z
4 2
= t sin(2πnt/T )dt − sin(2πnt/T )dt = − ,
T2 0 T 0 πn

so that
N
X 2
fN (t) = − sin(2πnt/T ),
n=1

which indeed converges slowly to 0. Let us now instead consider the symmetric
extension of f . Clearly this is the triangle wave with period 2T , and the Fourier
series of this was
X 8
(f˘)N (t) = − cos(2πnt/(2T )).
n2 π 2
n≤N , n odd

The second series clearly converges faster than the first, since its Fourier coef-
ficients are an = −8/(n2 π 2 ) (with n odd), while the Fourier coefficients in the
first series are bn = −2/(nπ).
If we use T = 1/440, the symmetric extension has period 1/220, which gives
a triangle wave where the first term in the Fourier series has frequency 220Hz.
Listening to this we should hear something resembling a 220Hz pure tone, since
the first term in the Fourier series is the most dominating in the triangle wave.
Listening to the periodic extension we should hear a different sound. The first
term in the Fourier series has frequency 440Hz, but this drounds a bit in the
contribution of the other terms in the Fourier series, due to the slow convergence
of the Fourier series, just as for the square wave.
Let us plot the Fourier series with N = 7 terms for f . These are shown in
Figure 1.11.
It is clear from the plot that the Fourier series for f itself is not a very good
approximation, while we cannot differentiate between the Fourier series and the
function itself for the symmetric extension.

Exercise 1.36: Fourier series of a delayed square wave


Define the function f with period T on [−T /2, T /2) by
CHAPTER 1. SOUND AND FOURIER SERIES 36

1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.0 1.0
0.000 0.002 0.004 0.006 0.008 0.000 0.002 0.004 0.006 0.008
Figure 1.11: The Fourier series with N = 7 terms of the periodic (left) and
symmetric (right) extensions of the function in Example 1.35.

(
1, if −T /4 ≤ t < T /4;
f (t) =
−1, if T /4 ≤ |t| < T /2.
f is just the square wave, delayed with d = −T /4. Compute the Fourier
coefficients of f directly, and use Property 4 in Theorem 1.17 to verify your
result.

Exercise 1.37: Find function from its Fourier series


Find a function f which has the complex Fourier series
X 4
e2πint/T .
π(n + 4)
n odd

Hint. Attempt to use one of the properties in Theorem 1.17 on the Fourier
series of the square wave.

Exercise 1.38: Relation between complex Fourier coeffi-


cients of f and cosine-coefficients of f˘
Show that the complex Fourier coefficients yn of f , and the cosine-coefficients
an of f˘ are related by a2n = yn + y−n . This result is not enough to obtain the
entire Fourier series of f˘, but at least it gives us half of it.

1.5 Operations on sound: filters


It is easy to see how we can use Fourier coefficients to analyse or improve sound:
Noise in a sound often corresponds to the presence of some high frequencies with
large coefficients, and by removing these, we remove the noise. For example, we
could set all the coefficients except the first one to zero. This would change, as
CHAPTER 1. SOUND AND FOURIER SERIES 37

we have seen, the unpleasant square wave to the pure tone sin(2π440t). Doing
so is an example of an important operation on sound called a filter:
Definition 1.24. Analog filters.
An operation on sound is called a filter if it preserves the different frequencies
in the sound. In other words, s is a filter if, for any sound on the form f =
2πiνt
P
ν c(ν)e , the output s(f ) is a sound which can be written on the form
!
X X
2πiνt
s(f ) = s c(ν)e = c(ν)λs (ν)e2πiνt .
ν ν

λs (ν) is a function describing how s treats the different frequencies, and is also
called the frequency response of s.
By definition any pure tone is an eigenvector of s, with the frequency response
providing the eigenvalue. The notion of a filter makes sense for both periodic
and non-periodic input functions. The problem is, however, that aP function may
be an infinite sum of frequencies, for which a sum on the form ν c(ν)e2πiνt
may not converge.
This general definition of filters may not be useful in practice, bit if we
restrict to Fourier spaces, however, we restrict ourselves to finite sums. We then
clearly have that s(f ) ∈ VN,T whenever f ∈ VN,T , so that the computation can
be performed in finite dimensions. Let us now see how we can construct useful
such filters.
Theorem 1.25. Convolution kernels.
Assume that g is a bounded Riemann-integrable function with compact
support (i.e. that there exists an interval [a, b] so that g = 0 outside [a, b]). The
operation
Z ∞
f (t) → h(t) = g(u)f (t − u)du. (1.23)
−∞
R∞
is a filter. Also, the frequency response of the filter is λs (ν) = ∞
g(s)e−2πiνs ds.
The function g is also called the kernel of s.
Note that the requirement that g is bounded with compact support is just
made for convenience, to ensure that the integral exists. Many weaker conditions
can be put on g to ensure that the integral exists. In case of compact support
there exist constants a and b so that the filter takes the form f (t) → h(t) =
Rb
a
g(s)f (t − s)ds.
Proof. We compute

Z ∞ Z ∞
s(e 2πiνt
)= g(s)e 2πiν(t−s)
ds = g(s)e−2πiνs dse2πiνt = λs (f )e2πiνt ,
−∞ −∞

which shows that s is a filter with the stated frequency response.


CHAPTER 1. SOUND AND FOURIER SERIES 38

The function g is arbitrary, so that this strategy leads to a wide class of


analog filters. We may ask the question of whether the general analog filter
always has this form. We will not go further into this, although one can find
partially affirmative answers to this question.

Exercise 1.39: Filters preserve sine- and cosine-series


An analog filter where λs (ν) = λs (−ν) is also called a symmetric filter.
a) Prove that, if the input to a symmetric filter is a Fourier series which is a
cosine series/sine-series, then the output also is a cosine/sine series.
Ra
b) Show that s(f ) = −a g(s)f (t − s)ds is a symmetric filter whenever g is
symmetric around 0 and supported on [−a, a].
We saw that the symmetric extension of a function took the form of a cosine-
series, and that this converged faster to the symmetric extension than the Fourier
series did to the function. If a filter preserves cosine-series it will also preserve
symmetric extensions, and therefore also map fast-converging Fourier series to
fast-converging Fourier series.

1.6 Convergence of Fourier series*


A major topic in harmonic analysis is finding conditions on f which secure
convergence of its Fourier series. This turns out to be very difficult in general,
and depends highly on the mode of convergence. We will cover some important
result on this here in this section, which is a bit more technical than the remainder
of the book. We will consider both
• pointwise convergence, i.e. that fN (t) → f (t), and
• convergence in k · k, i.e. that kfN − f k → 0.
The latter unfortunately does not imply the first, which is harder to prove.
Although a general theorem about the pointwise convergence of the Fourier
series for square-integrable functions exists, this result is way too hard to prove
here. Instead we will restrict ourselves to the following class of functions, for
which it is possible to state a proof for both modes of convergence. This class
also contains most functions we encounter in the book, such as the square wave
and the triangle wave:
Definition 1.26. Piecewise continuous functions.
A T -periodic function is said to be piecewise continuous if there exists a finite
set of points

0 ≤ a0 < a1 < · · · an−1 < an < T


so that
1. f is continuous on each interval between adjacent such points,
CHAPTER 1. SOUND AND FOURIER SERIES 39

2. the one-sided limits f (a+


i ) := limt→a+ f (t) and f (a−
i ) := limt→a− f (t)
i i
exist, and
f (a+ )+f (a− )
3. f (ai ) = i
2
i
(i.e. the value at a "jump" is the average of the
one-sided limits).

For piecewise continuous functions, convergence in k · k for the Fourier series


will follow from the following theorem.
Theorem 1.27. Approximating piecewise continuous functions.
Let f be piecewise continuous. We can find a sequence SN ∈ VN,T , N ≥ 1 so
that SN (t) → f (t) for all t. Also, SN → f uniformly as N → ∞ on any interval
[a, b] where f is continuous.
The functions gN are found in a constructive way in the proof of this theorem,
but note that these are not the same as the Fourier series fN ! Therefore, the
theorem says nothing about the convergence of the Fourier series itself.
Proof. In the proof we will use the concept of a summability kernel, which is a
sequence of functions kN defined on [0, T ] so that
RT
• T1 0 kN (t)dt = 1,
RT
• there exists a constant C so that T1 0 |kN (t)|dt ≤ C for all N ,
R T −δ
• For all 0 < δ < T /2, limn→∞ T1 δ |kN (t)|dt = 0.
RT
Note that if kN is a trigonometric polynomial, then T1 0 kN (τ )f (t − τ )dτ is a
trigonometric polynomial of the same degree (Make the substitution u = t − τ
to verify this). In Exercise 1.42 you are aided through the construction of one
important such summability kernel, denoted FN and called the Fejer kernel.
The Fejer kernel has the following additional properties·
• FN is a trigonometric polynomial of degree N ,
T2
• 0 ≤ FN (t) ≤ 4(N +1)t2 .
RT
We now set SN (t) = T1 0 FN (u)f (t − u)du. Through the change of variables
RT
t = −u it easily follows that SN (t) = T1 0 FN (u)f (t + u)du as well. We can
thus write
Z T
1
SN (t) = (f (t + u) + f (t − u))FN (u)du.
2T 0
RT
Since also f (t) = T1 0 f (t)FN (u)du, it follows that

Z T
1
SN (t) − f (t) = (f (t + u) + f (t − u) − 2f (t))FN (u)du. (1.24)
2T 0
CHAPTER 1. SOUND AND FOURIER SERIES 40

We have written SN (t) − f (t) on this form since now the integrand is continuous
at u = 0 as a function of u: This is obvious if f is continuous at t. If on the other
hand t is one of the discontinuities, we have that f (t + u) + f (t − u) → 2f (t) as
u → 0. Given  > 0, we can therefore find a δ > 0 so that

|f (t + u) + f (t − u) − 2f (t)| < 
R T R −δ Rδ R T /2
whenever |u| < δ. Now, split the integral (1.24) in three: 0 = −T /2 + −δ + δ .
For the second of these we have

Z δ
1
|(f (t + u) + f (t − u) − 2f (t))FN (u)|du
2T −δ
Z δ Z T /2
1 1 
≤ Fn (u)du ≤ Fn (u)du = .
2T −δ 2T −T /2 2

For the third of these we have

Z T /2
1
|(f (t + u) + f (t − u) − 2f (t))FN (u)|du
2T δ
T /2
T2 kf k∞ T 2
Z
1
≤ 4kf k∞ du ≤ ,
2T δ 4(N + 1)u2 4(N + 1)δ 2

where kf k∞ = maxx∈[0,T ] |f (x)|. A similar calculation can be done for the first
integral. Clearly then we can choose N so big that the sum of the first and third
integrals are less than /2, and we then get that |SN (t) − f (t)| < . This shows
that SN (t) → f (t) as N → ∞ for any t. For the final statement, if [a, b] is an
interval where f is continuous, choose the δ above so small that [a − δ, b + δ] still
contains no discontinuities. Since continuous functions are uniformly continuous
on compact intervals, it is not too hard to see that the convergence of f to SN
on [a, b] is uniform. This completes the proof.
RT
Since SN (t) = T1 0 f (t−u)FN (u)du ∈ VN,T , and fN is a best approximation
from VN,T , we have that kfN − f k ≤ kSN − f k. If f is continuous, the result says
that kf − SN k∞ → 0, which implies that kf − SN k → 0, so that kf − fN k → 0.
Therefore, for f continuous, both kf − Sn k∞ → 0 and kf − fN k → 0 hold, so
that we have established both modes of convergence. If f has a discontinuity, it is
obvious that kf −SN k∞ → 0 can not hold, since SN is continuous. kf −fN k → 0
holds, however, even with discontinuities. The reason is that any function with
only a finite number of discontinuities can be approximated arbitrarily well with
continuous functions w.r.t. k · k. The proof of this is left as an exercise.
Both the square wave and the triangle are piecewise continuous (at least if
we redefined the value of the square wave at the discontinuity). Therefore both
their Fourier series converge to f in k · k. Since the triangle wave is continuous,
SN also converges uniformly to ft .
CHAPTER 1. SOUND AND FOURIER SERIES 41

The result above states that SN converges pointwise to f - it does not say
that fN converges pointwise to f . This suggests that SN may be better suited
to approximate f . In Figure 1.12 we have plotted SN (t) and fN (t) for the
square wave. Clearly the approximations are very different. The pointwise
convergence of fN (t) is more difficult to analyze, so we will make some additional
assumptions.

1.0 1.0

0.5 0.5

0.0 0.0

0.5 0.5

1.0 1.0
0.0000 0.0005 0.0010 0.0015 0.0020 0.0000 0.0005 0.0010 0.0015 0.0020

Figure 1.12: fN (t) and SN (t) for N = 20.

Theorem 1.28. Pointwise convergence of Fourier series.


Assume that f is piecewise continuous and that the one-sided limits

f (t + h) − f (t+ ) f (t + h) − f (t− )
D+ f (t) = lim+ D− f (t) = lim−
h→0 h h→0 h
exist. Then limN →∞ fN (t) = f (t).
Proof. In Exercise 1.41 we construct another kernel DN (t), called the Dirichlet
kernel. This satisfies only two of the properties of a summability kernel, but this
will turn out to be enough for our purposes due to the additional assumption on
the one-sided limits for the derivative. A formula similar to (1.24) can be easily
proved using the same substitution v = −u:

Z T
1
fN (t) − f (t) = (f (t + u) + f (t − u) − 2f (t))DN (u)du. (1.25)
2T 0

Substituting the expression for the Dirichlet kernel obtained in exercise 1.42, the
integrand can be written as

f (t + u) − f (t+ ) + f (t − u) − f (t− ))DN (u)


f (t + u) − f (t+ ) f (t − u) − f (t− )
 
= + sin(π(2N + 1)u/T )
sin(πu/T ) sin(πu/T )
= h(u) sin(π(2N + 1)u/T ).
CHAPTER 1. SOUND AND FOURIER SERIES 42

We have that

f (t + u) − f (t+ ) f (t + u) − f (t+ ) πu/T


=
sin(πu/T ) πu/T sin(πu/T )
(
T
D+ f (t) when u → 0+ ,
→ Tπ −
,
π D− f (t) when u → 0 ,


and similarly for f (t−u)−f (t )
sin(πu/T ) . It follows that the function h defined above is a
piecewise continuous function in u. The proof will be done if we can show that
RT
0
h(u) sin(π(2N + 1)u/T )dt → 0 as N → ∞ for any piecewise continuous h.
Since

sin(π(2N + 1)u/T ) = sin(2πN u/T ) cos(πu/T ) + cos(2πN u/T ) sin(πu/T ),

and since h(u) cos(πu/TR ) and h(u) sin(πu/T ) also are Rpiecewise continuous, it is
enough to show that h(u) sin(2πN u/T )du → 0 and h(u) cos(2πN u/T )du →
0. These are simply the order N Fourier coefficients of h. Since h is in particular
square integrable, it follows from Bessel’s inequality (Theorem 1.14) that the
Fourier coefficients of h go to zero, and the proof is done.
The requirement on the one-sided limits of the derivative above can be can be
replaced by less strict conditions. This gives rise to what is known as Dini’s test.
One can also replace with the less strict requirement that f has a finite number
of local minima and maxima. This is refered to as Dirichlets theorem, after
Dirichlet who proved it in 1829. There also exist much more general conditions
that secure pointwise convergence of the Fourier series. The most general results
require deep mathematical theory to prove.
Both the square wave and the triangle wave have one-sided limits for the
derivative. Therefore both their Fourier series converge to f pointwise.

1.6.1 Interpretation in terms of filters


It is instructive to interpret Theorem 1.27 and 1.28 in terms of filters. There
are filters at play here, and their kernels are the Fejer kernel and the Dirichlet
kernel. The kernels are shown in Figure 1.13
RT
For the Fejer kernel, we saw that SN (t) = T1 0 FN (u)f (t − u)du. So, if
sN is the filter with kernel FN , then SN = sN (f ). It is shown in exercise 1.42
PN
that SN (t) = N1+1 n=0 fn (t), also called the Cesaro mean of the Fourier series.
Since the N ’th order Fourier series of e2πint/T is 0 if |n| > |N |, and e2πint/T
if not, it follows that sN (e2πint/T ) = 1 − N|n| +1 e
2πint/T
. In other words, the
frequency  response for filtering with the Fejer kernel FN is given by the mapping
|n|
n/T → 1 − N +1 . On [−N/T, N/T ], this first increases linearly to 1, then
decreases linearly back to 0. Outside [−N/T, N/T ] we get zero.
CHAPTER 1. SOUND AND FOURIER SERIES 43

25 40

20 30

15 20

10 10

5 0

0
0.0015 0.0010 0.0005 0.0000 0.0005 0.0010 0.0015 10
0.0015 0.0010 0.0005 0.0000 0.0005 0.0010 0.0015

Figure 1.13: The Fejer and Dirichlet kernels for N = 20.

RT
For the Dirichlet kernel we saw that fN (t) = T1 0 DN (u)f (t − u)du. From
this it follows in the same way that the frequency response corresponding to
filtering with the Dirichlet kernel is given by the mapping n/T → 1, i.e. it is one
on [−N/T, N/T ] and 0 elsewhere.

Figure 1.14: The frequency responses for the filters with Fejer and Dirichlet
kernels, N = 20.

The two frequency responses are shown in Figure 1.14. Both filters above are
what is called lowpass filters: They annihilate high frequencies. More precisely,
if ν > |N/T |, then the frequency response of ν is zero. The lowest frequency
ν = 0 is treated in the same way by the two filters, but the higher frequencies are
differed: The Dirichlet kernel keeps them, while the Fejer kernel attenuates them,
i.e. does not include all the frequency content at the higher frequencies. That
filtering with the Fejer kernel gave something (SN (t)) with better convergence
properties can be interpreted as follows: We should be careful when we include
the contribution from the higher frequencies, as this may affect the convergence.
CHAPTER 1. SOUND AND FOURIER SERIES 44

Exercise 1.40: Approximation in norm with continuous


functions
Show that if f is a function with only a finite number of discontinuities, there
exists a continuous function g so that kf − gk < .

Exercise 1.41: The Dirichlet kernel


The Dirichlet kernel is defined as
N
X N
X
DN (t) = e2πint/T = 1 + 2 cos(2πnt/T ).
n=−N n=1

DN is clearly trigonometric, and of degree N .


sin(π(2N +1)t/T )
a) Show that DN (t) = sin(πt/T ) .
RT
b) Show that fN (t) = T1 0 f (t − u)DN (u)du. Proving that limN →∞ fN (t) =
RT
f (t) is thus equivalent to limN →∞ T1 0 f (t − u)DN (u)du = f (t).
c) Prove that DN (t) satisfies only two of the properties of a summability kernel.
d) Write a function which takes N and T as arguments, and plots DN (t) over
[−T /2, T /2].

Exercise 1.42: The Fejer summability kernel


The Fejer kernel is defined as
N  
X |n|
FN (t) = 1− e2πint/N .
N +1
n=−N

FN is clearly trigonometric, and of degree N .


 2
a) Show that FN (t) = N1+1 sin(π(N +1)t/T )
sin(πt/T ) , and conclude from this that
T2
0 ≤ FN (t) ≤ 4(N +1)t2 .

2
Hint. Use that π |u| ≤ | sin u| when u ∈ [−π/2, π/2].
b) Show that FN (t) satisfies the three properties of a summability kernel.
RT PN
c) Show that T1 0 f (t − u)FN (u)du = N1+1 n=0 fn .

1
PN
Hint. Show that FN (t) = N +1 n=0 Dn (t), and use Exercis 1.41 b).
d) Write a function which takes N and T as arguments, and plots FN (t) over
[−T /2, T /2].
CHAPTER 1. SOUND AND FOURIER SERIES 45

1.7 The MP3 standard


Digital audio first became commonly available when the CD was introduced in
the early 1980s. As the storage capacity and processing speeds of computers
increased, it became possible to transfer audio files to computers and both play
and manipulate the data, in ways such as in the previous section. However,
audio was represented by a large amount of data and an obvious challenge was
how to reduce the storage requirements. Lossless coding techniques like Huffman
and Lempel-Ziv coding were known and with these kinds of techniques the file
size could be reduced to about half of that required by the CD format. However,
by allowing the data to be altered a little bit it turned out that it was possible
to reduce the file size down to about ten percent of the CD format, without
much loss in quality. The MP3 audio format takes advantage of this.
MP3, or more precisely MPEG-1 Audio Layer 3, is part of an audio-visual
standard called MPEG. MPEG has evolved over the years, from MPEG-1 to
MPEG-2, and then to MPEG-4. The data on a DVD disc can be stored with
either MPEG-1 or MPEG-2, while the data on a bluray-disc can be stored
with either MPEG-2 or MPEG-4. MP3 was developed by Philips, CCETT
(Centre commun d’etudes de television et telecommunications), IRT (Institut fur
Rundfunktechnik) and Fraunhofer Society, and became an international standard
in 1991. Virtually all audio software and music players support this format.
MP3 is just a sound format. It leaves a substantial amount of freedom in the
encoder, so that different encoders can exploit properties of sound in various
ways, in order to alter the sound in removing inaudible components therein.
As a consequence there are many different MP3 encoders available, of varying
quality. In particular, an encoder which works well for higher bit rates (high
quality sound) may not work so well for lower bit rates.
With MP3, the sound is split into frequency bands, each band corresponding
to a particular frequency range. In the simplest model, 32 frequency bands are
used. A frequency analysis of the sound, based on what is called a psycho-acoustic
model, is the basis for further transformation of these bands. The psycho-acoustic
model computes the significance of each band for the human perception of the
sound. When we hear a sound, there is a mechanical stimulation of the ear
drum, and the amount of stimulus is directly related to the size of the sample
values of the digital sound. The movement of the ear drum is then converted to
electric impulses that travel to the brain where they are perceived as sound. The
perception process uses a transformation of the sound so that a steady oscillation
in air pressure is perceived as a sound with a fixed frequency. In this process
certain kinds of perturbations of the sound are hardly noticed by the brain, and
this is exploited in lossy audio compression.
More precisely, when the psycho-acoustic model is applied to the frequency
content resulting from our frequency analysis, scale factors and masking thresholds
are assigned for each band. The computed masking thresholds have to do with a
phenomenon called masking. A simple example of this is that a loud sound will
make a simultaneous low sound inaudible. For compression this means that if
certain frequencies of a signal are very prominent, most of the other frequencies
CHAPTER 1. SOUND AND FOURIER SERIES 46

can be removed, even when they are quite large. If the sounds are below the
masking threshold, it is simply omitted by the encoder, since the model says
that the sound should be inaudible.
Masking effects are just one example of what is called psycho-acoustic effects,
and all such effects can be taken into account in a psycho-acoustic model. Another
obvious such effect regards computing the scale factors: the human auditory
system can only perceive frequencies in the range 20 Hz - 20 000 Hz. An obvious
way to do compression is therefore to remove frequencies outside this range,
although there are indications that these frequencies may influence the listening
experience inaudibly. The computed scaling factors tell the encoder about the
precision to be used for each frequency band: If the model decides that one band
is very important for our perception of the sound, it assigns a big scale factor to
it, so that more effort is put into encoding it by the encoder (i.e. it uses more
bits to encode this band).
Using appropriate scale factors and masking thresholds provide compression,
since bits used to encode the sound are spent on parts important for our percep-
tion. Developing a useful psycho-acoustic model requires detailed knowledge of
human perception of sound. Different MP3 encoders use different such models,
so they may produce very different results, worse or better.
The information remaining after frequency analysis and using a psycho-
acoustic model is coded efficiently with (a variant of) Huffman coding. MP3
supports bit rates from 32 to 320 kb/s and the sampling rates 32, 44.1, and 48
kHz. The format also supports variable bit rates (the bit rate varies in different
parts of the file). An MP3 encoder also stores metadata about the sound, such
as the title of the audio piece, album and artist name and other relevant data.
MP3 too has evolved in the same way as MPEG, from MP1 to MP2, and to
MP3, each one more sophisticated than the other, providing better compression.
MP3 is not the latest development of audio coding in the MPEG family: AAC
(Advanced Audio Coding) is presented as the successor of MP3 by its principal
developer, Fraunhofer Society, and can achieve better quality than MP3 at the
same bit rate, particularly for bit rates below 192 kb/s. AAC became well
known in April 2003 when Apple introduced this format (at 128 kb/s) as the
standard format for their iTunes Music Store and iPod music players. AAC is
also supported by many other music players, including the most popular mobile
phones.
The technologies behind AAC and MP3 are very similar. AAC supports
more sample rates (from 8 kHz to 96 kHz) and up to 48 channels. AAC uses the
same transformation as MP3, but AAC processes 1 024 samples at a time. AAC
also uses much more sophisticated processing of frequencies above 16 kHz and
has a number of other enhancements over MP3. AAC, as MP3, uses Huffman
coding for efficient coding of the transformed values. Tests seem quite conclusive
that AAC is better than MP3 for low bit rates (typically below 192 kb/s), but
for higher rates it is not so easy to differentiate between the two formats. As
for MP3 (and the other formats mentioned here), the quality of an AAC file
depends crucially on the quality of the encoding program.
CHAPTER 1. SOUND AND FOURIER SERIES 47

There are a number of variants of AAC, in particular AAC Low Delay


(AAC-LD). This format was designed for use in two-way communication over a
network,
for example the internet. For this kind of application, the encoding (and
decoding) must be fast to avoid delays (a delay of at most 20 ms can be tolerated).

1.8 Summary
We defined digital sound, and demonstrated how we could perform simple
operations on digital sound such as adding noise, playing at different rates e.t.c..
Digital sound could be obtained by sampling continuous sounds.
We discussed the basic question of what is sound is, and concluded that
sound could be modeled as a sum of frequency components. If the function
was periodic we could define its Fourier series, which can be thought of as an
approximation scheme for periodic functions using finite-dimensional spaces of
trigonometric functions. We established the basic properties of Fourier series,
and some duality relationships between the function and its Fourier series. We
have also computed the Fourier series of the square wave and the triangle wave,
and we saw that we could speed up the convergence of the Fourier series by
instead considering the symmetric extension of the function.
We also discussed the MP3 standard for compression of sound, and its relation
to a psychoacoustic model which describes how the human auditory system
perceives sound. There exist a wide variety of documents on this standard. In
[33], an overview is given, which, although written in a signal processing friendly
language and representing most relevant theory such as for the psychoacoustic
model, does not dig into all the details.
we also defined analog filters, which were operations which operate on con-
tinuous sound, without any assumption on periodicity. In signal processing
literature one defines the Continuous-time Fourier transform, or CTFT. We will
not use this concept in this book. We have instead disguised this concept as the
frequency response of an analog filter. To be more precise: in the literature, the
CTFT of g is nothing but the frequency response of an analog filter with g as
convolution kernel.

What you should have learned in this chapter.


• Computer operations for reading, writing, and listening to sound.
• Construct sounds such as pure tones and the square wave, from mathe-
matical formulas.

• The inner product which we use for function spaces.


• Definition of the Fourier spaces, and the orthogonality of the Fourier basis.
• Fourier series approximations as best approximations.
CHAPTER 1. SOUND AND FOURIER SERIES 48

• Formulas for the Fourier coefficients.


• Using the computer to plot Fourier series, and comparing a sound with its
Fourier series.
• For symmetric/antisymmetric functions, Fourier series are actually co-
sine/sine series.
• The complex Fourier basis and its orthonormality.
• Simple Fourier series pairs.
• Certain properties of Fourier series, for instance how delay of a function or
multiplication with a complex exponential affect the Fourier coefficients.
• The convergence rate of a Fourier series depends on the regularity of the
function. How this motivates the symmetric extension of a function.
Chapter 2

Digital sound and Discrete


Fourier analysis

In Chapter 1 we saw how a periodic function can be decomposed into a lin-


ear combination of sines and cosines, or equivalently, a linear combination of
complex exponential functions. This kind of decomposition is, however, not
very convenient from a computational point of view. The coefficients are given
by integrals that in most cases cannot be evaluated exactly, so some kind of
numerical integration technique needs to be applied. In this chapter we will
decompose vectors in terms of linear combinations of complex exponentials. As
before it turns out that this is simplest when we assume that the values in
the vector repeat periodically. Then a vector of finite dimension can be used
to represent all sound values, and a transformation to the frequency domain,
where operations which change the sound can easily be made, simply amounts
to multiplying the vector by a matrix. This transformation is called the Discrete
Fourier transform, and we will see how we can implement this efficiently. It
turns out that these algorithms can also be used for computing approximations
to the Fourier series, and for sampling a sound in order to create a vector of
sound data.

2.1 Discrete Fourier analysis and the discrete


Fourier transform
In this section we will parallel the developments we did for Fourier series,
assuming instead that vectors (rather than functions) are involved. As with
Fourier series we will assume that the vector is periodic. This means that we
can represent it with the values from only the first period. In the following we
will only work with these values, but we will remind ourselves from time to time
that the values actually come from a periodic vector. As for functions, we will
call denote the periodic vector as the periodic extension of the finite vector. To

49
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS50

illustrate this, we have in Figure 2.1 shown a vector x and its periodic extension
x.
2.0 2.0
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0 10 20 30 40 0.0 0 10 20 30 40
Figure 2.1: A vector and its periodic extension.

At the outset our vectors will have real components, but since we use complex
exponentials we must be able to work with complex vectors also. We therefore
first need to define the standard inner product and norm for complex vectors.
Definition 2.1. Euclidean inner product.
For complex vectors of length N the Euclidean inner product is given by
N
X −1
hx, yi = xk yk . (2.1)
k=0

The associated norm is


v
uN −1
uX
kxk = t |xk |2 . (2.2)
k=0

In the previous chapter we saw that, using a Fourier series, a function with
period T could be approximated by linear combinations of the functions (the
pure tones) {e2πint/T }N
n=0 . This can be generalized to vectors (digital sounds),
but then the pure tones must of course also be vectors.
Definition 2.2. Discrete Fourier analysis.
In Discrete Fourier analysis, a vector x = (x0 , . . . , xN −1 ) is represented as a
linear combination of the N vectors
1  2πin/N 2πi2n/N 
φn = √ 1, e ,e , . . . , e2πikn/N , . . . , e2πin(N −1)/N .
N
These vectors are called the normalised complex exponentials, or the pure
digital tones of order N . n is also called frequency index. The whole collection
N −1
FN = {φn }n=0 is called the N -point Fourier basis.
Note that pure digital tones can be considered as √ samples of a pure tone,
taken uniformly over one period: If f (t) = e2πint/T / N is the pure tone with
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS51
√ √
frequency n/T , then f (kT /N ) = e2πin(kT /N )/T / N = e2πink/N / N = φn .
When mapping a pure tone to a digital pure tone, the index n corresponds to
frequency ν = n/T , and N the number of samples takes over one period. Since
T fs = N , where fs is the sampling frequency, we have the following connection
between frequency and frequency index:

nfs νN
ν= and n = (2.3)
N fs
The following lemma shows that the vectors in the Fourier basis are orthonor-
mal, so they do indeed form a basis.
Lemma 2.3. Complex exponentials are an orthonormal basis.
−1
The normalized complex exponentials {φn }N
n=0 of order N form an orthonor-
N
mal basis in R .

Proof. Let n1 and n2 be two distinct integers in the range [0, N − 1]. The inner
product of φn1 and φn2 is then given by

1 2πin1 k/N 2πin2 k/N


hφn1 , φn2 i = he ,e i
N
N −1
1 X 2πin1 k/N −2πin2 k/N
= e e
N
k=0
N −1
1 X
= e2πi(n1 −n2 )k/N
N
k=0
1 1 − e2πi(n1 −n2 )
=
N 1 − e2πi(n1 −n2 )/N
= 0.

In particular, this orthogonality means that the the complex exponentials form
a basis. Clearly also hφn , φn i = 1, so that the N -point Fourier basis is in fact
an orthonormal basis.
Note that the normalizing factor √1N was not present for pure tones in the
previous chapter. Also, the normalizing factor T1 from the last chapter is not part
of the definition of the inner product in this chapter. These are small differences
which have to do with slightly different notation for functions and vectors, and
which will not cause confusion in what follows.
The focus in Discrete Fourier analysis is to change coordinates from the
standard basis to the Fourier basis, performing some operations on this “Fourier
representation”, and then change coordinates back to the standard basis. Such
operations are of crucial importance, and in this section we study some of their
basic properties. We start with the following definition.
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS52

Definition 2.4. Discrete Fourier Transform.


We will denote the change of coordinates matrix from the standard basis of
RN to the Fourier basis FN by FN . We will also call this the (N -point) Fourier
matrix. √
The matrix N FN is also called the (N -point) discrete Fourier transform,
or DFT. If x is a vector in RN , then y = DFTx are called the DFT coefficients

of x. (the DFT coefficients are thus the coordinates in FN , scaled with N ).
DFTx is sometimes written as x̂.
Note that we define the Fourier matrix and the DFT as two different matrices,
the one being a scaled version of the other. The reason for this is that there are
different traditions in different fields. In pure mathematics, the Fourier matrix
is mostly used since it is, as we wil see, a unitary matrix. In signal processing,
the scaled version provided by the DFT is mostly used. We will normally write
x for the given vector in RN , and y for its DFT. In applied fields, the Fourier
basis vectors are also called synthesis vectors, since they can be used used to
“synthesize” the vector x, with weights provided by the coordinates in the Fourier
basis. To be more precise, we have that the change of coordinates performed by
the Fourier matrix can be written as

φ1 · · · φN −1 y = FN−1 y, (2.4)

x = y0 φ0 + y1 φ1 + · · · + yN −1 φN −1 = φ0

where we have used the inverse of the defining relation y = FN x, and that the
φn are the columns in FN−1 (this follows from the fact that FN−1 is the change of
coordinates matrix from the Fourier basis to the standard basis, and the Fourier
basis vectors are clearly the columns in this matrix). Equation (2.4) is also called
the synthesis equation.
Let us find an expression for the matrix FN . From Lemma 2.3 we know that
the columns of FN−1 are orthonormal. If the matrix was real, it would have been
called orthogonal, and the inverse matrix could have been obtained by transposing.
FN−1 is complex, however, and it is easy to see that the conjugation present in
the definition of the inner product (2.1), implies that the inverse of FN can be
obtained if we also conjugate, in addition to transpose, i.e. (FN )−1 = (FN )T .
We call (A)T the conjugate transpose of A, and denote this by AH . We thus
have that (FN )−1 = (FN )H . Matrices which satisfy A = AH are called unitary.
For complex matrices, this is the parallel to orthogonal matrices.
Theorem 2.5. Fourier matrix is unitary.
The Fourier matrix FN is the unitary N × N -matrix with entries given by
1
(FN )nk = √ e−2πink/N ,
N
for 0 ≤ n, k ≤ N − 1.
Since the Fourier matrix is easily inverted, the DFT is also easily inverted.
Note that, since (FN )T = FN , we have that (FN )−1 = FN . Let us make the
following definition.
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS53

Definition 2.6. IDFT.√ √


The matrix FN / N is the inverse of the matrix DFT = N FN . We call
this inverse matrix the inverse discrete Fourier transform, or IDFT.
We can thus also view the IDFT as a change of coordinates (this time from
the√Fourier basis to the standard basis), with a scaling of the coordinates by
1/ N at the end. The IDFT is often called the reverse DFT. Similarly, the
DFT is often called the forward DFT.
That y = DFTx and x = IDFTy can also be expressed in component form
as

N −1 N −1
X 1 X
yn = xk e−2πink/N xk = yn e2πink/N (2.5)
N n=0
k=0

In applied fields such as signal processing, it is more common to state the DFT
and IDFT in these component forms, rather than in the matrix forms y = DFTy
and x = IDFTy.
Let us now see how these formulas work out in practice by considering some
examples.

Example 2.1: DFT of a cosine


Let x be the vector of length N defined by xk = cos(2π5k/N ), and y the vector
of length N defined by yk = sin(2π7k/N ). Let us see how we can compute
FN (2x + 3y). By the definition of the Fourier matrix as a change of coordinates,
FN (φn ) = en . We therefore get

FN (2x + 3y) = FN (2 cos(2π5 · /N ) + 3 sin(2π7 · /N ))


1 1
= FN (2 (e2πi5·/N + e−2πi5·/N ) + 3 (e2πi7·/N − e−2πi7·/N ))
2 2i
√ √ 3i √
= FN ( N φ5 + N φN −5 − N (φ7 − φN −7 ))
2
√ 3i 3i
= N (FN (φ5 ) + FN (φN −5 ) − FN φ7 + FN φN −7 )
2 2
√ √ 3i √ 3i √
= N e5 + N eN −5 − N e7 + N eN −7 .
2 2

Example 2.2: DFT on a square wave


Let us attempt to apply the DFT to a signal x which is 1 on indices close to 0,
and 0 elsewhere. Assume that

x−L = . . . = x−1 = x0 = x1 = . . . = xL = 1,
while all other values are 0. This is similar to a square wave, with some
modifications: First of all we assume symmetry around 0, while the square wave
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS54

of Example 1.4 assumes antisymmetry around 0. Secondly the values of the


square wave are now 0 and 1, contrary to −1 and 1 before. Finally, we have a
different proportion of where the two values are assumed. Nevertheless, we will
also refer to the current digital sound as a square wave.
Since indices with the DFT are between 0 an N − 1, and since x is assumed to
have period N , the indices [−L, L] where our signal is 1 translates to the indices
[0, L] and [N − L, N − 1] (i.e., it is 1 on the first and last parts of the vector).
PN −1 P−1
Elsewhere our signal is zero. Since k=N −L e−2πink/N = k=−L e−2πink/N
(since e−2πink/N is periodic with period N ), the DFT of x is
L
X N
X −1 L
X −1
X
yn = e−2πink/N + e−2πink/N = e−2πink/N + e−2πink/N
k=0 k=N −L k=0 k=−L
L
X 1 − e−2πin(2L+1)/N
= e−2πink/N = e2πinL/N
k=−L
1 − e−2πin/N
eπin(2L+1)/N − e−πin(2L+1)/N
= e2πinL/N e−πin(2L+1)/N eπin/N
eπin/N − e−πin/N
sin(πn(2L + 1)/N )
= .
sin(πn/N )
This computation does in fact also give us the IDFT of the same vector, since
the IDFT just requires a change of sign in all the exponents, in addition to the
1/N normalizing factor. From this example we see that, in order to represent
x in terms of frequency components, all components are actually needed. The
situation would have been easier if only a few frequencies were needed.

Example 2.3: Computing the DFT by hand


In most cases it is difficult to compute a DFT by hand, due to the entries
e−2πink/N in the matrices, which typically can not be represented exactly. The
DFT is therefore usually calculated on a computer only. However, in the case
N = 4 the calculations are quite simple. In this case the Fourier matrix takes
the form
 
1 1 1 1
1 −i −1 i 
DFT4 =  1 −1 1 −1 .

1 i −1 −i
We now can compute the DFT of a vector like (1, 2, 3, 4)T simply as

     
1 1+2+3+4 10
2 1 − 2i − 3 + 4i −2 + 2i
3 =  1 − 2 + 3 − 4  =  −2  .
DFT4      

4 1 + 2i − 3 − 4i −2 − 2i
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS55

In general, computing the DFT implies using floating point multiplication. For
N = 4, however, we see that there is no need for floating point multiplication at
all, since DFT4 has unit entries which are either real or purely imaginary.

Example 2.4: Direct implementation of the DFT


The DFT can be implemented very simply and directly by the code

def DFTImpl(x):
y = zeros_like(x).astype(complex)
N = len(x)
for n in xrange(N):
D = exp(-2*pi*n*1j*arange(float(N))/N)
y[n] = dot(D, x)
return y

In Exercise 2.13 we will extend this to a general implementation we will use


later. Note that we do not allocate the entire matrix FN in this code, as this
quickly leads to out of memory situations, even for N of moderate size. Instead
we construct one row of FN at a time, and use use this to compute one entry
in the output. The method dot can be used here, since each entry in matrix
multiplication can be viewed as an inner product. It is likely that the dot
function is more efficient than using a for-loop, since Python may have an
optimized way for computing this. Note that dot in Python does not conjugate
any of the components, contrary to what we do in our definition of a complex
inner product. This can be rewritten to a direct implementation of the IDFT
also. We will look at this in the exercises, where we also make the method more
general, so that the DFT can be applied to a series of vectors at a time (it can
then be applied to all the channels in a sound in one call). Multiplying a full
N × N matrix by a vector requires roughly N 2 arithmetic operations. The DFT
algorithm above will therefore take a long time when N becomes moderately
large. It turns out that a much more efficient algorithm exists for computing the
DFT, which we will study at the end of this chapter. Python also has a built-in
implementation of the DFT which uses such an efficient algorithm.

2.1.1 Properties of the DFT


The DFT has properties which are very similar to those of Fourier series, as they
were listed in Theorem 1.17. The following theorem sums this up:
Theorem 2.7. Properties of the DFT.
Let x be a real vector of length N . The DFT has the following properties:

x)N −n = (b
1. (b x)n for 0 ≤ n ≤ N − 1.
2. If xk = xN −k for all n (so x is symmetric), then x
b is a real vector.

3. If xk = −xN −k for all k (so x is antisymmetric), then x


b is a purely
imaginary vector.
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS56

4. If d is an integer and z is the vector with components zk = xk−d (the


z )n = e−2πidn/N (b
vector x with its elements delayed by d), then (b x)n .
5. If d is an integer and z is the vector with components zk = e2πidk/N xk ,
then (bz )n = (b
x)n−d .

Proof. The methods used in the proof are very similar to those used in the proof
of Theorem 1.17. From the definition of the DFT we have

N
X −1 N
X −1 N
X −1
x)N −n =
(b e−2πik(N −n)/N xk = e2πikn/N xk = e−2πikn/N xk = (b
x)n
k=0 k=0 k=0

which proves property 1.


To prove property 2, we write

N
X −1 N
X −1 N
X
z )n =
(b zk e−2πikn/N = xN −k e−2πikn/N = xu e−2πi(N −u)n/N
k=0 k=0 u=1
N
X −1 N
X −1
= xu e2πiun/N = xu e−2πiun/N = (b
x)n .
u=0 u=0

If x is symmetric it follows that z = x, so that (b


x)n = (b
x)n . Therefore x must
be real. The case of antisymmetry in property 3 follows similarly.
To prove property 4 we observe that

N
X −1 N
X −1
z )n =
(b xk−d e−2πikn/N = xk e−2πi(k+d)n/N
k=0 k=0
N
X −1
= e−2πidn/N xk e−2πikn/N = e−2πidn/N (b
x)n .
k=0

For the proof of property 5 we note that the DFT of z is


N
X −1 N
X −1
z )n =
(b e2πidk/N xn e−2πikn/N = xn e−2πi(n−d)k/N = (b
x)n−d .
k=0 k=0

This completes the proof.


These properties have similar interpretations as the ones listed in Theo-
rem 1.17 for Fourier series. Property 1 says that we need to store only about one
half of the DFT coefficients, since the remaining coefficients can be obtained by
conjugation. In particular, when N is even, we only need to store y0 , y1 , . . . , yN/2 .
This also means that, if we plot the (absolute value) of the DFT of a real vector,
we will see a symmetry around the index n = N/2. The theorem generalizes the
properties from Theorem 1.17, except for the last property where the signal had
a point of symmetry. We will delay the generalization of this property to later.
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS57

Example 2.5: Computing the DFT when multiplying with


a complex exponential
To see how we can use the fourth property of Theorem 2.7, consider a vector
x = (x0 , x1 , x2 , x3 , x4 , x5 , x6 , x7 ) with length N = 8, and assume that x is
so that F8 (x) = (1, 2, 3, 4, 5, 6, 7, 8). Consider the vector z with components
zk = e2πi2k/8 xk . Let us compute F8 (z). Since multiplication of x with e2πikd/N
delays the output y = FN (x) with d elements, setting d = 2, the F8 (z) can be
obtained by delaying F8 (x) by two elements, so that F8 (z) = (7, 8, 1, 2, 3, 4, 5, 6).
It is straightforward to compute this directly also:

N
X −1 N
X −1
(FN z)n = zk e−2πikn/N = e2πi2k/N xk e−2πikn/N
k=0 k=0
N
X −1
= xk e−2πik(n−2)/N = (FN (x))n−2 .
k=0

Exercise 2.6: Computing the DFT by hand


Compute F4 x when x = (2, 3, 4, 5).

Exercise 2.7: Exact form of low-order DFT matrix


As in Example 2.3, state the exact cartesian form of the Fourier matrix for the
cases N = 6, N = 8, and N = 12.

Exercise 2.8: DFT of a delayed vector


We have a real vector x with length N , and define the vector z by delaying
all elements in x with 5 cyclically, i.e. z5 = x0 , z6 = x1 ,. . . ,zN −1 = xN −6 ,
and z0 = xN −5 ,. . . ,z4 = xN −1 . For a given n, if |(FN x)n | = 2, what is then
|(FN z)n |? Justify the answer.

Exercise 2.9: Using symmetry property


Given a real vector x of length 8 where (F8 (x))2 = 2 − i, what is (F8 (x))6 ?

Exercise 2.10: DFT of cos2 (2πk/N )


Let x be the vector of length N where xk = cos2 (2πk/N ). What is then FN x?

Exercise 2.11: DFT of ck x


Let x be the vector with entries xk = ck . Show that the DFT of x is given by
the vector with components
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS58

1 − cN
yn =
1 − ce−2πin/N
for n = 0, . . . , N − 1.

Exercise 2.12: Rewrite a complex DFT as real DFT’s


If x is complex, Write the DFT in terms of the DFT on real sequences.

Hint. Split into real and imaginary parts, and use linearity of the DFT.

Exercise 2.13: DFT implementation


Extend the code for the function DFTImpl in Example 2.4 so that

• The function also takes a second parameter called forward. If this is true
the DFT is applied. If it is false, the IDFT is applied. If this parameter is
not present, then the forward transform should be assumed.
• If the input x is two-dimensional (i.e. a matrix), the DFT/IDFT should be
applied to each column of x. This ensures that, in the case of sound, the
FFT is applied to each channel in the sound when the enrire sound is used
as input, as we are used to when applying different operations to sound.

Also, write documentation for the code.

Exercise 2.14: Symmetry


Assume that N is even.
a) Show that, if xk+N/2 = xk for all 0 ≤ k < N/2, then yn = 0 when n is odd.
b) Show that, if xk+N/2 = −xk for all 0 ≤ k < N/2, then yn = 0 when n is
even.
c) Show also the converse statements in a) and b).
d) Also show the following:

• xn = 0 for all odd n if and only if yk+N/2 = yk for all 0 ≤ k < N/2.

• xn = 0 for all even n if and only if yk+N/2 = −yk for all 0 ≤ k < N/2.
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS59

Exercise 2.15: DFT on complex and real data


Let x1 , x2 be real vectors, and set x = x1 + ix2 . Use Theorem 2.7 to show that

1 
(FN (x1 ))k = (FN (x))k + (FN (x))N −k
2
1  
(FN (x2 ))k = (FN (x))k − (FN (x))N −k
2i
This shows that we can compute two DFT’s on real data from one DFT on
complex data, and 2N extra additions.

2.2 Connection between the DFT and Fourier


series. Sampling and the sampling theorem
So far we have focused on the DFT as a tool to rewrite a vector in terms of the
Fourier basis vectors. In practice, the given vector x will often be sampled from
some real data given by a function f (t). We may then compare the frequency
content of x and f , and ask how they are related: What is the relationship
between the Fourier coefficients of f and the DFT-coefficients of x?
In order to study this, assume for simplicity that f ∈ VM,T for some M . This
means that f equals its Fourier approximation fM ,

M Z T
X 1
f (t) = fM (t) = zn e2πint/T , where zn = f (t)e−2πint/T dt. (2.6)
T 0
n=−M

We here have changed our notation for the Fourier coefficients from yn to zn , in
order not to confuse them with the DFT coefficients. We recall that in order to
represent the frequency n/T fully, we need the corresponding exponentials with
both positive and negative arguments, i.e., both e2πint/T and e−2πint/T .
Fact 2.8. frequency vs. Fourier coefficients.
Suppose f is given by its Fourier series (2.6). Then the total frequency
content for the frequency n/T is given by the two coefficients zn and z−n .

We have the following connection between the Fourier coefficients of f and


the DFT of the samples of f .
Proposition 2.9. Relation between Fourier coefficients and DFT coefficients.
−1
Let N > 2M , f ∈ VM,T , and let x = {f (kT /N )}N k=0 be N uniform samples
from f over [0, T ]. The Fourier coefficients zn of f can be computed from

1
(z0 , z1 , . . . , zM , 0, . . . , 0 , z−M , z−M +1 , . . . , z−1 ) = DFTN x. (2.7)
| {z } N
N −(2M +1)
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS60

In particular, the total contribution in f from frequency n/T , for 0 ≤ n ≤ M , is


given by yn and yN −n , where y is the DFT of x.
Proof. Let x and y be as defined, so that
N −1
1 X
xk = yn e2πink/N . (2.8)
N n=0

Inserting the sample points t = kT /N into the Fourier series, we must have that

M
X −1
X M
X
xk = f (kT /N ) = zn e2πink/N = zn e2πink/N + zn e2πink/N
n=−M n=−M n=0
N
X −1 M
X
= zn−N e2πi(n−N )k/N + zn e2πink/N
n=N −M n=0
M
X N
X −1
= zn e2πink/N + zn−N e2πink/N .
n=0 n=N −M

This states that x = N IDFTN (z0 , z1 , . . . , zM , 0, . . . , 0 , z−M , z−M +1 , . . . , z−1 ).


| {z }
N −(2M +1)
Equation (2.7) follows by applying the DFT to both sides. We also see that
zn = yn /N and z−n = y2M +1−n /N = yN −n /N , when y is the DFT of x. It now
also follows immediately that the frequency content in f for the frequency n/T
is given by yn and yN −n . This completes the proof.

In Proposition 2.9 we take N samples over [0, T ], i.e. we sample at rate


fs = N/T samples per second. When |n| ≤ M , a pure sound with frequency
ν = n/T is then seen to correspond to the DFT indices n and N − n. Since
T = N/fs , ν = n/T can also be written as ν = nfs /N . Moreover, the highest
frequencies in Proposition 2.9 are those close to ν = M/T , which correspond to
DFT indices close to N − M and M , which are the nonzero frequencies closest
to N/2. DFT index N/2 corresponds to the frequency N/(2T ) = fs /2, which
corresponds to the highest frequency we can reconstruct from samples for any
M . Similarly, the lowest frequencies are those close to ν = 0, which correspond
to DFT indices close to 0 and N . Let us summarize this as follows.
Observation 2.10. Connection between DFT index and frequency.
Assume that x are N samples of a sound taken at sampling rate fs samples
per second, and let y be the DFT of x. Then the DFT indices n and N − n
give the frequency contribution at frequency ν = nfs /N . Moreover, the low
frequencies in x correspond to the yn with n near 0 and N , while the high
frequencies in x correspond to the yn with n near N/2.

The theorem says that any f ∈ VM,T can be reconstructed from its samples
(since we can write down its Fourier series), as long as N > 2M . That f ∈ VM,T
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS61

1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.00.0 0.2 0.4 0.6 0.8 1.0 1.00.0 0.2 0.4 0.6 0.8 1.0
Figure 2.2: An example on how the samples are picked from an underlying
continuous time function (left), and the samples on their own (right).

is important. From Figure 2.2 it is clear that information is lost in the right plot
when we discard everything but the sample values from the left plot.
Here the function is f (t) = sin(2π8t) ∈ V8,1 , so that we need to choose N
so that N > 2M = 16 samples. Here N = 23 samples were taken, so that
reconstruction from the samples is possible. That the condition N < 2M is also
necessary can easily be observed in Figure 2.3.
1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.00.0 0.2 0.4 0.6 0.8 1.0 1.00.0 0.2 0.4 0.6 0.8 1.0
Figure 2.3: Sampling sin(2πt) with two points (left), and sampling sin(2π4t)
with eight points (right).

Right we have plotted sin(2π4t) ∈ V4,1 , with N = 8 sample points taken


uniformly from [0, 1]. Here M = 4, so that we require 2M + 1 = 9 sample points,
according to Proposition 2.9. Clearly there is an infinite number of possible
functions in VM,T passing through the sample points (which are all zero): Any
f (t) = c sin(2π4t) will do. Left we consider one period of sin(2πt). Since this is
in VM,T = V1,1 , reconstruction should be possible if we have N ≥ 2M + 1 = 3
samples. Four sample points, as seen left, is thus be enough to secure reconstruct.
The special case N = 2M + 1 is interesting. No zeros are then inserted in
the vector in Equation (2.7). Since the DFT is one-to-one, this means that there
is a one-to-one correspondence between sample values and functions in VM,T
(i.e. Fourier series), i.e. we can always find a unique interpolant in VM,T from
N = 2M + 1 samples. In Exercise 2.21 you will asked to write code where you
start with a given function f , Take N = 2M +1 samples, and plot the interpolant
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS62

from VM,T against f . Increasing M should give an interpolant which is a better


approximation to f , and if f itself resides in some VM,T for some M , we should
obtain equality when we choose M big enough. We have in elementary calculus
courses seen how to determine a polynomial of degree N − 1 that interpolates a
set of N data points, and such polynomials are called interpolating polynomials.
In mathematics many other classes than polynomials exist which are also useful
for interpolation, and the Fourier basis is just one example.
Besides reconstructing a function from its samples, Proposition 2.9 also
enables us to approximate functions in a simple way. To elaborate on this, recall
that the Fourier series approximation fM is a best approximation to f from
VM,T . We usually can’t compute fM exactly, however, since this requires us to
compute the Fourier integrals. We could instead form the samples x of f , and
apply Proposition 2.9. If M is high, fM is a good approximation to f , so that
the samples of fM are a good approximation to x. By continuity of the DFT, it
follows that y = DFTN x is a good approximation to the DFT of the samples of
fM , so that
N
X −1
f˜(t) = yn e2πint/T (2.9)
n=0
is a good approximation to fM , and therefore also to f . We have illustrated this
in Figure 2.4.

f / f˜
O

 DFTN
/y
x

Figure 2.4: How we can interpolate f from VM,T with help of the DFT. The
left vertical arrow represents sampling. The right vertical arrow represents
interpolation, i.e. computing Equation (2.9).

The new function f˜ has the same values as f in the sample points. This is
usually not the case for fM , so that f˜ and fM are different approximations to f .
Let us summarize as follows.
Idea 2.11. f˜ as approximation to f .
The function f˜ resulting from sampling, taking the DFT, and interpolation, as
shown in Figure 2.4, also gives an approximation to f . f˜ is a worse approximation
in the mean square sense (since fM is the best such), but it is much more useful
since it avoids evaluation of the Fourier integrals, depends only on the samples,
and is easily computed.
The condition N > 2M in Proposition 2.9 can also be written as N/T >
2M/T . The left side is now the sampling rate fs , while the right side is the
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS63

double of the highest frequency in f . The result can therefore also be restated
as follows
Proposition 2.12. Reconstruction from samples.
Any f ∈ VM,T can be reconstructed uniquely from a uniform set of samples
−1
{f (kT /N )}N
k=0 , as long as fs > 2|ν|, where ν denotes the highest frequency in
f.
We also refer to fs = 2|ν| as the critical sampling rate, since it is the
minimum sampling rate we need in order to reconstruct f from its samples. If
fs is substantially larger than 2|ν| we say that f is oversampled, since we have
takes more samples than we really need. Similarly we say that f is undersampled
if fs is smaller than 2|ν|, since we have not taken enough samples in order to
reconstruct f . Clearly Proposition 2.9 gives one formula for the reconstruction.
In the literature another formula can be found, which we now will deduce. This
alternative version of Theorem 2.9 is also called the sampling theorem. We start
by substituting N = T /Ts (i.e. T = N Ts , with Ts being the sampling period) in
the Fourier series for f :

M
X
f (kTs ) = zn e2πink/N −M ≤ k ≤ M.
n=−M

Equation (2.7) said that the Fourier coefficients could be found from the samples
from
1
(z0 , z1 , . . . , zM , 0, . . . , 0 , z−M , z−M +1 , . . . , z−1 ) = DFTN x.
| {z } N
N −(2M +1)

By delaying the n index with −M , this can also be written as

N −1 M
1 X 1 X
zn = f (kTs )e−2πink/N = f (kTs )e−2πink/N , − M ≤ n ≤ M.
N N
k=0 k=−M

Inserting this in the reconstruction formula we get


CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS64

M M
1 X X
f (t) = f (kTs )e−2πink/N e2πint/T
N
n=−M k=−M
M M
!
X 1 X
2πin(t/T −k/N )
= f (kTs )e
N
k=−M n=−M
M
X 1 −2πiM (t/T −k/N ) 1 − e2πi(2M +1)(t/T −k/N )
= e f (kTs )
k=−M
N 1 − e2πi(t/T −k/N )
M
X 1 sin(π(t − kTs )/Ts )
= f (kTs )
N sin(π(t − kTs )/T )
k=−M

Let us summarize our findings as follows:

Theorem 2.13. Sampling theorem and the ideal interpolation formula for peri-
odic functions.
Let f be a periodic function with period T , and assume that f has no
frequencies higher than νHz. Then f can be reconstructed exactly from its
samples f (−M Ts ), . . . , f (M Ts ) (where Ts is the sampling period, N = TTs is the
number of samples per period, and M = 2N + 1) when the sampling rate fs = T1s
is bigger than 2ν. Moreover, the reconstruction can be performed through the
formula
M
X 1 sin(π(t − kTs )/Ts )
f (t) = f (kTs ) . (2.10)
N sin(π(t − kTs )/T )
k=−M

Formula (2.10) is also called the ideal interpolation formula for periodic
functions. Such formulas, where one reconstructs a function based on a weighted
sum of the sample values, are more generally called interpolation formulas. The
function N1 sin(π(t−kT s )/Ts )
sin(π(t−kTs )/T ) is also called an interpolation kernel. Note that f
itself may not be equal to a finite Fourier series, and reconstruction is in general
not possible then. The ideal interpolation formula can in such cases still be used,
but the result we obtain may be different from f (t).
In fact, the following more general result holds, which we will not prove. The
result is also valid for functions which are not periodic, and is frequently stated
in the literature:
Theorem 2.14. Sampling theorem and the ideal interpolation formula, general
version..
Assume that f has no frequencies higher than νHz. Then f can be recon-
structed exactly from its samples . . . , f (−2Ts ), f (−Ts ), f (0), f (Ts ), f (2Ts ), . . .
when the sampling rate is bigger than 2ν. Moreover, the reconstruction can be
performed through the formula
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS65


X sin(π(t − kTs )/Ts )
f (t) = f (kTs ) . (2.11)
π(t − kTs )/Ts
k=−∞

When f is periodic, it is possible to deduce this partly from the interpolation


formula for periodic functions. An ingredient in this is that x ≈ sin x for small
x, so that there certainly is a connection between the terms in the two sums.
When f is not periodic we require more tools from Fourier analysis, however.
The DFT coefficients represent the contribution in a sound at given fre-
quencies. Due to this the DFT is extremely useful for performing operations
on sound, and also for compression. For instance we can listen to either the
lower or higher frequencies after performing a simple adjustment of the DFT
coefficients. Observation 2.10 says that the 2L + 1 lowest frequencies correspond
to the DFT-indices [0, L] ∪ [N − L, N − 1], while the 2L + 1 highest frequencies
correspond to DFT-indices [N/2 − L, N/2 + L] (assuming that N is even). If we
perform a DFT, eliminate these low or high frequencies, and perform an inverse
DFT, we recover the sound signal where these frequencies have been eliminated.
The function forw_comp_rev_DFT() in the module forw_comp_rev can perform
these tasks for our audio sample file, as well as some other useful tasks that can
be useful for compression. This function accepts named parameters L and lower,
where the lowest frequencies are kept if lower==1, and the highest frequencies
are kept if lower==0.

Example 2.16: Using the DFT to adjust frequencies in


sound
Let us test the function forw_comp_rev_DFT to listen to the lower frequencies
in the audio sample file. For L = 13000, the resulting sound can be found in the
file castanetslowerfreq7.wav. For L = 5000, the resulting sound can be found in
the file castanetslowerfreq3.wav. With L = 13000 you can hear the disturbance
in the sound, but we have not lost that much even if about 90% of the DFT
coefficients are dropped. The quality is much poorer when L = 5000 (here we
keep less than 5% of the DFT coefficients). However we can still recognize the
song, and this suggests that most of the frequency information is contained in
the lower frequencies.
Let us then listen to higher frequencies instead. For L = 140000, the resulting
sound can be found in the file this. For L = 100000 resulting sound can be found
in the file castanetshigherfreq3.wav. Both sounds are quite unrecognizable.
We find that we need very high values of L to hear anything, suggesting
again that most information is contained in the lowest frequencies.

2.2.1 Change in frequency representation when window-


ing a signal
Note that there may be a problem in the previous example: when we restrict to
the values in a given block, we actually look at a different signal. The new signal
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS66

repeats the values in the block in periods, while the old signal consists of one
much bigger block. What are the differences in the frequency representations of
the two signals?
Assume that the entire sound has length M . The frequency representation
of this is computed as an M -point DFT (the signal is actually repeated with
period M ), and we write the sound samples as a sum of frequencies: xk =
1
PM −1 2πikn/M
M n=0 yn e . Let us consider the effect of restricting to a block for each
of the contributing pure tones e2πikn0 /M , 0 ≤ n0 ≤ M − 1. When we restrict
 N −1
this to a block of size N , we get the signal e2πikn0 /M k=0 . Depending on n0 ,
this may not be a Fourier basis vector! Its N -point DFT gives us its frequency
representation, and the absolute value of this is

N
X −1 N
X −1
|yn | = e2πikn0 /M e−2πikn/N = e2πik(n0 /M −n/N )
k=0 k=0
2πiN (n0 /M −n/N )
1−e sin(πN (n0 /M − n/N ))
= = . (2.12)
1 − e2πi(n0 /M −n/N ) sin(π(n0 /M − n/N ))

If n0 = kM/N , this gives yk = N , and yn = 0 when n 6= k. Thus, splitting


the signal into blocks gives another pure tone when n0 is a multiplum of M/N .
When n0 is different from this the situation is different. Let us set M = 1000,
n0 = 1, and experiment with different values of N . Figure 2.5 shows the yn
values for different values of N . We see that the frequency representation is now
very different, and that many frequencies contribute.
1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.2 0 10 20 30 40 50 60 70 0.2 0 50 100 150 200 250
Figure 2.5: The frequency representation obtained when restricting to a block
of size N of the signal, for N = 64 (left), and N = 256 (right)

The explanation is that the pure tone is not a pure tone when N = 64 and
N = 256, since at this scale such frequencies are too high to be represented
exactly. The closest pure tone in frequency is n = 0, and we see that this
has the biggest contribution, but other frequencies also contribute. The other
frequencies contribute much more when N = 256, as can be seen from the peak
in the closest frequency n = 0. In conclusion, when we split into blocks, the
frequency representation may change in an undesirable way. This is a common
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS67

problem in signal processing theory, that one in practice needs to restrict to


smaller segments of samples, but that this restriction may have undesired effects.
Another problem when we restrict to a shorter periodic signal is that we
may obtain discontinuities at the boundaries between the new periods, even if
there were no discontinuities in the original signal. And, as we know from the
square wave, discontinuities introduce undesired frequencies. We have already
mentioned that symmetric extensions may be used to remedy this.
The MP3 standard also applies a DFT to the sound data. In its simplest form
it applies a 512 point DFT. There are some differences to how this is done when
compared to Example 2.16, however. In our example we split the sound into
disjoint blocks, and applied a DFT to each of them. The MP3 standard actually
splits the sound into blocks which overlap, as this creates a more continuous
frequency representation. Another difference is that the MP3 standard applies a
window to the sound samples, and the effect of this is that the new signal has a
frequency representation which is closer to the original one, when compared to
the signal obtained by using the block values unchanged as above. We will go
into details on this in the next chapter.

Example 2.17: Compression by zeroing out small DFT co-


efficients
We can achieve compression of a sound by setting small DFT coefficients which
to zero. The idea is that frequencies with small values at the corresponding
frequency indices contribute little to our perception of the sound, so that they
can be discarded. As a result we obtain a sound with less frequency components,
which is thus more suitable for compression. To test this in practice, we first
need to set a threshold, which decides which frequencies to keep. This can then
be sent to the function forw_comp_rev_DFT by means of the named parameter
threshold. The function will now also write to the display the percentage of the
DFT coefficients which were zeroed out. If you run this function with threshold
equal to 20, resulting sound can be found in the file castanetsthreshold002.wav,
and the function says that about 68% of the DFT coefficients were set to zero.
You can clearly hear the disturbance in the sound, but we have not lost that
much. If we instead try threshold equal to 70, the resulting sound can be
found in the file castanetsthreshold01.wav, and the function says that about
94% of the DFT coefficients were set to zero. The quality is much poorer now,
even if we still can recognize the song. This suggests that most of the frequency
information is contained in frequencies with the highest values.
In Figure 2.6 we have illustrated this principle for compression for 512 sound
samples from a song. The samples of the sound and (the absolute value of) its
DFT are shown at the top. At the bottom all values of the DFT with absolute
value smaller than 0.02 are set to zero (52) values then remain), and the sound
is reconstructed with the IDFT, and then shown in. The start and end signals
look similar, even though the last signal can be represented with less than 10 %
of the values from the first.
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS68

0.10 0.4
0.3
0.05 0.2
0.1
0.00 0.0
0.1
0.05 0.2
0.3
0.100 100 200 300 400 500 0.40 100 200 300 400 500
0.4 0.10
0.3
0.2 0.05
0.1
0.0 0.00
0.1
0.2 0.05
0.3
0.40 100 200 300 400 500 0.100 100 200 300 400 500
Figure 2.6: Experimenting with the DFT on a small part of a song.

Note that using a neglection threshold in this way is too simple in practice:
The neglection threshold in general should depend on the frequency, since the
human auditory system is more sensitive to certain frequencies.

Example 2.18: Compression by quantizing DFT coefficients


The previous example is a rather simple procedure to obtain compression. The
disadvantage is that it only affects frequencies with low contribution. A more
neutral way to obtain compression is to let each DFT index occupy a certain
number of bits. This is also called quantization, and provides us with compression
if the number of bits is less than what actually is used to represent the sound.
This is closer to what modern audio standards do. forw_comp_rev_DFT accepts
a name parameter n. The effect of this is that a DFT coefficient with bit
representation

...d2 d1 d0 .d−1 d−2 d−3 ...


is truncated so that the bits dn−1 , dn−2 , dn−2 are discarded. In other words,
high values of n mean more rounding. If you run forw_comp_rev_DFT with n
equal to 3, the resulting sound can be found in the file castantesquantizedn3.wav,
with n = 5 the resulting sound can be found in the file castantesquantizedn5.wav,
and with n = 7 resulting sound can be found in the file castantesquantizedn7.wav.
You can hear that the sound degrades further when n is increased.
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS69

In practice this quantization procedure is also too simple, since the human
auditory system is more sensitive to certain frequency information, and should
thus allocate a higher number of bits for such frequencies. Modern audio
standards take this into account, but we will not go into details on this.

Exercise 2.19: Comment code


Explain what the code below does, line by line:

x = x[0:2**17]
y = fft.fft(x, axis=0)
y[(2**17/4):(3*2**17/4)] = 0
newx = abs(fft.ifft(y))
newx /= abs(newx).max()
play(newx, fs)

Comment in particular why we adjust the sound samples by dividing with the
maximum value of the sound samples. What changes in the sound do you expect
to hear?

Exercise 2.20: Which frequency is changed?


In the code from the previous exercise it turns out that fs = 44100Hz, and that
the number of sound samples is N = 292570. Which frequencies in the sound
file will be changed on the line where we zero out some of the DFT coefficients?

Exercise 2.21: Implement interpolant


Implement code where you do the following:

• at the top you define the function f (x) = cos6 (x), and M = 3,
• compute the unique interpolant from VM,T (i.e. by taking N = 2M + 1
samples over one period), as guaranteed by Proposition 2.9,
• plot the interpolant against f over one period.

Finally run the code also for M = 4, M = 5, and M = 6. Explain why the plots
coincide for M = 6, but not for M < 6. Does increasing M above M = 6 have
any effect on the plots?

2.3 The Fast Fourier Transform (FFT)


The main application of the DFT is as a tool to compute frequency information
in large datasets. Since this is so useful in many areas, it is of vital importance
that the DFT can be computed with efficient algorithms. The straightforward
implementation of the DFT with matrix multiplication we looked at is not
efficient for large data sets. However, it turns out that the DFT matrix may be
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS70

factored in a way that leads to much more efficient algorithms, and this is the
topic of the present section. We will discuss the most widely used implementation
of the DFT, usually referred to as the Fast Fourier Transform (FFT). The FFT
has been stated as one of the ten most important inventions of the 20’th century,
and its invention made the DFT computationally feasible in many fields. The
FFT is for instance used much in real time processing, such as processing and
compression of sound, images, and video. The MP3 standard uses the FFT
to find frequency components in sound, and matches this information with a
psychoachoustic model, in order to find the best way to compress the data.
FFT-based functionality is collected in a module called fft.
Let us start with the most basic FFT algorithm, which applies for a general
complex input vector x, with length N being an even number.

Theorem 2.15. FFT algorithm when N is even.


Let y = DFTN x be the N -point DFT of x, with N an even number, and let
DN/2 be the (N/2) × (N/2)-diagonal matrix with entries (DN/2 )n,n = e−2πin/N
for 0 ≤ n < N/2. Then we have that

(y0 , y1 , . . . , yN/2−1 ) = DFTN/2 x(e) + DN/2 DFTN/2 x(o) (2.13)


(e) (o)
(yN/2 , yN/2+1 , . . . , yN −1 ) = DFTN/2 x − DN/2 DFTN/2 x (2.14)

where x(e) , x(o) ∈ RN/2 consist of the even- and odd-indexed entries of x,
respectively, i.e.

x(e) = (x0 , x2 , . . . , xN −2 ) x(o) = (x1 , x3 , . . . , xN −1 ).

Put differently, the formulas (2.13)-(2.14) reduce the computation of an


N -point DFT to two N/2-point DFT’s. It turns out that this is the basic fact
which speeds up computations considerably. It is important to note that we first
should compute that the same term DN/2 DFTN/2 x(o) appears in both formulas
above. It is thus important that this is computed only once, and then inserted
in both equations. Let us first check that these formulas are correct.
Proof. Suppose first that 0 ≤ n ≤ N/2 − 1. We start by splitting the sum in the
expression for the DFT into even and odd indices,

N −1 N/2−1 N/2−1
X X X
−2πink/N −2πin2k/N
yn = xk e = x2k e + x2k+1 e−2πin(2k+1)/N
k=0 k=0 k=0
N/2−1 N/2−1
X X
= x2k e−2πink/(N/2) + e−2πin/N x2k+1 e−2πink/(N/2)
k=0 k=0
   
(e) −2πin/N
= DFTN/2 x +e DFTN/2 x(o) ,
n n
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS71

where we have substituted x(e) and x(o) as in the text of the theorem, and
recognized the N/2-point DFT in two places. Assembling this for 0 ≤ n <
N/2 we obtain Equation (2.13). For the second half of the DFT coefficients,
i.e. {yN/2+n }0≤n≤N/2−1 , we similarly have

N
X −1 N
X −1
yN/2+n = xk e−2πi(N/2+n)k/N = xk e−πik e−2πink/N
k=0 k=0
N/2−1 N/2−1
X X
= x2k e−2πin2k/N − x2k+1 e−2πin(2k+1)/N
k=0 k=0
N/2−1 N/2−1
X X
= x2k e−2πink/(N/2) − e−2πin/N x2k+1 e−2πink/(N/2)
k=0 k=0
   
= DFTN/2 x(e) − e−2πin/N DFTN/2 x(o) .
n n

Equation (2.14) now follows similarly.


Note that an algorithm for the IDFT can be deduced in exactly the same
way. All we need to change is the sign in the exponents of the Fourier matrix. In
addition we need to divide by 1/N at the end. If we do this we get the following
result, which we call the IFFT algorithm. Recall that we use the notation A for
the matrix where all the elements of A have been conjugated.
Theorem 2.16. IFFT algorithm when N is even.
Let N be an even number and let x̃ = DFTN y. Then we have that

(x̃0 , x̃1 , . . . , x̃N/2−1 ) = DFTN/2 y (e) + DN/2 DFTN/2 )y (o) (2.15)


(e) (o)
(x̃N/2 , x̃N/2+1 , . . . , x̃N −1 ) = DFTN/2 y − DN/2 DFTN/2 )y (2.16)

where y (e) , y (o) ∈ RN/2 are the vectors

y (e) = (y0 , y2 , . . . , yN −2 ) y (o) = (y1 , y3 , . . . , yN −1 ).

Moreover, x = IDFTN y can be computed from x = x̃/N = DFTN y/N


It turns out that these theorems can be interpreted as matrix factorizations.
For this we need to define the concept of a block matrix.
Definition 2.17. Block matrix.
Let m0 , . . . , mr−1 and n0 , . . . , ns−1 be integers, and let A(i,j) be an mi × nj -
matrix for i = 0, . . . , r − 1 and j = 0, . . . , s − 1. The notation
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS72

A(0,0) A(0,1) A(0,s−1)


 
···
 A(1,0) A(1,1) ··· A(1,s−1) 
A=
 
.. .. .. .. 
 . . . . 
A(r−1,0) A(r−1,1) ··· A(r−1,s−1)
denotes the (m0 + m1 + . . . + mr−1 ) × (n0 + n1 + . . . + ns−1 )-matrix where the
matrix entries occur as in the A(i,j) matrices, in the way they are ordered. When
A is written in this way it is referred to as a block matrix.
Clearly, using equations (2.13)-(2.14), the DFT matrix can be factorized
using block matrix notation as

 x(e)
 
(y0 , y1 , . . . , yN/2−1 ) = DFTN/2 DN/2 DFTN/2
x(o)
 x(e)
 
(yN/2 , yN/2+1 , . . . , yN −1 ) = DFTN/2 −DN/2 DFTN/2 .
x(o)

Combining these, noting that


    
DFTN/2 DN/2 DFTN/2 I DN/2 DFTN/2 0
= ,
DFTN/2 −DN/2 DFTN/2 I −DN/2 0 DFTN/2
we obtain the following factorisations:
Theorem 2.18. DFT and IDFT matrix factorizations.
We have that
    (e) 
I DN/2 DFTN/2 0 x
DFTN x =
I −DN/2 0 DFTN/2 x(o)
    (e) 
1 I DN/2 DFTN/2 0 y
IDFTN y = (2.17)
N I −DN/2 0 DFTN/2 y (o)
We will shortly see why these factorizations reduce the number of arithmetic
operations we need to do, but first let us consider how to implement them. First
of all, note that we can apply the FFT factorizations again to FN/2 to obtain
 
  I DN/4 0 0
I DN/2 I −DN/4 0 0 
DFTN x =  ×
I −DN/2 0 0 I DN/4 
0 0 I −DN/4
   (ee) 
DFTN/4 0 0 0 x
 0 DFTN/4 0 0  x(eo) 
  
 0 0 DFTN/4 0  x(oe) 
0 0 0 DFTN/4 x(oo)
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS73

where the vectors x(e) and x(o) have been further split into even- and odd-indexed
entries. Clearly, if this factorization is repeated, we obtain a factorization

 
I DN/2k 0 0 ··· 0 0
I −DN/2k 0 0 ··· 0 0 
 
log2 N 0
 0 I DN/2k ··· 0 0 

0 0 I −DN/2k ··· 0 0
Y 
DFTN =  P. (2.18)


 .. .. .. .. .. 
k=1  . . . . . 0 0 
 
0 0 0 0 ··· I DN/2k 
0 0 0 0 ··· I −DN/2k

The factorization has been repated until we have a final diagonal matrix with
DFT1 on the diagonal, but clearly DFT1 = 1, so we do not need any DFT-
matrices in the final factor. Note that all matrices in this factorization are
sparse. A factorization into a product of sparse matrices is the key to many
efficient algorithms in linear algebra, such as the computation of eigenvalues and
eigenvectors. When we later compute the number of arithmetic operations in
this factorization, we will see that this is the case also here.
In Equation (2.18), P is a permutation matrix which secures that the even-
indexed entries come first. Since the even-indexed entries have 0 as the last
bit, this is the same as letting the last bit become the first bit. Since we here
recursively place even-indexed entries first, it is not too difficult to see that P
permutes the elements of x by performing a bit-reversal of the indices, i.e.

P (ei ) = ej i = d1 d2 . . . dn j = dn dn−1 . . . d1 ,

where we have used the bit representations of i and j. Since P 2 = I, a bit-reversal


can be computed very efficiently, and performed in-place, i.e. so that the result
ends up in same vector x, so that we do not need to allocate any memory in
this operation. We will use an existing function called bitreverse to perfom
in-place bit-reversal. In Exercise 2.30 we will go through this implementation.
Matrix multiplication is usually not done in-place, i.e. when we compute
y = Ax, different memory is allocated for x and y. For certain simple matrices,
however, matrix multiplication can also be done in-place, so that the output can
be written into the same memory (x) used by the input. It turns out that the
matrices in factorization (2.18) are of this kind, so that the entire FFT can be
computed in-place. We will have more to say on this in the exercises.
In a practical algorithm, it is smart to perform the bit-reversal first, since
the matrices in the factorization (2.18) are block diagonal, so that the different
blocks in each matrix can be applied in parallel to P x (the bit-reversed version
of x). We can thus exploit the parallel processing capabilities of the computer.
It turns out that this bit-reversal is useful for other similar factorizations of the
DFT as well. We will also look at other such factorizations, and we will therefore
split the computation of the DFT as follows: First a general function is applied,
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS74

which is responsible for the bit-reversal of the input vector x. Then the matrices
in the factorization (2.18) is applied in a “kernel FFT function” (and we will
have many such kernels), which assumes that the input has been bit-reversed. A
simple implementation of the general function can be as follows.
def FFTImpl(x, FFTKernel):
bitreverse(x)
FFTKernel(x)

A simple implementation of the kernel FFT function, based on the first FFT
algorithm we stated, can be as follows.
def FFTKernelStandard(x):
N = len(x)
if N > 1:
xe, xo = x[0:(N/2)], x[(N/2):]
FFTKernelStandard(xe)
FFTKernelStandard(xo)
D = exp(-2*pi*1j*arange(float(N/2))/N)
xo *= D
x[:] = concatenate([xe + xo, xe - xo])

In Exercise 2.23 we will extend these to the general implementations we will


use later. We can now run the FFT by combining the general function and the
kernel as follows:

FFTImpl(x, FFTKernelStandard)

Note that FFTKernelStandard is recursive; it calls itself. If this is your first


encounter with a recursive program, it is worth running through the code
manually for a given value of N , such as N = 4.
Immediately we see from factorization (2.18) two possible implementations
for a kernel. First, as we did, we can apply the FFT recursively. A second way
is to, instead of using recursive function calls, use a for-loop where we at each
stage in the loop compute the product with one matrix in factorization (2.18),
from right to left. Inside this loop there must be another for-loop, where the
different blocks in this matrix are applied. We will establish this non-recursive
implementation in Exercise 2.28, and see that this leads to a more efficient
algorithm.
Python has built-in functions for computing the DFT and the IDFT using
the FFT algorithm. These reside in the module numpy. The functions are called
fft and ifft. These functions make no assumption about the length of the
vector, i.e. it may not be of even length. The implementation may however check
if the length of the vector is 2r , and in those cases variants of the algorithm
discussed here can be used. In general, fast algorithms exist when the vector
length N can be factored as a product of small integers.

2.3.1 Reduction in the number of arithmetic operations


Now we will explain why the FFT and IFFT factorizations reduce the number of
arithmetic operations when compared to direct DFT and IDFT implementations.
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS75

We will assume that x ∈ RN with N a power of 2, so that the FFT algorithm


can be used recursively, all the way down to vectors of length 1. In many settings
this power of 2 assumption can be done. As an example, in compression of
sound, one restricts processing to a certain block of the sound data, since the
entire sound is too big to be processed in one piece. One then has a freedom to
how big these blocks are made, and for optimal speed one often uses blocks of
length 2r with r some integer in the range 5–10. At the end of this section we
will explain how the more general FFT can be computed when N is not a power
of 2.
We first need some terminology for how we count the number of operations
of a given type in an algorithm. In particular we are interested in the limiting
behaviour when N becomes large, which is the motivation for the following
definition.
Definition 2.19. Order of an algorithm.
Let RN be the number of operations of a given type (such as multiplication
or addition) in an algorithm, where N describes the dimension of the data (such
as the size of the matrix or length of the vector), and let f be a positive function.
The algorithm is said to be of order f (N ), also written O(f (N )), if the number
of operations grows as f (N ) for large N , or more precisely, if

RN
lim = 1.
N →∞ f (N )

In some situations we may count the number of operations exactly, but we


will also see that it may be easier to obtain the order of the algorithm, since the
number of operations may have a simpler expression in the limit. Let us see how
we can use this terminology to describe the complexity of the FFT algorithm.
Let MN and AN denote the number of real multiplications and real additions,
respectively, required by the FFT algorithm. Once the FFT’s of order N/2 have
been computed (MN/2 real multiplications and AN/2 real additions are needed
for each), it is clear from equations (2.13)-(2.14) that an additional N complex
additions, and an additional N/2 complex multiplications, are required. Since
one complex multiplication requires 4 real multiplications and 2 real additions,
and one complex addition requires two real additions, we see that we require
an additional 2N real multiplications, and 2N + N = 3N real additions. This
means that we have the difference equations

MN = 2MN/2 + 2N AN = 2AN/2 + 3N. (2.19)

Note that e−2πi/N may be computed once and for all and outside the algorithm,
and this is the reason why we have not counted these operations.
The following example shows how the difference equations (2.19) can be solved.
It is not too difficult to argue that MN = O(2N log2 N ) and AN = O(3N log2 ),
by noting that there are log2 N levels in the FFT, with 2N real multiplications
and real 3N additions at each level. But for N = 2 and N = 4 we may actually
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS76

avoid some multiplications, so we should solve these equations by stating initial


conditions carefully, in order to obtain exact operation counts. In practice, and
as we will see later, one often has more involved equations than (2.19), for which
the solution can not be seen directly, so that one needs to apply systematic
mathematical methods instead. Below is shown an example of this.

Solving for the number of operations. To use standard solution methods


for difference equations to equations (2.19), we first need to write them in a
standard form. Assuming that AN and MN are powers of 2, we set N = 2r
and xr = M2r , or xr = A2r . The difference equations can then be rewritten as
xr = 2xr−1 + 2 · 2r for multiplications, and xr = 2xr−1 + 3 · 2r for additions, and
again be rewritten in the standard forms

xr+1 − 2xr = 4 · 2r xr+1 − 2xr = 6 · 2r .

The homogeneous equation xr+1 − 2xr = 0 has the general solution xhr = C2r .
Since the base in the power on the right hand side equals the root in the
homogeneous equation, we should in each case guess for a particular solution on
the form (xp )r = Ar2r . If we do this we find that the first equation has particular
solution (xp )r = 2r2r , while the second has particular solution (xp )r = 3r2r .
The general solutions are thus on the form xr = 2r2r + C2r , for multiplications,
and xr = 3r2r + C2r for additions.
Now let us state initial conditions for the number of additions and multipli-
cations. Example 2.3 showed that floating point multiplication can be avoided
completely for N = 4. We can therefore use M4 = x2 = 0 as an initial value.
This gives, xr = 2r2r − 4 · 2r , so that MN = 2N log2 N − 4N .
For additions we can use A2 = x1 = 4 as initial value (since DFT2 (x1 , x2 ) =
(x1 + x2 , x1 − x2 )), which gives xr = 3r2r , so that AN = 3N log2 N − N . Our
FFT algorithm thus requires slightly more additions than multiplications. FFT
algorithms are often characterized by their operation count, i.e. the total number
of real additions and real multiplications, i.e. RN = MN + AN . We see that
RN = 5N log2 N − 5N . The order of the operation count of our algorithm can
log2 N −4N
thus be written as O(5N log2 N ), since limN →∞ 5N5N log2 N = 1.
In practice one can reduce the number of multiplications further, since
e−2πin/N take the simple values 1, −1, −i, √ i for √
some n. √ One can also use that
e−2πin/N can take the simple values ±1/ 2 ± 1/ 2i = 1/ 2(±1 ± i), which also √
saves some floating point multiplication, due to that we can factor out 1/ 2.
These observations do not give big reductions in the arithmetic complexity,
however, and one can show that the operation count is still O(5N log2 N ) after
using these observations.
It is straightforward to show that the IFFT implementation requires the
same operation count as the FFT algorithm.
In contrast, the direct implementation of the DFT requires N 2 complex
multiplications and N (N − 1) complex additions. This results in 4N 2 real
multiplications and 2N 2 + 2N (N − 1) = 4N 2 − 2N real additions. The total
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS77

operation count is thus 8N 2 − 2N . In other words, the FFT and IFFT signifi-
cantly reduce the number of arithmetic operations. In Exercise 2.29 we present
another algorithm, called the Split-radix algorithm, which reduces the number of
operations even further. We will see, however, the reduction obtained with the
split-radix algorithm is about 20%. Let us summarize our findings as follows.

Theorem 2.20. Number of operations in the FFT and IFFT algorithms.


The N -point FFT and IFFT algorithms we have gone through both require
O(2N log2 N ) real multiplications and O(3N log2 N ) real additions. In compar-
ison, the number of real multiplications and real additions required by direct
implementations of the N -point DFT and IDFT are O(8N 2 ).

Often we apply the DFT for real data, so we would like to have FFT-
algorithms tailored to this, with reduced complexity (since real data has half
the dimension of general complex data). By some it has been argued that one
can find improved FFT algorithms when one assumes that the data is real. In
Exercise 2.27 we address this issue, and conclude that there is little to gain from
assuming real input: The general algorithm for complex input can be tailored
for real input so that it uses half the number of operations, which harmonizes
with the fact that real data has half the dimension of complex data.
Another reason why the FFT is efficient is that, since the FFT splits the
calculation of the DFT into computing two DFT’s of half the size, the FFT
is well suited for parallel computing: the two smaller FFT’s can be performed
independently of one another, for instance in two different computing cores
on the same computer. Besides reducing the number of arithmetic operations,
FFT implementation can also apply several programming tricks to speed up
computation, see for instance http://cnx.org/content/m12021/latest/ for an
overview.

2.3.2 The FFT when N is not a power of 2


Applying an FFT to a vector of length 2n is by far the most common thing to
do. It turns out, however, that the idea behind the algorithm easily carries over
to the case when N is any composite number, i.e. when N = N1 N2 . This make
the FFT useful also in settings where we have a dictated number of elements in
x, which is not an even number. The approach we will present in this section
will help us as long as N is not a prime number. The case when N is a prime
number needs other techniques.
So, assume that N = N1 N2 . Any time-index k can be written uniquely on
the form N1 k + p, with 0 ≤ k < N2 , and 0 ≤ p < N1 . We will make the following
definition.
Definition 2.21. Polyphase components of a vector.
Let x ∈ RN1 N2 . We denote by x(p) the vector in RN2 with entries (x(p) )k =
xN1 k+p . x(p) is also called the p’th polyphase component of x.
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS78

The previous vectors x(e) and x(o) can be seen as special cases of polyphase
components. Polyphase components will also be useful later. Using the polyphase
notation, we can write

N
X −1 1 −1 N
NX X2 −1

DFTN x = xk e−2πink/N = (x(p) )k e−2πin(N1 k+p)/N


k=0 p=0 k=0
1 −1
NX 2 −1
NX
= e−2πinp/N (x(p) )k e−2πink/N2
p=0 k=0

Similarly, any frequency index n can be written uniquely on the form N2 q + n,


with 0 ≤ q < N1 , and 0 ≤ n < N2 , so that the DFT can also be written as

1 −1
NX 2 −1
NX
−2πi(N2 q+n)p/N
e (x(p) )k e−2πi(N2 q+n)k/N2
p=0 k=0
1 −1
NX 2 −1
NX
= e−2πiqp/N1 e−2πinp/N (x(p) )k e−2πink/N2 .
p=0 k=0

the N2 ×N1 -matrix X where the p’th column is x(p) , we recognize


Now, if X isP
N2 −1
the inner sum k=0 (x(p) )k e−2πink/N2 as matrix multiplication with DFTN2
and X, so that this can be written as (DFTN2 X)n,p . The entire sum can thus
be written as
1 −1
NX
e−2πiqp/N1 e−2πinp/N (DFTN2 X)n,p .
p=0

Now, define Y as the matrix where X is multiplied component-wise with the


matrix with (n, p)-component e−2πinp/N . The entire sum can then be written as
1 −1
NX
e−2πiqp/N1 Yn,p = (Y FN1 )n,q
p=0

This means that the sum can be written as component (n, q) in the matrix
Y FN1 . Clearly Y FN1 is the matrix where the DFT is applied to all rows of Y .
We have thus shown that component N2 q + n of FN x equals (Y FN1 )n,q . This
means that FN x can be obtained by stacking the columns of Y FN1 on top of
one-another. We can thus summarize our procedure as follows, which gives a
recipe for splitting an FFT into smaller FFT’s when N is not a prime number.
Theorem 2.22. FFT algorithm when N is composite.
When N = N1 N2 , the FFT of a vector x can be computed as follows

• Form the N2 × N1 -matrix X, where the p’th column is x(p) .


CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS79

• Perform the DFT on all the columns in X, i.e. compute FN2 X.


• Multiply element (n, p) in the resulting matrix with e−2πinp/N (these are
called twiddle factors), to obtain matrix Y .
• Perform the DFT on all the rows in the resulting matrix, i.e. compute
Y F N1 .
• Form the vector where the columns of the resulting matrix are stacked on
top of one-another.

From the algorithm one easily deduces how the IDFT can be computed also:
All steps are invertible, and can be performed by IFFT or multiplication. We
thus only need to perform the inverse steps in reverse order.

Exercise 2.22: Extra results for the FFT when N = N1 N2


When N is composite, there are a couple of results we can state regarding
polyphase components.
a) Assume that N = N1 N2 , and that x ∈ RN satisfies xk+rN1 = xk for all k, r,
i.e. x has period N1 . Show that yn = 0 for all n which are not a multiplum of
N2 .
b) Assume that N = N1 N2 , and that x(p) = 0 for p = 6 0. Show that the
polyphase components y (p) of y = DFTN x are constant vectors for all p.
But what about the case when N is a prime number? Rader’s algorithm
[38] handles this case by expressing a DFT with N a prime number in terms of
DFT’s of length N − 1 (which is not a prime number). Our previous scenario can
then be followed, but stops quickly again if N − 1 has prime factors of high order.
Since there are some computational penalties in applying Rader’s algorithm, it
may be inefficient some cases. Winograd’s FFT algorithm [50] extends Rader’s
algorithm to work for the case when N = pr . This algorithm tends to reduce
the number of multiplications, at the price of an increased number of additions.
It is difficult to program, and is rarely used in practice.

Exercise 2.23: Extend implementation


Recall that, in Exercise 2.13, we extended the direct DFT implementation so
that it accepted a second parameter telling us if the forward or reverse transform
should be applied. Extend the general function and the standard kernel in the
same way. Again, the forward transform should be used if the forward parameter
is not present. Assume also that the kernel accepts only one-dimensional data,
and that the general function applies the kernel to each column in the input if
the input is two-dimensional (so that the FFT can be applied to all channels
in a sound with only one call). The signatures for our methods should thus be
changed as follows:
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS80

def FFTImpl(x, FFTKernel, forward = True):


def FFTKernelStandard(x, forward):

It should be straightforward to make the modifications for the reverse transform


by consulting the second part of Theorem 2.18. For simplicity, let FFTImpl take
care of the additional division with N we need to do in case of the IDFT. In the
following we will assume these signatures for the FFT implementation and the
corresponding kernels.

Exercise 2.24: Compare execution time


In this exercise we will compare execution times for the different methods for
computing the DFT.
a) Write code which compares the execution times for an N -point DFT for the
following three cases: Direct implementation of the DFT (as in Example 2.4),
the FFT implementation used in this chapter, and the built-in fft-function.
Your code should use the sample audio file castanets.wav, apply the different
DFT implementations to the first N = 2r samples of the file for r = 3 to r = 15,
store the execution times in a vector, and plot these. You can use the function
time() in the time module to measure the execution time.
b) A problem for large N is that there is such a big difference in the execution
times between the two implementations. We can address this by using a loglog-
plot instead. Plot N against execution times using the function loglog. How
should the fact that the number of arithmetic operations are 8N 2 and 5N log2 N
be reflected in the plot?
c) It seems that the built-in FFT is much faster than our own FFT implemen-
tation, even though they may use similar algorithms. Try to explain what can
be the cause of this.

Exercise 2.25: Combine two FFT’s


Let x1 = (1, 3, 5, 7) and x2 = (2, 4, 6, 8). Compute DFT4 x1 and DFT4 x2 . Ex-
plain how you can compute DFT8 (1, 2, 3, 4, 5, 6, 7, 8) based on these computations
(you don’t need to perform the actual computation). What are the benefits of
this approach?

Exercise 2.26: FFT operation count


When we wrote down the difference equation for the number of multiplications in
the FFT algorithm, you could argue that some multiplications were not counted.
Which multiplications in the FFT algorithm were not counted when writing down
this difference equation? Do you have a suggestion to why these multiplications
were not counted?
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS81

Exercise 2.27: FFT algorithm adapted to real data


In this exercise we will look at how we can adapt an FFT algorithm to real input.
Since yN −n = yn for real input, there is no additional complexity in computing
the second half of the DFT coefficients, once the first half has been computed.
We will now rewrite Equation (2.13) for indices n and N/2 − n as

yn = (DFTN/2 x(e) )n + e−2πin/N (DFTN/2 x(o) )n


yN/2−n = (DFTN/2 x(e) )N/2−n + e−2πi(N/2−n)/N (DFTN/2 x(o) )N/2−n
= (DFTN/2 x(e) )n − e2πin/N (DFTN/2 x(o) )n
= (DFTN/2 x(e) )n − e−2πin/N (DFTN/2 x(o) )n .

We see here that, if we already have computed DFTN/2 x(e) and DFTN/2 x(o) ,
we need one additional complex multiplication for each yn with 0 ≤ n < N/4
(since e−2πin/N and (DFTN/2 x(o) )n are complex). No further multiplications
are needed in order to compute yN/2−n , since we simply conjugate terms before
adding them. Again yN/2 must be handled explicitly with this approach. For
this we can use the formula

yN/2 = (DFTN/2 x(e) )0 − (DN/2 DFTN/2 x(o) )0


instead.
a) Conclude from this that an FFT algorithm adapted to real data at each step
requires N/4 complex additions and N/2 complex additions. Conclude from this
as before that an algorithm based on real data requires MN = O(N log2 N ) real
multiplications and AN = O 32 N log2 N real additions (i.e. half the operation
count for complex input).
b) Find an IFFT algorithm adapted to vectors with conjugate symmetry, which
has the same operation count as this FFT algorithm adapted to real data.

Hint. Consider the vectors z, w with entries zn = yn + yN/2−n ∈ RN/2 and


wn = e2πin/N (yn − yN/2−n ) ∈ RN/2 . From the equations above, how can these
be used in an IFFT?

Exercise 2.28: Non-recursive FFT algorithm


Use the factorization in (2.18) to write a kernel function FFTKernelNonrec
for a non-recursive FFT implementation. In your code, perform the matrix
multiplications in Equation (2.18) from right to left in an (outer) for-loop. For
each matrix loop through the different blocks on the diagonal in an (inner)
for-loop. Make sure you have the right number of blocks on the diagonal, each
block being on the form
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS82

 
I DN/2k
.
I −DN/2k
It may be a good idea to start by implementing multiplication with such a simple
matrix first as these are the building blocks in the algorithm (also attempt to do
this so that everything is computed in-place). Also compare the execution times
with our original FFT algorithm, as we did in Exercise 2.24, and try to explain
what you see in this comparison.

Exercise 2.29: The Split-radix FFT algorithm


In this exercise we will develop a variant of the FFT algorithm called the split-
radix FFT algorithm, which until recently held the record for the lowest operation
count for any FFT algorithm.
We start by splitting the rightmost DFTN/2 in Equation (2.17) by using this
equation again, to obtain

  
DFTN/4 DN/4 DFTN/4  (e) 
DFTN/2 DN/2
DFT N/4 −D N/4 DFT N/4
 x
DFTN x =    x(oe)  .
 DFTN/4 DN/4 DFTN/4 
DFTN/2 −DN/2 x(oo)
DFTN/4 −DN/4 DFTN/4
(2.20)
The term radix describes how an FFT is split into FFT’s of smaller sizes, i.e. how
the sum in an FFT is split into smaller sums. The FFT algorithm we started
this section with is called a radix 2 algorithm, since it splits an FFT of length
N into FFT’s of length N/2. If an algorithm instead splits into FFT’s of length
N/4, it is called a radix 4 FFT algorithm. The algorithm we go through here is
called the split radix algorithm, since it uses FFT’s of both length N/2 and N/4.
a) Let GN/4 be the 
(N/4)×(N/4) diagonal
 matrix with e−2πin/N on the diagonal.
GN/4 0
Show that DN/2 = .
0 −iGN/4
b) Let HN/4 be the (N/4) × (N/4) diagonal matrix GD/4 DN/4 . Verify the
following rewriting of Equation (2.20):
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS83

  
GN/4 DFTN/4 HN/4 DFTN/4  (e) 
DFT N/2 x
 −iGN/4 DFTN/4 iHN/4 DFTN/4  x(oe) 
 
DFTN x = 
 −GN/4 DFTN/4 −HN/4 DFTN/4 
DFTN/2 x(oo)
iGN/4 DFTN/4 −iHN/4 DFTN/4
 
I 0 GN/4 HN/4    (e) 
0 I −iGN/4 iHN/4  DFTN/2 0 0 x
(oe) 
=I 0 −GN/4 −HN/4 
 0 DFT N/4 0   x
0 0 DFTN/4 (oo)
x
0 I iGN/4 −iHN/4
   
GN/4 HN/4
DFTN/2 x(e)
 
I
−iGN/4 iHN/4  DFTN/4 x(oe) 
 
= GN/4 HN/4 
I − DFTN/4 x(oo)
−iGN/4 iHN/4
GN/4 DFTN/4 x(oe) + HN/4 DFTN/4 x(oo) 
  
(e)
DFT N/2 x + (oe)

−i GN/4 DFTN/4 x − HN/4 DFTN/4 x(oo) 
= (oe)

 (e) GN/4 DFTN/4 x + HN/4 DFTN/4 x(oo)  
DFTN/2 x −
−i GN/4 DFTN/4 x(oe) − HN/4 DFTN/4 x(oo)

c) Explain from the above expression why, once the three FFT’s above have
been computed, the rest can be computed with N/2 complex multiplications,
and 2 × N/4 + N = 3N/2 complex additions. This is equivalent to 2N real
multiplications and N + 3N = 4N real additions.

Hint. It is important that GN/4 DFTN/4 x(oe) and HN/4 DFTN/4 x(oo) are com-
puted first, and the sum and difference of these two afterwards.
d) Due to what we just showed, our new algorithm leads to real multiplication
and addition counts which satisfy

MN = MN/2 + 2MN/4 + 2N AN = AN/2 + 2AN/4 + 4N

Find the general solutions to these difference equations


 and conclude from these
that MN = O 34 N log2 N , and AN = O 83 N log2 N . The operation count is
thus O (4N log2 N ), which is a reduction of N log2 N from the FFT algorithm.
e) Write an FFT kernel function FFTKernelSplitradix for the split-radix
algorithm (again this should handle both the forward and reverse transforms).
Are there more or less recursive function calls in this function than in the
original FFT algorithm? Also compare the execution times with our original
FFT algorithm, as we did in Exercise 2.24. Try to explain what you see in this
comparison.
By carefully examining the algorithm we have developed, one can reduce
the operation count to 4N log2 N − 6N + 8. This does not reduce the order of
the algorithm, but for small N (which often is the case in applications) this
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS84

reduces the number of operations considerably, since 6N is large compared to


4N log2 N for small N . In addition to having a lower number of operations
than the FFT algorithm of Theorem 2.15, a bigger percentage of the operations
are additions for our new algorithm: there are now twice as many additions
than multiplications. Since multiplications may be more time-consuming than
additions (depending on how the CPU computes floating-point arithmetic), this
can be a big advantage.

Exercise 2.30: Bit-reversal


In this exercise we will make some considerations which will help us explain the
code for bit-reversal. This is perhaps not a mathematically challenging exercise,
but nevertheless a good exercise in how to think when developing an efficient
algorithm. We will use the notation i for an index, and j for its bit-reverse. If
we bit-reverse k bits, we will write N = 2k for the number of possible indices.
a) Consider the following code

j = 0
for i in range(N-1):
print j
m = N/2
while (m >= 1 and j >= m):
j -= m
m /= 2
j += m

Explain that the code prints all numbers in [0, N − 1] in bit-reversed order (i.e. j).
Verify this by running the program, and writing down the bits for all numbers
for, say N = 16. In particular explain the decrements and increments made to
the variable j. The code above thus produces pairs of numbers (i, j), where j is
the bit-reverse of i. As can be seen, bitreverse applies similar code, and then
swaps the values xi and xj in x, as it should.
Since bit-reverse is its own inverse (i.e. P 2 = I), it can be performed by
swapping elements i and j. One way to secure that bit-reverse is done only once,
is to perform it only when j > i. You see that bitreverse includes this check.
b) Explain that N − j − 1 is the bit-reverse of N − i − 1. Due to this, when
i, j < N/2, we have that N − i − 1, N − j − l ≥ N/2, and that bitreversal can
swap them. Moreover, all swaps where i, j ≥ N/2 can be performed immediately
when pairs where i, j < N/2 are encountered. Explain also that j < N/2
if and only if i is even. In the code you can see that the swaps (i, j) and
(N − i − 1, N − j − 1) are performed together when i is even, due to this.
c) Assume that i < N/2 is odd. Explain that j ≥ N/2, so that j > i. This says
that when i < N/2 is odd, we can always swap i and j (this is the last swap
performed in the code). All swaps where 0 ≤ j < N/2 and N/2 ≤ j < N can be
performed in this way.
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS85

In bitreversal, you can see that the bit-reversal of 2r and 2r +1 are handled
together (i.e. i is increased with 2 in the for-loop). The effect of this is that the
number of if-tests can be reduced, due to the observations from b) and c).

2.4 Summary
We considered the analog of Fourier series for digital sound, which is called
the Discrete Fourier Transform, and looked at its properties and its relation to
Fourier series. We also saw that the sampling theorem guaranteed that there is
no loss in considering the samples of a function, as long as the sampling rate is
high enough compared to the highest frequency in the sound.
We obtained an implementation of the DFT, called the FFT, which is
more efficient in terms of the number of arithmetic operations than a direct
implementation of the DFT. The FFT has been cited as one of the ten most
important algorithms of the 20’th century [5]. The original paper [8] by Cooley
and Tukey dates back to 1965, and handles the case when N is composite. In the
literature, one has been interested in the FFT algorithms where the number of
(real) additions and multiplications (combined) is as low as possible. This number
is also called the flop count. The presentation in this book thus differs from
the literature in that we mostly count only the number of multiplications. The
split-radix algorithm [51, 14], which we reviewed in Exercise 2.3. 2.29, held the
record for the lowest flop count until quite recently. In [22], Frigo and Johnson
showed that the operation count can be reduced to O(34N log2 (N )/9), which
clearly is less than the O(4N log2 N ) we obatined for the split-radix algorithm.
It may seem strange that the total number of additions and multiplications
are considered: Aren’t multiplications more time-consuming than additions?
When you consider how this is done mechanically, this is certainly the case:
In fact, floating point multiplication can be considered as a combination of
many floating point additions. Due to this, one can find many places in the
literature where expressions are rewritten so that the multiplication count is
reduced, at the cost of a higher addition count. Winograd’s algorithm [50] is
an example of this, where the number of additions is much higher than the
number of multiplications. However, most modern CPU’s have more complex
hardware dedicated to computing multiplications, which can result in that one
floating point multiplication can be performed in one cycle, just as one addition
can. Another thing is that modern CPU’s typically can perform many additions
and multiplications in parallel, and the higher complexity in the multiplication
hardware may result in that the CPU can run less multiplications in parallel,
compared to additions. In other words, if we run test program on a computer, it
may be difficult to detect any differences in performance between addition and
multiplication, even though complex big-scale computing should in theory show
some differences. There are also other important aspects of the FFT, besides
the flop count. Another is memory use. It is possible to implement the FFT so
that the output is computed into the same memory as the input, so that the
CHAPTER 2. DIGITAL SOUND AND DISCRETE FOURIER ANALYSIS86

FFT algorithm does not require extra memory besides the input buffer. Clearly,
one should bit-reverse the input buffer in order to achieve this.
We have now defined two types of transforms to the frequency domain: Fourier
series for continuous, periodic functions, and the DFT, for periodic vectors. In
the literature there are two other transforms also: The Continuous time Fourier
transform (CTFT) we have already mentioned at the end of Chapter 1. We also
have the Discrete time Fourier transform (DTFT)) for vectors which are not
periodic [37]. In this book we will deliberately avoid the DTFT as well, since it
assumes that the signal to transform is of infinite duration, while we in practice
analyze signals with a limited time scope.
The sampling theorem is also one of the most important results of the last
century. It was discovered by Harry Nyquist and Claude Shannon [42], but also
by others independently. One can show that the sampling theorem holds also
for functions which are not periodic, as long as we have the same bound on the
highest frequency. This is more common in the literature. In fact, the proof seen
here where we restrict to periodic functions is not common. The advantage of
the proof seen here is that we remain in a finite dimensional setting, and that
we only need the DFT. More generally, proofs of the sampling theorem in the
literature use the DTFT and the CTFT.

What you should have learned in this chapter.

• The definition of the Fourier basis and its orthonormality.


• The definition of the Discrete Fourier Transfrom as a change of coordinates
to the Fourier basis, its inverse, and its unitarity.
• How to apply the DFT to a sum of sinusoids.

• Properties of the DFT, such as conjugate symmetry when the vector is


real, how it treats delayed vectors, or vectors multiplied with a complex
exponential.
• Translation between DFT index and frequency. In particular DFT indices
for high and low frequencies.

• How one can use the DFT to adjust frequencies in sound.


• How the FFT algorithm works by splitting into two FFT’s of half the
length.
• Simple FFT implementation.

• Reduction in the number of operations with the FFT.


Chapter 3

Operations on digital sound:


digital filters

In Section 1.5 we defined filters as operations on continuous sound which preserved


different frequencies. Such operations are important since they can change the
frequency content in many ways. They are difficult to use computationally,
however, since they are defined for all instances in time. As when we defined the
DFT to make Fourier series computable, we would like to define digital filters,
in order to make filters computable. It will turn out that such digital filters can
be computed by the following procedure:
1
zn = (xn−1 + 2xn + xn+1 ), for n = 0, 1, . . . , N − 1. (3.1)
4
Here x denotes the input vector, and z the output vector. In other words, the
output of a digital filter is constructed by combining several input elements
linearly. The concrete filter defined by Equation (3.1) is called a smoothing
filter, as we will demonstrate that it smooths the variations in the sound. We
will start this chapter by looking at matrix representations for operations as
given by Equation (3.1). Then we will formally define digital filters in terms of
preservation of frequencies as we did for the filters in Chapter 1, and show that
this is equivalent to operations on the form (3.1).

3.1 Matrix representations of filters


Let us consider Equation (3.1) in some more detail to get more intuition about
filters. As before we assume that the input vector is periodic with period N ,
so that xn+N = xn . Our first observation is that the output vector z is also
periodic with period N since

1 1
zn+N = (xn+N −1 + 2xn+N + xn+N +1 ) = (xn−1 + 2xn + xn+1 ) = zn .
4 4

87
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS88

The filter is also clearly a linear transformation and may therefore be represented
by an N × N matrix S that maps the vector x = (x0 , x1 , . . . , xN −1 ) to the vector
z = (z0 , z1 , . . . , zN −1 ), i.e., we have z = Sx. To find S, for 1 ≤ n ≤ N − 2 it is
clear from Equation (3.1) that row n has the value 1/4 in column n − 1, the
value 1/2 in column n, and the value 1/4 in column n + 1. For row 0 we must be
a bit more careful, since the index −1 is outside the legal range of the indices.
This is where the periodicity helps us out so that

1 1 1
z0 = (x−1 + 2x0 + x1 ) = (xN −1 + 2x0 + x1 ) = (2x0 + x1 + xN −1 ).
4 4 4
From this we see that row 0 has the value 1/4 in columns 1 and N − 1, and the
value 1/2 in column 0. In exactly the same way we can show that row N − 1
has the entry 1/4 in columns 0 and N − 2, and the entry 1/2 in column N − 1.
In summary, the matrix of the smoothing filter is given by
 
2 1 0 0 ··· 0 0 0 1
1 2 1 0 · · · 0 0 0 0
 
10 1 2 1 · · · 0 0 0 0

S = . . . . . . . . . . (3.2)
4  .. .. .. .. .. .. .. .. .. 

 
0 0 0 0 · · · 0 1 2 1
1 0 0 0 ··· 0 0 1 2
A matrix on this form is called a Toeplitz matrix. The general definition is as
follows and may seem complicated, but is in fact quite straightforward:
Definition 3.1. Toeplitz matrices.
An N × N -matrix S is called a Toeplitz matrix if its elements are constant
along each diagonal. More formally, Sk,l = Sk+s,l+s for all nonnegative integers
k, l, and s such that both k + s and l + s lie in the interval [0, N − 1]. A Toeplitz
matrix is said to be circulant if in addition

S(k+s) mod N,(l+s) mod N = Sk,l

for all integers k, l in the interval [0, N − 1], and all s (Here mod denotes the
remainder modulo N ).

Toeplitz matrices are very popular in the literature and have many applica-
tions. A Toeplitz matrix is constant along each diagonal, while the additional
property of being circulant means that each row and column of the matrix
’wraps over’ at the edges. Clearly the matrix given by Equation (3.2) satisfies
Definition 3.1 and is a circulant Toeplitz matrix. A Toeplitz matrix is uniquely
identified by the values on its nonzero diagonals, and a circulant Toeplitz matrix
is uniquely identified by the values on the main diagonal, and on the diagonals
above (or under) it. Toeplitz matrices show up here in the context of filters, but
they will also show up later in the context of wavelets.
Equation (3.1) leads us to the more general expression
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS89

X
zn = tk xn−k . (3.3)
k

If t has infinitely many nonzero entries, the sum is an infinite one, and may
diverge. We will, however, mostly assume that t has a finite number of nonzero
entries. This general expression opens up for defining many types of operations.
The values tk will be called filter coefficients. The range of k is not specified,
but is typically an interval around 0, since zn usually is calculated by combining
xk ’s with indices close to n. Both positive and negative indices are allowed.
As an example, for formula (3.1) k ranges over −1, 0, and 1, and we have that
t−1 = t1 = 1/4, and t0 = 1/2. Since Equation (3.3) needs to be computed for
each n, if only t0 , . . . , tkmax are nonzero, we need to go through the following
for-loop to compute zkmax ,. . . ,zN −1 :

z = zeros_like(x)
for n in range(kmax,N):
for k in range(kmax + 1):
z[n] += t[k]*x[n - k]

It is clearly possible to vectorize the inner loop here, since it takes the form of a
dot product. Another possible way to vectorize is to first change the order of
summation, and then vectorize as follows

z = zeros_like(x)
for k in range(kmax + 1):
z[kmax:N] += t[k]*x[(kmax-k):(N-k)]

Depending on how vectorization is supported, this code will in general execute


faster, and is to prefer. The drawback, however, is that a filter often is applied in
real time, with the output computed only when enough input is available, with
the input becoming available continuously. This second approach then clearly
fails, since it computes nothing before all input is available. In the exercise we
will compare the computation times for the two approaches above, and compare
them with a built-in function which computes the same.
Note that above we did not consider the first entries in z, since this is where
the circulation occurs. Taken this into account, the first filter we considered in
this chapter can be implemented in vectorized form simply as

z[0] = x[1]/4. + x[0]/2. + x[N-1]/4.


z[1:(N-1)] = x[2:N]/4. + x[1:(N-1)]/2. + x[0:(N-2)]/4.
z[N-1] = x[0]/4. + x[N-1]/2. + x[N-2]/4.

In the following we will avoid such implementations, since for-loops can be very
slow. We will see that an efficient built-in function exists for computing this,
and use this instead.
By following the same argument as above, the following is clear:
Proposition 3.2. Filters as matrices.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS90

Any operation defined by Equation (3.3) is a linear transformation which


transforms a vector of period N to another of period N . It may therefore be
represented by an N × N matrix S that maps the vector x = (x0 , x1 , . . . , xN −1 )
to the vector z = (z0 , z1 , . . . , zN −1 ), i.e., we have z = Sx. Moreover, the matrix
S is a circulant Toeplitz matrix, and the first column s of this matrix is given by
(
tk , if 0 ≤ k < N/2;
sk = (3.4)
tk−N if N/2 ≤ k ≤ N − 1.
In other words, the first column of S can be obtained by placing the coefficients
in (3.3) with positive indices at the beginning of s, and the coefficients with
negative indices at the end of s.
This proposition will be useful for us, since it explains how to pass from the
form (3.3), which is most common in practice, to the matrix form S. Since the
filter coefficients tk uniquely define any N × N -circulant Toeplitz matrix, we will
establish the following shorthand notation for the filter matrix for a given set of
filter coefficients. We will use this notation only when we have a finite set of
nonzero filter coefficients (note however that many interesting filters in signal
processing have infinitely many nonzero filter coefficients, see Section 3.5) Note
also that we always choose N so large that the placement of the filter coefficients
in the first column, as dictated by Proposition 3.2, never collide (as happens
when N is smaller than the number of filter coefficients).
Definition 3.3. Compact notation for filters.
Let kmin , kmax be the smallest and biggest index of a filter coefficient in
Equation (3.3) so that tk 6= 0 (if no such values exist, let kmin = kmax = 0), i.e.
kX
max

zn = tk xn−k . (3.5)
k=kmin

We will use the following compact notation for S:

S = {tkmin , . . . , t−1 , t0 , t1 , . . . , tkmax }.


In other words, the entry with index 0 has been underlined, and only the nonzero
tk ’s are listed. kmax and kmin are also called the start and end indices of S. By
the length of S, denoted l(S), we mean the number kmax − kmin .
One seldom writes out the matrix of a filter, but rather uses this compact
notation.

Example 3.1: Finding the matrix elements from the filter


coefficients
Let us apply Proposition 3.2 to the operation defined by formula (3.1):

• for k = 0 Equation (3.4) gives s0 = t0 = 1/2.


CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS91

• For k = 1 Equation (3.4) gives s1 = t1 = 1/4.


• For k = N − 1 Equation (3.4) gives sN −1 = t−1 = 1/4.

For all k different from 0, 1, and N − 1, we have that sk = 0. Clearly this gives
the matrix in Equation (3.2).

Example 3.2: Finding the filter coefficients from the matrix


Proposition 3.2 is also useful when we have a circulant Toeplitz matrix S, and
we want to find filter coefficients tk so that z = Sx can be written on the form
(3.3). Consider the matrix
 
2 1 0 3
3 2 1 0
S= 0 3 2 1 .

1 0 3 2
This is a circulant Toeplitz matrix with N = 4, and we see that s0 = 2, s1 = 3,
s2 = 0, and s3 = 1. The first equation in (3.4) gives that t0 = s0 = 2, and
t1 = s1 = 3. The second equation in (3.4) gives that t−2 = s2 = 0, and
t−1 = s3 = 1. By including only the tk which are nonzero, the operation can be
written as

zn = t−1 xn−(−1) + t0 xn + t1 xn−1 + t2 xn−2 = xn+1 + 2x0 + 3xn−1 .

Example 3.3: Writing down compact filter notation


Using the compact notation for a filter, we would write S = {1/4, 1/2, 1/4} for
the filter given by formula (3.1)). For the filter

zn = xn+1 + 2x0 + 3xn−1


from Example 3.2, we would write S = {1, 2, 3}.

3.1.1 Convolution
Applying a filter to a vector x is also called taking the convolution of the two
vectors t and x. Convolution is usually defined without the assumption that
the input vector is periodic, and without any assumption on the vector lengths
(i.e. they may be sequences of inifinite length). The case where both vectors t
and x have a finite number of nonzero elements dererves extra attention. Assume
that t0 , . . . , tM −1 and x0 , . . . , xN −1 are the only nonzero elements in t and x
M N
(i.e. we can view them Pas vectors in R and R , respectively). It is clear from
the expression zn = tk xn−k that only z0 , . . . , zM +N −2 can be nonzero. This
motivates the following definition.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS92

Definition 3.4. Convolution of vectors.


By the convolution of two vectors t ∈ RM and x ∈ RN we mean the vector
t ∗ x ∈ RM +N −1 defined by
X
(t ∗ x)n = tk xn−k , (3.6)
k

where we only sum over k so that 0 ≤ k < M , 0 ≤ n − k < N .


Note that convolution in the literature usually assumes infinite vectors.
Python has the built-in function convolve for computing t ∗ x. As we shall see
in the exercises this function is highly optimized, and is therefore much used
in practice. Since convolution is not exactly the same as our definition of a
filter (since we assume that a vector is repeated periodically), it would be a
good idea to express our definition of filters in terms of convolution. This can
be achieved with the next proposition, which is formulated for the case with
equally many filter coefficients with negative and positive indices. The result is
thus directly applicable for symmetric filters, which is the type of filters we will
mostly concentrate on. It is a simple exercise to generalize the result to other
filters, however.
Proposition 3.5. Using convolution to compute filters.
Assume that S is a filter on the form

S = {t−L , . . . , t0 , . . . , tL }.
If x ∈ RN , then Sx can be computed as follows:

• Form the vector x̃ = (xN −L , · · · , xN −1 , x0 , · · · , xN −1 , x0 , · · · , xL−1 ) ∈


RN +2L .

• Use the convolve function to compute z̃ = t ∗ x̃ ∈ RM +N +2L−1 .


• We have that Sx = (z̃2L , . . . , z̃M +N −2 ).

We will consider an implementation of this result using the convolve function


in the exercises.
Proof. When x ∈ RN , the operation x → t ∗ x can be represented by an
(M + N − 1) × N matrix. It is easy to see that this matrix has element (i + s, i)
equal to ts , for 0 ≤ i < M , 0 ≤ s < N . In the left part of Figure 3.1 such a
matrix is shown for M = 5. The (constant) nonzero diagonals are shown as
diagonal lines.
Now, form the vector x̃ ∈ RN +2L as in the text of the theorem. Convolving
(t−L , . . . , tL ) with vectors in RN +2L can similarly be represented by an (M + N +
2L − 1) × (N + 2L)-matrix. The rows from 2L up to and including M + N − 2 in
this matrix (we have marked these with horizontal lines above) make up a new
matrix S̃, shown in the right part of Figure 3.1 (S̃ is an N × (N + 2L) matrix).
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS93

@
@
@@
@@
@@
@@
@
@@@
@
@
@@@ @ @@
@@@
@@@@ @@@
@@
@
@@
@
@@
@
@@@@
@@
@@@
@@@
@@
@@@@ @
@@@
@
@@
@@
@
@
Figure 3.1: Matrix for the operation x → t ∗ x (left), as well as this matrix
with the first and last 2L rows dropped (right).

We need to show that Sx = S̃ x̃. We have that S̃ x̃ equals the matrix shown in
the left part of Figure 3.2 multiplied with (xN −L , . . . , xN −1 , x0 , . . . , xN −1 , x0 , . . . , xL−1 )
(we inserted extra vertical lines in the matrix where circulation occurs), which
equals the matrix shown in the right part of Figure 3.2 multiplied with (x0 , . . . , xN −1 ).
We see that this is Sx, and the proof is complete.

@@@@@ @@@ @@
@@@@@ @@@@ @
@@@@@ @@@@@
@@@@@ @@@@@
@@@@@ @@@@@
@@@@@ @@@@@
@@@@@ @@@@@
@@@@@ @@@@@
@@@@@ @@@@@
@@@@@ @@@@@
@@@@@ @ @@@@
@@@@@ @@ @@@

Figure 3.2: The matrix we multiply with


(xN −L , . . . , xN −1 , x0 , . . . , xN −1 , x0 , . . . , xL−1 ) (left), and the matrix we
multiply with (x0 , . . . , xN −1 ) (right).

There is also a very nice connection between convolution and polynomials:


Proposition 3.6. Convolution and polynomials.
Assume that p(x) = aN xN + aN −1 xN −1 + . . . , a1 x + a0 and q(x) = bM xM +
bM −1 xM −1 + . . . , b1 x + b0 are polynomials of degree N and M respectively.
Then the coefficients of the polynomial pq can be obtained by computing
convolve(a,b).
We can thus interpret a filter as a polynomial. In this setting, clearly the
length l(S) of the filter can be interpreted as the degree of the polynomial. If
t ∈ RM and x ∈ RN , then they can be associated with polynomials of degree
M − 1 and N − 1, respectively. Also, their convolution, which is in RM +N −1 , can
be associated with a polynomial of degree M + N − 2, which is the sum of the
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS94

degrees of the individual polynomials. Of course we can make the same addition
of degrees when we multiply polynomials. Clearly the polynomial associated
with t is the frequency response, when we insert x = e−iω . Also, applying two
filters in succession is equivalent to applying the convolution of the filters, so
that two filtering operations can be combined to one.
Since the number of nonzero filter coefficients is typically much less than N
(the period of the input vector), the matrix S have many entries which are zero.
Multiplication with such matrices requires less additions and multiplications
than for other matrices: If S has k nonzero filter coefficients, S has N k nonzero
entries, so that kN multiplications and (k −1)N additions are needed to compute
Sx. This is much less than the N 2 multiplications and (N − 1)N additions
needed in the general case. Perhaps more important is that we need not form
the entire matrix, we can perform the matrix multiplication directly in a loop.
For large N we risk running into out of memory situations if we had to form the
entire matrix.

Exercise 3.4: Finding the filter coefficients and the matrix


Assume that the filter S is defined by the formula
1 1 1 1
zn = xn+1 + xn + xn−1 + xn−2 .
4 4 4 4
Write down the filter coefficients tk , and the matrix for S when N = 8.

Exercise 3.5: Finding the filter coefficients from the matrix


Given the circulant Toeplitz matrix
 
1 2 0 0
0 1 2 0
S= 0
,
0 1 2
2 0 0 1
write down the filter coefficients tk .

Exercise 3.6: Convolution and polynomials


Compute the convolution of {1, 2, 1} with itself. interpret the result in terms of
two polynomials.

Exercise 3.7: Implementation of convolution


Implement code which computes t ∗ x in the two ways described after Equation
(3.3), i.e. as a double for loop, and as a simple for loop in k, with n vectorized.
As your t, take k randomly generated numbers. Compare execution times for
these two methods and the convolve function, for different values of k. Present
the result as a plot where k runs along the x-axis, and execution times run along
the y-axis. Your result will depend on how Python performs vectorization.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS95

Exercise 3.8: Filters with a different number of coefficients


with positive and negative indices
Assume that S = {t−E , . . . , t0 , . . . , tF }. Formulate a generalization of Proposi-
tion 3.5 for such filters, i.e. to filters where there may be a different number of
filter coefficients with positive and negative indices. You should only need to
make some small changes to the proof of Proposition 3.5 to achieve this.

Exercise 3.9: Implementing filtering with convolution


Implement a function filterS which uses Proposition 3.5 and the convolve
function Sx when S = {t−L , . . . , t0 , . . . , tL } The function should take the vectors
(t−L , . . . , t0 , . . . , tL ) and x as input.

3.2 Formal definition of filters and the vector


frequency response
Let us now define digital filters formally, and establish their relationship to
Toeplitz matrices. We have seen that a sound can be decomposed into different
frequency components, and we would like to define filters as operations which
adjust these frequency components in a predictable way. One such example is
provided in Example 2.16, where we simply set some of the frequency components
to 0. The natural starting point is to require for a filter that the output of a
pure tone is a pure tone with the same frequency.
Definition 3.7. Digital filters and vector frequency response.
A linear transformation S : RN 7→ RN is a said to be a digital filter, or simply
a filter, if, for any integer n in the range 0 ≤ n ≤ N − 1 there exists a value λS,n
so that

S (φn ) = λS,n φn , (3.7)


i.e., the N Fourier vectors are the eigenvectors of S. The vector of (eigen)values
−1
λS = (λS,n )Nn=0 is often referred to as the (vector) frequency response of S.

Since the Fourier basis vectors are orthogonal vectors, S is clearly orthogonally
diagonalizable. Since also the Fourier basis vectors are the columns in (FN )H ,
we have that

S = FNH DFN (3.8)


whenever S is a digital filter, where D has the frequency response (i.e. the
eigenvalues) on the diagonal 1 . We could also use DFTN to diagonalize filters,
but it is customary to use an orthogonal matrix (i.e. FN ) when the matrix is
1 Recall that the orthogonal diagonalization of S takes the form S = P DP T , where P

contains as columns an orthonormal set of eigenvectors, and D is diagonal with the eigenvectors
listed on the diagonal (see Section 7.1 in [25]).
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS96

orthogonally diagonalizable. In particular, if S1 and S2 are digital filters, we can


write S1 = FNH D1 FN and S2 = FNH D2 FN , so that

S1 S2 = FNH D1 FN FNH D2 FN = FNH D1 D2 FN .


Since D1 D2 = D2 D1 for any diagonal matrices, we get the following corollary:
Corollary 3.8. The product of two filters is a filter.
The product of two digital filters is again a digital filter. Moreover, all digital
filters commute, i.e. if S1 and S2 are digital filters, S1 S2 = S2 S1 .

Clearly also S1 + S2 is a filter when S1 and S2 are. The set of all filters is thus
a vector space, which also is closed under multiplication. Such a space is called
an algebra. Since all filters commute, this algebra is also called a commutative
algebra.

3.2.1 Time delay


Time delay with d elements is defined by Ed (x = z, with zk = xk−d . From
this it follows that the vector with components e2πikn/N is sent to the vector
with components e2πi(k−d)n/N = e−2πdn/N e2πikn/N . This means that φn is an
eigenvector, so that time delay with d elements is a digital filter. Since e−2πdn/N
is the corresponding eigenvalue for φn , this also gives us the frequency response.
Since all |λS,n | = 1, time-delay does not change the amplitude of frequencies in
sounds.
The next result states three equivalent characterizations of a digital filter.
The first one is simply the definition in terms of having the Fourier basis as
eigenvectors. The second is that the matrix is circulant Toeplitz, i.e. that
the operations we started this chapter with actually are filters. The third
characterization is in terms of a new concept which we now define.
Definition 3.9. Time-invariance.
Assume that S is a linear transformation from RN to RN . Let x be input to
S, and y = Sx the corresponding output. Let also z, w be delays of x, y with
d elements (i.e. z = Ed x, w = Ed y). S is said to be time-invariant if, for any
d and x, Sz = w (i.e. S sends the delayed input vector to the delayed output
vector, equivalently SEd = Ed S).
Clearly time delay is time-invariant, since Ed1 Ed2 = Ed2 Ed1 = Ed1 +d2 for
any d1 and d2 .

Theorem 3.10. Characterizations of digital filters.


The following are equivalent characterizations of a digital filter:

• S = (FN )H DFN for a diagonal matrix D, i.e. the Fourier basis is a basis
of eigenvectors for S.
• S is a circulant Toeplitz matrix.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS97

• S is linear and time-invariant.

Proof. If S is a filter, then SEd = Ed S for all d since all filters commute, so that
S is time-invariant. This proves 1. → 3..
Assume that S is time-invariant. Note that Ed e0 = ed , and since SEd e0 =
Ed Se0 we have that Sed = Ed s, where s is the first column of S. This also says
that column d of S can be obtained by delaying the first column of S with d
elements. But then d is a circulant Toeplitz matrix. This proves 3. P → 2..
N −1
Finally, any circulant Toeplitz matrix can be written on the form d=0 sd Ed
(by splitting the matrix into a sum of its diagonals). Since all Ed are filters, it is
clear that any circulant Toeplitz matrix is a filter. This proves 2. → 1..
Due to this result, filters are also called LTI filters, LTI standing for Linear,
Time-Invariant. Also, operations defined by (3.3) are digital filters, when re-
stricted to vectors with period N . The following results enables us to compute
the eigenvalues/frequency response easily through the DFT, so that we do not
need to form the characteristic polynomial and find its roots:
Theorem 3.11. Connection between frequency response and the matrix.
Any digital filter is uniquely characterized by the values in the first column
of its matrix. Moreover, if s is the first column in S, the frequency response of
S is given by

λS = DFTN s. (3.9)
Conversely, if we know the frequency response λS , the first column s of S is
given by

s = IDFTN λS . (3.10)
Proof. If we replace S by (FN )H DFN we find that

   
1 1
√ √ 0 √ 0
DFTN s = N FN s = N FN S  .  = N FN FNH DFN  . 
   
 ..   .. 
0 0
 
1  
1
√ 0
 .. 
= N DFN  .  = D  .  = λS ,
 
 .. 
1
0

where we have used that the first column in FN has all entries equal to 1/ N ,
and that the diagonal matrix D has all the eigenvalues of S on its diagonal,
so that the last expression is the vector of eigenvalues λS . This proves (3.9).
Equation (3.10) follows directly by applying the inverse DFT to (3.9).
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS98

The first column s, which thus characterizes the filter, is also called the
impulse response. This name stems from the fact that we can write s = Se0 ,
i.e. the vector s is the output (often called response) to the vector e0 (often
called an impulse). Equation (3.9) states that the frequency response can be
written as
N
X −1
λS,n = sk e−2πink/N , for n = 0, 1, . . . , N − 1, (3.11)
k=0

where sk are the components of s.


The identity matrix is a digital filter since I = (FN )H IFN . Since e0 is the
first column, it has impulse response s = e0 . Its frequency response has 1 in all
components and therefore preserves all frequencies, as expected.
In signal processing, the frequency content of a vector (i.e., its DFT) is also
referred to as its spectrum. This may be somewhat confusing from a linear
algebra perspective, because in this context the term spectrum is used to denote
the eigenvalues of a matrix. But because of Theorem 3.11 this is not so confusing
after all if we interpret the spectrum of a vector (in signal processing terms) as
the spectrum of the corresponding digital filter (in linear algebra terms).

Example 3.10: Frequency response of a simple filter


When only few of the coefficients sk are nonzero, it is possible to obtain nice
expressions for the frequency response. To see this, let us compute the frequency
response of the filter defined from formula (3.1). We saw that the first column
of the corresponding Toeplitz matrix satisfied s0 = 1/2, and sN −1 = s1 = 1/4.
The frequency response is thus

1 0 1 −2πin/N 1 −2πin(N −1)/N


λS,n = e + e + e
2 4 4
1 1 −2πin/N 1 2πin/N 1 1
= e0 + e + e = + cos(2πn/N ).
2 4 4 2 2

Example 3.11: Matrix form


We have seen that the DFT can be used to spare us the tedious calculation
of eigenvectors and eigenvalues we are used to, at least for circulant Toeplitz
matrices. Let us compare the two approaches for a simple matrix.
 
4 1
S= .
1 4
It is straightforward to compute the eigenvalues and eigenvectors of this matrix
the way you learned in your first course in linear algebra. However, this matrix
is also a circulant Toeplitz matrix, so that we can use the results in this section
to compute the eigenvalues and eigenvectors. Since here N = 2, we have that
e2πink/N = eπink = (−1)nk . This means that the Fourier basis vectors are
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS99
√ √
(1, 1)/ 2 and (1, −1)/ 2, which also are the eigenvectors of S. The eigenvalues
are the frequency response of S, which can be obtained as
√ √ 1 1 1
    
4 5
N FN s = 2 √ =
2 1 −1 1 3
The eigenvalues are thus 3 and 5. You could have obtained the same result with
your computer. Note that the computer may not return the eigenvectors exactly
as the Fourier basis vectors, since the eigenvectors are not unique (the multiple of
an eigenvector is also an eigenvector). The computer may for instance switch the
signs of the eigenvectors. We have no control over what the computer chooses
to do, since some underlying numerical algorithm for computing eigenvectors is
used, which we can’t influence.

Example 3.12: Computing the output of a filter


Certain vectors are easy to express in terms of the Fourier basis. This enables
us to compute the output of such vectors from a digital filter easily.
Let us consider the filter S defined by zn = 16 (xn+2 + 4xn+1 + 6xn + 4xn−1 +
xn−2 ), and see how we can compute Sx when

x = (cos(2π5 · 0/N ), cos(2π5 · 1/N ), . . . , cos(2π5 · (N − 1)/N )) ,

where N is the length of the vector. We note first that

√  
N φ5 = e2πi5·0/N , e2πi5·1/N , . . . , e2πi5·(N −1)/N
√  
N φN −5 = e−2πi5·0/N , e−2πi5·1/N , . . . , e−2πi5·(N −1)/N ,

−2πi5k/N
Since e2πi5k/N
√ +e = 2 cos(2π5k/N ), we get by adding the two vectors
1
that x = 2 N (φ5 + φN −5 ). Since the φn are eigenvectors, we have expressed x
as a sum of eigenvectors. The corresponding eigenvalues are given by the vector
frequency response, so let us compute this. If N = 8, computing Sx means to
multiply with the 8 × 8 circulant Toeplitz matrix
 
6 4 1 0 0 0 1 4
 4 6 4 1 0 0 0 1
 
 1 4 6 4 1 0 0 0
 
1 0 1 4 6 4 1 0 0

6  0 0 1 4 6 4 1 0


 0 0 0 1 4 6 4 1
 
 1 0 0 0 1 4 6 4
4 1 0 0 0 1 4 6
We now see that
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS100

1
λS,n = (6 + 4e−2πin/N + e−2πi2n/N + e−2πi(N −2)n/N + 4e−2πi(N −1)n/N )
6
1
= (6 + 4e2πin/N + 4e−2πin/N + e2πi2n/N + e−2πi2n/N )
6
4 1
= 1 + cos(2πn/N ) + cos(4πn/N ).
3 3
The two values of this we need are

4 1
λS,5 = 1 + cos(2π5/N ) + cos(4π5/N )
3 3
4 1
λS,N −5 = 1 + cos(2π(N − 5)/N ) + cos(4π(N − 5)/N )
3 3
4 1
= 1 + cos(2π5/N ) + cos(4π5/N ).
3 3
Since these are equal, x is a sum of eigenvectors with equal eigenvalues. This
means that x itself also is an eigenvector, with the same eigenvalue, so that
 
4 1
Sx = 1 + cos(2π5/N ) + cos(4π5/N ) x.
3 3

3.2.2 Using digital filters to approximate filters


The formal definition of digital filters resembles that of filters from Chapter 1,
the difference being that the Fourier basis now is discrete. Let us try to connect
the two. In doing so, let us differ them by calling the ones from Chapter 1 simply
filters (without digital in front). We have the following result.
Theorem 3.12. Connection with the frequency response.
Let s be a filter with frequency response λs (f ), and assume that f ∈ VM,T
(so that also s(f ) ∈ VM,T ). Let

x = (f (0 · T /N ), f (1 · T /N ), . . . , f ((N − 1)T /N ))
z = (s(f )(0 · T /N ), s(f )(1 · T /N ), . . . , s(f )((N − 1)T /N ))

be vectors of N = 2M + 1 uniform samples from f and s(f ). Then the operation


S : x → z (i.e. the operation which sends the samples of the input to the samples
of the output) is well-defined on RN , and is an N × N -digital filter with vector
frequency response λS,n = λs (n/T ).
Proof. With N = 2M + 1 we know that f ∈ VM,T is uniquely determined from
x. This means that s(f ) also is uniquely determined from x, so that z also is
uniquely determined from x. The operation S : x → z is therefore well-defined
on RN .
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS101

Clearly also s(e2πint/T ) = λs (n/T )e2πint/T . Since the samples of e2πint/T


is the vector e2πikn/N , and the samples of λs (n/T )e2πint/T is λs (n/T )e2πikn/N ,
the vector e2πikn/N is an eigenvector of S with eigenvalue λs (n/T ). Clearly then
S is a digital filter with frequency response λS,n = λs (n/T ).
It is interesting that the vector frequency response above is obtained by
sampling the frequency response. In this way we also see that it is easy to realize
any digital filter as the restriction of a filter: an filter s where the frequency
response has the values λS,n at the points n/T will do. In the theorem it is
essential that f ∈ VM,T . There are many functions with the same samples, but
where the samples of the output from the filter are different. When we restrict
to VM,T , however, the output samples are always determined from the input
samples.
Theorem 3.12 explains how digital filters can occur in practice. In the real
world, a signal is modeled as a continuous function f (t), and an operation on
signals as a filter s. We can’t compute the entire output s(f ) of the filter, but it
is possible to apply the digital filter from Theorem 3.12 to the samples x of f .
In general f (t) may not lie in VM,T , but we can denote by f˜ the unique function
in VM,T with the same samples as f (as in Section 2.2). By definition, Sx are
the samples of s(f˜) ∈ VM,T . s(f˜) can finally be found from these samples by
using the procedure from Figure 2.4 for finding s(f˜). This procedure for finding
s(f˜) is illustrated in Figure 3.3.

f / s(f˜)
O

 S /z FN
/y
x

Figure 3.3: The connections between filters and digital filters, sampling and
interpolation, provided by Theorem 3.12. The left vertical arrow represents
sampling, the right vertical arrow represents interpolation.

Clearly, s(f˜) is an approximation to s(f ), since f˜ is an approximation to f ,


and since s is continuous. Let us summarize this as follows:
Idea 3.13. Approximating a filter.
A filter s can be approximated through sampling, a digital filter, the DFT,
and interpolation, as illustrated in Figure 3.3. S is the digital filter with
frequency response λS,n = λs (n/T ). When f ∈ VM,T , this approximation equals
s(f ). When we increase the number of sample points/the size of the filter, the
approximation becomes better. If there is a bound on the highest frequency in f ,
there exists an N so that when sampling of that size, the approximation equals
s(f ).
Let us comment on why the last statements here are true. That the approx-
imation equals s(f ) when f ∈ VM,T is obvious, since both f and s(f ) ∈ VM,T
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS102

are determined from their samples then. If there is a bound on the highest
frequency in f , then f lies in VM,T for large enough M , so that we recover s(f )
as our approximation using N = 2M + 1. Finally, what happens when there is
no bound on the highest frequency? We know that s(fN ) = (s(f ))N . Since fN
is a good approximation to f , the samples x of f are close to the samples of fN .
By continuity of the digital filter, z = Sx will also be close to the samples of
(s(f ))N = s(fN ), so that (also by continuity) interpolating with z gives a good
approximation to (s(f ))N , which is again a good approximation to s(f )). From
this it follows that the digital filter is a better approximation when N is high.

Exercise 3.13: Time reversal is not a filter


In Example 1.2 we looked at time reversal as an operation on digital sound.
In RN this can be defined as the linear mapping which sends the vector ek to
eN −1−k for all 0 ≤ k ≤ N − 1.
a) Write down the matrix for the time reversal linear mapping, and explain
from this why time reversal is not a digital filter.
b) Prove directly that time reversal is not a time-invariant operation.

Exercise 3.14: When is a filter symmetric?


Let S be a digital filter. Show that S is symmetric if and only if the frequency
response satisfies λS,n = λS,N −n for all n.

Exercise 3.15: Eigenvectors and eigenvalues


Consider the matrix
 
4 1 3 1
1 4 1 3
S=
3
.
1 4 1
1 3 1 4
a) Compute the eigenvalues and eigenvectors of S using the results of this
section. You should only need to perform one DFT in order to achieve this.
b) Verify the result from a) by computing the eigenvectors and eigenvalues the
way you taught in your first course in linear algebra. This should be a much
more tedious task.
c) Use a computer to compute the eigenvectors and eigenvalues of S also. For
some reason some of the eigenvectors seem to be different from the Fourier basis
vectors, which you would expect from the theory in this section. Try to find an
explanation for this.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS103

Exercise 3.16: Composing filters


Assume that S1 and S2 are two circulant Toeplitz matrices.
a) How can you express the eigenvalues of S1 + S2 in terms of the eigenvalues
of S1 and S2 ?
b) How can you express the eigenvalues of S1 S2 in terms of the eigenvalues of
S1 and S2 ?
c) If A and B are general matrices, can you find a formula which expresses the
eigenvalues of A + B and AB in terms of those of A and B? If not, can you find
a counterexample to what you found in a) and b)?

Exercise 3.17: Keeping every second component


Consider the linear mapping S which keeps every second component in RN ,
i.e. S(e2k ) = e2k , and S(e2k−1 ) = 0. Is S a digital filter?

3.3 The continuous frequency response and prop-


erties
If we make the substitution ω = 2πn/N in the formula for λS,n , we may interpret
the frequency response as the values on a continuous function on [0, 2π).
Theorem 3.14. Connection between vector- and continuous frequency response.
The function λS (ω) defined on [0, 2π) by
X
λS (ω) = tk e−ikω , (3.12)
k

where tk are the filter coefficients of S, satisfies

λS,n = λS (2πn/N ) for n = 0, 1, . . . , N − 1


for any N . In other words, regardless of N , the vector frequency response lies
on the curve λS . λS (ω)) is called the continuous frequency response of S, and ω
is called angular frequency.
The difference between the vector- and continuous frequency response lies in
that one uses the filter coefficients tk , while the other uses the impulse response
sk . These contain the same values, but they are ordered differently. The result
shows that, at the points 2πn/N , they are equal.
Proof. For any N we have that
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS104

N
X −1 X X
λS,n = sk e−2πink/N = sk e−2πink/N + sk e−2πink/N
k=0 0≤k<N/2 N/2≤k≤N −1
X X
= tk e−2πink/N + tk−N e−2πink/N
0≤k<N/2 N/2≤k≤N −1
X X
−2πink/N
= tk e + tk e−2πin(k+N )/N
0≤k<N/2 −N/2≤k≤−1
X X
−2πink/N
= tk e + tk e−2πink/N
0≤k<N/2 −N/2≤k≤−1
X
−2πink/N
= tk e = λS (2πn/N ).
−N/2≤k<N/2

where we have used Equation (3.4).


If t is the set of filter coefficients of S, we can combine Theorem 3.14 with
Equation (3.9), and use the fact that time delay does not affect the absolute
value of the DFT (Theorem 2.7), in order to plot the frequency response of S as
follows:

omega = 2*pi*arange(0,N)/float(N)
s = concatenate([t, zeros(N - len(t))])
plot(omega, abs(fft.fft(s)))

With this procedure we avoid computing the frequency response by hand. We


will have use for
PNthis several places later.
−1
Note that k=0 sk e−πiω typically will not converge when N → ∞ (although
it gives the right values at all points ω = 2πn/N for all N )! The filter coefficients
avoid this convergence problem, however, since we assume that only tk with |k|
small are nonzero. In other words, filter coefficients are used in the definition of
the continuous frequency response so that we can find a continuous curve where
we can find the vector frequency response values for all N .
The frequency response contains the important characteristics of a filter,
since it says how it behaves for the different frequencies. When analyzing a
filter, we therefore often plot the frequency response. Often we plot only the
absolute value (or the magnitude) of the frequency response, since this is what
explains how each frequency is amplified or attenuated. Since λS is clearly
periodic with period 2π, we may restrict angular frequency to the interval [0, 2π).
The conclusion in Observation 2.10 was that the low frequencies in a vector
correspond to DFT indices close to 0 and N − 1, and high frequencies correspond
to DFT indices close to N/2. This observation is easily translated to a statement
about angular frequencies:

Observation 3.15. Plotting the frequency response.


CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS105

When plotting the frequency response on [0, 2π), angular frequencies near 0
and 2π correspond to low frequencies, angular frequencies near π correspond to
high frequencies
λS may also be viewed as a function defined on the interval [−π, π). Plotting
on [−π, π] is often done in practice, since it makes clearer what corresponds to
lower frequencies, and what corresponds to higher frequencies:
Observation 3.16. Higher and lower frequencies.
When plotting the frequency response on [−π, π), angular frequencies near 0
correspond to low frequencies, angular frequencies near ±π correspond to high
frequencies.

The following holds:


Theorem 3.17. Connection between analog and digital filters.
Assume that s is an analog filter, and that we sample a periodic function at
rate fs over one period, and denote the corresponding digital filter by S. The
analog and digital frequency responses are related by λs (f ) = λS (2πf fs ).

To see this, note first that S has frequency response λS,n = λs (n/T ) =
λs (f ), where f = n/T . We then rewrite λS,n = λS (2πn/N ) = λS (2πf T /N ) =
λS (2πf fs ).
Since the frequency response is essentially a DFT, it inherits several properties
from Theorem 2.7. We will mostly use the continuous frequency response to
express these properties.
Theorem 3.18. Properties of the frequency response.
We have that

• The continuous frequency response satisfies λS (−ω) = λS (ω).


• If S is a digital filter, S T is also a digital filter. Moreover, if the frequency
response of S is λS (ω), then the frequency response of S T is λS (ω).

• If S is symmetric, λS is real. Also, if S is antisymmetric (the element on


the opposite side of the diagonal is the same, but with opposite sign), λS
is purely imaginary.
• A digital filter S is an invertible if and only if λS,n 6= 0 for all n. In that
case S −1 is also a digital filter, and λS −1 ,n = 1/λS,n .

• If S1 and S2 are digital filters, then S1 S2 also is a digital filter, and


λS1 S2 (ω) = λS1 (ω)λS2 (ω).

Proof. Property 1. and 3. follow directly from Theorem 2.7. Transposing a


matrix corresponds to reversing the first column of the matrix and thus also
the filter coefficients. Due to this Property 2. also follows from Theorem 2.7. If
S = (FN )H DFN , and all λS,n 6= 0, we have that S −1 = (FN )H D−1 FN , where
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS106

D−1 is a diagonal matrix with the values 1/λS,n on the diagonal. Clearly then
S −1 is also a digital filter, and its frequency response is λS −1 ,n = 1/λS,n , which
proves 4. The last property follows in the same was as we showed that filters
commute:

S1 S2 = (FN )H D1 FN (FN )H D2 FN = (FN )H D1 D2 FN .

The frequency response of S1 S2 is thus obtained by multiplying the frequency


responses of S1 and S2 .
In particular the frequency response may not be real, although this was the
case in the first example of this section. Theorem 3.18 applies also for the vector
frequency response. Since the vector frequency response are the eigenvalues of
the filter, the last property above says that, for filters, multiplication of matrices
corresponds to multiplication of eigenvalues. Clearly this is an important property
which is shared with all other matrices which have the same eigenvectors.

Example 3.18: Plotting a simple frequency response


In Example 3.10 we computed the vector frequency response of the filter defined
in formula (3.1). The filter coefficients are here t−1 = 1/4, t0 = 1/2, and t1 = 1/4.
The continuous frequency response is thus
1 iω 1 1 −iω 1 1
λS (ω) = e + + e = + cos ω.
4 2 4 2 2
Clearly this matches the computation from Example 3.10. Figure 3.4 shows
plots of this frequency response, plotted on the intervals [0, 2π) and [−π, π).

1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.00 1 2 3 4 5 6 0.0 3 2 1 0 1 2 3
Figure 3.4: |λS (ω)| of the moving average filter of Formula (3.1), plotted over
[0, 2π] and [−π, π].

Both the continuous frequency response and the vector frequency response
for N = 51 are shown. The right part shows clearly how the high frequencies
are softened by the filter.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS107

Example 3.19: Computing a composite filter


Assume that the filters S1 and S2 have the frequency responses λS1 (ω) = cos(2ω),
λS2 (ω) = 1 + 3 cos ω. Let us see how we can use Theorem 3.18 to compute the
filter coefficients and the matrix of the filter S = S1 S2 . We first notice that,
since both frequency responses are real, all S1 , S2 , and S = S1 S2 are symmetric.
We rewrite the frequency responses as

1 2iω 1 1
λS1 (ω) = (e + e−2iω ) = e2iω + e−2iω
2 2 2
3 iω 3 iω 3
λS2 (ω) = 1 + (e + e ) = e + 1 + e−iω .
−iω
2 2 2
We now get that

  
1 2iω 1 −2iω 3 iω 3 −iω
λS1 S2 (ω) = λS1 (ω)λS2 (ω) = e + e e +1+ e
2 2 2 2
3 3iω 1 2iω 3 iω 3 −iω 1 −2iω 3 −3iω
= e + e + e + e + e + e
4 2 4 4 2 4
From this expression we see that the filter coefficients of S are t±1 = 3/4,
t±2 = 1/2, t±3 = 3/4. All other filter coefficients are 0. Using Theorem 3.2, we
get that s1 = 3/4, s2 = 1/2, and s3 = 3/4, while sN −1 = 3/4, sN −2 = 1/2, and
sN −3 = 3/4 (all other sk are 0). This gives us the matrix representation of S.

3.3.1 Windowing operations


In this section we will take a look at a very important, and perhaps surprising,
application of the continuous frequency response. Let us return to the computa-
tions from Example 2.16. There we saw that, when we restricted to a block of
the signal, this affected the frequency representation. If we substitute with the
angular frequencies ω = 2πn/N and ω0 = 2πn0 /M in Equation (2.12), we get

N −1 N −1
1 X ikω0 −ikω 1 X −ik(ω−ω0 )
yn = e e = e
N N
k=0 k=0

(here yn were the DFT components of the sound after we had restricted to a
block). This expression states that, when we restrict to a block of length N in
the signal by discarding the other samples, a pure tone of angular frequency
ω0 suddenly gets a frequency contribution at angular frequency ω also, and the
contribution is given by this formula. The expression is seen to be the same as
the frequency response of the filter N1 {1, 1, . . . , 1} (where 1 is repeated N times),
evaluated at ω − ω0 . This filter is nothing but a (delayed) moving average filter.
The frequency response of a moving average filter thus governs how the different
frequencies pollute when we limit ourselves to a block of the signal. Since this
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS108

frequency response has its peak at 0, angular frequencies ω close to ω0 have


biggest values, so that the pollution is mostly from frequencies close to ω0 . But
unfortunately, other frequencies also pollute.
One can also ask the question if there are better ways to restrict to a block
of size N of the signal. We formulate the following idea.
Idea 3.19. Windows.
Let x = (x0 , . . . , xM ) be a sound of length M . We would like to find values
w = {w0 , . . . , wN −1 } so that the new sound (w0 x0 , . . . , wN −1 xN −1 ) of length
N < M has a frequency representation similar to that of x. w is called a window
of length N , and the new sound is called the windowed signal.
Above we encountered the window w = {1, 1, . . . , 1}. This is called the
rectangular window. To see how we can find a good window, note first that the
DFT values in the windowed signal of length N is

N −1 N −1
1 X 1 X
yn = wk eikω0 e−ikω = wk e−ik(ω−ω0 ) .
N N
k=0 k=0
1
This is the frequency response of N w.In order to limit the pollution from
other frequencies, we thus need to construct a window with a frequency response
with smaller values than that of the rectangular window away from 0. Let us
summarize our findings as follows:
Observation 3.20. Constructing a window.
Assume that we would like to construct a window of length N . It is desirable
that the frequency response of the window has small values away from zero.
We will not go into techniques for how such frequency responses can be
constructed, but only consider one example different from the rectangular window.
We define the Hamming window by

wn = 2(0.54 − 0.46 cos(2πn/(N − 1))). (3.13)


The frequency responses of the rectangular window and the Hamming window
are compared in Figure 3.5 for N = 32.
We see that the Hamming window has much smaller values away from 0,
so that it is better suited as a window. However, the width of the “main lobe”
(i.e. the main structure at the center), seems to be bigger. The window coefficients
themselves are shown in Figure 3.6. It is seen that the frequency response of the
Hamming window attenuates more and more as we get close to the boundaries.
Many other windows are used in the literature. The concrete window from
Exercise 3.29 is for instance used in the MP3 standard. It is applied to the
sound, and after this an FFT is applied to the windowed sound in order to make
a frequency analysis of that part of the sound. The effect of the window is that
there is smaller loss in the frequency representation of the sound when we restrict
to a block of sound samples. This is a very important part of the psychoacoustic
model used in the MP3 encoder, since it has to make compression decisions
based on the frequency information in the sound.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS109

1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 3 2 1 0 1 2 3 0.0 3 2 1 0 1 2 3
Figure 3.5: The frequency responses of the rectangular and Hamming windows,
which we considered for restricting to a block of the signal.

2.0 2.0
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0 5 10 15 20 25 30 0.0 0 5 10 15 20 25 30
Figure 3.6: The coefficients of the rectangular and Hamming windows, which
we considered for restricting to a block of the signal.

Exercise 3.20: Plotting a simple frequency response


Let again S be the filter defined by the equation
1 1 1 1
zn =xn+1 + xn + xn−1 + xn−2 ,
4 4 4 4
as in Exercise 3.4. Compute and plot (the magnitude of) λS (ω).

Exercise 3.21: Low-pass and high-pass filters


A filter S is defined by the equation
1
zn = (xn + 3xn−1 + 3xn−2 + xn−3 ).
3
a) Compute and plot the (magnitude of the continuous) frequency response of
the filter, i.e. |λS (ω)|. Is the filter a low-pass filter or a high-pass filter?
b) Find an expression for the vector frequency response λS,2 . What is Sx when
x is the vector of length N with components e2πi2k/N ?
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS110

Exercise 3.22: Circulant matrices


A filter S1 is defined by the equation
1
zn = (xn+2 + 4xn+1 + 6xn + 4xn−1 + xn−2 ).
16
a) Write down an 8 × 8 circulant Toeplitz matrix which corresponds to applying
S1 on a periodic signal with period N = 8.
b) Compute and plot (the continuous) frequency response of the filter. Is the
filter a low-pass filter or a high-pass filter?
c) Another filter S2 has (continuous) frequency response λS2 (ω) = (eiω + 2 +
e−iω )/4. Write down the filter coefficients for the filter S1 S2 .

Exercise 3.23: Composite filters


Assume that the filters S1 and S2 have the frequency responses λS1 (ω) =
2 + 4 cos(ω), λS2 (ω) = 3 sin(2ω).
a) Compute and plot the frequency response of the filter S1 S2 .
b) Write down the filter coefficients tk and the impulse response s for the filter
S1 S2 .

Exercise 3.24: Maximum and minimum


Compute and plot the continuous frequency response of the filter S = {1/4, 1/2, 1/4}.
Where does the frequency response achieve its maximum and minimum value,
and what are these values?

Exercise 3.25: Plotting a simple frequency response


Plot the continuous frequency response of the filter T = {1/4, −1/2, 1/4}. Where
does the frequency response achieve its maximum and minimum value, and what
are these values? Can you write down a connection between this frequency
response and that from Exercise 3.24?

Exercise 3.26: Continuous- and vector frequency responses


Define the filter S by S = {1, 2, 3, 4, 5, 6}. Write down the matrix for S when
N = 8. Plot (the magnitude of) λS (ω), and indicate the values λS,n for N = 8
in this plot.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS111

Exercise 3.27: Starting with circulant matrices


Given the circulant Toeplitz matrix
 
1 1 1 ··· 1
1
 1 1 ··· 0 
0
 1 1 ··· 0 
1 . .. .. .. .. 
.
S = . . . . .
5 0

 0 0 ··· 1 
1
 0 0 ··· 1 
1 1 0 ··· 1
1 1 1 ··· 1
Write down the compact notation for this filter. Compute and plot (the magni-
tude) of λS (ω).

Exercise 3.28: When the filter coefficients are powers


Assume that S = {1, c, c2 , . . . , ck }. Compute and plot λS (ω) when k = 4 and
k = 8. How does the choice of k influence the frequency response? How does
the choice of c influence the frequency response?

Exercise 3.29: The Hanning window


The Hanning window is defined by wn = 1−cos(2πn/(N −1)). Compute and plot
the window coefficients and the continuous frequency response of this window for
N = 32, and compare with the window coefficients and the frequency responses
for the rectangular- and the Hamming window.

3.4 Some examples of filters


We have now established the basic theory of filters, and it is time to study some
specific examples. Some of the filters have the desirable property that they favor
certain frequencies, while annihilating others. Such filters have their own names.
Definition 3.21. Lowpass and high-pass filters.
A filter S is called

• a low-pass filter if λS (ω) is large when ω is close to 0, and λS (ω) ≈ 0


when ω is close to π (i.e. S keeps low frequencies and annihilates high
frequencies),
• a high-pass filter if λS (ω) is large when ω is close to π, and λS (ω) ≈ 0
when ω is close to 0 (i.e. S keeps high frequencies and annihilates low
frequencies),
• a bandpass filter if λS (ω) is large within some interval [a, b] ⊂ [0, 2π], and
λS (ω) ≈ 0 outside this interval.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS112

This definition should be considered rather vague when it comes to what we


mean by “ω close to 0, π”, and “λS (ω) is large”: in practice, when we talk about
low-pass and high-pass filters, it may be that the frequency responses are still
quite far from what is commonly refered to as ideal low-pass or high-pass filters,
where the frequency response only assumes the values 0 and 1 near 0 and π.
One common application of low-pass filters is to reduce treble in sound, which
is a common option in audio systems. The treble in a sound is generated by
the fast oscillations (high frequencies) in the signal. Another option in audio
systems is to reduce the bass. This corresponds to reducing the low frequencies
in the sound, and high-pass filters are suitable for this purpose. It turns out
that there is a simple way to jump between low-pass and high-pass filters:

Observation 3.22. Passing between low-pass- and high-pass filters.


Assume that S2 is obtained by adding an alternating sign to the filter
coefficients of S1 . If S1 is a low-pass filter, then S2 is a high-pass filter, and vice
versa.
To see why this is the case, let S1 be a filter with filter coefficients tk , and let
us consider the filter S2 with filter coefficient (−1)k tk . The frequency response
of S2 is

X X
λS2 (ω) = (−1)k tk e−iωk = (e−iπ )k tk e−iωk
k k
X X
−iπk −iωk
= e tk e = tk e−i(ω+π)k = λS1 (ω + π).
k k

where we have set e−iπ = −1 (note that this is nothing but Property 4. in
Theorem 2.7, with d = N/2). Now, for a low-pass filter S1 , λS1 (ω) has large
values when ω is close to 0 (the low frequencies), and values near 0 when ω is
close to π (the high frequencies). For a high-pass filter S2 , λS2 (ω) has values
near 0 when ω is close to 0 (the low frequencies), and large values when ω is
close to π (the high frequencies). Therefore, the relation λS2 (ω) = λS1 (ω + π)
says that S1 is low-pass when S2 is high-pass, and vice versa.

Example 3.30: Adding echo


An echo is a copy of the sound that is delayed and softer than the original sound.
If x is the sound, the sound z with samples given by

N,nchannels = shape(x)
z = zeros((N, nchannels))
z[0:d] = x[0:d] # No echo at the beginning of the signal
z[d:N] = x[d:N] + c*x[0:(N-d)]
z /= abs(z).max()

will include an echo of the original sound. This is an example of a filtering


operation where each output element is constructed from two input elements.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS113

d is an integer which represents the delay in samples. If you need a delay in


t seconds, simply multiply this with the sample rate to obtain the delay d in
samples (d needs to be rounded to the nearest integer). c is a constant called
the damping factor. Since an echo is usually weaker than the original sound, we
usually have that c < 1. If we choose c > 1, the echo will dominate in the sound.
The sample file with echo added with d = 10000 and c = 0.5 can be found in
the file castanetsecho.wav.
Using our compact filter notation, the filter which adds echo can be written
as

S = {1, 0, . . . , 0, c},
where the damping factor c appears after the delay d. The frequency response of
this is λS (ω) = 1 + ce−idω , which is not real, so that the filter is not symmetric.
In Figure 3.7 we have plotted the magnitude of this frequency response with
c = 0.1 and d = 10.

1.2
1.0
0.8
0.6
0.4
0.2
0.0
0 1 2 3 4 5 6

Figure 3.7: The frequency response of a filter which adds an echo with damping
factor c = 0.1 and delay d = 10.

We see that the response varies between 0.9 and 1.1. The deviation from 1 is
controlled by the damping factor c, and the oscillation is controlled by the delay
d.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS114

Example 3.31: Reducing the treble with moving average


filters
Let us now take a look at filters which adjust the treble in sound. The fact that
the filters are useful for these purposes will be clear when we plot the frequency
response. A general way of reducing variations in a sequence of numbers is to
replace one number by the average of itself and its neighbors, and this is easily
−1
done with a digital sound signal. If z = (zi )Ni=0 is the sound signal produced by
taking the average of three successive samples, we have that
1
zn =(xn+1 + xn + xn−1 ),
3
i.e. S = {1/3, 1/3, 1/3}. This filter is also called a moving average filter (with
three elements), and it can be written in compact form as. If we set N = 4, the
corresponding circulant Toeplitz matrix for the filter is
 
1 1 0 1
1 1 1 1 0
S=  
3 0 1 1 1
1 0 1 1
The frequency response is

λS (ω) = (eiω + 1 + e−iω )/3 = (1 + 2 cos(ω))/3.


More generally we can construct the moving average filter of 2L + 1 elements,
which is S = {1, · · · , 1, · · · , 1}/(2L + 1), where there is symmetry around 0.
Clearly then the first column of S is s = ( 1, . . . , 1 , 0, . . . , 0, 1, . . . , 1)/(2L+1). In
| {z } | {z }
L+1 times L times
Example 2.2 we computed that the DFT of the vector x = ( 1, . . . , 1 , 0, . . . , 0, 1, . . . , 1)
| {z } | {z }
L+1 times L times
had components

sin(πn(2L + 1)/N )
yn = .
sin(πn/N )
Since s = x/(2L + 1) and λS = DFTN s, the frequency response of S is

1 sin(πn(2L + 1)/N )
λS,n = ,
2L + 1 sin(πn/N )
so that
1 sin((2L + 1)ω/2)
λS (ω) = .
2L + 1 sin(ω/2)
We clearly have
1 sin((2L + 1)ω/2)
0≤ ≤ 1,
2L + 1 sin(ω/2)
and this frequency response approaches 1 as ω → 0. The frequency response
thus peaks at 0, and this peak gets narrower and narrower as L increases, i.e. as
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS115

we use more and more samples in the averaging process. This filter thus “keeps”
only the lowest frequencies. When it comes to the highest frequencies it is seen
that the frequency response is small for ω ≈ π. In fact it is straightforward
to see that |λS (π)| = 1/(2L + 1). In Figure 3.8 we have plotted the frequency
response for moving average filters with L = 1, L = 5, and L = 20.
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.2 0.2
0.40 1 2 3 4 5 6 0.40 1 2 3 4 5 6
1.0
0.8
0.6
0.4
0.2
0.0
0.2
0.40 1 2 3 4 5 6
Figure 3.8: The frequency response of moving average filters with L = 1, L = 5,
and L = 20.

Unfortunately, the frequency response is far from a filter which keeps some
frequencies unaltered, while annihilating others: Although the filter distinguishes
between high and low frequencies, it slightly changes the small frequencies.
Moreover, the higher frequencies are not annihilated, even when we increase L
to high values.

Example 3.32: Ideal low-pass filters


By definition, the ideal low-pass filter keeps frequencies near 0 unchanged, and
completely removes frequencies near π. We now have the theory in place in order
to find the filter coefficients for such a filter: In Example 2.16 we implemented the
ideal low-pass filter with the help of the DFT. Mathematically you can see that
this code is equivalent to computing (FN )H DFN where D is the diagonal matrix
with the entries 0, . . . , L and N − L, . . . , N − 1 being 1, the rest being 0. Clearly
this is a digital filter, with frequency response as stated. If the filter should keep
the angular frequencies |ω| ≤ ωc only, where ωc describes the highest frequency
we should keep, we should choose L so that ωc = 2πL/N . Again, in Example 2.2
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS116

we computed the DFT of this vector, and it followed from Theorem 2.7 that
the IDFT of this vector equals its DFT. This means that we can find the filter
coefficients by using Equation (3.10), i.e. we take an IDFT. We then get the
filter coefficients

1 sin(πk(2L + 1)/N )
.
N sin(πk/N )
This means that the filter coefficients lie as N points uniformly spaced on the
curve N1 sin(πt(2L+1)/2)
sin(πt/2) between 0 and 1. This curve has been encountered many
other places in these notes. The filter which keeps only the frequency ωc = 0 has
all filter coefficients being N1 (set L = 1), and when we include all frequencies (set
L = N ) we get the filter where x0 = 1 and all other filter coefficients are 0. When
we are between these two cases, we get a filter where s0 is the biggest coefficient,
while the others decrease towards 0 along the curve we have computed. The
bigger L and N are, the quicker they decrease to zero. All filter coefficients are
usually nonzero for this filter, since this curve is zero only at certain points. This
is unfortunate, since it means that the filter is time-consuming to compute.
The two previous examples show an important duality between vectors which
are 1 on some elements and 0 on others (also called window vectors), and the
vector N1 sin(πk(2L+1)/N
sin(πk/N )
)
(also called a sinc): filters of the one type correspond to
frequency responses of the other type, and vice versa. The examples also show
that, in some cases only the filter coefficients are known, while in other cases
only the frequency response is known. In any case we can deduce the one from
the other, and both cases are important.
Filters are much more efficient when there are few nonzero filter coefficients.
In this respect the second example displays a problem: in order to create filters
with particularly nice properties (such as being an ideal low-pass filter), one may
need to sacrifice computational complexity by increasing the number of nonzero
filter coefficients. The trade-off between computational complexity and desirable
filter properties is a very important issue in filter design theory.

Example 3.33: Dropping filter coefficients


In order to decrease the computational complexity for the ideal low-pass filter in
Example 3.32, one can for instance include only the first filter coefficients, i.e.
 N0
1 sin(πk(2L + 1)/N )
.
N sin(πk/N ) k=−N0

Hopefully this gives us a filter where the frequency response is not that different
from the ideal low-pass filter. Let us set N = 128, L = 32, so that the filter
removes all frequencies ω > π/2. In Figure 3.9 we show the corresponding
frequency responses. N0 has been chosen so that the given percentage of all
coefficients are included.
This shows that we should be careful when we omit filter coefficients: if we
drop too many, the frequency response is far away from that of an ideal bandpass
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS117

1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.2 0.2
0 1 2 3 4 5 6 0 1 2 3 4 5 6

1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.2 0.2
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Figure 3.9: The frequency response which results by including the first 1/32,
the first 1/16, the first 1/4, and and all of the filter coefficients for the ideal
low-pass filter.

filter. In particular, we see that the new frequency response oscillates wildly
near the discontinuity of the ideal low-pass filter. Such oscillations are called
Gibbs oscillations.

Example 3.34: Filters and the MP3 standard


We mentioned previously that the MP3 standard splits the sound into frequency
bands. This splitting is actually performed by particular filters, which we will
consider now. In the example above, we saw that when we dropped the last
filter coefficients in the ideal low-pass filter, there were some undesired effects
in the frequency response of the resulting filter. Are there other and better
approximations to the ideal low-pass filter which uses the same number of filter
coefficients? This question is important, since the ear is sensitive to certain
frequencies, and we would like to extract these frequencies for special processing,
using as low computational complexity as possible. In the MP3-standard, such
filters have been constructed. These filters are more advanced than the ones we
have seen up to now. They have as many as 512 filter coefficients! We will not
go into the details on how these filters are constructed, but only show how their
frequency responses look.
In the left plot in Figure 3.10, the “prototype filter” used in the MP3 standard
is shown. We see that this is very close to an ideal low-pass filter. Moreover, many
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS118

1.2
2.0 1.0
1.5 0.8
0.6
1.0
0.4
0.5 0.2
0.0 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.0 0.6 0.4 0.2 0.0 0.2 0.4 0.6

Figure 3.10: Frequency responses of some filters used in the MP3 standard.
The prototype filter is shown left. The other frequency responses at right are
simply shifted copies of this.

of the undesirable effect from the previous example have been eliminated: The
oscillations near the discontinuities are much smaller, and the values are lower
away from 0. Using Property 4 in Theorem 2.7, it is straightforward to construct
filters with similar frequency responses, but centered around different frequencies:
We simply need to multiply the filter coefficients with a complex exponential, in
order to obtain a filter where the frequency response has been shifted to the left
or right. In the MP3 standard, this observation is used to construct 32 filters,
each having a frequency response which is a shifted copy of that of the prototype
filter, so that all filters together cover the entire frequency range. 5 of these
frequency responses are shown in the right plot in Figure 3.10. To understand the
effects of the different filters, let us apply them to our sample sound. If you apply
all filters in the MP3 standard in successive order with the most low-pass filters
first, the result can be found in the file mp3bands.wav. You should interpret the
result as low frequencies first, followed by the high frequencies. π corresponds
to the frequency 22.05KHz (i.e. the highest representable frequency equals half
the sampling rate on 44.1KHz. The different filters are concentrated on 1/32 of
these frequencies each, so that the angular frequencies you here are [π/64, 3π/64],
[3π/64, 5π/64], [5π/64, 7π/64], and so on, in that order.
In Section 3.3.1 we mentioned that the psychoacoustic model of the MP3
standard applied a window the the sound data, followed by an FFT to that
data. This is actually performed in parallel on the same sound data. Applying
two different operations in parallel to the sound data may seem strange. In the
MP3 standard [20] (p. 109) this is explained by “the lack of spectral selectivity
obtained at low frequencies“ by the filters above. In other words, the FFT can
give more precise frequency information than the filters can. This more precise
information is then used to compute psychoacoustic information such as masking
thresholds, and this information is applied to the output of the filters.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS119

Example 3.35: Reducing the treble using Pascals triangle


When reducing the treble it is reasonable to let the middle sample xi count more
than the neighbors in the average, so an alternative is to compute the average
by instead writing

zn = (xn−1 + 2xn + xn+1 )/4


The coefficients 1, 2, 1 here have been taken from row 2 in Pascal’s triangle. It
turns out that this is a good choice of coefficients. Also if we take averages of
more numbers it will turn out that higher rows of Pascals triangle are good
choices. Let us take a look at why this is the case. Let S be the moving average
filter of two elements, i.e.
1
(xn−1 + xn ).
(Sx)n =
2
In Example 3.31 we had an odd number of filter coefficients. Here we have only
two. We see that the frequency response in this case is
1
λS (ω) = (1 + e−iω ) = e−iω/2 cos(ω/2).
2
The frequency response is complex now, since the filter is not symmetric in this
case. Let us now apply this filter k times, and denote by Sk the resulting filter.
Theorem 3.18 gives us that the frequency response of Sk is
1
λS k (ω) = (1 + e−iω )k = e−ikω/2 cosk (ω/2),
2k
which is a polynomial in e−iω with the coefficients taken from Pascal’s triangle
(remember that the values in Pascals triangle are the  coefficients of x in the
expression (1 + x)k , i.e. the binomial coefficients kr for 0 ≤ r ≤ k). At least,
this partially explains how filters with coefficients taken from Pascal’s triangle
appear. The reason why these are more desirable than moving average filters,
and are used much for smoothing abrupt changes in images and in sound, is the
following: Since (1 + e−iω )k is a factor in λS k (ω), it has a zero of multiplicity of k
at ω = π. In other words, when k is large, λS k has a zero of high multiplicity at
ω = π, and this implies that the frequency response is very flat for ω ≈ π when
k increases, i.e. the filter is good at removing the highest frequencies. This can
be seen in Figure 3.11, where we have plotted the magnitude of the frequency
response when k = 5, and when k = 30. Clearly the latter frequency response is
much flatter for ω ≈ π. On the other side, it is easy to show that the moving
average filters of Example 3.31 had a zero of multiplicity one at ω = π, regardless
of L. Clearly, the corresponding frequency responses, shown in Figure 3.8, were
not as flat for ω ≈ π, when compared to the ones in Figure 3.11.
While using S k gives a desirable behaviour for ω ≈ π, we see that the
behaviour is not so desirable for small frequencies ω ≈ 0: Only frequencies very
close to 0 are kept unaltered. It should be possible to produce better low-pass
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS120

1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.00 1 2 3 4 5 6 0.00 1 2 3 4 5 6
Figure 3.11: The frequency response of filters corresponding to iterating the
moving average filter {1/2, 1/2} k = 5 and k = 30 times (i.e. using row k in
Pascal’s triangle).

filters than this also, and the frequency responses we plotted for the filters used
in the MP3 standard gives an indication to this.
Let us now see how to implement the filters S k . Since convolution corresponds
to multiplication of polynomials, we can obtain their filter coefficients with the
following code

t = [1.]
for kval in range(k):
t = convolve(t, [1/2., 1/2.])

Note that S k has k + 1 filter coefficients, and that S k corresponds to the filter
coefficients of a symmetric filter when k is even. Having computed t, we can
simply compute the convolution of the input x and t. In using conv we disregard
the circularity of S, and we introduce a time delay. These issues will, however,
not be audible when we listen to the output. An example of the result of
smoothing is shown in Figure 3.12.
1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.0
0.000 0.002 0.004 0.006 0.008 0.010 1.0
0.000 0.002 0.004 0.006 0.008 0.010
Figure 3.12: Reducing the treble. The original sound signal is shown left, the
result after filtering using row 4 in Pascal’s triangle is shown right.

The left plot shows the samples of the pure sound with frequency 440Hz
(with sampling frequency fs = 4400Hz). The right plot shows the result of
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS121

applying the averaging process by using row 4 of Pascals triangle. We see


that the oscillations have been reduced. In Exercise 3.39 you will be asked to
implement reducing the treble in our sample audio file. If you do this you should
hear that the sound gets softer when you increase k: For k = 32 the sound can
be found in the file castanetstreble32.wav, for k = 256 it can be found in the file
castanetstreble256.wav.

Example 3.36: Reducing the bass using Pascals triangle


Due to Observation 3.22 and Example 3.35, we can create bass-reducing filters by
adding an alternating sign to rows in Pascals triangle. Consider the bass-reducing
filter deduced from the fourth row in Pascals triangle:
1
zn = (xn−2 − 4xn−1 + 6xn − 4xn+1 + xn+2 )
16
Let us apply this filter to the sound in Figure 3.12. The result is shown in
Figure 3.13.

1.0

0.5

0.0

0.5

1.0
0.000 0.002 0.004 0.006 0.008 0.010

Figure 3.13: The result of applying the bass-reducing filter deduced from row 4
in Pascals triangle to the pure sound in the left plot of Figure 3.12.

We observe that the samples oscillate much more than the samples of the
original sound. In Exercise 3.39 you will be asked to implement reducing the
bass in our sample audio file. The new sound will be difficult to hear for large
k, and we will explain why later. For k = 1 the sound can be found in the file
castanetsbass1.wav, for k = 2 it can be found in the file castanetsbass2.wav.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS122

Even if the sound is quite low, you can hear that more of the bass has disappeared
for k = 2.

1.2
1.0
0.8
0.6
0.4
0.2
0.0
0 1 2 3 4 5 6

Figure 3.14: The frequency response of the bass reducing filter, which corre-
sponds to row 5 of Pascal’s triangle.

The frequency response we obtain from using row 5 of Pascal’s triangle is


shown in Figure 3.14. It is just the frequency response of the corresponding
treble-reducing filter shifted with π. The alternating sign can also be achieved if
we write the frequency response 21k (1+e−iω )k from Example 3.35 as 21k (1−e−iω )k ,
which corresponds to applying the filter S(x) = 21 (−xn−1 + xn ) k times.

Exercise 3.37: Composing time delay filters


Let Ed1 and Ed2 be two time delay filters. Show that Ed1 Ed2 = Ed1 +d2 (i.e. that
the composition of two time delays again is a time delay) in two different ways:
a) Give a direct argument which uses no computations.
b) By using Property 3 in Theorem 2.7, i.e. by using a property for the Discrete
Fourier Transform.

Exercise 3.38: Adding echo filters


Consider the two filters S1 = {1, 0, . . . , 0, c} and S2 = {1, 0, . . . , 0, −c}. Both of
these can be interpreted as filters which add an echo. Show that 12 (S1 + S2 ) = I.
What is the interpretation of this relation in terms of echos?
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS123

Exercise 3.39: Reducing bass and treble


Write code where you reduce the treble and the bass as described in examples 3.35
and 3.36, generate the sounds you heard in these examples, and verify that they
are the same. In your code, it will be necessary to scale the values after reducing
the bass, but not the values after reducing the bass. Explain why this is the case.
How high must k be in order for you to hear difference from the actual sound?
How high can you choose k and still recognize the sound at all? If you solved
Exercise 3.9, you can also use the function filterS to perform the filtering,
rather than sing the convolve function (the latter disregards circularity).

Exercise 3.40: Constructing a high-pass filter


Consider again Example 3.32. Find an expression for a filter so that only
frequencies so that |ω − π| < ωc are kept, i.e. the filter should only keep angular
frequencies close to π (i.e. here we construct a high-pass filter).

Exercise 3.41: Combining low-pass and high-pass filters


In this exercise we will investigate how we can combine low-pass and high-pass
filters to produce other filters
a) Assume that S1 and S2 are low-pass filters. What kind of filter is S1 S2 ?
What if both S1 and S2 are high-pass filters?
b) Assume that one of S1 , S2 is a high-pass filter, and that the other is a low-pass
filter. What kind of filter S1 S2 in this case?

Exercise 3.42: Composing filters


A filter S1 has the frequency response 12 (1 + cos ω), and another filter has the
frequency response 12 (1 + cos(2ω)).
a) Is S1 S2 a low-pass filter, or a high-pass filter?
b) What does the filter S1 S2 do with angular frequencies close to ω = π/2.
c) Find the filter coefficients of S1 S2 .

Hint. Use Theorem 3.18 to compute the frequency response of S1 S2 first.


d) Write down the matrix of the filter S1 S2 for N = 8.

Exercise 3.43: Composing filters


An operation describing some transfer of data in a system is defined as the
composition of the following three filters:

• First a time delay filter with delay d1 = 2, due to internal transfer of data
in the system,
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS124

• then the treble-reducing filter T = {1/4, 1/2, 1/4},


• finally a time delay filter with delay d2 = 4 due to internal transfer of the
filtered data.
We denote by T2 = Ed2 T Ed1 = E4 T E2 the operation which applies these filters
in succession.
a) Explain why T2 also is a digital filter. What is (the magnitude of) the
frequency response of Ed1 ? What is the connection between (the magnitude of)
the frequency response of T and T2 ?
b) Show that T2 = {0, 0, 0, 0, 0, 1/4, 1/2, 1/4}.

Hint. Use the expressions (Ed1 x)n = xn−d1 , (T x)n = 14 xn+1 + 12 xn + 14 xn−1 ,
(Ed2 x)n = xn−d2 , and compute first (Ed1 x)n , then (T Ed1 x)n , and finally
(T2 x)n = (Ed2 T Ed1 x)n . From the last expression you should be able to read
out the filter coefficients.
c) Assume that N = 8. Write down the 8 × 8-circulant Toeplitz matrix for the
filter T2 .

Exercise 3.44: Filters in the MP3 standard


In Example 3.34, we mentioned that the filters used in the MP3-standard were
constructed from a low-pass prototype filter by multiplying the filter coefficients
with a complex exponential. Clearly this means that the new frequency response
is a shift of the old one. The disadvantage is, however, that the new filter
coefficients are complex. It is possible to address this problem as follows. Assume
that tk are the filter coefficients of a filter S1 , and that S2 is the filter with filter
coefficients cos(2πkn/N )tk , where n ∈ N. Show that
1
λS2 (ω) = (λS1 (ω − 2πn/N ) + λS1 (ω + 2πn/N )).
2
In other words, when we multiply (modulate) the filter coefficients with a cosine,
the new frequency response can be obtained by shifting the old frequency response
with 2πn/N in both directions, and taking the average of the two.

Exercise 3.45: Explain code


a) Explain what the code below does, line by line.

x, fs = audioread(’sounds/castanets.wav’)
N, nchannels = shape(x)
z = zeros((N, nchannels))
for n in range(1,N-1):
z[n] = 2*x[n+1] + 4*x[n] + 2*x[n-1]
z[0] = 2*x[1] + 4*x[0] + 2*x[N-1]
z[N-1] = 2*x[0] + 4*x[N-1] + 2*x[N-2]
z = z/abs(z).max()
play(z, fs)
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS125

Comment in particular on what happens in the three lines directly after the
for-loop, and why we do this. What kind of changes in the sound do you expect
to hear?
b) Write down the compact filter notation for the filter which is used in the
code, and write down a 5 × 5 circulant Toeplitz matrix which corresponds to
this filter. Plot the (continuous) frequency response. Is the filter a low-pass- or
a high-pass filter?
c) Another filter is given by the circulant Toeplitz matrix
 
4 −2 0 0 −2
−2 4 −2 0 0
 
 0 −2 4 −2 0 
 .
0 0 −2 4 −2
−2 0 0 −2 4
Express a connection between the frequency responses of this filter and the filter
from b). Is the new filter a low-pass- or a high-pass filter?

3.5 More general filters


The starting point for defining filters at the beginning of this chapter was
equations on the form
X
zn = tk xn−k .
k

For most filters we have looked at, we had a limited number of nonzero tk , and this
enabled us to compute them on a computer using a finite number of additions and
multiplications. Filters which have a finite number of nonzero filter coefficients
are also called FIR-filters (FIR is short for Finite Impulse Response. Recall
that the impulse response of a filter can be found from the filter coefficients).
However, there exist many useful filters which are not FIR filters, i.e. where
the sum above is infinite. The ideal lowpass filter from Example 3.32 was one
example. It turns out that many such cases can be made computable if we
change our procedure slightly. The old procedure for computing a filter is to
compute z = Sx. Consider the following alternative:

Idea 3.23. More general filters (1).


Let x ∈ RN , and T an N × N filter. By solving the system T z = x for z we
get another filter, which we denote by S.
Of course T must then be the inverse of S (which also is a filter), but the point
is that the inverse of a filter may have a finite number of filter coefficients, even
if the filter itself does not. In such cases this new procedure is more attractive
that the old one, since the equation system can be solved with few arithmetic
operations when T has few filter coefficients.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS126

It turns out that there also are highly computable filters where neither the
filter nor its inverse have a finite number of filter coefficients. Consider the
following idea:
Idea 3.24. More general filters (2).
Let x be the input to a filter, and let U and V be filters. By solving the
system U z = V x for z we get another filter, which we denote by S. The filter S
can be implemented in two steps: first we compute the right hand side y = V x,
and then we solve the equation U z = y.
If both U and V are invertible we have that the filter is S = U −1 V , and this
is invertible with inverse S −1 = V −1 U . The point is that, when U and V have
a finite number of filter coefficients, both S and its inverse will typically have
an infinite number of filter coefficients. The filters from this idea are thus more
general than the ones from the previous idea, and the new idea makes a wider
class of filters implementable using row reduction of sparse matrices. Computing
a filter by solving U z = V x may also give meaning when the matrices U and
V are singular: The matrix system can have a solution even if U is singular.
Therefore we should be careful in using the form T = U −1 V .
We have the following result concerning the frequency responses:
Theorem 3.25. Frequency response of IIR filters.
Assume that S is the filter defined from the equation U z = V x. Then we
have that λS (ω) = λλVU (ω)
(ω)
whenever λU (ω) 6= 0.

Proof. Set x = φn . We have that U z = λU,n λS,n φn , and V x = λV,n φn . If the


λV,n
expressions are equal we must have that λU,n λS,n = λV,n , so that λS,n = λU,n
for all n. By the definition of the continuous frequency response this means that
λS (ω) = λλVU (ω)
(ω)
whenever λU (ω) 6= 0.

The following example clarifies the points made above, and how one may
construct U and V from S. The example also shows that, in addition to making
some filters with infinitely many filter coefficients computable, the procedure
U z = V x for computing a filter can also reduce the complexity in some filters
where we already have a finite number of filter coefficients.

Example 3.46: Moving average filter


Consider again the moving average filter S from Example 3.31:
1
zn = (xn+L + · · · + xn + · · · + xn−L ).
2L + 1
If we implemented this directly, 2L additions would be needed for each n, so
that we would need a total of 2N L additions. However, we can also write
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS127

1
zn+1 = (xn+1+L + · · · + xn+1 + · · · + xn+1−L )
2L + 1
1 1
= (xn+L + · · · + xn + · · · + xn−L ) + (xn+1+L − xn−L )
2L + 1 2L + 1
1
= zn + (xn+1+L − xn−L ).
2L + 1
This means that we can also compute the output from the formula
1
zn+1 − zn = (xn+1+L − xn−L ),
2L + 1
which can be written on the form U z = V x with U = {1, −1} and V =
1
2L+1 {1, 0, . . . , 0, −1} where the 1 is placed at index −L − 1 and the −1 is placed
at index L. We now perform only 2N additions in computing the right hand
side, and solving the equation system requires only 2(N − 1) additions. The
total number of additions is thus 2N + 2(N − 1) = 4N − 2, which is much less
than the previous 2LN when L is large.
A perhaps easier way to find U and V is to consider the frequency response
of the moving average filter, which is

1 1 1 − e(2L+1)iω
(e−Liω + . . . + eLiω ) = e−Liω
2L + 1 2L + 1 1 − eiω
1
+ e−Liω
(L+1)iω

2L+1 −e
= ,
1 − eiω
where we have used the formula for the sum of a geometric series. From here
we easily see the frequency responses of U and V from the numerator and the
denominator.
Filters with an infinite number of filter coefficients are also called IIR filters
(IIR stands for Infinite Impulse Response). Thus, we have seen that some IIR
filters may still have efficient implementations.

Exercise 3.47: A concrete IIR filter


A filter is defined by demanding that zn+2 − zn+1 + zn = xn+1 − xn .
a) Compute and plot the frequency response of the filter.
b) Use a computer to compute the output when the input vector is x =
(1, 2, . . . , 10). In order to do this you should write down two 10 × 10-circulant
Toeplitz matrices.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS128

3.6 Implementation of filters


As we saw in Example 3.46, a filter with many filter coefficients could be
factored into the application of two simpler filters, and this could be used as
a basis for an efficient implementation. There are also several other possible
efficient implementations of filters. In this section we will consider two such
techniques. The first technique considers how we can use the DFT to speed up
the computation of filters. The second technique considers how we can factorize
a filter into a product of simpler filters.

3.6.1 Implementation of filters using the DFT


If there are k filter coefficients, a direct implementation of a filter would require
kN multiplications. Since filters are diagonalized by the DFT, one can also
compute the filter as the product S = FNH DFN . This would instead require
O (N log2 N ) complex multiplications when we use the FFT algorithm, which
may be a higher number of multiplications. We will however see that, by slightly
changing our algorithm, we may end up with a DFT-based implementation of
the filter which requires fewer multiplications.
The idea is to split the computation of the filter into smaller parts. Assume
that we compute M elements of the filter at a time. If the nonzero filter
coefficients of S are t−k0 ,. . . ,tk−k0 −1 , we have that
X
(Sx)t = tr xs−r = t−k0 xt+k0 + .. + tk−k0 −1 xt−(k−k0 −1) .
r

From this it is clear that (Sx)t only depends on xt−(k−k0 −1) , . . . , xt+k0 . This
means that, if we restrict the computation of S to xt−(k−k0 −1) , . . . , xt+M −1+k0 ,
the outputs xt , . . . , xt+M −1 will be the same as without this restriction. This
means that we can compute the output M elements at a time, at each step
multiplying with a circulant Toeplitz matrix of size (M + k − 1) × (M + k − 1). If
we choose M so that M + k − 1 = 2r , we can use the FFT and IFFT algorithms
to compute S = FNH DFN , and we require O(r2r ) multiplications for every block
r2r r2r
of length M . The total number of multiplications is NM = 2rN−k+1 . If k = 128,
you can check on your calculator that the smallest value is for r = 10 with
value 11.4158 × N . Since the direct implementation gives kN multiplications,
this clearly gives a benefit for the new approach, it gives a 90% decrease in the
number of multiplications.

3.6.2 Factoring a filter


In practice, filters are often applied in hardware, and applied in real-time scenarios
where performance is a major issue. The most CPU-intensive tasks in such
applications often have few memory locations available. These tasks are thus not
compatible with filters with many filter coefficients, since for each output sample
we then need access to many input samples and filter coefficients. A strategy
which addresses this is to factorize the filter into the product of several smaller
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS129

filters, and then applying each filter in turn. Since the frequency response of
the product of filters equals the product of the frequency responses, we get the
following idea:
Idea 3.26. Factorizing a filter.
Let S be a filter with real coefficients. Assume that

λS (ω) = Keikω (eiω − a1 ) . . . (eiω − am )(e2iω + b1 eiω + c1 ) . . . (e2iω + bn eiω + cn ).


(3.14)
Then we can write S = KEk A1 . . . Am B1 . . . Bn , where Ai = {1, −ai } and
Bi = {1, bi , ci }.
Note that in Equation (3.14) ai correspond to the real roots of the frequency
response, while bi , ci are obtained by pairing the complex conjugate roots. Clearly
the frequency responses of Ai , Bi equal the factors in the frequency response of
S, which in any case can be factored into the product of filters with 2 and 3
filter coefficients, followed by a time-delay.
Note that, even though this procedure factorizes a filter into smaller parts
(which is attractive for hardware implementations since smaller filters require
fewer locations in memory), the number of of arithmetic operations is usually
not reduced. However, consider Example 3.35, where we factorized the treble-
reducing filters into a product of moving average filters of length 2 (all roots in
the previous idea are real, and equal). Each application of a moving average
filter of length 2 does not really require any multiplications, since multiplication
with 12 corresponds to a bitshift. Therefore, the factorization of Example 3.35
removes the need for doing any multiplications at all, while keeping the number
of additions the same. There are computational savings in this case, due to the
special filter structure here.

Exercise 3.48: Implementing the factorization


Write a function filterdftimpl, which takes the filter coefficients t and the
value k0 from this section, computes the optimal M , and implements the filter
as here.

Exercise 3.49: Factoring concrete filter


Factor the filter S = {1, 5, 10, 6} into a product of two filters, one with two filter
coefficients, and one with three filter coefficients.

3.7 Summary
We defined digital filters, which do the same job for digital sound as analog filters
do for (continuous) sound. Digital filters turned out to be linear transformations
diagonalized by the DFT. We proved several other equivalent characterizations
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS130

of digital filters as well, such as being time-invariant, and having a matrix


which is circulant and Toeplitz. Just as for continuous sound, digital filters are
characterized by their frequency response, which explains how the filter treats
the different frequencies. We also went through several important examples of
filters, some of which corresponded to meaningful operations on sound, such as
adjustmest of bass and treble, and adding echo. We also explained that there
exist filters with useful implementations which have an infinite number of filter
coefficients, and we considered techniques for implementing filters efficiently.
Most of the topics covered on that can also be found in [37]. We also took a look
at the role of filters in the MP3 standard for compression of sound.
In signal processing literature, the assumption that vectors are periodic is
often not present, and filters are thus not defined as finite-dimensional operations.
With matrix notation they would then be viewed as infinite matrices which
have the Toeplitz structure (i.e. constant values on the diagonals), but with no
circulation. The circulation in the matrices, as well as the restriction to finite
vectors, come from the assumption of a periodic vector. There are, however, also
some books which view filters as circulant Toeplits matrices as we have done,
such as [16].

What you should have learned in this chapter.


• How to write down the circulant Toeplitz matrix from a digital filter
expression, and vice versa.
• How to find the first column of this matrix (s) from the filter coefficients
(t), and vice versa.
• The compact filter notation for filters with a finite number of filter coeffi-
cients.
• The definition of convolution, its connection with filters, and the conv-
function for computing convolution.
• Connection between applying a filter and multiplying polynomials.

• The formal definition of a digital filter in terms of having the Fourier


vectors as eigenvectors.
• The definition of the vector frequency response in terms of the correspond-
ing eigenvalues.
• The definition of time-invariance and the three equivalent characterizations
of a filter.
• For filters, eigenvalues can be computed by taking the DFT of the first
column s, and there is no need to compute eigenvectors explicitly.
• How to apply a digital filter to a sum of sines or cosines, by splitting these
into a sum of eigenvectors.
CHAPTER 3. OPERATIONS ON DIGITAL SOUND: DIGITAL FILTERS131

• The definition of the continuous frequency response in terms of the filter


coefficients t.
• Connection with the vector frequency response.
• Properties of the continuous frequency response, in particular that the
product of two frequency responses equals the frequency response of the
product.
• How to compute the frequency response of the product of two filters,.
• How to find the filter coefficients when the continuous frequency response
is known.

• Simple examples of filters, such as time delay filters and filters which add
echo.
• Low-pass and high-pass filters and their frequency responses, and their
interpretation as treble- and bass-reducing filters. Moving average filters,
and filters arising from rows in Pascal’s triangle, as examples of such filters.

• How to pass between low-pass and high-pass filters by adding an alternating


sign to the filter coefficients.
Chapter 4

Symmetric filters and the


DCT

In Chapter 1 we approximated a signal of finite duration with trigonometric


functions. Since these are all periodic, there are some undesirable effects near
the boundaries of the signal (at least when the values at the boundaries are
different), and this resulted in a slowly converging Fourier series. This was
addressed by instead considering the symmetric extension of the function, for
which we obtained a more precise Fourier representation, as fewer Fourier basis
vectors were needed in order to get a precise approximation.
This chapter is dedicated to addressing these thoughts for vectors. We will
start by defining symmetric extensions of vectors, similarly to how we defined
these for functions. Just as the Fourier series of a symmetric function was a
cosine series, we will see that the symmetric extension can be viewed as a cosine
vector. This gives rise to a different change of coordinates than the DFT, which
we will call the DCT, which enables us to express a symmetric vector as a sum
of cosine-vectors (instead of the non-symmetric complex exponentials). Since
a cosine also can be associated with a given frequency, the DCT is otherwise
similar to the DFT, in that it extracts the frequency information in the vector.
The advantage is that the DCT can give more precise frequency information
than the DFT, since it avoids the discontinuity problem of the Fourier series.
This makes the DCT very practical for applications, and we will explain some
of these applications. We will also show that the DCT has a a very efficient
implementation, comparable with the FFT.
In this chapter we will also see that the DCT has a very similar role as the
DFT when it comes to filters: just as the DFT diagonalized filters, we will see
that symmetric filters can be diagonalized by the DCT, when we apply the filter
to the symmetric extension of the input. We will actually show that the filters
which preserve our symmetric extensions are exactly the symmetric filters.

132
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 133

4.1 Symmetric vectors and the DCT


As in Chapter 1, vectors can also be extended in a symmetric manner, besides
the simple periodic extension procedure from Figure 2.1. In Figure 4.1 we have
shown such an extension of a vector x. It has x as its first half, and a copy of x
in reverse order as its second half.
2.0 2.0
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0 10 20 30 40 0.0 0 10 20 30 40
Figure 4.1: A vector and its symmetric extension.

We will call this the symmetric extension of x:


Definition 4.1. Symmetric extension of a vector.
By the symmetric extension of x ∈ RN , we mean the symmetric vector
x̆ ∈ R2N defined by

xk 0≤k<N
x̆k = (4.1)
x2N −1−k N ≤ k < 2N − 1

Clearly, the symmetric extension is symmetric around N − 1/2. This is not


the only way to construct a symmetric extension, as we will return to later. As
shown in Figure 4.1, but not included in Definition 4.1, we also repeat x̆ ∈ R2N
in order to obtain a periodic vector. Creating a symmetric extension is thus a
two-step process:

• First, “mirror” the vector to obtain a vector in R2N ,


• repeat this periodically to obtain a periodic vector.

The result from the first step lies in an N -dimensional subspace of all vectors in
R2N , which we will call the space of symmetric vectors. To account for the fact
that a periodic vector can have a different symmetry point than N − 1/2, let us
make the following general definition:
Definition 4.2. Symmetric vector.
We say that a periodic vector x is symmetric if there exists a number d so
that xd+k = xd−k for all k so that d + k and d − k are integers. d is called the
symmetry point of x
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 134

Due to the inherent periodicity of x, it is clear that N must be an even


number for symmetric vectors to exist at all. d can take any value, and it may
not be an integer: It can also be an odd multiple of 1/2, because then both d + k
and d − k are integers when k also is an odd multiple of 1/2. The symmetry
point in symmetric extensions as defined in Definition 4.1 was d = N − 1/2.
This is very common in the literature, and this is why we concentrate on this in
this chapter. Later we will also consider symmetry around N − 1, as this also is
much used.
We would like to find a basis for the N -dimensional space of symmetric
vectors, and we would like this basis to be similar to the Fourier basis. Since the
Fourier basis corresponds to the standard basis in the frequency domain, we are
lead to studying the DFT of a symmetric vector. If the symmetry point is an
integer, it is straightforward to prove the following:
Theorem 4.3. Symmetric vectors with integer symmetry points.
Let d be an integer. The following are equivalent

• x is real and symmetric with d as symmetry point.


x)n = zn e−2πidn/N where zn are real numbers so that zn = zN −n .
• (b

Proof. Assume first that d = 0. It follows in this case from property 2a) of
Theorem 2.7 that (b x)n is a real vector. Combining this with property 1 of
Theorem 2.7 we see that x b, just as x, also must be a real vector symmetric about
0. Since the DFT is one-to-one, it follows that x is real and symmetric about
0 if and only if x
b is. From property 3 of Theorem 2.7it follows that, when d is
an integer, x is real and symmetric about d if and only if (b x)n = zn e−2πidn/N ,
where zn is real and symmetric about 0. This completes the proof.
Symmetric extensions were here defined by having the non-integer symmetry
point N − 1/2, however. For these we prove the following, which is slightly more
difficult.
Theorem 4.4. Symmetric vectors with non-integer symmetry points.
Let d be an odd multiple of 1/2. The following are equivalent

• x is real and symmetric with d as symmetry point.


x)n = zn e−2πidn/N where zn are real numbers so that zN −n = −zn .
• (b

Proof. When x is as stated we can write


CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 135

N −1
1 X
x)n = √
(b xk e−2πikn/N
N k=0
 
1 X X
=√ xd+s e−2πi(d+s)n/N + xd−s e−2πi(d−s)n/N 
N s≥0 s≥0
1 X  
=√ xd+s e−2πi(d+s)n/N + e−2πi(d−s)n/N
N s≥0
1 X  
= √ e−2πidn/N xd+s e−2πisn/N + e2πisn/N
N s≥0
1 X
= √ e−2πidn/N 2xd+s cos(2πsn/N ).
N s≥0

√1
P
Here s runs through odd multiples of 1/2. Since zn = N s≥0 2xd+s cos(2πsn/N )
−2πidn/N
is a real number, we can write the result as zn e . Substituting N − n
for n, we get

1 X
x)N −n = √ e−2πid(N −n)/N
(b 2xd+s cos(2πs(N − n)/N )
N s≥0
1 X
= √ e−2πid(N −n)/N 2xd+s cos(−2πsn/N + 2πs)
N s≥0
1 X
= − √ e−2πid(N −n)/N 2xd+s cos(2πsn/N ) = −zn e−2πid(N −n)/N .
N s≥0

This shows that zN −n = −zn , and this completes one way of the proof. The
other way, we can write

N −1
1 X
xk = √ x)n e2πikn/N
(b
N n=0

x)n = zn e−2πidn/N and (b


if (b x)N −n = −zn e−2πid(N −n)/N , the sum of the n’th
term and the N − n’th term in the sum is

zn e−2πidn/N e2πikn/N − zn e2−πid(N −n)/N e2πik(N −n)/N


= zn (e2πi(k−d)n/N − e−2πid+2πidn/N −2πikn/N )
= zn (e2πi(k−d)n/N + e2πi(d−k)n/N ) = 2zn cos(2π(k − d)n/N ).

This is real, so that all xk are real. If we set k = d + s, k = d − s here we get


CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 136

2zn cos(2π((d + s) − d)n/N ) = 2zn cos(2πsn/N )


2zn cos(2π((d − s) − d)n/N ) = 2zn cos(−2πsn/N ) = 2zn cos(2πsn/N ).
By adding terms together and comparing we must have that xd+s = xd−s , and
the proof is done.
Now, let us specialize to symmetric extensions as defined in Definition 4.1,
i.e. where d = N − 1/2. The following result gives us an orthonormal basis for
the symmetric extensions, which are very simple in the frequency domain:
Theorem 4.5. Orthonormal basis for symmetric vectors.
The set of all x symmetric around N − 1/2 is a vector space of dimension N ,
and we have that
(  )
1  πin/(2N ) N −1
e0 , √ e en + e−πin/(2N ) e2N −n
2 n=1

is an orthonormal basis for x


b where x is symmetric around N − 1/2.
Proof. For a vector x symmetric about d = N − 1/2 we know that

x)n = zn e−2πi(N −1/2)n/(2N ) ,


(b
and the only requirement on the vector z is the antisymmetry condition z2N −n =
−zn . The vectors zi = √12 (ei − e2N −i ), 1 ≤ i ≤ N − 1, together with the vector
z0 = e0 , are clearly orthonormal and satisifes the antisymmetry condition. From
these we obtain that

(  N −1 )
1  
e0 , √ e−2πi(N −1/2)n/(2N ) en − e−2πi(N −1/2)(2N −n)/(2N ) e2N −n
2 n=1

is an orthonormal basis for the x


b with x symmetric. We can write

1  
√ e−2πi(N −1/2)n/(2N ) en − e−2πi(N −1/2)(2N −n)/(2N ) e2N −n
2
1  
= √ e−πin eπin/(2N ) en + eπin e−πin/(2N ) e2N −n
2
1 πin  πin/(2N ) 
=√ e e en + e−πin/(2N ) e2N −n .
2
This also means that
(  )
1  πin/(2N ) N −1
−πin/(2N )
e0 , √ e en + e e2N −n
2 n=1

is an orthonormal basis.
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 137

We immediately get the following result:


Theorem 4.6. Orthonormal basis for symmetric vectors.
We have that

(       N −1 )
1 0 1 1 n 1
√ cos 2π k+ , √ cos 2π k+ (4.2)
2N 2N 2 N 2N 2 n=1

is an orthonormal basis for the set of vectors symmetric around N − 1/2 in R2N .
Moreover, the n’th vector in this basis has frequency contribution only from the
indices n and 2N − n.

Proof. Since the IDFT is unitary, the IDFT applied to the vectors above gives
an orthonormal basis for the set of symmetric extensions. We get that

    
1 1 1 1 0 1
(F2N )H (e0 ) = √ ,√ ,..., √ =√ cos 2π k+ .
2N 2N 2N 2N 2N 2

We also get that

 
1 
(F2N )H √ eπin/(2N ) en + e−πin/(2N ) e2N −n
2
 
1 1 2πink/(2N ) 1 2πi(2N −n)k/(2N )
=√ eπin/(2N ) √ e + e−πin/(2N ) √ e
2 2N 2N
 
1 1 2πink/(2N ) 1 −2πink/(2N )
=√ eπin/(2N ) √ e + e−πin/(2N ) √ e
2 2N 2N
  
1   1 n 1
= √ e2πi(n/(2N ))(k+1/2) + e−2πi(n/(2N ))(k+1/2) = √ cos 2π k+ .
2 N N 2N 2

Since F2N is unitary, and thus preserves the scalar product, the given vectors
are orthonormal.
We need to address one final thing before we can define the DCT: The vector
x we start with is in RN , but the vectors above are in R2N . We would like
to have orthonormal vectors in RN , so that we can use them to decompose
x. It is possible to show with a direct argument that, when we restrict the
vectors above to the first N elements, they are still orthogonal. We will, however,
apply a more instructive argument to show this, which gives us some intuition
into the connection with symmetric filters. We start with the following result,
which shows that a filter preserves symmetric vectors if and only if the filter is
symmetric.
Theorem 4.7. Criteria for preserving symmetric vectors.
Let S be a filter. The following are equivalent
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 138

• S preserves symmetric vectors (i.e. Sx is a symmetric vector whenever x


is).
• The set of filter coefficients of S is a symmetric vector.

Also, when S preserves symmetric vectors, the following hold:

• The vector of filter coefficients has an integer symmetry point if and only
if the input and output have the same type (integer or non-integer) of
symmetry point.
• The input and output have the same symmetry point if and only if the
filter is symmetric.

Proof. Assume that the filter S maps a symmetric vector with symmetry at d1
to another symmetric vector. Let x be the symmetric vector so that (b x)n =
e−2πid1 n/N for n < N/2. Since the output is a symmetric vector, we must have
that

λS,n e−2πid1 n/N = zn e−2πid2 n/N


for some d2 , zn and for n < N/2. But this means that λS,n = yn e−2πi(d2 −d1 )n/N .
Similar reasoning applies for n > N/2, so that λS,n clearly equals sb for some
symmetric vector s√ from Theorems 4.3 and 4.4. This vector equals (up to
multiplication with N ) the filter coefficients of S, which therefore is a symmetric.
Moreover, it is clear that the filter coefficients have an integer symmetry point if
and only if the input and output vector either both have an integer symmetry
point, or both a non-integer symmetry point.
Since the filter coefficients of a filter which preserves symmetric vectors
also is a symmetric vector, this means that its frequency response takes the
form λS,n = zn e−2πidn/N , where z is a real vector. This means that the phase
(argument) of the freqency response is −2πdn/N or π − 2πdn/N , depending on
the sign of zn . In other words, the phase is linear in n. Filters which preserve
symmetric vectors are therefore also called linear phase filters
.
Note also that the case d = 0 or d = N − 1/2 corresponds to symmetric
filters. An example of linear phase filters which are not symmetric are smoothing
filters where the coefficients are taken from odd rows in Pascal’s triangle.
When S is symmetric, it preserves symmetric extensions, so that it makes
sense to restrict S to symmetric vectors. We therefore make the following
definition.

Definition 4.8. Symmetric restriction.


Assume that S : R2N → R2N is a symmetric filter. We define Sr : RN → RN
as the mapping which sends x ∈ RN to the first N components of the vector
S x̆. Sr is also called the symmetric restriction of S.
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 139

Sr is clearly linear, and the restriction of S to vectors symmetric about


N − 1/2 is characterized by Sr . We continue with the following result:
Theorem 4.9. Expression for Sr .
Assume that S : R2N → R2N is a symmetric filter, and that
 
S1 S2
S= .
S3 S4
Then Sr is symmetric, and Sr = S1 + (S2 )f , where (S2 )f is the matrix S2 with
the columns reversed.
Proof. With S as in the text of the theorem, we compute

 
x0
 .. 
 .     
  x0 xN −1
 xN −1   .   . 
Sr x = S1 S2 xN −1  = S1  ..  + S2  .. 

xN −1 x0
 
 . 
 .. 
x0
   
x0 x0
= S1  ...  + (S2 )f  ...  = (S1 + (S2 )f )x,
   

xN −1 xN −1

so that Sr = S1 + (S2 )f . Since S is symmetric, S1 is also symmetric. (S2 )f is


also symmetric, since it is constant on anti-diagonals. It follows then that S is
also symmetric. This completes the proof.
Note that Sr is not a digital filter, since its matrix is not circulant. In
particular, its eigenvectors are not pure tones. In the block matrix factorization
of S, S2 contains the circulant part of the matrix, and forming (S2 )f means that
the circulant parts switch corners. With the help of Theorem 4.9 we can finally
establish the orthogonality of the cosine-vectors in RN .
Corollary 4.10. Basis of eigenvectors for Sr .
Let S be a symmetric filter, and let Sr be the mapping defined in Theorem 4.9.
Define
 q
1
 ,n = 0
dn,N = qN
2

N ,1 ≤ n < N
n
k + 12 for 0 ≤ n ≤ N − 1, then {d0 , d1 , . . . , dN −1 }

and dn = dn,N cos 2π 2N
is an orthonormal basis of eigenvectors for Sr .
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 140

Proof. Let S be a symmetric filter of length 2N . We know then that λS,n =


λS,2N −n , so that

   
n 1
S cos 2π k+
2N 2
  
1 2πi(n/(2N ))(k+1/2)
=S e + e−2πi(n/(2N ))(k+1/2)
2
1  πin/(2N )  2πink/(2N )   
= e S e + e−πin/(2N ) S e−2πink/(2N )
2
1  πin/(2N ) 
= e λS,n e2πink/(2N ) + e−πin/(2N ) λS,2N −n e−2πink/(2N )
2
1 
= λS,n e2πi(n/(2N ))(k+1/2) + λS,2N −n e−2πi(n/(2N ))(k+1/2)
2
1  2πi(n/(2N ))(k+1/2) 
= λS,n e + e−2πi(n/(2N ))(k+1/2)
2   
n 1
= λS,n cos 2π k+ ,
2N 2

where we have used that e2πink/(2N ) is an eigenvector of S with eigenvalue


λS,n , and e−2πink/(2N ) = e2πi(2N −n)k/(2N ) is an eigenvector of S with eigenvalue
λS,2N −n . This shows that the vectors are eigenvectors for symmetric filters of
length 2N . It is also clear that the first half of the vectors must be eigenvectors
for Sr with the same eigenvalue, since when y = Sx = λS,n x, we also have that

(y0 , y1 , . . . , yN −1 ) = Sr (x0 , x1 , . . . , xN −1 ) = λS,n (x0 , x1 , . . . , xN −1 ).

To see why these vectors are orthogonal, choose at the outset a symmetric filter
−1
where {λS,n }N
n=0 are distinct. Then the cosine-vectors of length N are also
eigenvectors with distinct eigenvalues, and they must be orthogonal since Sr is
symmetric. Moreover, since
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 141

2N −1   
X n 1
cos2 2π k+
2N 2
k=0
N −1    2N −1   
X n 1 X n 1
= cos2 2π k+ + cos2 2π k+
2N 2 2N 2
k=0 k=N
N −1    N −1   
X
2 n 1 X
2 n 1
= cos 2π k+ + cos 2π k+N +
2N 2 2N 2
k=0 k=0
N −1    N −1   
X
2 n 1 2n
X
2 n 1
= cos 2π k+ + (−1) cos 2π k+
2N 2 2N 2
k=0 k=0
N −1   
X n 1
=2 cos2 2π k+ ,
2N 2
k=0

where we used that cos(x + nπ) = (−1)n cos x. This means that

2N −1 N −1

     
n 1 n 1
cos 2π k+ = 2 cos 2π k+ .
2N 2 k=0 2N 2 k=0

√ the first N
Thus, in order to make the vectors orthonormal when we consider
elements instead of all 2N elements, we need to multiply with 2. This gives
us the vectors dn as defined in the text of the theorem. This completes the
proof.

We now clearly see the analogy between symmetric functions and vectors:
while the first can be written as a sum of cosine-functions, the second can be
written as a sum of cosine-vectors. The orthogonal basis we have found is given
its own name:

Definition 4.11. DCT basis.


We denote by DN the orthogonal basis {d0 , d1 , . . . , dN −1 }. We also call DN
the N -point DCT basis.
Using the DCT basis instead of the Fourier basis we can make the following
definitions, which parallel those for the DFT:

Definition 4.12. Discrete Cosine Transform.


The change of coordinates from the standard basis of RN to the DCT basis
DN is called the discrete cosine transform (or DCT). The N × N matrix DCTN
that represents this change of basis is called the (N -point) DCT matrix. If x is
a vector in RN , its coordinates y = (y0 , y1 , . . . , yN −1 ) relative to the DCT basis
are called the DCT coefficients of x (in other words, y = DCTN x).
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 142

Note that we can also write

 √ 
1/ 2 0 ··· 0
1 ···
r 
2  0 0
n

DCTN =  cos 2π 2N (k + 1/2) . (4.3)

 . .. .. ..
N  .. . . . 
0 0 ··· 1
Since this matrix is orthogonal, it is immediate that

 
1/2 0 · · · 0
−1 2     0 1 · · · 0
n
cos 2π 2N (k + 1/2) = cos 2π n+1/2 k ..  (4.4)
 
2N  .. .. ..
N  . . . .
0 0 ··· 1
 
1/2 0 · · · 0
  −1 2  0 1 · · · 0

cos 2π n+1/2 n

k = ..  cos 2π 2N (k + 1/2) .

 . .. ..
2N N  .. . . . 
0 0 ··· 1
(4.5)
In other words, not only can DCTN be directly expressed in terms of a cosine-
matrix, but our developments helped us to express the inverse of a cosine
matrix in terms of other cosine-matrices. In the literature different types of
cosine-matrices have been useful:
I Cosine-matrices with entries cos(2πnk/(2(N − 1))).
II Cosine-matrices with entries cos(2πn(k + 1/2)/(2N )).
III Cosine-matrices with entries cos(2π(n + 1/2)k/(2N )).
IV Cosine-matrices with entries cos(2π(n + 1/2)(k + 1/2)/(2N )).
We will call these type-I, type-II, type-III, and type-IV cosine-matrices, respec-
tively. What we did above handles the case of type-II cosine-matrices. It will
turn out that not all of these cosine-matrices are orthogonal, but that we in all
cases, as we did above for type-II cosine matrices, can express the inverse of a
cosine-matrix of one type in terms of a cosine-matrix of another type, and that
any cosine-matrix is easily expressed in terms of an orthogonal matrix. These
(I) (II) (III) (IV )
orthogonal matrices will be called DCTN , DCTN , DCTN , and DCTN ,
respectively, and they are all called DCT-matrices. The DCTN we constructed
(II)
abobe is thus DCTN . The type-II DCT matrix is the most commonly used,
and the type is therefore often dropped when refering to these. We will consider
the other cases of cosine-matrices at different places in this book: In the next
chapter we will run into type-I cosine matrices, in connection with a different ex-
tension strategy used for wavelets. Type-IV cosine-matrices will be encountered
in exercises 4.5 and 4.6 at the end of this section.
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 143

As with the Fourier basis vectors, the DCT basis vectors are called synthesis
vectors, since we can write

x = y0 d0 + y1 d1 + · · · + yN −1 dN −1 (4.6)
in the same way as for the DFT. Following the same reasoning as for the DFT,
DCT−1N is the matrix where the dn are columns. But since these vectors are real
and orthonormal, DCTN must be the matrix where the dn are rows. Moreover,
since Theorem 4.9 also states that the same vectors are eigenvectors for filters
which preserve symmetric extensions, we can state the following:
Theorem 4.13. The DCT is orthogonal.
DCTN is the orthogonal matrix where the rows are dn . Moreover, for any
digital filter S which preserves symmetric extensions, (DCTN )T diagonalizes Sr ,
i.e. Sr = DCTTN DDCTN where D is a diagonal matrix.

Let us also make the following definition:


Definition 4.14. IDCT.
We will call x = (DCTN )T y the inverse DCT or (IDCT) of x.
Matlab’s functions for computing the DCT and IDCT are called dct, and
idct, respectively. These are defined exactly as they are here, contrary to the
case for the FFT (where a different normalizing factor was used).
With these functions we can repeat examples 2.16- 2.18, by simply replacing
the calls to DFTImpl with calls to the DCT counterparts. You may not here
much improvements in these simple experiments, but in theory the DCT should
be able to approximate sound better.
Similarly to the DFT, one can think of the DCT as a least squares approx-
imation and the unique representation of a function having the same sample
values, but this time in terms of sinusoids instead of complex exponentials:
Theorem 4.15. Interpolation with the DCT basis.
Let f be a function defined on the interval [0, T ], and let x be the sampled
vector given by

xk = f ((2k + 1)T /(2N )) for k = 0, 1, . . . , N − 1.


There is exactly one linear combination g(t) on the form
N
X −1
yn dn,N cos(2π(n/2)t/T )
n=0

which satisfies the conditions

g((2k + 1)T /(2N )) = f ((2k + 1)T /(2N )), k = 0, 1, . . . , N − 1,

and its coefficients are determined by y = DCTN x.


CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 144

Proof. This follows by inserting t = (2k + 1)T /(2N ) in the equation


N
X −1
g(t) = yn dn,N cos(2π(n/2)t/T )
n=0

to arrive at the equations

N −1   
X n 1
f (kT /N ) = yn dn,N cos 2π k+ 0 ≤ k ≤ N − 1.
n=0
2N 2

This gives us an equation system for finding the yn with the invertible DCT
matrix as coefficient matrix, and the result follows.
Due to this there is a slight difference to how we applied the DFT, due to the
subtle change in the sample points, from kT /N for the DFT, to (2k + 1)T /(2N )
for the DCT. The sample points for the DCT are thus the midpoints on the
intervals in a uniform partition of [0, T ] into N intervals, while they for the DFT
are the start points on the intervals. Also, the frequencies are divided by 2. In
Figure 4.2 we have plotted the sinusoids of Theorem 4.15 for T = 1, as well as
the sample points used in that theorem.
The sample points in the upper left plot correspond to the first column in the
DCT matrix, the sample points in the upper right plot to the second column of
the DCT matrix, and so on (up to normalization with dn,N ). As n increases, the
functions oscillate more and more. As an example, y5 says how much content of
maximum oscillation there is. In other words, the DCT of an audio signal shows
the proportion of the different frequencies in the signal, and the two formulas
y = DCTN x and x = (DCTN )T y allow us to switch back and forth between
the time domain representation and the frequency domain representation of the
sound. In other words, once we have computed y = DCTN x, we can analyse
the frequency content of x. If we want to reduce the bass we can decrease the
y-values with small indices and if we want to increase the treble we can increase
the y-values with large indices.

Example 4.1: Computing lower order DCTs


As with Example 2.3, exact expressions for the DCT can be written down just
for a few specific cases. It turns out that the case N = 4 as considered in
Example 2.3 does not give the same type of nice, exact values, so let us instead
consider the case N = 2. We have that

!
√1 cos(0) √1 cos(0) √1 √1
 
DCT4 = 2 2 = 2 2
cos π2 0 + 12 cos π2 1 + 12 √1 − √12
 
2

The DCT of the same vector as in Example 2.3 can now be computed as:
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 145

1.06 1.0
1.04
0.5
1.02
1.00 0.0
0.98
0.5
0.96
0.940.0 0.2 0.4 0.6 0.8 1.0 1.00.0 0.2 0.4 0.6 0.8 1.0
1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.00.0 0.2 0.4 0.6 0.8 1.0 1.00.0 0.2 0.4 0.6 0.8 1.0
1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.00.0 0.2 0.4 0.6 0.8 1.0 1.00.0 0.2 0.4 0.6 0.8 1.0
Figure 4.2: The 6 different sinusoids used in DCT for N = 6, i.e. cos(2π(n/2)t),
0 ≤ n < 6. The plots also show piecewise linear functions (in red) between the
sample points 2k+1
2N 0 ≤ k < 6, since only the values at these points are used in
Theorem 4.15.

!
√3
 
1 2
DCT2 = .
2 − √12

Exercise 4.2: Computing eigenvalues


Consider the matrix
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 146

 
2 1 0 0 0 0
1 1 1 0 0 0
 
1 0 1 1 1 0 0
S=  
3 0
 0 1 1 1 0
0 0 0 1 1 1
0 0 0 0 1 2
a) Compute the eigenvalues and eigenvectors of S using the results of this
section. You should only need to perform one DFT or one DCT in order to
achieve this.
b) Use a computer to compute the eigenvectors and eigenvalues of S also. What
are the differences from what you found in a)?
c) Find a filter T so that S = Tr . What kind of filter is T ?

Exercise 4.3: Writing down lower order Sr


Consider the averaging filter S = { 14 , 12 , 14 }. Write down the matrix Sr for the
case when N = 4.

Exercise 4.4: Writing down lower order DCTs


As in Example 4.1, state the exact cartesian form of the DCT matrix for the
case N = 3.

Exercise 4.5: DCT-IV


n 
n+ 1 oN −1
Show that the vectors cos 2π 2N2 k + 12 in RN are orthogonal, with
n=0 q
n+ 1
p  
lengths N/2. This means that the matrix with entries N2 cos 2π 2N2 k + 12
is orthogonal. Since this matrix also is symmetric, it is its own inverse. This is
(IV)
the DCT-IV, which we denote by DCTN . Although we will not consider this,
the DCT-IV also has an efficient implementation.

Hint. Compare with the orthogonal vectors dn , used in the DCT.

Exercise 4.6: MDCT


The MDCT is defined as the N ×(2N )-matrix M with elements Mn,k = cos(2π(n+
1/2)(k + 1/2 + N/2)/(2N )). This exercise will take you through the details of
the transformation which corresponds to multiplication with this matrix. The
MDCT is very useful, and is also used in the MP3 standard and in more recent
standards.
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 147

a) Show that

r  
N (IV) 0 A
M= DCTN
2 B 0

where A and B are the (N/2) × N -matrices

 
··· ··· 0 −1 −1 0 ···
···
 .. .. .. .. .. .. .. ..   
f
A= . . . . . . . . 
 = −IN/2 −IN/2

0 −1 ··· ··· · · · · · · −1 0 
−1 0 ··· ··· · · · · · · 0 −1
 
1 0 ··· ··· · · · · · · 0 −1
0 1 ··· ··· · · · · · · −1 0  
f

B= . ..  = IN/2 −IN/2 .
 
.. .. .. .. .. ..
 .. . . . . . . . 
··· ··· 0 1 −1 0 · · · · · ·

Due to this expression, any algorihtm for the DCT-IV can be used to compute
the MDCT.
b) The MDCT is not invertible, since it is not a square matrix. We will show
here that it still can be used in connection with invertible transformations. We
first define the IMDCT as the matrix M T /N . Transposing the matrix expression
we obtained in a) gives

BT
 
1 0 (IV)
√ DCTN
2N AT 0

for the IMDCT, which thus also has an efficient implementation. Show that if

x0 = (x0 , . . . , xN −1 ) x1 = (xN , . . . , x2N −1 ) x2 = (x2N , . . . , x3N −1 )

and

   
x0 x1
y0,1 = M y1,2 = M
x1 x2

(i.e. we compute two MDCT’s where half of the data overlap), then
−1 N −1
x1 = {IMDCT(y0,1 )}2N
k=N + {IMDCT(y1,2 )}k=0 .

Even though the MDCT itself is not invertible, the input can still be recovered
from overlapping MDCT’s.
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 148

4.2 Improvements using the DCT for interpola-


tion
Recall that, in Section 3.2.2, we explained how to approximate an analog filter
from the samples. It turns out that, when an analog filter is symmetric, we can
use symmetric extensions to create a better approximation from the samples.
Assume that s is an analog filter, and that we apply it to a general function
f . Denote as before the symmetric extension of f by f˘. We start with the
following observation, which follows from the continuity of s.
Observation 4.16. Using symmetric extensions for approximations.
Since (f˘)N is a better approximation to f˘, compared to what fN is to f ,
s((f )N ) is a better approximation to s(f˘), compared to what s(fN ) is to s(f ).
˘

Since s(f˘) agrees with s(f ) except near the boundaries, we can thus conclude
that s((f˘)N ) is a better approximation to s(f ) than what s(fN ) is.
We have seen that the restriction of s to VM,T is equivalent to an N × N
digital filter S, where N = 2M + 1. Let x be the samples of f , x̆ the samples of
f˘. Turning around the fact that (f˘)N is a better approximation to f˘, compared
to what fN is to f , the following is clear.
Observation 4.17. Using symmetric extensions for approximations.
The samples x̆ are a better approximation to the samples of (f˘)N , than the
samples x are to the samples of fN .
Now, let z = Sx, and z̆ = S x̆. The following is also clear from the preceding
observation, due to continuity of the digital filter S.
Observation 4.18. Using symmetric extensions for approximations.
z̆ is a better approximation to S(samples of (f˘)N ) = samples of s((f˘)N ),
than z is to S(samples of fN ) = samples of s(fN ).

Since by Observation 4.16 s((f˘)N ) is a better approximation to the output


s(f ), we conclude that z̆ is a better approximation than z to the samples of the
output of the filter.
Observation 4.19. Using symmetric extensions for approximations.
S x̆ is a better approximation to the samples of s(f ) than Sx is (x are the
samples of f ).
Now, let us also bring in the assumption that s is symmetric. Then the
corresponding digital filter S is also symmetric, and we know then that we can
view its restriction to symmetric extensions in R2N in terms of the mapping
Sr : RN → RN . We can thus specialize Figure 3.3 to symmetric filters by adding
the step of creating the symmetric extension, and replacing S with Sr . We have
summarized these remarks in Figure 4.3. The DCT appears here, since we have
used Theorem 4.15 to interpolate with the DCT basis, instead of the Fourier
basis. Note that this also requires that the sampling is performed as required
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 149

f / s(f˜˘)
O


˘
f



 Sr DCTN
(x̆0 , x̆1 , . . . , x̆N −1 ) / (z̆ 0 , z̆ 1 , . . . , z̆ N −1 ) /y

Figure 4.3: The connections between the new mapping Sr , sampling, and
interpolation. The right
PNvertical arrow represents interpolation with the DCT,
−1
i.e. that we compute n=0 yn dn,N cos(2π(n/2)t/T ) for values of t.

in that theorem, i.e. the samples are the midpoints on all intervals. This new
sampling procedure is not indicated in Figure 4.3.
Figure 4.3 can be further simplified to that shown in Figure 4.4.

f / s(f˜˘)
O

 Sr
/z DCTN
/y
x

Figure 4.4: Simplification of Figure 4.3. The left vertical arrow represents
sampling as dictated by the DCT.

Note that the assumption that s is symmetric only helped us to implement


˜
the approximation s(f˘) more efficiently, since Sr has N points and S has 2N
˜
points. s(f˘) can in any way be used as an approximation, even if s is not
symmetric, but the mapping does not preserve symmetry.
As mentioned in Section 3.2, interpolation of a function from its samples can
be seen as a special case. This can thus be illustrated as in Figure 4.5.
Note that the approximation lies in V2M,2T (i.e. it is in a higher order Fourier
space), but the point is that the same number of samples is used.

4.2.1 Implementations of symmetric filters


Symmetric filters are also important for applications since they can be imple-
mented efficiently. To see this, we can write
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 150

f / s(f˜˘)
O

 DCTN
/y
x

Figure 4.5: How we can approximate a function from its samples with the DCT.

N
X −1
(Sx)n = sk x(n−k) mod N
k=0
(N −1)/2 N −1
X X
= s0 xn + sk x(n−k) mod N + sk x(n−k) mod N
k=1 k=(N +1)/2
(N −1)/2 (N −1)/2
X X
= s0 xn + sk x(n−k) mod N + sk x(n−(N −k)) mod N
k=1 k=1
(N −1)/2
X
= s0 xn + sk (x(n−k) mod N + x(n+k) mod N ). (4.7)
k=1

If we compare the first and last expressions here, we need the same number of
summations, but the number of multiplications needed in the latter expression
has been halved.
Observation 4.20. Reducing arithmetic operations for symmetric filters.
Assume that a symmetric filter has 2s + 1 filter coefficients. The filter applied
to a vector of length N can then be implemented using (s + 1)N multiplications
and 2sN additions. This gives a reduced number of arithmetic operations when
compared to a filter with the same number of coefficients which is not symmetric,
where a direct implementations requires (2s + 1)N multiplications and 2sN
additions.
Similarly to as in Section 3.6.2, a symmetric filter can be factored into a
product of symmetric filters. To see how, note first that a real polynomial is
symmetric if and only if 1/a is a root whenever a is. If we pair together the
factors for the roots a, 1/a when a is real we get a component in the frequency
response of degree 2. If we pair the factors for the roots a, 1/a, a, 1/a when a is
complex, we get a component in the frequency response of degree 4. We thus
get the following idea:
Idea 4.21. Factorizing symmetric filters.
Let S be a symmetric filter with real coefficients. There exist constants K,
a1 , . . . , am , b1 , c1 , . . . , bn , cn so that
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 151

λS (ω) =K(a1 eiω + 1 + a1 e−iω ) . . . (am eiω + 1 + am e−iω )


× (b1 e2iω + c1 eiω + 1 + c1 e−iω + b1 e−2iω ) . . .
× (bn e2iω + cn eiω + 1 + cn e−iω + bn e−2iω ).

We can write S = KA1 . . . Am B1 . . . Bn , where Ai = {ai , 1, ai } and Bi =


{bi , ci , 1, ci , bi }.
In any case we see that the component filters have 3 and 5 filter coefficients.

Exercise 4.7: Component expressions for a symmetric filter


Assume that S = t−L , . . . , t0 , . . . , tL is a symmetric filter. Use Equation (4.7)
to show that zn = (Sx)n in this case can be split into the following different
formulas, depending on n:
a) 0 ≤ n < L:

n
X L
X
zn = t0 xn + tk (xn+k + xn−k ) + tk (xn+k + xn−k+N ). (4.8)
k=1 k=n+1

b) L ≤ n < N − L:
L
X
zn = t0 xn + tk (xn+k + xn−k ). (4.9)
k=1

c) N − L ≤ n < N :

−1−n
NX L
X
zn = t0 xn + tk (xn+k + xn−k ) + tk (xn+k−N + xn−k ). (4.10)
k=1 k=N −1−n+1

The convolve function may not pick up this reduction in the number of
multiplications, since it does not assume that the filter is symmetric. We will
still use the convolve function in implementations, however, due to its heavy
optimization.

4.3 Efficient implementations of the DCT


When we defined the DCT in the preceding section, we considered symmetric
vectors of twice the length, and viewed these in the frequency domain. In order to
have a fast algorithm for the DCT, which are comparable to the FFT algorithms
we developed in Section 2.3, we need to address the fact that vectors of twice
the length seem to be involved. The following theorem addresses this. This
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 152

result is much used in practical implementations of DCT, and can also be used
for practical implementation of the DFT as we will see in Exercise 4.9. Note
that the result, and the following results in this section, are stated in terms
n
k + 12 ,

of the cosine matrix CN (where the entries are (CN )n,k = cos 2π 2N
rather than the DCTN matrix (which uses the additional scaling factor dn,N
for the rows). The reason is that CN appears to me most practical for stating
algorithms. When computing the DCT, we simply need to scale with the dn,N
at the end, after using the statements below.
Theorem 4.22. DCT algorithm.
Let y = CN x. Then we have that

  n   n  
yn = cos π <((DFTN x(1) )n ) + sin π =((DFTN x(1) )n ) , (4.11)
2N 2N
where x(1) ∈ RN is defined by

(x(1) )k = x2k for 0 ≤ k ≤ N/2 − 1


(x(1) )N −k−1 = x2k+1 for 0 ≤ k ≤ N/2 − 1,

Proof. Using the definition of CN , ans splitting the computation of y = CN x


into two sums, corresponding to the even and odd indices as follows:

N −1   
X n 1
yn = xk cos 2π k+
2N 2
k=0
N/2−1    N/2−1   
X n 1 X n 1
= x2k cos 2π 2k + + x2k+1 cos 2π 2k + 1 + .
2N 2 2N 2
k=0 k=0

If we reverse the indices in the second sum, this sum becomes


N/2−1   
X n 1
xN −2k−1 cos 2π N − 2k − 1 + .
2N 2
k=0

If we then also shift the indices with N/2 in this sum, we get

N −1   
X n 1
x2N −2k−1 cos 2π 2N − 2k − 1 +
2N 2
k=N/2
N −1   
X n 1
= x2N −2k−1 cos 2π 2k + ,
2N 2
k=N/2
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 153

where we used that cos is symmetric and periodic with period 2π. We see that
we now have the same cos-terms in the two sums. If we thus define the vector
x(1) as in the text of the theorem, we see that we can write

N −1   
X n 1
yn = (x(1) )k cos 2π 2k +
2N 2
k=0
−1
N
!
−2πin(2k+ 12 )/(2N )
X
(1)
=< (x ) e k
k=0
−1
N
!
X
−πin/(2N ) (1) −2πink/N
=< e (x )k e
k=0
 
= < e−πin/(2N ) (DFTN x(1) )n
  n   n  
= cos π <((DFTN x(1) )n ) + sin π =((DFTN x(1) )n ) ,
2N 2N
where we have recognized the N -point DFT. This completes the proof.
With the result above we have avoided computing a DFT of double size. If we
in the proof above define the N × N -diagonal matrix QN by Qn,n = e−πin/(2N ) ,
the result can also be written on the more compact form
 
y = CN x = < QN DFTN x(1) .
We will, however, not use this form, since there is complex arithmetic involved,
contrary to Equation(4.11). Code which uses Equation (4.11) to compute the
DCT, using the function FFTImpl from Section 2.3, can look as follows:
def DCTImpl(x):
"""
Compute the DCT of the vector x
x: a vector
"""
N = len(x)
if N > 1:
x1 = concatenate([x[0::2], x[-1:0:-2]]).astype(complex)
FFTImpl(x1, FFTKernelStandard)
cosvec = cos(pi*arange(float(N))/(2*N))
sinvec = sin(pi*arange(float(N))/(2*N))
if ndim(x) == 1:
x[:] = cosvec*real(x1) + sinvec*imag(x1)
else:
for s2 in xrange(shape(x)[1]):
x[:, s2] = cosvec*real(x1[:, s2]) \
+ sinvec*imag(x1[:, s2])
x[0] *= sqrt(1/float(N))
x[1:] *= sqrt(2/float(N))

In the code, the vector x(1) is created first by rearranging the components, and
it is sent as input to FFTImpl. After this we take real parts and imaginary parts,
and multiply with the cos- and sin-terms in Equation (4.11).
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 154

4.3.1 Efficient implementations of the IDCT


As with the FFT, it is straightforward to modify the DCT implementation so
that it returns the IDCT. To see how we can do this, write from Theorem 4.22,
for n ≥ 1

  n   n  
yn = cos π <((DFTN x(1) )n ) + sin π =((DFTN x(1) )n )
  2N  2N   
N −n (1) N −n
yN −n = cos π <((DFTN x )N −n ) + sin π =((DFTN x(1) )N −n )
2N 2N
  n   n  
= sin π <((DFTN x(1) )n ) − cos π =((DFTN x(1) )n ) ,
2N 2N
(4.12)

where we have used the symmetry of DFTN for real signals. These two equations
enable us to determine <((DFTN x(1) )n ) and =((DFTN x(1) )n ) from yn and
yN −n . We get

 n   n 
cos π yn + sin π yN −n = <((DFTN x(1) )n )
2N
 n  2N
 n 
sin π yn − cos π yN −n = =((DFTN x(1) )n ).
2N 2N
Adding we get

 n   n   n   n 
(DFTN x(1) )n = cos π yn + sin π yN −n + i(sin π yn − cos π yN −n )
 2N
n   n 2N 2N 2N
=(cos π + i sin π )(yn − iyN −n ) = eπin/(2N ) (yn − iyN −n ).
2N 2N
This means that (DFTN x(1) )n = eπin/(2N ) (yn + iyN −n ) = (yn + iyN −n )/Qn,n
1
for n ≥ 1. Since =((DFTN x(1) )0 ) = 0 we have that (DFTN x(1) )0 = d0,N y0 =
y0 /Q0,0 . This means that x(1) can be recovered by taking the IDFT of the
vector with component 0 being y0 /Q0,0 , and the remaining components being
(yn − iyN −n )/Qn,n :

Theorem 4.23. IDCT algorithm.


Let x = (CN )−1 y. and let z be the vector with component 0 being y0 /Q0,0 ,
and the remaining components being (yn − iyN −n )/Qn,n . Then we have that

x(1) = IDFTN z,
where x(1) is defined as in Theorem 4.22.
The implementation of IDCT can thus go as follows:
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 155

def IDCTImpl(y):
"""
Compute the IDCT of the vector y
y: a vector
"""
N = len(y)
if N > 1:
y[0] /= sqrt(1/float(N))
y[1:] /= sqrt(2/float(N))
Q = exp(-pi*1j*arange(float(N))/(2*N))
y1 = zeros_like(y).astype(complex)
y1[0] = y[0]/Q[0]
if ndim(y) == 1:
y1[1:] = (y[1:] - 1j*y[-1:0:-1])/Q[1:]
else:
for s2 in xrange(shape(y)[1]):
y1[1:, s2] = (y[1:, s2] - 1j*y[-1:0:-1, s2])/Q[1:]
FFTImpl(y1, FFTKernelStandard, 0)
y[0::2] = real(y1[0:(N/2)])
y[1::2] = real(y1[-1:(N/2-1):-1])

4.3.2 Reduction in the number of arithmetic operations


Let us also state a result which confirms that the DCT and IDCT implementations
we have described give the same type of reductions in the number multiplications
as the FFT and IFFT:
Theorem 4.24. Number of multiplications required by the DCT and IDCT
algorithms.
The DCT and the IDCT can be implemented so that they use any FFT and
IFFT algorithms. Their operation counts then have the same order as these. In
particular, when the standard FFT algorithms of Section 2.3 are used, i.e. their
operation counts are O(5N log2 /2). In comparison, the operation count for a
direct implementation of the N -point DCT/IDCT is 2N 2 .
Note that we divide the previous operation counts by 2 since the DCT applies
an FFT to real input only, and the operation count for the FFT can be halved
when we adapt to real data, see Exercise 2.27.
Proof. By Theorem 2.20, the number of multiplications required by the standard
FFT algorithm from Section 2.3 adapted to real data is O(N log2 N ), while
the number of additions is O(3N log2 N/2). By Theorem 4.22, two additional
multiplications and one addition are required for each index (so that we have
2N extra real multiplications and N extra real additions in total), but this does
not affect the operation count, since O(N log2 N + 2N ) = O(N log2 N ). Since
the operation counts for the IFFT is the same as for the FFT, we only need
to count the additional multiplications needed in forming the vector z = (yn −
iyN −n )/Qn,n . Clearly, this also does not affect the order of the algorithm.
Since the DCT and IDCT can be implemented using the FFT and IFFT,
it has the same advantages as the FFT when it comes to parallel computing.
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 156

Much literature is devoted to reducing the number of multiplications in the


DFT and the DCT even further than what we have done (see [22] for one of the
most recent developments). Another note on computational complexity is in
order: we have not counted the operations sin and cos in the DCT. The reason
is that these values can be precomputed, since we take the sine and cosine of
a specific set of values for each DCT or DFT of a given size. This is contrary
to to multiplication and addition, since these include the input values, which
are only known at runtime. We have, however, not written down that we use
precomputed arrays for sine and cosine in our algorithms: This is an issue to
include in more optimized algorithms.

Exercise 4.8: Trick for reducing the number of multiplica-


tions with the DCT
In this exercise we will take a look at a small trick which reduces the number of
additional multiplications we need for DCT algorithm from Theorem 4.22. This
exercise does not reduce the order of the DCT algorithms, but we will see in
Exercise 4.9 how the result can be used to achieve this.
a) Assume that x is a real signal. Equation (4.12), which said that

 n   n 
yn = cos π <((DFTN x(1) )n ) + sin π =((DFTN x(1) )n )
2N
 n  2N
 n 
yN −n = sin π <((DFTN x(1) )n ) − cos π =((DFTN x(1) )n )
2N 2N
for the n’th and N − n’th coefficient of the DCT. This can also be rewritten as

   n 
yn = <((DFTN x(1) )n ) + =((DFTN x(1) )n ) cos π
 n   n  2N
(1)
− =((DFTN x )n )(cos π − sin π )
 2N 2N  n 
yN −n = − <((DFTN x(1) )n ) + =((DFTN x(1) )n ) cos π
 n   n  2N
(1)
+ <((DFTN x )n )(sin π + cos π ).
2N 2N
Explain that the first two equations require 4 multiplications to compute yn and
yN −n , and that the last two equations require 3 multiplications to compute yn
and yN −n .
b) Explain why the trick in a) reduces the number of additional multiplications
in a DCT, from 2N to 3N/2.
c) Explain why the trick in a) can be used to reduce the number of additional
multiplications in an IDCT with the same number.

Hint. match the expression eπin/(2N ) (yn − iyN −n ) you encountered in the
IDCT with the rewriting you did in b).
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 157

d) Show that the penalty of the trick we here have used to reduce the number
of multiplications, is an increase in the number of additional additions from N
to 3N/2. Why can this trick still be useful?

Exercise 4.9: An efficient joint implementation of the DCT


and the FFT
In this exercise we will explain another joint implementation of the DFT and
the DCT, which has the benefit of a low multiplication count, at the expense
of a higher addition count. It also has the benefit that it is specialized to real
vectors, with a very structured implementation (this is not always the case for
the quickest FFT implementations. Not surprisingly, one often sacrifices clarity
of code when one pursues higher computational speed). a) of this exercise can be
skipped, as it is difficult and quite technical. For further details of the algorithm
the reader is refered to [48].
a) Let y = DFTN x be the N -point DFT of the real vector x. Show that


 <((DFTN/2 x(e) )n ) + (CN/4 z)n 0 ≤ n ≤ N/4 − 1
<(yn ) = <((DFTN/2 x(e) )n ) n = N/4
<((DFTN/2 x(e) )n ) − (CN/4 z)N/2−n N/4 + 1 ≤ n ≤ N/2 − 1

(4.13)

 =((DFTN/2 x(e) )n ) n=0
=(yn ) = =((DFTN/2 x(e) )n ) + (CN/4 w)N/4−n 1 ≤ n ≤ N/4 − 1
=((DFTN/2 x(e) )n ) + (CN/4 w)n−N/4 N/4 ≤ n ≤ N/2 − 1

(4.14)

where x(e) is as defined in Theorem 2.15, where z, w ∈ RN/4 defined by

zk = x2k+1 + xN −2k−1 0 ≤ k ≤ N/4 − 1,


k
wk = (−1) (xN −2k−1 − x2k+1 ) 0 ≤ k ≤ N/4 − 1,

Explain from this how you can make an algorithm which reduces an FFT of
length N to an FFT of length N/2 (on x(e) ), and two DCT’s of length N/4 (on
z and w). We will call this algorithm the revised FFT algorithm.
a) says nothing about the coefficients yn for n > N2 . These are obtained in
the same way as before through symmetry. a) also says nothing about yN/2 .
This can be obtained with the same formula as in Theorem 2.15.
Let us now compute the number of arithmetic operations our revised algorithm
needs. Denote by the number of real multiplications needed by the revised N -
point FFT algorithm
b) Explain from the algorithm in a) that
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 158

MN = 2(MN/4 + 3N/8) + MN/2 AN = 2(AN/4 + 3N/8) + AN/2 + 3N/2


(4.15)

Hint. 3N/8 should come from the extra additions/multiplications (see Exer-
cise 4.8) you need to compute when you run the algorithm from Theorem 4.22
for CN/4 . Note also that the equations in a) require no extra multiplications,
but that there are xix equations involved, each needing N/4 additions, so that
we need 6N/4 = 3N/2 extra additions.
c) Explain why xr = M2r is the solution to the difference equation

xr+2 − xr+1 − 2xr = 3 × 2r ,


and that xr = A2r is the solution to

xr+2 − xr+1 − 2xr = 9 × 2r .


and show that the general solution to these are xr = 12 r2r + C2r + D(−1)r for
multiplications, and xr = 32 r2r + C2r + D(−1)r for additions.
d) Explain why, regardless of initial conditions
 to the difference equations,
MN = O 12 N log2 N and AN = O 32 N log2 N both for the revised FFT and


the revised DCT. The total number of operations is thus O(2N log2 N ), i.e. half
the operation count of the split-radix algorithm. The orders of these algorithms
are thus the same, since we here have adapted to read data.
e) Explain that, if you had not employed the
 trick from Exercise 4.8, we
 would
instead have obtained MN = O 23 log2 N , and AN = O 43 log2 N , which
equal the orders for the number of multiplications/additions for the split-radix
algorithm. In particular, the order of the operation count remains the same,
but the trick from Exercise 4.8 turned a bigger percentage of the arithmetic
operations into additions.
The algorithm we here have developed thus is constructed from the beginning
to apply for real data only. Another advantage of the new algorithm is that it
can be used to compute both the DCT and the DFT.

Exercise 4.10: Implementation of the IFFT/IDCT


We did not write down corresponding algorithms for the revised IFFT and IDCT
algorithms. We will consider this in this exercise.
a) Using equations (4.13)-(4.14), show that

<(yn ) − <(yN/2−n ) = 2(CN/4 z)n


=(yn ) + =(yN/2−n ) = 2(CN/4 w)N/4−n
for 1 ≤ n ≤ N/4 − 1. Explain how one can compute z and w from this using
two IDCT’s of length N/4.
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 159

b) Using equations (4.13)-(4.14), show that

<(yn ) + <(yN/2−n ) = <((DFTN/2 x(e) )n )


=(yn ) − =(yN/2−n ) = =((DFTN/2 x(e) )n ),

and explain how one can compute x(e) from this using an IFFT of length N/2.

4.4 Summary
We started this chapter by extending a previous result which had to do with that
the Fourier series of a symmetric function converged quicker. To build on this
we first needed to define symmetric extensions of vectors and symmetric vectors,
before we classified symmetric extensions in the frequency domain. From this
we could find a nice, orthonormal basis for the symmetric extensions, which
lead us to the definition of the DCT. We also saw a connection with symmetric
filters: These are exactly the filters which preserve symmetric extensions, and
we could characterize symmetric filters restricted to symmetric extension as an
N -dimensional mapping. We also showed that it is smart to replace the DFT
with the DCT when we work with filters which are known to be symmetric.
Among other things, this lead to a better way of approximating analog filters,
and better interpolation of functions.
We also showed how to obtain an efficient implementation of the DCT, which
could reuse the FFT implementation. The DCT has an important role in the
MP3 standard. As we have explained, the MP3 standard applies several filters
to the sound, in order to split it into bands concentrating on different frequency
ranges. Later we will look closer at how these filters can be implemented and
constructed. The implementation can use transforms similar to the MDCT, as
explained in Exercise 4.6. The MDCT is also used in the more advanced version
of the MP3 standard (layer III). Here it is applied to the filtered data to obtain
a higher spectral resolution of the sound. The MDCT is applied to groups of 576
(in special circumstances 192) samples. The MP3 standard document [20] does
not dig into the theory for this, only representing what is needed in order to
make an implementation. It is somewhat difficult to read this document, since it
is written in quite a different language, familiar mainly to those working with
international standards.
The different type of cosine-matrices can all be associated with some extension
strategy for the signal. [34] contains a review of these.
The DCT is particularly popular for processing sound data before they are
compressed with lossless techniques such as Huffman coding or arithmetic coding.
The reason is, as mentioned, that the DCT provides a better approximation
from a low-dimensional space than the DFT does, and that it has a very efficient
implementation. Libraries exist which goes into lengths to provide efficient
implementation of the FFT and the DCT. FFTW, short for Fastest Fourier
Transform in the West [17], is perhaps the best known of these.
CHAPTER 4. SYMMETRIC FILTERS AND THE DCT 160

Signal processing literature often does not motivate digital filters in explaining
where they come from, and where the input to the filters come from. Using
analog filters to motivate this, and to argue for improvements in using the DCT
and symmeric extensions, is not that common. Much literature simply says that
the property of linear phase is good, without elaborating on this further.
Chapter 5

Motivation for wavelets and


some simple examples

In the first part of the book our focus was to approximate functions or vectors
with trigonometric functions. We saw that the Discrete Fourier transform could
be used to obtain a representation of a vector in terms of such functions, and
that computations could be done efficiently with the FFT algorithm. This was
useful for analyzing, filtering, and compressing sound and other discrete data.
The approach with trigonometric functions has some limitations, however. One
of these is that, in a representation with trigonometric functions, the frequency
content is fixed over time. This is in contrast with most sound data, where
the characteristics are completely different in different parts. We have also
seen that, even if a sound has a simple representation in terms of trigonometric
functions on two different parts, the representation of the entire sound may not
be simple. In particular, if the function is nonzero only on a very small interval,
a representation of it in terms of trigonometric functions is not so simple.
In this chapter we are going to introduce the basic properties of an alternative
to Fourier analysis for representing functions. This alternative is called wavelets.
Similar to Fourier analysis, wavelets are also based on the idea of expressing a
function in some basis. But in contrast to Fourier analysis, where the basis is
fixed, wavelets provide a general framework with many different types of bases.
In this chapter we first give a motivation for wavelets, before we continue by
introducing some very simple wavelets. The first wavelet we look at can be
interpreted as an approximation scheme based on piecewise constant functions.
The next wavelet we look at is similar, but with piecewise linear functions used
instead. Following these examples we will establish a more general framework,
based on experiences from the simple wavelets. In the following chapters we will
interpret this framework in terms of filters, and use this connection to construct
even more interesting wavelets.
Core functions in this chapter are collected in a module called dwt.

161
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES162

5.1 Why wavelets?


The left image in Figure 5.1 shows a view of the entire Earth.

Figure 5.1: A view of Earth from space, together with versions of the image
where we have zoomed in.

The startup image in Google EarthTM , a program for viewing satellite images,
maps and other geographic information, is very similar to this. In the middle
image we have zoomed in on the Mexican Gulff, as marked with a rectangle in
the left image. Similarly, in the right image we have further zoomed in on Cuba
and a small portion of Florida, as marked with a rectangle in the middle image.
There is clearly an amazing amount of information available behind a program
like Google EarthTM , since we there can zoom further in, and obtain enough
detail to differentiate between buildings and even trees or cars all over the Earth.
So, when the Earth is spinning in the opening screen of Google EarthTM , all
the Earth’s buildings appear to be spinning with it! If this was the case the
Earth would not be spinning on the screen, since there would just be so much
information to process that a laptop would not be able to display a rotating
Earth.
There is a simple reason that the globe can be shown spinning in spite of
the huge amounts of information that need to be handled. We are going to see
later that a digital image is just a rectangular array of numbers that represent
the color at a dense set of points. As an example, the images in Figure 5.1 are
made up of a grid of 1064 × 1064 points, which gives a total of 1 132 096 points.
The color at a point is represented by three eight-bit integers, which means that
the image files contain a total of 3 396 288 bytes each. So regardless of how
close to the surface of the Earth our viewpoint is, the resulting image always
contains the same number of points. This means that when we are far away
from the Earth we can use a very coarse model of the geographic information
that is being displayed, but as we zoom in, we need to display more details and
therefore need a more accurate model.
Observation 5.1. Images model.
When discrete information is displayed in an image, there is no need to use a
mathematical model that contains more detail than what is visible in the image.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES163

A consequence of Observation 5.1 is that for applications like Google EarthTM


we should use a mathematical model that makes it easy to switch between different
levels of detail, or different resolutions. Such models are called multiresolution
models, and wavelets are prominent examples of this kind of models. We will
see that multiresolution models also provide us with means of approximating
functions, just as Taylor series and Fourier series. Our new approximation scheme
differs from these in one important respect, however: When we approximate
with Taylor series and Fourier series, the error must be computed at the same
data points as well, so that the error contains just as much information as the
approximating function, and the function to be approximated. Multiresolution
models on the other hand will be defined in such a way that the error and the
“approximating function” each contain half of the information from the function
we approximate, i.e. their amount of data is reduced. This property makes
multiresolution models attractive for the problems at hand, when compared to
approaches such as Taylor series and Fourier series.
When we zoom in with Google EarthTM , it seems that this is done contin-
uously. The truth is probably that the program only has representations at
some given resolutions (since each representation requires memory), and that
one interpolates between these to give the impression of a continuous zoom. In
the coming chapters we will first look at how we can represent the information
at different resolutions, so that only new information at each level is included.
We will now turn to how wavelets are defined more formally, and construct
the simplest wavelet we have. Its construction goes in the following steps: First
we introduce what we call resolution spaces, and the corresponding scaling
function. Then we introduce the detail spaces, and the corresponding mother
wavelet. These two functions will give rise to certain bases for these spaces,
and we will define the Discrete Wavelet Transform as a change of coordinates
between these bases.

5.2 A wavelet based on piecewise constant func-


tions
Our starting point will be the space of piecewise constant functions on an interval
[0, N ). This will be called a resolution space.
Definition 5.2. The resolution space V0 .
Let N be a natural number. The resolution space V0 is defined as the space
of functions defined on the interval [0, N ) that are constant on each subinterval
[n, n + 1) for n = 0, . . . , N − 1.
Note that this also corresponds to piecewise constant functions which are
periodic with period N . We will, just as we did in Fourier analysis, identify a
function defined on [0, N ) with its (period N ) periodic extension. An example
of a function in V0 for N = 10 is shown in Figure 5.2. It is easy to check that V0
is a linear space, and for computations it is useful to know the dimension of the
space and have a basis.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES164

8
7
6
5
4
3
2
1
00 2 4 6 8 10
Figure 5.2: A piecewise constant function.

Lemma 5.3. The function φ.


Define the function φ(t) by
(
1, if 0 ≤ t < 1;
φ(t) = (5.1)
0, otherwise;
and set φn (t) = φ(t − n) for any integer n. The space V0 has dimension N , and
−1
the N functions {φn }Nn=0 form an orthonormal basis for V0 with respect to the
standard inner product
Z N
hf, gi = f (t)g(t) dt. (5.2)
0
In particular, any f ∈ V0 can be represented as
N
X −1
f (t) = cn φn (t) (5.3)
n=0
−1
for suitable coefficients (cn )N
n=0 . The function φn is referred to as the character-
istic function of the interval [n, n + 1).
Note the small difference between the inner product we define here from the
inner product we used for functions previously: Here there is no scaling 1/T
involved. Also, for wavelets we will only consider real functions, and the inner
product will therefore not be defined for complex functions. Two examples of
the basis functions defined in Lemma 5.3 are shown in Figure 5.3.
R
Proof. Two functions φn1 and φn2 with n1 6= n2 clearly satisfy φn1 (t)φn2 (t)dt =
0 since φn1 (t)φn2 (t) = 0 for all values of x. It is also easy to check that kφn k = 1
for all n. Finally, any function in V0 can be written as a linear combination the
functions φ0 , φ1 , . . . , φN −1 , so the conclusion of the lemma follows.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES165

1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.00 2 4 6 8 10 0.00 2 4 6 8 10
Figure 5.3: The basis functions φ2 and φ7 from φ0 .

1.0
1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.0
0 2 4 6 8 10 1.00 2 4 6 8 10
Figure 5.4: Examples of functions from V0 . The square wave in V0 (left), and
an approximation to cos t from V0 (right).

In our discussion of Fourier analysis, the starting point was the function
sin(2πt) that has frequency 1. We can think of the space V0 as being analogous
PN −1
to this function: The function n=0 (−1)n φn (t) is (part of the) square wave
that we discussed in Chapter 1, and which also oscillates regularly like the sine
function, see the left plot in Figure 5.4. The difference is that we have more
flexibility since we have a whole space at our disposal instead of just one function
— the right plot in Figure 5.4 shows another function in V0 .
In Fourier analysis we obtained a linear space of possible approximations by
including sines of frequency 1, 2, 3, . . . , up to some maximum. We use a similar
approach for constructing wavelets, but we double the frequency each time and
label the spaces as V0 , V1 , V2 , . . .

Definition 5.4. Refined resolution spaces.


The space Vm for the interval [0, N ) is the space of piecewise constant
functions defined on [0, N ) that are constant on each subinterval [n/2m , (n +
1)/2m ) for n = 0, 1, . . . , 2m N − 1.
Some examples of functions in the spaces V1 , V2 and V3 for the interval [0, 10]
are shown in Figure 5.5. As m increases, we can represent smaller details. In
particular, the function in the rightmost is a piecewise constant function that
oscillates like sin(2π22 t) on the interval [0, 10].
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES166

1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.00 2 4 6 8 10 1.00 2 4 6 8 10
1.0
1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.0
1.00 2 4 6 8 10 0 2 4 6 8 10
Figure 5.5: Piecewise constant approximations to cos t on the interval [0, 10] in
the spaces V1 , V2 , and V3 . The lower right plot shows the square wave in V2 .

It is easy to find a basis for Vm , we just use the characteristic functions of


each subinterval.
Lemma 5.5. Basis for Vm .
Let [0, N ) be a given interval with N some positive integer. Then the
dimension of Vm is 2m N . The functions

φm,n (t) = 2m/2 φ(2m t − n), for n = 0, 1, . . . , 2m N − 1 (5.4)


form an orthonormal basis for Vm , which we will denote by φm . Any function
f ∈ Vm can thus be represented uniquely as
2mX
N −1
f (t) = cm,n φm,n (t).
n=0

Proof. The functions given by Equation (5.4) are nonzero on the subintervals
[n/2m , (n+1)/2m ) which we referred to in Definition 5.4, so that φm,n1 φm,n2 = 0
when n1 6= n2 , since these intervals are disjoint. The only mysterious thing may
be the normalisation factor 2m/2 . This comes from the fact that

Z N Z (n+1)/2m Z 1
m 2
φ(2 t − n) dt = φ(2m t − n)2 dt = 2−m φ(u)2 du = 2−m .
0 n/2m 0

The normalisation therefore thus ensures that kφm,n k = 1 for all m.


CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES167

In the following we will always denote the coordinates in the basis φm by


cm,n . Note that our definition restricts the dimensions of the spaces we study
to be on the form N 2m . In Chapter 6 we will explain how this restriction can
be dropped, but until then the dimensions will be assumed to be on this form.
In the theory of wavelets, the function φ is also called a scaling function. The
origin behind this name is that the scaled (and translated) functions φm,n of φ
are used as basis functions for the refined resolution spaces. Later on we will see
that other scaling functions φ can be chosen, where the scaled versions φm,n will
be used to define similar resolution spaces, with slightly different properties.

5.2.1 Function approximation property


Each time m is increased by 1, the dimension of Vm doubles, and the subinterval
on which the functions in Vm are constant are halved in size. It therefore seems
reasonable that, for most functions, we can find good approximations in Vm
provided m is big enough.
Theorem 5.6. Resolution spaces and approximation.
Let f be a given function that is continuous on the interval [0, N ]. Given
 > 0, there exists an integer m ≥ 0 and a function g ∈ Vm such that

f (t) − g(t) ≤ 

for all t in [0, N ].

Proof. Since f is (uniformly) continuous on [0, N ], we can find an integer m so


that f (t1 )−f (t2 ) ≤  for any two numbers t1 and t2 in [0, N ] with |t1 −t2 | ≤ 2−m .
Define the approximation g by
2mX
N −1

g(t) = f tm,n+1/2 φm,n (t),
n=0

where tm,n+1/2 is the midpoint of the subinterval n2−m , (n + 1)2−m ,


 

tm,n+1/2 = (n + 1/2)2−m .

For t in this subinterval we then obviously have |f (t) − g(t)| ≤ , and since these
intervals cover [0, N ], the conclusion holds for all t ∈ [0, N ].
Theorem 5.6 does not tell us how to find the approximation g although the
proof makes use of an approximation that interpolates f at the midpoint of each
subinterval. Note that if we measure the error in the L2 -norm, we have
Z N
2 2
kf − gk = f (t) − g(t) dt ≤ N 2 ,
0

so kf − gk ≤  N . We therefore have the following corollary.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES168

Corollary 5.7. Resolution spaces and approximation.


Let f be a given continuous function on the interval [0, N ]. Then

lim kf − projVm (f )k = 0.
m→∞

Figure 5.6 illustrates how some of the approximations of the function f (x) =
x2 from the resolution spaces for the interval [0, 1] improve with increasing m.

1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.20.0 0.2 0.4 0.6 0.8 1.0 0.20.0 0.2 0.4 0.6 0.8 1.0
1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.20.0 0.2 0.4 0.6 0.8 1.0 0.20.0 0.2 0.4 0.6 0.8 1.0
Figure 5.6: Comparison of the function defined by f (t) = t2 on [0, 1] with the
projection onto V2 , V4 , and V6 , respectively.

5.2.2 Detail spaces and wavelets


So far we have described a family of function spaces that allow us to determine
arbitrarily good approximations to a continuous function. The next step is to
introduce the so-called detail spaces and the wavelet functions. We start by
observing that since

[n, n + 1) = [2n/2, (2n + 1)/2) ∪ [(2n + 1)/2, (2n + 2)/2),


we have
1 1
φ0,n = √ φ1,2n + √ φ1,2n+1 .
2 2
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES169

This provides a formal proof of the intuitive observation that V0 ⊂ V1 , for if


g ∈ V0 , we can write
N −1 N −1
X X  √
g(t) = c0,n φ0,n (t) = c0,n φ1,2n + φ1,2n+1 / 2,
n=0 n=0

and the right-hand side clearly lies in V1 . Since also

φm−1,n (t) = 2(m−1)/2 φ(2m−1 t − n) = 2(m−1)/2 φ0,n (2m−1 t)


1
= 2(m−1)/2 √ (φ1,2n (2m−1 t) + φ1,2n+1 (2m−1 t))
2
1
= 2(m−1)/2 (φ(2m t − 2n) + φ(2m t − (2n + 1))) = √ (φm,2n (t) + φm,2n+1 (t)),
2
we also have that
1 1
φm−1,n = √ φm,2n + √ φm,2n+1 , (5.5)
2 2
so that also Vk ⊂ Vk+1 for any integer k ≥ 0.
Lemma 5.8. Resolution spaces are nested.
The spaces V0 , V1 , . . . , Vm , . . . are nested,

V0 ⊂ V1 ⊂ V2 ⊂ · · · ⊂ Vm · · · .

This means that it is meaningful to project Vk+1 onto Vk . The next step is to
characterize the projection from V1 onto V0 , and onto the orthogonal complement
of V0 in V1 . Before we do this, let us make the following definitions.
Definition 5.9. Detail spaces.
The orthogonal complement of Vm−1 in Vm is denoted Wm−1 . All the spaces
Wk are also called detail spaces, or error spaces.
The name detail space is used since the projection from Vm onto Vm−1 in
considered as a (low-resolution) approximation, and the error, which lies in
Wm−1 , is the detail which is left out when we replace with this approximation.
We will also write gm = gm−1 + em−1 when we split gm ∈ Vm into a sum of a
low-resolution approximation and a detail component. In the context of our
Google EarthTM example, in Figure 5.1 you should interpret g0 as the left image,
the middle image as an excerpt of g1 , and e0 as the additional details which are
needed to reproduce the middle image from the left image.
Since V0 and W0 are mutually orthogonal spaces they are also linearly
independent spaces. When U and V are two such linearly independent spaces,
we will write U ⊕ V for the vector space consisting of all vectors of the form
u + v, with u ∈ U , v ∈ V . U ⊕ V is also called the direct sum of U and V . This
also makes sense if we have more than two vector spaces (such as U ⊕ V ⊕ W ),
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES170

and the direct sum clearly obeys the associate law U ⊕ (V ⊕ W ) = (U ⊕ V ) ⊕ W .


Using the direct sum notation, we can first write

Vm = Vm−1 ⊕ Wm−1 . (5.6)


m m
Since Vm has dimension 2 N , it follows that also Wm has dimension 2 N . We
can continue the direct sum decomposition by also writing Vm−1 as a direct sum,
then Vm−2 as a direct sum, and so on, and end up with

Vm = V0 ⊕ W0 ⊕ W1 ⊕ · · · ⊕ Wm−1 , (5.7)
where the spaces on the right hand side have dimension N, N, 2N, . . . , 2m−1 N .
This decomposition wil be important for our purposes. It says that the resolution
space Vm acan be written as the sum of a lower order resolution space V0 , and
m detail spaces W0 , . . . , Wm−1 . We will later interpret this splitting into a
low-resolution component and m detail components.
It turns out that the following function will play the same role for the detail
space Wk as the function φ plays for the resolution space Vk .
Definition 5.10. The function ψ.
We define
 √
ψ(t) = φ1,0 (t) − φ1,1 (t) / 2 = φ(2t) − φ(2t − 1), (5.8)
and

ψm,n (t) = 2m/2 ψ(2m t − n), for n = 0, 1, . . . , 2m N − 1. (5.9)


The functions φ and ψ are shown in Figure 5.7.
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.0 1.0
1.50.5 0.0 0.5 1.0 1.5 1.50.5 0.0 0.5 1.0 1.5
Figure 5.7: The functions φ and ψ we used to analyse the space of piecewise
constant functions.

As in the proof for Equation (5.5), it follows that


1 1
ψm−1,n = √ φm,2n − √ φm,2n+1 , (5.10)
2 2
Clearly ψ is supported on [0, 1), and kψk = 1. From this it follows as for φ0
−1
that the {ψ0,n }N
n=0 are orthonormal. In the same way as for φm , it follows
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES171

m
also that the {ψm,n }2n=0N −1 is orthonormal for any m. We will write ψm for
m
the orthonormal basis {ψm,n }2n=0N −1 , and we will always denote the coordinates
in the basis ψm by wm,n . The next result motivates the definition of ψ, and
states how we can project from V1 onto V0 and W0 , i.e. find the low-resolution
approximation and the detail component of g1 ∈ V1 .
Lemma 5.11. Orthonormal bases.
For 0 ≤ n < N we have that

( √
φ0,n/2 / 2, if n is even;
projV0 (φ1,n ) = √ (5.11)
φ0,(n−1)/2 / 2, if n is odd.
( √
ψ0,n/2 / 2, if n is even;
projW0 (φ1,n ) = √ (5.12)
−ψ0,(n−1)/2 / 2, if n is odd.

In particular, ψ0 is an orthonormal basis for W0 . More generally, if g1 =


P2N −1
n=0 c1,n φ1,n ∈ V1 , then

N −1
X c1,2n + c1,2n+1
projV0 (g1 ) = c0,n φ0,n , where c0,n = √ (5.13)
n=0
2
N −1
X c1,2n − c1,2n+1
projW0 (g1 ) = w0,n ψ0,n , where w0,n = √ . (5.14)
n=0
2

Proof. We first observe that φ1,n (t) 6= 0 if and only if n/2 ≤ t < (n + 1)/2.
Suppose that n is even. Then the intersection
 
n n+1
, ∩ [n1 , n1 + 1) (5.15)
2 2
n
is nonempty only if n1 = 2. Using the orthogonal decomposition formula we get

N
X −1
projV0 (φ1,n ) = hφ1,n , φ0,k iφ0,k = hφ1,n , φ0,n1 iφ0,n1
k=0
Z (n+1)/2 √ 1
= 2 dt φ0,n/2 = √ φ0,n/2 .
n/2 2

Using this we also get

 
1 1 1 1
projW0 (φ1,n ) = φ1,n − √ φ0,n/2 = φ1,n − √ √ φ1,n + √ φ1,n+1
2 2 2 2
1 1 √
= φ1,n − φ1,n+1 = ψ0,n/2 / 2.
2 2
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES172

This proves the expressions for both projections when n is even. When n is
odd, the intersection (5.15) is nonempty only if n1 = (n − 1)/2, which gives the
expressions for both projections when n is odd in the same way. In particular
we get

 
φ0,(n−1)/2 1 1 1
projW0 (φ1,n ) = φ1,n − √ = φ1,n − √ √ φ1,n−1 + √ φ1,n
2 2 2 2
1 1 √
= φ1,n − φ1,n−1 = −ψ0,(n−1)/2 / 2.
2 2
ψ0 must be an orthonormal basis for W0 since ψ0 is contained in W0 , and both
have dimension N .
We project the function g1 in V1 using the formulas in (5.11). We first split
the sum into even and odd values of n,

2N
X −1 N
X −1 N
X −1
g1 = c1,n φ1,n = c1,2n φ1,2n + c1,2n+1 φ1,2n+1 . (5.16)
n=0 n=0 n=0

We can now apply the two formulas in (5.11),

−1 −1
N N
!
X X
projV0 (g1 ) = projV0 c1,2n φ1,2n + c1,2n+1 φ1,2n+1
n=0 n=0
N
X −1 N
X −1
= c1,2n projV0 (φ1,2n ) + c1,2n+1 projV0 (φ1,2n+1 )
n=0 n=0
N −1 N −1
X √ X √
= c1,2n φ0,n / 2 + c1,2n+1 φ0,n / 2
n=0 n=0
N −1
X c1,2n + c1,2n+1
= √ φ0,n
n=0
2

which proves Equation (5.13). Equation (5.14) is proved similarly.


In Figure 5.8 we have used Lemma 5.11 to plot the projections of φ1,0 ∈ V1
onto V0 and W0 . It is an interesting exercise to see from the plots why exactly
these functions should be least-squares approximations of φ1,n . It is also an
interesting exercise to prove the following from Lemma 5.11:
Proposition 5.12. Projections.
Let f (t) ∈ V1 , and let fn,1 be the value f attains on [n, n + 1/2), and fn,2
the value f attains on [n + 1/2, n + 1). Then projV0 (f ) is the function in V0
which equals (fn,1 + fn,2 )/2 on the interval [n, n + 1). Moreover, projW0 (f ) is
the function in W0 which is (fn,1 − fn,2 )/2 on [n, n + 1/2), and −(fn,1 − fn,2 )/2
on [n + 1/2, n + 1).
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES173

1.5 1.5
1.0 1.0
0.5 0.5
0.0 0.0
0.5 φ1,0 0.5 φ1,0
1.0 projV0 (φ1,0) 1.0 projW0 (φ1,0)
1.50.5 0.0 0.5 1.0 1.5 1.50.5 0.0 0.5 1.0 1.5
Figure 5.8: The projection of φ1,0 ∈ V1 onto V0 and W0 .

In other words, the projection on V0 is constructed by averaging on two


subintervals, while the projection on W0 is constructed by taking the difference
from the mean. This sounds like a reasonable candidate for the least-squares
approximations. In the exercise we generalize these observations.
In the same way as in Lemma 5.11, it is possible to show that

( √
ψm−1,n/2 / 2, if n is even;
projWm−1 (φm,n ) = √ (5.17)
−ψm−1,(n−1)/2 / 2, if n is odd.

From this it follows as before that ψm is an orthonormal basis for Wm . If {Bi }ni=1
are mutually independent bases, we will in the following write (B1 , B2 , . . . , Bn )
for the basis where the basis vectors from Bi are included before Bj when i < j.
With this notation, the decomposition in Equation (5.7) can be restated as
follows
Theorem 5.13. Bases for Vm .
φm and (φ0 , ψ0 , ψ1 , · · · , ψm−1 ) are both bases for Vm .
The function ψ thus has the property that its dilations and translations
together span the detail components. Later we will encounter other functions,
which also will be denoted by ψ, and have similar properties. In the theory of
wavelets, such ψ are called mother wavelets. There is one important property of
ψ, which we will return to:
Observation 5.14. Vanishing moment.
RN
We have that 0 ψ(t)dt = 0.
This can be seen directly from the plot in Figure 5.7, since the parts of
the graph above and below the x-axis
R cancel. In general we say that ψ has k
vanishing moments if the integrals tl ψ(t)dt = 0 for all 0 ≤ l ≤ k − 1. Due to
Observation 5.14, ψ has one vanishing moment. In Chapter 7 we will show that
mother wavelets with many vanishing moments are very desirable when it comes
to approximation of functions.
We now have all the tools needed to define the Discrete Wavelet Transform.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES174

Definition 5.15. Discrete Wavelet Transform.


The DWT (Discrete Wavelet Transform) is defined as the change of coordi-
nates from φ1 to (φ0 , ψ0 ). More generally, the m-level DWT is defined as the
change of coordinates from φm to (φ0 , ψ0 , ψ1 , · · · , ψm−1 ). In an m-level DWT,
the change of coordinates from

(φm−k+1 , ψm−k+1 , ψm−k+2 , · · · , ψm−1 ) to (φm−k , ψm−k , ψm−k+1 , · · · , ψm−1 )


(5.18)
is also called the k’th stage. The (m-level) IDWT (Inverse Discrete Wavelet
Transform) is defined as the change of coordinates the opposite way.
The DWT corresponds to replacing as many φ-functions as we can with
ψ-functions, i.e. replacing the original function with a sum of as much detail at
different resolutions as possible. We now can state the following result.
Theorem 5.16. Expression for the DWT.
If gm = gm−1 + em−1 with

2mX
N −1
gm = cm,n φm,n ∈ Vm ,
n=0

2m−1
X N −1 2m−1
X N −1
gm−1 = cm−1,n φm−1,n ∈ Vm−1 em−1 = wm−1,n ψm−1,n ∈ Wm−1 ,
n=0 n=0

then the change of coordinates from φm to (φm−1 , ψm−1 ) (i.e. first stage in a
DWT) is given by

   √ √  
cm−1,n 1/√2 1/ √2 cm,2n
= (5.19)
wm−1,n 1/ 2 −1/ 2 cm,2n+1

Conversely, the change of coordinates from (φm−1 , ψm−1 ) to φm (i.e. the last
stage in an IDWT) is given by

   √ √  
cm,2n 1/√2 1/ √2 cm−1,n
= (5.20)
cm,2n+1 1/ 2 −1/ 2 wm−1,n

Proof. Equations (5.5) and (5.10) say that

√ √ √ √
φm−1,n = φm,2n / 2 + φm,2n+1 / 2 ψm−1,n = φm,2n / 2 − φm,2n+1 / 2.
The change
 √ √ matrix from the basis {φm−1,n , ψm−1,n } to {φm,2n , φm,2n+1 }
of coordinate
1/√2 1/ √2
is thus . This proves Equation (5.20). Equation (5.19) follows
1/ 2 −1/ 2
immediately since this matrix equals its inverse.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES175

Above we assumed that N is even. In Exercise 5.8 we will see how we can
handle the case when N is odd.
From Theorem 5.16, we see that, if we had defined

Cm = {φm−1,0 , ψm−1,0 , φm−1,1 , ψm−1,1 , · · · , φm−1,2m−1 N −1 , ψm−1,2m−1 N −1 }.


(5.21)
i.e. we have reordered the basis vectors in (φm−1 , ψm−1 ) (the subscript m is used
since Cm is a basis for Vm ), it is apparent from Equation (5.20) that G = Pφm ←Cm
is the matrix where
!
√1 √1
2 2
√1 − √12
2

is repeated along the main diagonal 2m−1 N times. Also, from Equation (5.19) it
is apparent that H = PCm ←φm is the same matrix. Such matrices are called block
diagonal matrices. This particular block diagonal matrix is clearly orthogonal.
Let us make the following definition.
Definition 5.17. DWT and IDWT kernel transformations.
The matrices H = PCm ←φm and G = Pφm ←Cm are called the DWT and
IDWT kernel transformations. The DWT and the IDWT can be expressed in
terms of these kernel transformations by

DWT = P(φm−1 ,ψm−1 )←Cm H and IDWT = GPCm ←(φm−1 ,ψm−1 ) ,

respectively, where

• P(φm−1 ,ψm−1 )←Cm is a permutation matrix which groups the even elements
first, then the odd elements,
• PCm ←(φm−1 ,ψm−1 ) is a permutation matrix which places the first half at
the even indices, the last half at the odd indices.

Clearly, the kernel transformations H and G also invert each other. The point
of using the kernel transformation is that they compute the output sequentially,
similarly to how a filter does. Clearly also the kernel transformations are very
similar to a filter, and we will return to this in the next chapter.
At each level in a DWT, Vk is split into one low-resolution component from
Vk−1 , and one detail component from Wk−1 . We have illustrated this in figure 5.9,
where the arrows represent changes of coordinates.
The detail component from Wk−1 is not subject to further transformation.
This is seen in the figure since ψk−1 is a leaf node, i.e. there are no arrows going
out from ψm−1 . In a similar illustration for the IDWT, the arrows would go the
opposite way.
The Discrete Wavelet Transform is the analogue in a wavelet setting to the
Discrete Fourier transform. When applying the DFT to a vector of length N ,
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES176

φm / φm−1 / φm−2 / ··· / φ1 / φ0

" # #
ψ m−1 ψ m−2 ψ m−3 ψ0

Figure 5.9: Illustration of a wavelet transform.

one starts by viewing this vector as coordinates relative to the standard basis.
When applying the DWT to a vector of length N , one instead views the vector
as coordinates relative to the basis φm . This makes sense in light of Exercise 5.1.

Exercise 5.1: The vector of samples is the coordinate vector


Show that the coordinate vector for f ∈ V0 in the basis {φ0,0 , φ0,1 , . . . , φ0,N −1 }
is (f (0), f (1), . . . .f (N − 1)). This shows that, for f ∈ Vm , there is no loss of
information in working with the samples of f rather than f itself.

Exercise 5.2: Proposition 5.12


Prove Proposition 5.12.

Exercise 5.3: Computing projections 1


In this exercise we will consider the two projections from V1 onto V0 and W0 .
a) Consider the projection projV0 of V1 onto V0 . Use Lemma 5.11 to write down
the matrix for projV0 relative to the bases φ1 and φ0 .
b) Similarly, use Lemma 5.11 to write down the matrix for projW0 : V1 → W0
relative to the bases φ1 and ψ0 .

Exercise 5.4: Computing projections 2


Consider again the projection projV0 of V1 onto V0 .
a) Explain why projV0 (φ) = φ and projV0 (ψ) = 0.
b) Show that the matrix of projV0 relative to (φ0 , ψ0 ) is given by the diagonal
matrix where the first half of the entries on the diagonal are 1, the second half 0.
c) Show in a similar way that the projection of V1 onto W0 has a matrix relative
to (φ0 , ψ0 ) given by the diagonal matrix where the first half of the entries on
the diagonal are 0, the second half 1.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES177

Exercise 5.5: Computing projections 3


Show that
N
X −1 Z n+1 
projV0 (f ) = f (t)dt φ0,n (t) (5.22)
n=0 n

for any f . Show also that the first part of Proposition 5.12 follows from this.

Exercise 5.6: Finding the least squares error


Show that

X Z n+1  X Z n+1 2
2
k f (t)dt φ0,n (t) − f k = hf, f i − f (t)dt .
n n n n

This, together with the previous exercise, gives us an expression for the least-
squares error for f from V0 (at least after taking square roots). 2DO: Generalize
to m

Exercise 5.7: Projecting on W0


Show that

−1
N
!
X Z n+1/2 Z n+1
projW0 (f ) = f (t)dt − f (t)dt ψ0,n (t) (5.23)
n=0 n n+1/2

for any f . Show also that the second part of Proposition 5.12 follows from this.

Exercise 5.8: When N is odd


When N is odd, the (first stage in a) DWT is defined as the change of coordinates
from (φ1,0 , φ1,1 , . . . , φ1,N −1 ) to

(φ0,0 , ψ0,0 , φ0,1 , ψ0,1 , . . . , φ0,(N −1)/2 , ψ(N −1)/2 , φ0,(N +1)/2 ).
Since all functions are assumed to have period N , we have that

1 1
φ0,(N +1)/2 = √ (φ1,N −1 + φ1,N ) = √ (φ1,0 + φ1,N −1 ).
2 2
From this relation one can find the last column in the change of coordinate
matrix from φ0 to (φ1 , ψ1 ), i.e. the IDWT matrix. In particular, when N is
odd, we see that the last column in the IDWT matrix circulates to the upper
right corner. In terms of coordinates, we thus have that

1 1
c1,0 = √ (c0,0 + w0,0 + c0,(N +1)/2 ) c1,N −1 = √ c0,(N +1)/2 . (5.24)
2 2
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES178

a) If N = 3, the DWT matrix equals


 
1 1 1
1
√ 1 −1 0 ,
2 0 0 1
and the inverse of this is
 
1 1 −1
1 
√ 1 −1 −1 .
2 0 0 2
Explain from this that, when N is odd, the DWT matrix can be constructed by
adding a column on the form √12 (−1, −1, 0, . . . , 0, 2) to the DWT matrices we
had for N even (in the last row zeros are also added). In terms of the coordinates,
we thus have the additional formulas

1
c0,0 = √ (c1,0+c1,1 −c1,N −1 )
2
1
w0,0 = √ (c1,0−c1,1 −c1,N −1 )
2
1
c0,(N +1)/2 = √ 2c1,N −1 . (5.25)
2
b) Explain that the DWT matrix is orthogonal if and only if N is even. Also
explain that it is only the last column which spoils the orthogonality.

5.3 Implementation of the DWT and examples


The DWT is straightforward to implement: Simply iterate Equation (5.19)
for m, m − 1, . . . , 1. For each iteration we will use a kernel function which
takes as input the coordinates (cm,0 , cm,1 , . . .), and returns the coordinates
(cm−1,0 , wm−1,0 , cm−1,1 , wm−1,1 , . . .), i.e. computes one stage of the DWT, but
with a different order of the coordinates than in the basis (φm , ψm ). This turns
out to be the natural ordering for computing the DWT in-place. As an example,
the kernel function for the Haar wavelet can be implemented as follows (for
simplicity this first version of the code assumes that N is even):

def dwt_kernel_haar(x, bd_mode):


x /= sqrt(2)
for k in range(2,len(x) - 1,2):
a, b = x[k] + x[k+1], x[k] - x[k+1]
x[k], x[k+1] = a, b

The code above accepts two-dimensional data, just as our function FFTImpl.
Thus, the function may be applied simultaneously to all channels in a sound.
The reason for using a general kernel function will be apparent later, when we
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES179

change to different types of wavelets. It is not meant that you call this kernel
function directly. Instead every time you apply the DWT call the function

DWTImpl(x, m, wave_name, bd_mode=’symm’, dual=False, transpose=False)

x is the input to the DWT, and m is the number of levels. The three last
parameters will be addressed later in the book (the bd_mode-parameter addresses
how the boundary should be handled). The function also sets meaningful default
values for the three last parameters, so that you mostly only need to provide the
three first parameters.
We will later construct other wavelets, and we will distinguish them by
using different names. This is the purposes of the wave_name parameter. This
parameter is sent to a function called find_kernel which looks up a kernel
function by name (find_kernel also uses the dual and transpose parameters
to take a decision on which kernel to choose). The Haar wavelet is identified
with the name "Haar". When this is input to DWTImpl, find_kernel returns
the dwt_kernel_haar kernel. The kernel is then used as input to the following
function:

def DWTImpl_internal(x, m, f, bd_mode):


for res in range(m):
f(x[0::2**res], bd_mode)
reorganize_coeffs_forward(x, m)

The code is applied to all columns if the data is two-dimensional, and we see
that the kernel function is invoked one time for each resolution. To reorder
coordinates in the same order as (φm , ψm ), note that the coordinates from φm
above end up at indices k2m , where m represents the current stage, and k runs
through the indices. The function reorganize_coeffs_forward uses this to
reorder the coordinates (you will be spared the details in this implementation).
Although the DWT requires this reorganization, this may not be required in
practice. In Exercise 5.27 we go through some aspects of this implementation.
The implementation is not recursive, as the for-loop runs through the different
stages.
In this implementation, note that the first levels require the most operations,
since the latter levels leave an increasing part of the coordinates unchanged. Note
also that the change of coordinates matrix is a very sparse matrix: At each level
a coordinate can be computed from only two of the other coordinates, so that
this matrix has only two nonzero elements in each row/column. The algorithm
clearly shows that there is no need to perform a full matrix multiplication to
perform the change of coordinates.
There is a similar setup for the IDWT:

IDWTImpl(x, m, wave_name, bd_mode=’symm’, dual=False, transpose=False)

If the wave_name-parameter is set to "Haar", also this function will use the
find_kernel function to look up another kernel function, idwt_kernel_haar
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES180

(when N is even, this uses the exact same code as dwt_kernel_haar. For N is
odd, see Exercises 5.8 and 5.26). This is then sent as input to

def IDWTImpl_internal(x, m, f, bd_mode):


reorganize_coeffs_reverse(x, m)
for res in range(m - 1, -1, -1):
f(x[0::2**res], bd_mode)

Here the steps are simply performed in the reverse order, and by iterating
Equation (5.20).
In the next sections we will consider other cases where the underlying function
φ may be something else, and not necessarily piecewise constant. It will turn
out that much of the analysis we have done makes sense for other functions φ as
well, giving rise to other structures which we also will refer to as wavelets. The
wavelet resulting from piecewise constant functions is thus simply one example
out of many, and it is commonly referred to as the Haar wavelet. Let us round
off this section with some important examples.

Example 5.9: Computing the DWT by hand


In some cases, the DWT can be computed by hand, keeping in mind its definition
as a change of coordinates. As an example, consider the simple vector x of
length 210 = 1024 defined by
(
1 for n < 512
xn =
0 for n ≥ 512,
and let us compute the 10-level DWT of this vector by first visualizing the
function with these coordinates. Since m = 10 here, we shouldPview x as
511
coordinates in the basis φ10 of a function f (t) ∈ V10 . This is f (t) = n=0 φ10,n ,
and since φ10,n is supported on [2−10 n, 2−10 (n + 1)), the support of f has width
512 × 2−10 = 1/2 (512 translates, each with width 2−10 ). Moreover, since φ10,n
is 210/2 = 25 = 32 on [2−10 n, 2−10 (n + 1)) and 0 elsewhere, it is clear that
(
32 for 0 ≤ t < 1/2
f (t) =
0 for 0t ≥ 1/2.
This is by definition a function in V1 : f must in fact be a multiplum of φ1,0 , since
this also is supported on [0, 1/2). We can thus write f (t) = cφ1,0 (t) for some
c. We can find c by setting t = √ 0. This gives that 32 = 21/2 c (since f (0) = 32,
1/2 32
φ1,0 (0) = 2 ), so that c = 32/ 2. This means that f (t) = √ φ (t), f is in
2 1,0

V1 , and with coordinates (32/ 2, 0, . . . , 0) in φ1 .
When we run a 10-level DWT we make a change of coordinates from φ10 to
(φ0 , ψ0 , · · · , ψ9 ). The
√ first 9 levels give us the coordinates in (φ1 , ψ1 , ψ2 , . . . , ψ9 ),
and these are (32/ 2, 0, . . . , 0) from what we showed. It remains thus only to
perform the last level in the DWT, i.e. perform the change of coordinates from
φ1 to (φ0 , ψ0 ). Since φ1,0 = √12 (φ0,0 + ψ0,0 ), so that we get
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES181

32 32 1
f (t) = √ φ1,0 (t) = √ √ (φ0,0 + ψ0,0 ) = 16φ0,0 + 16ψ0,0 .
2 2 2
From this we see that the coordinate vector of f in (φ0 , ψ0 , · · · , ψ9 ), i.e. the
10-level DWT of x, is (16, 16, 0, 0, . . . , 0). Note that here V0 and W0 are both
1-dimensional, since V10 was assumed to be of dimension 210 (in particular,
N = 1).
It is straightforward to verify what we found using the algorithm above:

x = hstack([ones(512), zeros(512)])
DWTImpl(x, 10, ’Haar’)
print x

The reason why the method from this example worked was that the vector we
started with had a simple representation in the wavelet basis, actually it equaled
the coordinates of a basis function in φ1 . Usually this is not the case, and our
only possibility then is to run the DWT on a computer.

Example 5.10: DWT on sound


Let us plot the samples of our audio sample file, and compare them with the
first order DWT. Both are shown in Figure 5.10.
1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.00 20000400006000080000100000
120000 1.00 20000400006000080000100000
120000
Figure 5.10: The 217 first sound samples (left) and the DWT coefficients (right)
of the sound castanets.wav.

The first part of the DWT plot represents the low resolution part, the second
the detail.
Since φ(2m t − n) ∈ Vm oscillates more quickly than φ(t − n) ∈ V0 , one is lead
to believe that coefficients from lower order resolution spaces correspond to lower
frequencies. The functions φm,n do not correspond to pure tones in the setting
of wavelets, however, but let us nevertheless listen to sound from the different
resolution spaces. The code base includes a function forw_comp_rev_DWT which
runs an m-level DWT on the first samples of the audio sample file, extracts the
detail or the low-resolution approximation, and runs an IDWT to reconstruct
the sound. Since the returned values may lie outside the legal range [−1, 1], the
values are normalized at the end.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES182

To listen to the low-resolution approximation, write

x, fs = forw_comp_rev_DWT(m, ’Haar’)
play(x, fs)

It is instructive to run this code for different values of m. For m = 2 we clearly


hear a degradation in the sound. For m = 4 and above most of the sound is
unrecognizable, since too much of the detail is omitted. To be more precise,
when listening to the sound by throwing away detail from W0 , W1 ,...,Wm−1 , we
are left with a 2−m share of the data.
Let us also consider the detail. For m = 1 this can be played as follows

x, fs = forw_comp_rev_DWT(1, ’Haar’, 0)
play(x, fs)

We see that the detail is quite significant, so that the first order wavelet approxi-
mation does not give a very good approximation. For m = 2 the detail can be
played as follows

x, fs = forw_comp_rev_DWT(2, ’Haar’, 0)
play(x, fs)

1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.00 20000400006000080000100000
120000 1.00 20000400006000080000100000
120000
Figure 5.11: The detail in our audio sample file, for m = 1 (left) and m = 2
(right).

The errors are shown in Figure 5.11. The error is larger when two levels of
the DWT are performed, as one would suspect. It is also seen that the error
is larger in the part of the file where there are bigger variations. Since more
and more information is contained in the detail components as we increase m,
we here see the opposite effect: The sound gradually improves in quality as we
increase m.
The previous example illustrates that wavelets as well may be used to perform
operations on sound. As we will see later, however, our main application for
wavelets will be images, where they have found a more important role than
for sound. Images typically display variations which are less abrupt than the
ones found in sound. Just as the functions above had smaller errors in the
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES183

corresponding resolution spaces than the sound had, images are thus more suited
for for use with wavelets. The main idea behind why wavelets are so useful
comes from the fact that the detail, i.e., wavelet coefficients corresponding to the
spaces Wk , are often very small. After a DWT one is therefore often left with a
couple of significant coefficients, while most of the coefficients are small. The
approximation from V0 can be viewed as a good approximation, even though
it contains much less information. This gives another reason why wavelets
are popular for images: Detailed images can be very large, but when they are
downloaded to a web browser, the browser can very early show a low-resolution of
the image, while waiting for the rest of the details in the image to be downloaded.
When we later look at how wavelets are applied to images, we will need to handle
one final hurdle, namely that images are two-dimensional.

Example 5.11: DWT on the samples of a mathematical


function
Above we plotted the DWT coefficients of a sound, as well as the detail/error.
We can also experiment with samples generated from a mathematical function.
Figure 5.12 plots the error for different functions, with N = 1024.
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.00 200 400 600 800 1000 0.00 200 400 600 800 1000
1.0
0.8
0.6
0.4
0.2
0.00 200 400 600 800 1000
Figure 5.12: The error (i.e. the contribution from W0 ⊕ W1 ⊕ · · · ⊕ Wm−1 ) for
N = 1024 when f is a square wave, the linear function f (t) = 1 − 2|1/2 − t/N |,
and the trigonometric function f (t) = 1/2 + cos(2πt/N )/2, respectively. The
detail is indicated for m = 6 and m = 8.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES184

In these cases, we see that we require large m before the detail/error becomes
significant. We see also that there is no error for the square wave. The reason
is that the square wave is a piecewise constant function, so that it can be
represented exactly by the φ-functions. For the other functions, however, this is
not the case, so we here get an error.

Example 5.12: Computing the wavelet coefficients


For the functions we plotted in the previous example we used the functions
DWTImpl, IDWTImpl to plot the error, but it is also possible to compute the
wavelet coefficients wm,n exactly. You will be asked to do this in exercises 5.23
and 5.24. To exemplify the general procedure for this, consider the function
f (t) = 1 − t/N . This decreases linearly from 1 to 0 on [0, N ], so that it is not
piecewise constant, and does not lie in any of the spaces Vm . We can instead
consider projVm f ∈ Vm , and apply the DWT to this. Let us compute the ψm -
coordinates wm,n of projVm f in the orthonormal basis (φ0 , ψ0 , ψ1 , . . . , ψm−1 ).
The orthogonal decomposition theorem says that

Z N Z N
wm,n = hf, ψm,n i = f (t)ψm,n (t)dt = (1 − t/N )ψm,n (t)dt.
0 0

Using the definition of ψm,n we see that this can also be written as

!
Z N Z N Z N
m/2 m m/2 m t m
2 (1 − t/N )ψ(2 t − n)dt = 2 ψ(2 t − n)dt − ψ(2 t − n)dt .
0 0 0 N
RN
Using Observation 5.14 we get that 0 ψ(2m t − n)dt = 0, so that the first term
above vanishes. Moreover, ψm,n is nonzero only on [2−m n, 2−m (n + 1)), and is 1
on [2−m n, 2−m (n + 1/2)), and −1 on [2−m (n + 1/2), 2−m (n + 1)). We therefore
get

2−m (n+1/2) Z 2−m (n+1)


Z !
m/2 t t
wm,n = −2 dt − dt
2−m n N 2−m (n+1/2) N
 2 2−m (n+1/2)  2 2−m (n+1) !
m/2 t t
= −2 −
2N 2−m n 2N 2−m (n+1/2)
 −2m
2−2m n2
  −2m
(n + 1/2)2 (n + 1)2 2−2m (n + 1/2)2

2 2
= −2m/2 − − −
2N 2N 2N 2N
 −2m 2 −2m 2 −2m 2

2 n 2 (n + 1/2) 2 (n + 1)
= −2m/2 − + −
2N N 2N
−3m/2
2 1
−n2 + 2(n + 1/2)2 − (n + 1)2 =

=− .
2N N 22+3m/2
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES185

We see in particular that wm,n → 0 when m → ∞. Also, all coordinates were


equal, i.e. wm,0 = wm,1 = wm,2 = · · · . It is not too hard to convince oneself
that this equality has to do with the fact that f is linear. We see also that
there were a lot of computations even in this very simple example. For most
functions we therefore usually do not compute wm,n symbolically, but instead
run implementations like DWTImpl, IDWTImpl on a computer.

Exercise 5.13: Implement IDWT for The Haar wavelet


Write a function idwt_kernel_haar which uses the formulas (5.24) to implement
the IDWT, similarly to how the function dwt_kernel_haar implemented the
DWT using the formulas (5.25).

Exercise 5.14: Computing projections 4


Generalize Exercise 5.4 to the projections from Vm+1 onto Vm and Wm .

Exercise 5.15: Scaling a function


Show that f (t) ∈ Vm if and only if g(t) = f (2t) ∈ Vm+1 .

Exercise 5.16: Direct sums


Let C1 , C2 . . . , Cn be independent vector spaces, and let Ti : Ci → Ci be linear
transformations. The direct sum of T1 , T2 ,. . . ,Tn , written as T1 ⊕ T2 ⊕ . . . ⊕ Tn ,
denotes the linear transformation from C1 ⊕ C2 ⊕ · · · ⊕ Cn to itself defined by

T1 ⊕ T2 ⊕ . . . ⊕ Tn (c1 + c2 + · · · + cn ) = T1 (c1 ) + T2 (c2 ) + · · · + Tn (cn )

when c1 ∈ C1 , c2 ∈ C2 , . . . , cn ∈ Cn . Similarly, when A1 , A2 , . . . , An are square


matrices, A1 ⊕ A2 ⊕ · · · ⊕ An is defined as the block matrix where the blocks
along the diagonal are A1 , A2 , . . . , An , and where all other blocks are 0. Show
that, if Bi is a basis for Ci then

[T1 ⊕ T2 ⊕ . . . ⊕ Tn ](B1 ,B2 ,...,Bn ) = [T1 ]B1 ⊕ [T2 ]B2 ⊕ · · · ⊕ [Tn ]Bn ,

Here two new concepts are used: a direct sum of matrices, and a direct sum of
linear transformations.

Exercise 5.17: Eigenvectors of direct sums


Assume that T1 and T2 are matrices, and that the eigenvalues of T1 are equal
to those of T2 . What are the eigenvalues of T1 ⊕ T2 ? Can you express the
eigenvectors of T1 ⊕ T2 in terms of those of T1 and T2 ?
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES186

Exercise 5.18: Invertibility of direct sums


Assume that A and B are square matrices which are invertible. Show that A ⊕ B
is invertible, and that (A ⊕ B)−1 = A−1 ⊕ B −1 .

Exercise 5.19: Multiplying direct sums


Let A, B, C, D be square matrices of the same dimensions. Show that (A ⊕
B)(C ⊕ D) = (AC) ⊕ (BD).

Exercise 5.20: Finding N


Assume that you run an m-level DWT on a vector of length r. What value of
N does this correspond to? Note that an m-level DWT performs a change of
coordinates from φm to (φ0 , ψ0 , ψ1 , . . . , ψm−2 , ψm−1 ).

Exercise 5.21: Different DWTs for similar vectors


In Figure 5.13 we have plotted the DWT’s of two vectors x1 and x2 . In both
vectors we have 16 ones followed by 16 zeros, and this pattern repeats cyclically
so that the length of both vectors is 256. The only difference is that the second
vector is obtained by delaying the first vector with one element.

1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 50 100 150 200 250 0.0 50 100 150 200 250
Figure 5.13: 2 vectors x1 and x2 which seem equal, but where the DWT’s are
very different.

You see that the two DWT’s are very different: For the first vector we see
that there is much detail present (the second part of the plot), while for the
second vector there is no detail present. Attempt to explain why this is the case.
Based on your answer, also attempt to explain what can happen if you change
the point of discontinuity for the piecewise constant function in the left part of
Figure 5.11 to something else.

Exercise 5.22: Construct a sound


Attempt to construct a (nonzero) sound where the low resolution approximations
equal the sound itself for m = 1, m = 2.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES187

Exercise 5.23: Exact computation of wavelet coefficients 1


Compute the wavelet detail coefficients analytically for the functions in Exam-
RN
ple 5.11, i.e. compute the quantities wm,n = 0 f (t)ψm,n (t)dt similarly to how
this was done in Example 5.12.

Exercise 5.24: Exact compution of wavelet coefficients 2


k
Compute the wavelet detail coefficients analytically for the functions f (t) = Nt ,
RN k
i.e. compute the quantities wm,n = 0 Nt ψm,n (t)dt similarly to how this was
done in Example 5.12. How do these compare with the coefficients from the
Exercise 5.23?

Exercise 5.25: Computing the DWT of a simple vector


Suppose that we have the vector x with length 210 = 1024, defined by xn = 1
for n even, xn = −1 for n odd. What will be the result if you run a 10-level
DWT on x? Use the function DWTImpl to verify what you have found.

Hint. We defined ψ by ψ(t) = (φ1,0 (t) √− φ1,1 (t))/ 2. From this connection
√ it
follows that ψ9,n = (φ10,2n − φ10,2n+1 )/ 2, and thus φ10,2n − φ10,2n+1 = 2ψ9,n .
Try to couple this identity with the alternating sign you see in x.

Exercise 5.26: The Haar wavelet when N is odd


Use the results from Exercise 5.8 to rewrite the implementations dwt_kernel_haar
and idwt_kernel_haar so that they also work in the case when N is odd.

Exercise 5.27: in-place DWT


Show that the coordinates in φm after an in-place m-level DWT end up at
indices k2m , k = 0, 1, 2, . . .. Show similarly that the coordinates in ψm after an
in-place m-level DWT end up at indices 2m−1 + k2m , k = 0, 1, 2, . . .. Find these
indices in the code for the function reorganize_coefficients.

5.4 A wavelet based on piecewise linear func-


tions
Unfortutately, piecewise constant functions are too simple to provide good
approximations. In this section we are going to extend the construction of
wavelets to piecewise linear functions. The advantage is that piecewise linear
functions are better for approximating smooth functions and data than piecewise
constants, which should translate into smaller components (errors) in the detail
spaces in many practical situations. As an example, this would be useful if we
are interested in compression. In this new setting it turns out that we loose the
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES188

orthonormality we had for the Haar wavelet. On the other hand, we will see
that the new scaling functions and mother wavelets are symmetric functions.
We will later see that this implies that the corresponding DWT and IDWT have
simple implementations with higher precision. Our experience from deriving
Haar wavelets will guide us in the construction of piecewise linear wavelets. The
first task is to define the new resolution spaces.

Definition 5.18. Resolution spaces of piecewise linear functions.


The space Vm is the subspace of continuous functions on R which are periodic
with period N , and linear on each subinterval of the form [n2−m , (n + 1)2−m ).

10 1.0
8 0.8
6 0.6
4 0.4
2 0.2
00 2 4 6 8 10 0.0 1 0 1 2 3 4
Figure 5.14: A piecewise linear function and the two functions φ(t) and φ(t − 3).

m
Any f ∈ Vm is uniquely determined by its values in the points {2−m n}2n=0N −1 .
The linear mapping which sends f to these samples is thus an isomorphism from
m
Vm onto RN 2 , so that the dimension of Vm is N2m . The lft plot in Figure 5.14
shows an example of a piecewise linear function in V0 on the interval [0, 10]. We
note that a piecewise linear function in V0 is completely determined by its value
at the integers, so the functions that are 1 at one integer and 0 at all others are
particularly simple and therefore interesting, see the right plot in Figure 5.14.
These simple functions are all translates of each other and can therefore be built
from one scaling function, as is required for a multiresolution analysis.
Lemma 5.19. The function φ.
Let the function φ be defined by
(
1 − |t|, if −1 ≤ t ≤ 1;
φ(t) = (5.26)
0, otherwise;
and for any m ≥ 0 set

φm,n (t) = 2m/2 φ(2m t − n) for n = 0, 1, . . . , 2m N − 1,


m
2 N −1
and φm = {φm,n }n=0 . φm is a basis for Vm , and φ0,n (t) is the function in V0
with smallest support that is nonzero at t = n.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES189

Proof. It is clear that φm,n ∈ Vm , and

φm,n0 (n2−m ) = 2m/2 φ(2m (2−m n) − n0 ) = 2m/2 φ(n − n0 ).


Since φ is zero at all nonzero integers, and φ(0) = 1, we see that φm,n0 (n2−m ) =
m
2m/2 when n0 = n, and 0 if n0 6= n. Let Lm : Vm → RN 2 be the isomorphism m
mentioned above which sends f ∈ Vm to the samples in the points {2−m n}2n=0N −1 .
Our calculation shows that Lm (φm,n ) = 2m/2 en . Since Lm is an isomorphism it
m
follows that φm = {φm,n }2n=0N −1 is a basis for Vm .
Suppose that the function g ∈ V0 has smaller support than φ0,n , but is
nonzero at t = n. We must have that L0 (g) = cen for some c, since g is zero on
the integers different from n. But then g is a multiple of φ0,n , so that it is the
function in V0 with smallest support that is nonzero at t = n.
The function φ and its translates and dilates are often referred to as hat
functions for obvious reasons. Note that the new function φ is nonzero for small
negative x-values, contrary to the φ we defined in Chapter 5. If we plotted the
function on [0, N ), we would see the nonzero parts at the beginning and end of
this interval, due to the period N , but we will mostly plot on an interval around
zero, since such an interval captures the entire support of the function. Also for
the piecewise linear wavelet the coordinates of a basis function is given by the
samples:
Lemma 5.20. Writing in terms of the samples.
A function f ∈ Vm may be written as
2mX
N −1
f (t) = f (n/2m )2−m/2 φm,n (t). (5.27)
n=0

An essential property also here is that the spaces are nested.


Lemma 5.21. Resolution spaces are nested.
The piecewise linear resolution spaces are nested,
V0 ⊂ V1 ⊂ · · · ⊂ Vm ⊂ · · · .
Proof. We only need to prove that V0 ⊂ V1 since the other inclusions are similar.
But this is immediate since any function in V0 is continuous, and linear on any
subinterval in the form [n/2, (n + 1)/2).
In the piecewise constant case, we saw in Lemma 5.3 that the scaling functions
were automatically orthogonal since their supports did not overlap. This is not
the case in the linear case, but we could orthogonalise the basis φm with the
Gram-Schmidt process from linear algebra. The disadvantage is that we lose the
nice local behaviour of the scaling functions and end up with basis functions
that are nonzero over all of [0, N ]. And for most applications, orthogonality is
not essential; we just need a basis. The next step in the derivation of wavelets is
to find formulas that let us express a function given in the basis φ0 for V0 in
terms of the basis φ1 for V1 .
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES190

Lemma 5.22. The two-scale equation.


The functions φ0,n satisfy the relation
 
1 1 1
φ0,n = √ φ1,2n−1 + φ1,2n + φ1,2n+1 . (5.28)
2 2 2

1.2
1.0
0.8
0.6
0.4
0.2
0.0
0.2 1.0 0.5 0.0 0.5 1.0
Figure 5.15: How φ(t) can be decomposed as a linear combination of φ1,−1 ,
φ1,0 , and φ1,1 .

Proof. Since φ0,n is in V0 it may be expressed in the basis φ1 with formula


(5.27),
2N
X −1
φ0,n (t) = 2−1/2 φ0,n (k/2)φ1,k (t).
k=0

The relation (5.28) now follows since


 
φ0,n (2n − 1)/2 = φ0,n (2n + 1)/2 = 1/2, φ0,n (2n/2) = 1,

and φ0,n (k/2) = 0 for all other values of k.


The relationship given by Equation (5.28) is shown in Figure 5.15.

5.4.1 Detail spaces and wavelets


The next step in our derivation of wavelets for piecewise linear functions is
the definition of the detail spaces. We need to determine a space W0 that is
linearly independent from V0 , and so that V1 = V0 ⊕ W0 . In the case of piecewise
constant functions we started with a function g1 in V1 , computed the least
squares approximation g0 in V0 , and then defined the error function e0 = g1 − g0 ,
with e0 ∈ W0 and W0 as the orthogonal complement of V0 in V1 .
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES191

It turns out that this strategy is less appealing in the case of piecewise linear
functions. The reason is that the functions φ0,n are not orthogonal anymore
(see Exercise 5.32). Due to this we have no simple, orthogonal basis for the
set of piecewise linear functions, so that the orthogonal decomposition theorem
fails to give us the projection onto V0 in a simple way. It is therefore no reason
to use the orthogonal complement of V0 in V1 as our error space, since it is
hard to write a piecewise linear function as a sum of two other piecewise linear
functions which are orthogonal. Instead of using projections to find low-resolution
approximations, and orthogonal complements to find error functions, we will
attempt the following simple approximation method:
Definition 5.23. Alternative projection.
Let g1 be a function in V1 given by
2N
X −1
g1 = c1,n φ1,n . (5.29)
n=0
The approximation g0 = P (g1 ) in V0 is defined as the unique function in V0
which has the same values as g1 at the integers, i.e.

g0 (n) = g1 (n), n = 0, 1, . . . , N − 1. (5.30)


It is easy to show that P (g1 ) actually is different from the projection of g1
onto V0 : If g1 = φ1,1 , then g1 is zero at the integers, and then clearly P (g1 ) = 0.
But in Exercise 5.31 you will be asked to compute the projection onto V0 using
different means than the orthogonal decomposition theorem, and the result will
be seen to be nonzero. It is also very easy to see that the coordinates of g0 in φ0
can be obtained by dropping every second coordinate of g0 in φ1 . To be more
precise, the following holds:
Lemma 5.24. Expression for the alternative projection.
We have that
(√
2φ0,n/2 , if n is an even integer;
P (φ1,n ) =
0, otherwise.
Once this approximation method is determined, it is straightforward to
determine the detail space as the space of error functions.
Lemma 5.25. Resolution spaces.
Define

W0 = {f ∈ V1 | f (n) = 0, for n = 0, 1, . . . , N − 1},


and

1
ψ(t) = √ φ1,1 (t) ψm,n (t) = 2m/2 ψ(2m t − n). (5.31)
2
Suppose that g1 ∈ V1 and that g0 = P (g1 ). Then
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES192

• the error e0 = g1 − g0 lies in W0 ,


−1
• ψ0 = {ψ0,n }N
n=0 is a basis for W0 .

• V0 and W0 are linearly independent, and V1 = V0 ⊕ W0 .

Proof. Since g0 (n) = g1 (n) for all integers n, e0 (n) = (g1 − g0 )(n) = 0, so that
e0 ∈ W0 . This proves the first statement.
For the second statement, note first that

1
ψ0,n (t) = ψ(t − n) = √ φ1,1 (t − n) = φ(2(t − n) − 1)
2
1
= φ(2t − (2n + 1)) = √ φ1,2n+1 (t). (5.32)
2
ψ0 is thus a linearly independent set of dimension N , since it corresponds to a
subset of φ1 . Since φ1,2n+1 is nonzero only on (n, n + 1), it follows that all of ψ0
lies in W0 . Clearly then ψ0 is also a basis for W0 , since W0 also has dimension
N (its image under L1 consists of points where every second component is zero).
Consider finally a linear combination from φ0 and ψ0 which gives zero:
N
X −1 N
X −1
an φ0,n + bn ψ0,n = 0.
n=0 n=0

If we evaluate this at t = k, we see that ψ0,n (k) = 0, φ0,n (k) = 0 when n 6= k,


and φ0,k (k) = 1. When we evaluate at k we thus get ak , which must be zero. If
we then evaluate at t = k + 1/2 we get in a similar way that all bn = 0, and it
follows that V0 and W0 are linearly independent. That V1 = V0 ⊕ W0 follows
from the fact that V1 has dimension 2N , and V0 and W0 both have dimension
N.
We can define Wm in a similar way for m > 0, and generalize the lemma
to Wm . We can thus state the following analog to Theorem 5.16 for writing
gm ∈ Vm as a sum of a low-resolution approximation gm−1 ∈ Vm−1 , and a
detail/error component em−1 ∈ Wm−1 .
Theorem 5.26. Decomposing Vm .
The space Vm can be decomposed as the direct sum Vm = Vm−1 ⊕ Wm−1
where

Wm−1 = {f ∈ Vm | f (n/2m−1 ) = 0, for n = 0, 1, . . . , 2m−1 N − 1}.


m
Wm has the base ψm = {ψm,n }2n=0N −1 , and Vm has the two bases

m m−1 m−1
φm = {φm,n }2n=0N −1 , and (φm−1 , ψm−1 ) = {φm−1,n }2n=0 N −1 2 N −1

, {ψm−1,n }n=0 .
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES193

With this result we can define the DWT and the IDWT with their stages
as before, but the matrices themmselves are now different. For the IDWT
(i.e. Pφ1 ←(φ0 ,ψ0 ) ), the columns in the matrix can be found from equations (5.28)
and (5.32), i.e.

 
1 1 1
φ0,n = √ φ1,2n−1 + φ1,2n + φ1,2n+1
2 2 2
1
ψ0,n = √ φ1,2n+1 . (5.33)
2
This states that

 
1 0 0 0 ··· 0 0 0
1/2
 1 1/2 0 · · · 0 0 0 
 0
1  0 1 0 ··· 0 0 0
G = Pφm ←Cm =√  . (5.34)

. .. .. .. .. .. .. .. 
2
 . . . . . . . .
 0 0 0 0 ··· 0 1 0
1/2 0 0 0 ··· 0 1/2 1

In general we will call a matrix on the form


 
1 0 0 0 ··· 0 0 0
λ 1
 λ 0 ··· 0 0 0 
0 0 1 0 ··· 0 0 0
(5.35)
 
 .. .. .. .. .. .. .... 
. . . . . . . .
 
0 0 0 0 ··· 0 1 0
λ 0 0 0 ··· 0 λ 1
an elementary lifting matrix of odd type, and denote it by Bλ . This results from
the identity matrix by adding λ times the preceding and succeeding rows to the
odd-indexed rows. Since the even-indexed rows are left untouched, the inverse is
clearly obtained by subtracting λ times the preceding and succeeding rows to
the odd-indexed rows, i.e. (Bλ )−1 = B−λ . This means that the matrix for the
DWT is also easily found, and


H = PCm ←φm = 2B−1/2 (5.36)
1
G = Pφm ←Cm = √ B1/2 . (5.37)
2
In the exercises you will be asked to implement a function lifting_odd_symm
which computes Bλ . Using this the DWT kernel transformation for the piecewise
linear wavelet can be applied to a vector x as follows.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES194

x *= sqrt(2)
lifting_odd_symm(-0.5, x, symm)

The IDWT kernel transformation is computed similarly. Functions dwt_kernel_pwl0,


idwt_kernel_pwl0 which perform these steps are included in the code base.
The 0 stands for 0 vanishing moments. We defined vanishing moments after
Observation 5.14, and we will have more to say about vanishing moments later.

Example 5.28: DWT on sound


Using the new kernels, let us plot the listen to the new low resolution approxi-
mations, as well as plot and listen to the detail, as we did in Example 5.10. First
we listen to the low-resolution approximation.

x, fs = forw_comp_rev_DWT(m, ’pwl0’)
play(x, fs)

There is a new and undesired effect when we increase m here: The castanet
sound seems to grow strange. The sounds from the castanets are perhaps the
sound with the highest frequencies.
Now for the detail. For m = 1 this can be played as follows

x, fs = forw_comp_rev_DWT(1, ’pwl0’, 0)
play(x, fs)

For m = 2 the detail can be played as follows

x, fs = forw_comp_rev_DWT(2, ’pwl0’, 0)
play(x, fs)

1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.00 20000400006000080000100000
120000 1.00 20000400006000080000100000
120000
Figure 5.16: The detail in our audio sample file for the piecewise linear wavelet,
for m = 1 (left) and m = 2 (right).

The errors are shown in Figure 5.16. When comparing with Example 5.10
we see much of the same, but it seems here that the error is bigger than before.
In the next section we will try to explain why this is the case, and construct
another wavelet based on piecewise linear functions which remedies this.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES195

Example 5.29: DWT on the samples of a mathematical


function
Let us also repeat Example 5.11, where we plotted the detail/error at different
resolutions for the samples of a mathematical function.
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.00 200 400 600 800 1000 0.00 200 400 600 800 1000
1.0
0.8
0.6
0.4
0.2
0.00 200 400 600 800 1000
Figure 5.17: The error (i.e. the contribution from W0 ⊕ W1 ⊕ · · · ⊕ Wm−1 ) for
N = 1025 when f is a square wave, the linear function f (t) = 1 − 2|1/2 − t/N |,
and the trigonometric function f (t) = 1/2 + cos(2πt/N )/2, respectivey. The
detail is indicated for m = 6 and m = 8.

Figure 5.17 shows the new plot. With the square wave we see now that
there is an error. The reason is that a piecewise constant function can not be
represented exactly by piecewise linear functions, due to discontinuity. For the
second function we see that there is no error. The reason is that this function is
piecewise linear, so there is no error when we represent the function from the
space V0 . With the third function, however, we see an error.

Exercise 5.30: The vector of samples is the coordinate vec-


tor 2
Show that, for f ∈ V0 we have that [f ]φ0 = (f (0), f (1), . . . , f (N − 1)). This
shows that, also for the piecewise linear wavelet, there is no loss of information
in working with the samples of f rather than f itself.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES196

Exercise 5.31: Computing projections


In this exercise we will show how the projection of φ1,1 onto V0 can be computed.
We will see from this that it is nonzero, and that its support is the entire [0, N ].
Let f = projV0 φ1,1 , and let xn = f (n) for 0 ≤ n < N . This means that, on
(n, n + 1), f (t) = xn + (xn+1 − xn )(t − n).
R n+1
a) Show that n f (t)2 dt = (x2n + xn xn+1 + x2n+1 )/3.
b) Show that

1/2 √
Z  
1 1
(x0 + (x1 − x0 )t)φ1,1 (t)dt = 2 2 x0 + x1
0 12 24
1 √
Z  
1 1
(x0 + (x1 − x0 )t)φ1,1 (t)dt = 2 2 x0 + x1 .
1/2 24 12

c) Use the fact that

Z N N
X −1
(φ1,1 (t) − xn φ0,n (t))2 dt
0 n=0
Z 1 Z 1/2 Z 1
2
= φ1,1 (t) dt − 2 (x0 + (x1 − x0 )t)φ1,1 (t)dt − 2 (x0 + (x1 − x0 )t)φ1,1 (t)dt
0 0 1/2
N
X −1 Z n+1
+ (xn + (xn−1 − xn )t)2 dt
n=0 n

PN −1
and a) and b) to find an expression for kφ1,1 (t) − n=0 xn φ0,n (t)k2 .
d) To find the minimum least squares error, we can set the gradient of the
expression in c) to zero, and thus find the expression for the projection of φ1,1
−1
onto V0 . Show that the values {xn }Nn=0 can be found by solving the equation
1
Sx = b, where S = 3 {1, 4, 1} is an N × N symmetric filter, and b is the vector

with components b0 = b1 = 2/2, and bk = 0 for k ≥ 2.
e) Solve the system in d. for some values of N to verify that the projection of
φ1,1 onto V0 is nonzero, and that its support covers the entire [0, N ].

Exercise 5.32: Non-orthogonality for the piecewise linear


wavelet
Show that

2 1
hφ0,n , φ0,n i = hφ0,n , φ0,n±1 i = hφ0,n , φ0,n±k i = 0 for k > 1.
3 6
As a consequence, the {φ0,n }n are neither orthogonal, nor have norm 1.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES197

Exercise 5.33: Implement elementary lifting steps of odd


type
Write a function

lifting_odd_symm(lambda, x, bd_mode)

which applies an elementary lifting matrix of odd type (Equation (5.35)) to


x. Assume that N is even. The parameter bd_mode should do nothing, as we
will return to this parameter later. The function should not perform matrix
multiplication, and apply as few multiplications as possible.

Exercise 5.34: Wavelets based on polynomials


The convolution of two functions defined on (−∞, ∞) is defined by
Z ∞
(f ∗ g)(x) = f (t)g(x − t)dt.
−∞

Show that we can obtain the piecewise linear φ we have defined as φ = χ[−1/2,1/2) ∗
χ[−1/2,1/2) (recall that χ[−1/2,1/2) is the function which is 1 on [−1/2, 1/2) and
0 elsewhere). This gives us a nice connection between the piecewise constant
scaling function (which is similar to χ[−1/2,1/2) ) and the piecewise linear scaling
function in terms of convolution.

5.5 Alternative wavelet based on piecewise lin-


ear functions
For the scaling function used for piecewise linear functions, {φ(t−n)}0≤n<N were
not orthogonal anymore, contrary to the case for piecewise constant functions. We
were still able to construct what we could call resolution spaces and detail spaces.
We also mentioned that having many vanishing moments is desirable for a mother
wavelet, and that the scaling function used for piecewise constant functions had
one vanishing moment. It is easily checked, however, that the mother wavelet
we now introduced for piecewise linear functions (i.e. ψ(t) = √12 φ1,1 (t)) has no
vanishing moments. Therefore, this is not a very good choice of mother wavelet.
We will attempt the following adjustment strategy to construct an alternative
mother wavelet ψ̂ which has two vanishing moments, i.e. one more than the Haar
wavelet.
Idea 5.27. Adjusting the wavelet construction.
Adjust the wavelet construction in Theorem 5.26 to

ψ̂ = ψ − αφ0,0 − βφ0,1 (5.38)


and choose α, β so that
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES198

Z N Z N
ψ̂(t) dt = tψ̂(t) dt = 0, (5.39)
0 0
m
N 2 −1
and define ψm = {ψ̂m,n }n=0 , and Wm as the space spanned by ψm .
We thus have two free variables α, β in Equation (5.38), to enforce the two
conditions in Equation (5.39). In Exercise 5.38 you are taken through the details
of solving this as two linear equations in the two unknowns α and β, and this
gives the following result:
Lemma 5.28. The new function ψ.
The function
1 
ψ̂(t) = ψ(t) − φ0,0 (t) + φ0,1 (t) (5.40)
4
satisfies the conditions (5.39).
Using Equation (5.28), which stated that
 
1 1 1
φ0,n = √ φ1,2n−1 + φ1,2n + φ1,2n+1 , (5.41)
2 2 2
we get

1 
ψ̂0,n = ψ0,n − φ0,n + φ0,n+1
4  
1 1 1 1 1
= √ φ1,2n+1 − √ φ1,2n−1 + φ1,2n + φ1,2n+1
2 4 2 2 2
 
1 1 1 1
− √ φ1,2n+1 + φ1,2n+2 + φ1,2n+3
4 2 2 2
 
1 1 1 3 1 1
=√ − φ1,2n−1 − φ1,2n + φ1,2n+1 − φ1,2n+2 − φ1,2n+3 (5.42)
2 8 4 4 4 8

In summary we have

1 1 1
φ0,n = √ ( φ1,2n−1 + φ1,2n + φ1,2n+1 )
2 2 2
 
1 1 1 3 1 1
ψ̂0,n = √ − φ1,2n−1 − φ1,2n + φ1,2n+1 − φ1,2n+2 − φ1,2n+3 ,
2 8 4 4 4 8
(5.43)

The new function ψ̂ is plotted in Figure 5.18.


We see that ψ̂ has support (−1, 2), and consist of four linear segments glued
together. This is in contrast with the old ψ, which was simpler in that it had the
shorther support (0, 1), and consisted of only two linear segments glued together.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES199

1.0
0.8
0.6
0.4
0.2
0.0
0.2
0.4
2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0
Figure 5.18: The function ψ̂ we constructed as an alternative wavelet for
piecewise linear functions.

It may therefore seem surprising that ψ̂ is better suited for approximating


functions than ψ. This is indeed a more complex fact, which may not be deduced
by simply looking at plots of the functions.
The DWT in this new setting is the change of coordinates from φm to

Cˆm = {φm−1,0 , ψ̂m−1,0 , φm−1,1 , ψ̂m−1,1 , · · · , φm−1,2m−1 N −1 , ψ̂m−1,2m−1 N −1 }.

Equation (5.40) states that


 
1 −1/4 0 0 ··· 0 0 −1/4
0
 1 0 0 ··· 0 0 0  
0 −1/4 1 −1/4 ··· 0 0 0 
PCm ←Cˆm =  .
 
.. .. .. .. .. .. .. 
 .. . . . . . . . 
 
0 0 0 0 ··· −1/4 1 −1/4
0 0 0 0 ··· 0 0 1
(Column j for j even equals ej , since the basis functions φ0,n are not altered).
In general we will call a matrix on the form
 
1 λ 0 0 ··· 0 0 λ
0 1 0 0 · · · 0 0 0
 
0 λ 1 λ · · · 0 0 0
(5.44)
 
 .. .. .. .. .. .. .. .. 
. . . . . . . .
 
0 0 0 0 · · · λ 1 λ
0 0 0 0 ··· 0 0 1
an elementary lifting matrix of even type, and denote it by Aλ . Using Equation
(5.34) we can write
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES200

1
G = Pφm ←Cˆm = Pφm ←Cm PCm ←Cˆm , = √ B1/2 A−1/4
2
This gives us a factorization of the IDWT in terms of lifting matrices. The inverse
of elementary lifting matrices of even type can be found similarly to how we
found the inverse of elementary lifting matrices of odd type, i.e. (Aλ )−1 = A−λ .
This means that the matrix for the DWT is easily found also in this case,
and


H = PCˆm ←φm = 2A1/4 B−1/2 (5.45)
1
G = Pφm ←Cˆm = √ B1/2 A−1/4 . (5.46)
2
Note that equations (5.43) also computes the matrix G, but we will rather
use these factorizations, since the elementary lifting operations are already
implemented in the exercises. We will also explain later why such a factorization
is attractive in terms of saving computations. In the exercises you will be asked
to implement a function lifting_even_symm which computes Aλ . Using this
the DWT kernel transformation for the alternative piecewise linear wavelet can
be applied to a vector x as follows.

x *= sqrt(2)
lifting_odd_symm(-0.5, x, symm)
lifting_even_symm(0.25, x, symm)

The IDWT kernel transformation is computed similarly. Functions dwt_kernel_pwl2,


idwt_kernel_pwl2 which perform these steps are included in the code base (2
stands for 2 vanishing moments).

Example 5.35: DWT on sound


Using the new kernels, let us also here listen to the low resolution approximations
and the detail. First the low-resolution approximation:

x, fs = forw_comp_rev_DWT(m, ’pwl2’)
play(x, fs)

The new, undesired effect in the castanets from Example 5.28 now seem to be
gone. The detail for m = 1 this can be played as follows

x, fs = forw_comp_rev_DWT(1, ’pwl2’, 0)
play(x, fs)

For m = 2 the detail can be played as follows


CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES201

x, fs = forw_comp_rev_DWT(2, ’pwl2’, 0)
play(x, fs)

1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.00 20000400006000080000100000
120000 1.00 20000400006000080000100000
120000
Figure 5.19: The detail in our audio sample file for the alternative piecewise
linear wavelet, for m = 1 (left) and m = 2 (right).

The errors are shown in Figure 5.19. Again, when comparing with Exam-
ple 5.10 we see much of the same. It is difficult to see an improvement from
this figure. However, this figure also clearly shows a smaller error than the
piecewise linear wavelet. A partial explanation is that the wavelet we now have
constructed has two vanishing moments, while the other had not.

Example 5.36: DWT on the samples of a mathematical


function
Let us also repeat Exercise 5.11 for our alternative wavelet, where we plotted the
detail/error at different resolutions, for the samples of a mathematical function.
Figure 5.20 shows the new plot. Again for the square wave there is an error,
which seems to be slightly lower than for the previous wavelet. For the second
function we see that there is no error, as before. The reason is the same as
before, since the function is piecewise linear. With the third function there is an
error. The error seems to be slightly lower than for the previous wavelet, which
fits well with the fact that this new wavelet has a bigger number of vanishing
moments.

Exercise 5.37: Implement elementary lifting steps of even


type
Write a function

lifting_even_symm(lambda, x, bd_mode)

which applies an elementary lifting matrix of even type (Equation (5.44)) to x.


As before, assume that N is even, and that the parameter bd_mode does nothing.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES202

1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.00 200 400 600 800 1000 0.00 200 400 600 800 1000
1.0
0.8
0.6
0.4
0.2
0.00 200 400 600 800 1000
Figure 5.20: The error (i.e. the contribution from W0 ⊕ W1 ⊕ · · · ⊕ Wm−1 ) for
N = 1025 when f is a square wave, the linear function f (t) = 1 − 2|1/2 − t/N |,
and the trigonometric function f (t) = 1/2 + cos(2πt/N )/2, respectivey. The
detail is indicated for m = 6 and m = 8.

Exercise 5.38: Two vanishing moments


In this exercise we will show that there is a unique function on the form given
by Equation (5.38) which has two vanishing moments.
a) Show that, when ψ̂ is defined by Equation (5.38), we have that



 −αt − α for − 1 ≤ t < 0
(2 + α − β)t − α for 0 ≤ t < 1/2



ψ̂(t) = (α − β − 2)t − α + 2 for 1/2 ≤ t < 1

βt − 2β for 1 ≤ t < 2





0 for all other t
b) Show that

Z N Z N
1 1
ψ̂(t)dt = − α − β, tψ̂(t)dt = − β.
0 2 0 4
c) Explain why there is a unique function on the form given by Equation (5.38)
which has two vanishing moments, and that this function is given by Equation
(5.40).
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES203

Exercise 5.39: Implement finding ψ with vanishing mo-


ments
In the previous exercise we ended up with a lot of calculations to find α, β in
Equation (5.38). Let us try to make a program which does this for us, and which
also makes us able to generalize the result.
a) Define

Z 1 Z 2 Z 1
ak = tk (1 − |t|)dt, bk = tk (1 − |t − 1|)dt, ek = tk (1 − 2|t − 1/2|)dt,
−1 0 0

for k ≥ 0. Explain why finding α, β so that we have two vanishing moments in


Equation (5.38) is equivalent to solving the following equation:
    
a0 b0 α e
= 0
a1 b1 β e1
Write a program which sets up and solves this system of equations, and use this
program to verify the values for α, β we previously have found.

Hint. you can integrate functions in Python with the function quad in the
package scipy.integrate . As an example, the function φ(t), which is nonzero
only on [−1, 1], can be integrated as follows:

res, err = quad(lambda t: t**k*(1-abs(t)), -1, 1)

b) The procedure where we set up a matrix equation in a) allows for generaliza-


tion to more vanishing moments. Define

ψ̂ = ψ0,0 − αφ0,0 − βφ0,1 − γφ0,−1 − δφ0,2 . (5.47)


We would like to choose α, β, γ, δ so that we have 4 vanishing moments. Define
also

Z 0 Z 3
k
gk = t (1 − |t + 1|)dt, dk = tk (1 − |t − 2|)dt
−2 1

for k ≥ 0. Show that α, β, γ, δ must solve the equation


    
a0 b0 g0 d0 α e0
a1 b1 g1 d1  β  e1 
   =  ,
a2 b2 g2 d2   γ  e2 
a3 b3 g3 d3 δ e3
and solve this with your computer.
c) Plot the function defined by (5.47), which you found in b).
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES204

Hint. If t is the vector of t-values, and you write

(t >= 0)*(t <= 1)*(1-2*abs(t-0.5))

you get the points φ1,1 (t).


d) Explain why the coordinate vector of ψ̂ in the basis (φ0 , ψ0 ) is

[ψ̂](φ0 ,ψ0 ) = (−α, −β, −δ, 0, . . . , 0 − γ) ⊕ (1, 0, . . . , 0).

Hint. The placement of −γ may seem a bit strange here, and has to with
−1
that φ0,−1 is not one of the basis functions {φ0,n }N
n=0 . However, we have that
φ0,−1 = φ0,N −1 , i.e. φ(t + 1) = φ(t − N + 1), since we always assume that the
functions we work with have period N .
e) Sketch a more general procedure than the one you found in b)., which can
be used to find wavelet bases where we have even more vanishing moments.

Exercise 5.40: ψ for the Haar wavelet with two vanishing


moments
Let φ(t) be the function we used when we defined the Haar-wavelet.
a) Compute projV0 (f (t)), where f (t) = t2 , and where f is defined on [0, N ).
b) Find constants α, β so that ψ̂(t) = ψ(t)−αφ0,0 (t)−βφ0,1 (t) has two vanishing
moments, i.e. so that hψ̂, 1i = 0, and hψ̂, ti = 0. Plot also the function ψ̂.
R R R R
Hint.
R Start with R computing the integrals ψ(t)dt, tψ(t)dt, φ0,0 (t)dt, φ0,1 (t)dt,
and tφ0,0 (t)dt, tφ0,1 (t)dt.
c) Express φ and ψ̂ with the help of functions from φ1 , and use this to write
down the change of coordinate matrix from (φ0 , ψ̂0 ) to φ1 .

Exercise 5.41: More vanishing moments for the Haar wavelet


It is also possible to add more vanishing moments to the Haar wavelet. Define

ψ̂ = ψ0,0 − a0 φ0,0 − · · · − ak−1 φ0,k−1 .


R l+1 r R1
Define also cr,l = l t dt, and er = 0 tr ψ(t)dt.

a) Show that ψ̂ has k vanishing moments if and only if a0 , . . . , ak−1 solves the
equation
    
c0,0 c0,1 ··· c0,k−1 a0 e0
 c1,0 c1,1 ··· c1,k−1 
  a1   e1 
   
  ..  =  ..  (5.48)

 .. .. .. ..
 . . . .  .   . 
ck−1,0 ck−1,1 · · · ck−1,k−1 ak−1 ek−1
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES205

b) Write a function vanishingmomshaar which takes k as input, solves Equation


(5.48), and returns the vector a = (a0 , a1 , . . . , ak−1 ).

Exercise 5.42: Listening experiments


Run the function forw_comp_rev_DWT for different m for the Haar wavelet, the
piecewise linear wavelet, and the alternative piecewise linear wavelet, but listen
to the detail components W0 ⊕ W1 ⊕ · · · ⊕ Wm−1 instead. Describe the sounds
you hear for different m, and try to explain why the sound seems to get louder
when you increase m.

5.6 Multiresolution analysis: A generalization


Let us summarize the properties of the spaces Vm . In both our examples we
showed that they were nested, i.e.

V0 ⊂ V1 ⊂ V2 ⊂ · · · ⊂ Vm ⊂ · · · .
We also showed that continuous functions could be approximated arbitrarily
well from Vm , as long as m was chosen large enough. Moreover, the space V0 is
closed under all translates, at least if we view the functions in V0 as periodic
with period N . In the following we will always identify a function with this
periodic extension, just as we did in Fourier analysis. When performing this
identification, we also saw that f (t) ∈ Vm if and only if g(t) = f (2t) ∈ Vm+1 .
We have therefore shown that the scaling functions we have considered fit into
the following general framework.
Definition 5.29. Multiresolution analysis.
A Multiresolution analysis, or MRA, is a nested sequence of function spaces

V0 ⊂ V1 ⊂ V2 ⊂ · · · ⊂ Vm ⊂ · · · , (5.49)
called resolution spaces, so that

• Any function can be approximated arbitrarily well from Vm , as long as m


is large enough,

• f (t) ∈ V0 if and only if f (2m t) ∈ Vm ,


• f (t) ∈ V0 if and only if f (t − n) ∈ V0 for all n.
• There is a function φ, called a scaling function, so that φ = {φ(t−n)}0≤n<N
is a basis for V0 .

When φ is an orthonormal basis we say that the MRA is orthonormal.


CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES206

The wavelet of piecewise constant functions was an orthonormal MRA, while


the wavelets for piecewise linear functions were not. Although the definition
above states that any function can be approximated with MRA’s, in practice
one needs to restrict to certain functions: Certain pathological functions may be
difficult to approximate. In the literature one typically requires that the function
is in L2 (R), and also that the scaling function and the spaces Vm are in L2 (R).
MRA’s are much used, and one can find a wide variety of functions φ, not only
piecewise constant functions, which give rise to MRA’s.
In the examples we have considered we also chose a mother wavelet. The
term wavelet is used in very general terms. However, the term mother wavelet
is quite concrete, and is what gives rise to the theory of wavelets. This was
necessary in order to efficiently decompose the gm ∈ Vm into a low resolution
approximation gm−1 ∈ Vm−1 , and a detail/error em−1 in a detail space we called
Wm−1 . We have freedom in how we define these detail spaces, as well as how we
define a mother wavelet whose translates span the detail space (in general we
choose a mother wavelet which simplifies the computation of the decomposition
gm = gm−1 + em−1 , but we will see later that it also is desirable to choose a
ψ with other properties). Once we agree on the detail spaces and the mother
wavelet, we can perform a change of coordinates to find detail and low resolution
approximations. We thus have the following general recipe.
Idea 5.30. Recipe for constructing wavelets.
In order to construct MRA’s which are useful for practical purposes, we need
to do the following:

• Find a function φ which can serve as the scaling function for an MRA,
• Find a function ψ so that ψ = {ψ(t − n)}0≤n<N and φ = {φ(t − n)}0≤n<N
together form an orthonormal basis for V1 . The function ψ is also called a
mother wavelet.

With V0 the space spanned by φ = {φ(t − n)}0≤n<N , and W0 the space spanned
by ψ = {ψ(t − n)}0≤n<N , φ and ψ should be chosen so that we easily can
compute the decomposition of g1 ∈ V1 into g0 + e0 , where g0 ∈ V0 and e0 ∈ W0 .
If we can achieve this, the Discrete Wavelet Transform is defined as the change
of coordinates from φ1 to (φ0 , ψ0 ).

More generally, if
X X X
f (t) = cm,n φm,n = c0,n φ0,n + wm0 ,n ψm0 ,n ,
n n m0 <m,n

then the m-level DWT is defined by DWT(cm ) = (c0 , w0 , . . . , wm−1 ). It is


useful to interpret m as frequency, n as time, and wm,n as the contribution
at frequency m and time n. In this sense, wavelets provide a time-frequency
representation of signals. This is what can make them more useful than Fourier
analysis, which only provides frequency representations.
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES207

While there are in general many possible choices of detail spaces, in the
case of an orthonormal wavelet we saw that it was natural to choose the detail
space Wm−1 as the orthogonal complement of Vm−1 in Vm , and obtain the
mother wavelet by projecting the scaling function onto the detail space. Thus,
for orthonormal MRA’s, the low-resolution approximation and the detail can be
obtained by computing projections, and the least squares approximation of f
from Vm can be computed as
X
projVm (f ) = hf, φm,n iφm,n (t).
n

Working with the samples of f rather than f itself: The first crime of
wavelets. In Exercise 5.1 we saw that for the piecewise constant wavelet the
coordinate vector of f in Φm equaled the sample vector of f . In Exercise 5.30 we
saw that the same held for the piecewise linear wavelet. The general statement
is false, however: The coordinate
P vector of f in Φ0 may not equal the samples
(f (0), f (1), ...), so that n f (n)φ0,n and f are two different functions.
In most applications, a function is usually only available through its samples.
In many books on wavelets, one starts with theseP samples, and computes their
DWT. This means that the underlying function is n f (n)φ0,n , and since this is
different from f in general, we compute something completely different than we
want. This shows that many books apply a wrong procedure when computing
the DWT. This kind of error is also called the first crime of wavelets.
So, how bad is this crime? We will address this with two results. First we
will see how the samples P are related to the wavelet coefficients. Then we will
see how the function s f (s/2m )φm,s (t) is related to f (wavelet crime assumes
equality).
Theorem 5.31. Relation between samples and wavelet coefficients.
RN
Assume that φ̃ has compact support and is absolutely integrable, i.e. 0 |φ̃(t)| dt <
∞. Assume also that f is continuous and has wavelet coefficients cm,n . Then
we have that
Z N
m/2 m0
lim 2 cm,n2m−m0 = f (n/2 ) φ̃(t) dt.
m→∞ 0

Proof. Since φ̃ has compact support, φ̃m,n2m−m0 will be supported on a small


0
interval close to n/2m for large m. Since f is continuous, given  > 0 we can
0
choose m so large that f = f (n/2m ) + r(t), where |r(t)| <  on this interval.
But then
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES208

Z N
cm,n2m−m0 = f (t)φ̃m,n2m−m0 (t) dt
0
Z N Z N
0
= f (n/2m ) φ̃m,n2m−m0 (t) dt + r(t)φ̃m,n2m−m0 (t) dt
0 0
Z N Z N
0
≤ 2−m/2 f (n/2m ) φ̃(t) dt +  |φ̃m,n2m−m0 (t)| dt
0 0
Z N Z N
0
= 2−m/2 f (n/2m ) φ̃(t) dt + 2−m/2  |φ̃(t)| dt.
0 0

0 RN
From this it follows that limm→∞ 2m/2 cm,n2m−m0 = f (n/2m ) 0 φ̃(t) dt, since 
was arbitrary, and φ̃(t) was assumed to be absolutely integrable.
This result has an important application. It turns out that there is usually
no way to find analytical expressions for the scaling function and the mother
wavelet. Their coordinates in (φ0 , ψ0 ) are simple, however, since there is only
one non-zero coordinate:

• The coordinates of φ in (φ0 , ψ0 ) is (1, 0, ..., 0), where there are 2m N − 1


zeros.
• The coordinates of ψ in (φ0 , ψ0 ) is (0, ..., 0, 1, 0, ..., 0), where there are
2m−1 N zeros at the beginning.

If we know that φ and ψ are continuous, we can apply an m stage IDWT to


these coordinates and use Theorem 5.31 to find arbitrarily good estimates to the
0 0
samples φ(n/2m ), ψ(n/2m ). The coordinates we find have to be scaled with
m/2
2 in order for the values to be of comparable size. Also, this algorithm will
RN
miss the actual samples by a factor of 0 φ̃(t) dt. Nevertheless, the graphs will
be similar. The algorithm is also called the cascade algorithm.
Definition 5.32. The cascade algorithm.
The cascade algorithm applies a change of coordinates for the functions φ, ψ
from the basis (φ0 , ψ0 , ψ1 . . .) to the basis φm , and uses the new coordinates as
an an approximation to the function values of these functions.
Now for the second result.
Theorem 5.33. Using the samples.
If f is continuous and φ has compact support, for all t with a finite bit
0
expansion (i.e. t = n/2m for some integers n and m0 ) we have that
2mX
N −1 X
lim 2−m/2 f (s/2m )φm,s (t) = f (t) φ(s).
m→∞
s=0 s
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES209

P
This says that, up to the constant factor c = n φ(n), the functions fm ∈ Vm
with coordinates 2−m/2 (f (0/2m ), f (1/2m ), ...) in Φm converge pointwise to f as
m → ∞ (even though the samples of fm may not equal those of f ).
0
Proof. With t = n/2m , for m > m0 we have that

0 0 0
φm,s (t) = φm,s (n2m−m /2m ) = 2m/2 φ(2m n2m−m /2m −s) = 2m/2 φ(n2m−m −s).

We thus have that


2mX
N −1 2mX
N −1
0
−m/2 m
2 f (s/2 )φm,s (t) = f (s/2m )φ(n2m−m − s).
s=0 s=0
m−m0
In the sum finitely many s close to n2 contribute (due to finite support),
P and
then s/2m ≈ t. Due to continuity of f this sum thus converges to f (t) s φ(s),
and the proof is done.
Let us see how we can implement the cascade algorithm. As input to the
algorithm we must have the number of levels m, and the kernel to use for the
IDWT. Also we need to know an interval [a, b] so large that it contains the
supports of φ, ψ (we will see later how we can compute this supports). Else
we can not obtain complete plots of the functions. a and b thus also need to
be input to the algorithm. We now set N = b − a. The space φm is then of
dimension (b − a)2m , so the cascade algorithm needs a coordinate vector of this
size as starting point. the coordinates of φ in the (b − a)2m -dimensional basis
(φ0 , ψ0 , ψ1 . . .) is (1, 0, ..., 0) while the coordinates of ψ in the same basis is

( 0, ..., 0 , 1, 0, ..., 0).


| {z }
b−a times

Our algorithm can take as input whether we want to plot the φ or the ψ function
(and thereby choose among these sets of coordinates), and also the value of the
dual parameter, which we will return to. The following algorithm can be used
for all this

def cascade_alg(m, a, b, wave_name, scaling, dual):


coords = zeros((b-a)*2**m)
if scaling:
coords[0] = 1
else:
coords[b - a] = 1
t = linspace(a, b, (b-a)*2**m)
IDWTImpl(coords, m, wave_name, ’per’, dual)
coords = concatenate([coords[(b*2**m):((b-a)*2**m)], \
coords[0:(b*2**m)]])
plt.figure()
plt.plot(t, 2**(m/2.)*coords, ’k-’)
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES210

Example 5.43: Implementing the cascade algorithm


One thing should be noted in the function cascade_alg. As the scaling function
of the piecewise linear wavelet, it may be that the function is nonzero for small,
negative values. If we plot the function over [0, N ], we would see two disconnected
segments - one to the left, and one to the right. In the code we shift the values
so that the graph appears as one connected segment.
We will use a = −2 and b = 6 in what follows, since [−2, 6] will turn out to
contain all supports. We will also use m = 10 levels in the cascade algorithm.
The following code then runs the cascade algorithm for the three wavelets we
have considered, to reproduce all previous scaling functions and mother wavelets.

cascade_alg(10, -2, 6, ’Haar’, True, False)


cascade_alg(10, -2, 6, ’Haar’, False, False)

cascade_alg(10, -2, 6, ’pwl0’, True, False)


cascade_alg(10, -2, 6, ’pwl0’, False, False)

cascade_alg(10, -2, 6, ’pwl2’, True, False)


cascade_alg(10, -2, 6, ’pwl2’, False, False)

5.7 Summary
We started this chapter by motivating the theory of wavelets as a different
function approximation scheme, which solved some of the shortcomings of Fourier
series. While one approximates functions with trigonometric functions in Fourier
theory, with wavelets one instead approximates a function in several stages,
where one at each stage attempts to capture information at a given resolution,
using a function prototype. This prototype is localized in time, contrary to
the Fourier basis functions, and this makes the theory of wavelets suitable for
time-frequency representations of signals. We used an example based on Google
Earth to illustrate that the wavelet-based scheme can represent an image at
different resolutions in a scalable way, so that passing from one resolution to
another simply mounts to adding some detail information to the lower resolution
version of the image. This also made wavelets useful for compression, since the
images at different resolutions can serve as compressed versions of the image.
We defined the simplest wavelet, the Haar wavelet, which is a function
approximation scheme based on piecewise constant functions, and deduced its
properties. We defined the Discrete Wavelet Transform (DWT) as a change of
coordinates corresponding to the function spaces we defined. This transform is
the crucial object to study when it comes to more general wavelets also, since
it is the object which makes wavelets useful for computation. In the following
chapters, we will see that reordering of the source and target bases of the
CHAPTER 5. MOTIVATION FOR WAVELETS AND SOME SIMPLE EXAMPLES211

DWT will aid in expressing connections between wavelets and filters, and in
constructing optimized implementations of the DWT.
We then defined another wavelet, which corresponded to a function approxi-
mation scheme based on piecewise linear functions, instead of piecewise constant
functions. There were several differences with the new wavelet when compared
to the previous one. First of all, the basis functions were not orthonormal, and
we did not attempt to make them orthonormal. The resolution spaces we now
defined were not defined in terms of orthogonal bases, and we had some freedom
on how we defined the detail spaces, since they are not defined as orthogonal
complements anymore. Similarly, we had some freedom on how we define the
mother wavelet, and we mentioned that we could define it so that it is more
suitable for approximation of functions, by adding what we called vanishing
moments.
From these examples of wavelets and their properties we made a generalization
to what we called a multiresolution analysis (MRA). In an MRA we construct
successively refined spaces of functions that may be used to approximate functions
arbitrarily well. We will continue in the next chapter to construct even more
general wavelets, within the MRA framework.
The book [29] goes through developments for wavelets in detail. While
wavelets have been recognized for quite some time, it was with the important
work of Daubechies [12, 13] that they found new arenas in the 80’s. Since then
they found important applications. The main application we will focus on in
later chapters is image processing.

What you should have learned in this chapter.


• Definition of resolution spaces (Vm ), detail spaces (Wm ), scaling function
(φ), and mother wavelet (ψ) for the wavelet based on piecewise constant
functions.
• The nesting of resolution spaces, and how one can project from one reso-
lution space onto a lower order resolution space, and onto its orthogonal
complement.
• The definition of the Discrete Wavelet Transform as a change of coordinates,
and how this can be written down from relations between basis functions.
• Definition of the m-level Discrete Wavelet Transform.
• Implementation of the Haar wavelet transform and its inverse.
• Experimentation with wavelets on sound.
• Definition of scaling function, mother wavelet, resolution spaces, and detail
spaces for the wavelet of piecewise linear functions.
• How one alters the mother wavelet for piecewise linear functions, in order
to add a vanishing moment.
• Definition of a multiresolution analysis.
Chapter 6

The filter representation of


wavelets

Previously we saw that analog filters restricted to the Fourier spaces gave rise to
digital filters. These digital filters sent the samples of the input function to the
samples of the output function, and are easily implementable, in contrast to the
analog filters. We have also seen that wavelets give rise to analog filters. This
leads us to believe that the DWT also can be implemented in terms of digital
filters. In this chapter we will prove that this is in fact the case.
There are some differences between the Fourier and wavelet settings, however:

• The DWT is not constructed by looking at the samples of a function, but


rather by looking at coordinates in a given basis.
• The function spaces we work in (i.e. Vm ) are different from the Fourier
spaces.
• The DWT gave rise to two different types of analog filters: The filter
defined by Equation (7.16) for obtaining cm,n , and the filter defined by
Equation (7.17) for obtaining wm,n . We want both to correspond to digital
filters.

Due to these differences, the way we realize the DWT in terms of filters will
be a bit different. Despite the differences, this chapter will make it clear
that the output of a DWT can be interpreted as the combined output of two
different filters, and each filter will have an interpretation in terms of frequency
representations. We will also see that the IDWT has a similar interpretation in
terms of filters.
In this chapter we will also see that expressing the DWT in terms of filters
will also enable us to define more general transforms, where even more filters
are used. It is fruitful to think about each filter as concentrating on a particular
frequency range, and that these transforms thus simply splits the input into
different frequency bands. Such transforms have important applications to the

212
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 213

processing and compression of sound, and we will show that the much used MP3
standard for compression of sound takes use of such transforms.

6.1 The filters of a wavelet transformation


We will make the connection with digital filters by looking again at the different
examples of wavelet bases we have seen: In each case we saw that every second
row/column in the kernel transformations G = Pφm ←Cm and H = PCm ←φm
repeated, as in a circulant matrix. The matrices were not exactly circulant
Toeplitz matrices, however, since there are two different columns repeating. The
change of coordinate matrices occuring in the stages in a DWT are thus not
digital filters, but they seem to be related. Let us start by giving these new
matrices names:
Definition 6.1. MRA-matrices.
An N × N -matrix T , with N even, is called an MRA-matrix if the columns
are translates of the first two columns in alternating order, in the same way as
the columns of a circulant Toeplitz matrix.
From our previous calculations it is clear that, once φ and ψ are given
through an MRA, the corresponding change of coordinate matrices will always
be MRA-matrices. The MRA-matrices is our connection between filters and
wavelets. Let us make the following definition:
Definition 6.2. H0 and H1 .
We denote by H0 the (unique) filter with the same first row as H, and by H1
the (unique) filter with the same second row as H. H0 and H1 are also called
the DWT filter components.
Using this definition it is clear that

(
(H0 cm )k when k is even
(Hcm )k =
(H1 cm )k when k is odd,

since the left hand side depends only on row k in the matrix H, and this is equal
to row k in H0 (when k is even) or row k in H1 (when k is odd). This means
that Hcm can be computed with the help of H0 and H1 as follows:
Theorem 6.3. DWT expressed in terms of filters.
Let cm be the coordinates in φm , and let H0 , H1 be defined as above. Any
stage in a DWT can ble implemented in terms of filters as follows:

• Compute H0 cm . The even-indexed entries in the result are the cordinates


cm−1 in φm−1 .
• Compute H1 cm . The odd-indexed entries in the result are the coordinates
wm−1 in ψm−1 .
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 214

This gives an important connection between wavelets and filters: The DWT
corresponds to applying two filters, H0 and H1 , and the result from the DWT
is produced by assembling half of the coordinates from each. Keeping only
every second coordinate is called downsampling (with a factor of two). Had
we not performed downsampling, we would have ended up with twice as many
coordinates as we started with. Downsampling with a factor of two means that
we end up with the same number of samples as we started with. We also say that
the output of the two filters is critically sampled. Due to the critical sampling, it
is inefficient to compute the full application of the filters. We will return to the
issue of making efficient implementations of critically sampled filter banks later.
We can now complement Figure 5.9 by giving names to the arrows as follows:

H0 H0 H0 H0 H0
φm / φm−1 / φm−2 / ··· / φ1 / φ0
H1 H1 H1 H1

" # #
ψ m−1 ψ m−2 ψ m−3 ψ0

Figure 6.1: Detailed illustration of a wavelet transform.

Let us make a similar anlysis for the IDWT, and let us first make the following
definition:
Definition 6.4. G0 and G1 .
We denote by G0 the (unique) filter with the same first column as G, and by
G1 the (unique) filter with the same second column as G. G0 and G1 are also
called the IDWT filter components.
These filters are uniquely determined, since any filter is uniquely determined
from one of its columns. We can now write
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 215

     
cm−1,0 cm−1,0 0

 wm−1,0 


 0  
  wm−1,0 


 cm−1,1 


 cm−1,1  
  0 

cm = G
 wm−1,1  = G 
   0 +
  wm−1,1 


 ··· 


 ···  
  ··· 

 cm−1,2m−1 N −1  cm−1,2m−1 N −1   0 
wm−1,2m−1 N −1 0 wm−1,2m−1 N −1
   
cm−1,0 0

 0 


 w m−1,0



 cm−1,1 


 0 

= G
 0  + G
  wm−1,1 


 ··· 


 ··· 

cm−1,2m−1 N −1   0 
0 wm−1,2m−1 N −1
   
cm−1,0 0

 0 


 wm−1,0 


 cm−1,1 


 0 

= G0 
 0  + G1 
  wm−1,1 .


 ··· 


 ··· 

cm−1,2m−1 N −1   0 
0 wm−1,2m−1 N −1

Here we have split a vector into its even-indexed and odd-indexed elements,
which correspond to the coefficients from φm−1 and ψm−1 , respectively. In
the last equation, we replaced with G0 , G1 , since the multiplications with G
depend only on the even and odd columns in that matrix (due to the zeros
inserted), and these columns are equal in G0 , G1 . We can now state the following
characterization of the inverse Discrete Wavelet transform:

Theorem 6.5. IDWT expressed in terms of filters.


Let G0 , G1 be defined as above. Any stage in an IDWT can be implemented
in terms of filters as follows:
   
cm−1,0 0

 0 


 wm−1,0 


 cm−1,1



 0 

cm = G0   0  + G1 
  wm−1,1
.
 (6.1)

 · · · 


 · · · 

cm−1,2m−1 N −1   0 
0 wm−1,2m−1 N −1
Making a new vector where zeroes have been inserted in this way is also called
upsampling (with a factor of two). We can now also complement Figure 5.9 for
the IDWT with named arrows. This has bee done in Figure 6.2
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 216

φm o φm−1 o φm−2 o ··· o φ1 o φ0


b G0 c G0 c G0 G0 ` G0

G1 G1 G1 G1

ψ m−1 ψ m−2 ψ m−3 ψ0

Figure 6.2: Detailed illustration of an IDWT.

Note that the filters G0 , G1 were defined in terms of the columns of G, while
the filters H0 , H1 were defined in terms of the rows of H. This difference is seen
from the computations above to come from that the change of coordinates one
way splits the coordinates into two parts, while the inverse change of coordinates
performs the opposite. Let us summarize what we have found as follows.

Fact 6.6. Computing DWT/IDWT through filters.


The DWT can be computed with the help of two filters H0 , H1 , as explained
in Theorem 6.3. Any linear transformation computed from two filters H0 , H1 in
this way is called a forward filter bank transform. The IDWT can be computed
with the help of two filters G0 , G1 as explained in Theorem 6.5. Any linear
transformation computed from two filters G0 , G1 in this way is called a reverse
filter bank transform.
In Chapter 8 we will go through how any forward and reverse filter bank
transform can be implemented, once we have the filters H0 , H1 , G0 , and G1 .
When we are in a wavelet setting, the filter coefficients in these four filters can
be found from the relations between the bases φ1 and (φ0 , ψ0 ). The filters
H0 , H1 , G0 , G1 can also be constructed from outside a wavelet setting, i.e. that
they do not originate from change of coordinate matrices between certain function
bases. The important point is that the matrices invert each other, but in a signal
processing setting it may also be meaningful to allow for the reverse transform
not to invert the forward transform exactly. This corresponds to some loss of
information when we attempt to reconstruct the original signal using the reverse
transform. A small such loss can, as we will see at the end of this chapter, be
acceptable.
Note that Figure 6.1 and 6.2 do not indicate the additional downsampling and
upsampling steps described in Theorem 6.3 and 6.5. If we indicate downsampling
with ↓2 , and upsampling with ↑2 , the algorithms given in Theorem 6.3 and 6.5
can be summarized as in Figure 6.3.
Here ⊕ represents summing the elements which point inwards to the plus
sign. In this figure, the left side represents the DWT, the right side the IDWT.
In the literature, wavelet transforms are more often illustrated in this way using
filters, since it makes alle steps involved in the process more clear. This type of
figure also opens for generalization. We will shortly look into this.
There are several reasons why it is smart to express a wavelet transformation
in terms of filters. First of all, it enables us to reuse theoretical results from
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 217

H0O c1 / ↓2 / c0 / ↑2 / (cm−1,0 , 0, cm−1,1 , 0, · · · )

G0

c1 ⊕O
G1

H1 c1 / ↓2 / w0 / ↑2 / (0, wm−1,0 , 0, wm−1,1 , · · · )

Figure 6.3: Detailed illustration of a DWT.

the world of filters in the world of wavelets, and to give useful interpretations
of the wavelet transform in terms of frequencies. Secondly, and perhaps most
important, it enables us to reuse efficient implementations of filters in order
to compute wavelet transformations. A lot of work has been done in order to
establish efficient implementations of filters, due to their importance.
In Example 5.10 we argued that the elements in Vm−1 correspond to frequen-
cies at lower frequencies than those in Vm , since V0 = Span({φ0,n }n ) should be
interpreted as content of lower frequency than the φ1,n , with W0 = Span({ψ0,n }n )
the remaining high frequency detail. To elaborate more on this, we have that

2N
X −1
φ(t) = (G0 )n,0 φ1,n (t) (6.2)
n=0
2N
X −1
ψ(t) = (G1 )n−1,1 φ1,n (t)., (6.3)
n=0

where (Gk )i,j are the entries in the matrix Gk . Similar equations are true for
φ(t − k), ψ(t − k). Due to Equation (6.2), the filter G0 should have lowpass
characteristics, since it extracts the information at lower frequencies. Similarly,
G1 should have highpass characteristics due to Equation (6.3).

Example 6.1: The Haar wavelet


For the Haar wavelet we saw that, in G, the matrix
!
√1 √1
2 2 (6.4)
√1 − √12
2

repeated along the diagonal. The filters G0 and G1 can be found directly from
these columns:
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 218

√ √
G0 = {1/ 2, 1/ 2}
√ √
G1 = {1/ 2, −1/ 2}.

We have seen these filters previously: G0 is a moving average filter of two


elements (up to multiplication with a constant). This is a lowpass filter. G1 is
a bass-reducing filter, which is a high-pass filter. Up to a constant, this is also
an approximation to the derivative. Since G1 is constructed from G0 by adding
an alternating sign to the filter coefficients, we know from before that G1 is the
high-pass filter corresponding to the low-pass filter G0 , so that the frequency
response of the second is given by a shift of frequency with π in the first. The
frequency responses are

1 1 √
λG0 (ω) = √ + √ e−iω = 2e−iω/2 cos(ω/2)
2 2
1 iω 1 √
λG1 (ω) = √ e − √ = 2ieiω/2 sin(ω/2).
2 2
By considering the filters where the rows are as in Equation (6.4), it is clear that

√ √
H0 = {1/ 2, 1/ 2}
√ √
H1 = {−1/ 2, 1/ 2},

so that the frequency responses for the DWT have the same lowpass/highpass
characteristics.

Example 6.2: Wavelet for piecewise linear functions


For the wavelet for piecewise linear functions we looked at in the previous section,
Equation (5.34) gives that

1
G0 = √ {1/2, 1, 1/2}
2
1
G1 = √ {1}. (6.5)
2
G0 is again a filter we have seen before: Up to multiplication with a constant, it
is the treble-reducing filter with values from row 2 of Pascal’s triangle. We see
something different here when compared to the Haar wavelet, in that the filter
G1 is not the highpass filter corresponding to G0 . The frequency responses are
now
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 219

1 1 1 1
λG0 (ω) = √ eiω + √ + √ e−iω = √ (cos ω + 1)
2 2 2 2 2 2
1
λG1 (ω) = √ .
2

λG1 (ω) thus has magnitude √12 at all points. Comparing with Figure 6.5 we see
that here also the frequency response has a zero at π. The frequency response
seems also to be flatter around π. For the DWT we have that


H0 = 2{1}

H1 = 2{−1/2, 1, −1/2}. (6.6)

Even though G1 was not the highpass filter corresponding to G0 , we see that,
up to a constant, H1 is (it is a bass-reducing filter with values taken from row 2
of Pascals triangle).

Example 6.3: The alternative piecewise linear wavelet


We previously wrote down the first two columns in Pφm ←Cm for the alternative
piecewise linear wavelet. This gives us that the filters G0 ans G1 are

1
G0 = √ {1/2, 1, 1/2}
2
1
G1 = √ {−1/8, −1/4, 3/4, −1/4, −1/8}. (6.7)
2
Here G0 was as for the wavelet of piecewise linear functions since we use the
same scaling function. G1 was changed, however. Clearly, G1 now has highpass
characteristics, while the lowpass characteristic of G0 has been preserved.
The filters G0 , G1 , H0 , H
√1 are
√ particularly important in applications: Apart
from the scaling factors 1/ 2, 2 in front, we see that the filter coefficients are
all dyadic fractions, i.e. they are on the form β/2j . Arithmetic operations with
dyadic fractions can be carried out exactly on a computer, due to representations
as binary numbers in computers. These filters are thus important in applications,
since they can be used as transformations for lossless coding. The same argument
can be made for the Haar wavelet, but this wavelet had one less vanishing moment.
Note that the role of H1 as the high-pass filter corresponding to G0 is the
case in both previous examples. We will prove in the next chapter that this
is a much more general result which holds for all wavelets, not only for the
orthonormal ones.
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 220

6.1.1 The dual filter bank transform and the dual param-
eter
Since the reverse transform inverts the forward transform, GH = I. If we
transpose this expression we get that H T GT = I. Clearly H T is a reverse
filter bank transform with filters (H0 )T , (H1 )T , and GT is a forward filter bank
transform with filters (G0 )T , (G1 )T . Due to their usefulness, these transforms
have their own name:

Definition 6.7. Dual filter bank transforms.


Assume that H0 , H1 are the filters of a forward filter bank transform, and that
G0 , G1 are the filters of a reverse filter bank transform. By the dual transforms
we mean the forward filter bank transform with filters (G0 )T , (G1 )T , and the
reverse filter bank transform with filters (H0 )T , (H1 )T .
In other words, if H and G are the kernel transformations of the DWT and
the IDWT, respectively, the kernel transformations of the dual DWT and the
dual IDWT are GT and H T , respectively. In Section 5.3 we used a parameter
dual in our call to the DWT and IDWT kernel functions. This parameter can
now be explained as follows:
Fact 6.8. The dual-parameter in DWT kernel functions..

• If the dual parameter is false, the DWT is computed as the forward filter
bank transform with filters H0 , H1 , and the IDWT is computed as the
reverse filter bank transform with filters G0 , G1 .
• If the dual parameter is true, the DWT is computed as the forward filter
bank transform with filters (G0 )T , (G1 )T , and the IDWT is computed as
the reverse filter bank transform with filters (H0 )T , (H1 )T .

This means that we can differ between the DWT, IDWT, and their duals as
follows.

DWTImpl(x, m, wave_name, True, False) # DWT


IDWTImpl(x, m, wave_name, True, False) # IDWT
DWTImpl(x, m, wave_name, True, True) # Dual DWT
IDWTImpl(x, m, wave_name, True, True) # Dual IDWT

Note that, even though the reverse filter bank transform G can be associated
with certain function bases, it is not clear if the reverse filter bank transform
H T also can be associated with such bases. We will see in the next chapter that
such bases can in many cases be found. We will also denote these bases as dual
bases.
The construction of the dual wavelet transform was function-free - we have no
reason to believe that they correspond to scaling functions and mother wavelets.
In the next chapter we will show that such dual scaling functions and dual
mother wavelets exist in many cases. We can set the dual parameter to True in
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 221

the implementation of the cascade algorithm in Example 5.43 to see how the
functions must look. In Figure 6.4 we have plotted the result. We see that these
functions look very irregular. Also, they are very different from the original
scaling function and mother wavelet. We will later argue that this is bad, it
would be much better if φ ≈ φ̃ and ψ ≈ ψ̃.
1200 1200
φ̃(t) 1000 ψ̃(t)
1000
800
800 600
400
600
200
400 0
200
200
400
02 1 0 1 2 3 4 5 6 600 2 1 0 1 2 3 4 5 6

20
6 φ̃(t) ψ̃(t)
15
4 10
2 5

0 0
5
2
2 1 0 1 2 3 4 5 6 10 2 1 0 1 2 3 4 5 6

Figure 6.4: Dual functions for the two piecewise linear wavelets.

In the construction of the alternative piecewise linear wavelet we actually


made a DWT/IDWT implementation before we found the filter coefficients
themselves. But since the filter coefficients of G0 and G1 can be found in the
columns of the matrix, they can be extracted by applying the IDWT kernel
implementation to e0 (for the low-pass filter coefficients) and e1 (for the high-pass
filter coefficients). Finally the frequency responses can by plotted by copying
what we did in Section 3.3 The following algorithm does this.

def freqresp_alg(wave_name, lowpass, dual):


idwt_kernel = find_kernel(wave_name, 0, dual, False);
N = 128
n = arange(0,N)
omega = 2*pi*n/float(N)
g = zeros(N)
if lowpass:
g[0] = 1
else:
g[1] = 1
idwt_kernel(g, ’per’)
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 222

plt.figure()
plt.plot(omega, abs(fft.fft(g)), ’k-’)

If the parameter dual is set to True, the dual filters (H0 )T and (H1 )T are plotted
instead. If the filters have real coefficients, |λHiT (ω)| = |λHi (ω)|, so the correct
frequency responses are shown.

Example 6.4: Plotting the frequency responses


In order to verify the low-pass/high-pass characteristics of G0 and G1 , let us
plot the frequency responses of the wavelets we have considered. To plot λG0 (ω)
and λG1 (ω) for the Haar wavelet, we can write

freqresp_alg(’Haar’, True, False)


freqresp_alg(’Haar’, False, False)

To plot the same frequency response for the alternative piecewise linear wavelet,
we can write

freqresp_alg(’pwl2’, True, False)


freqresp_alg(’pwl2’, False, False)

The resulting frequency responses are shown in Figure 6.5. Low-pass/high-pass


characteristics are clearly seen here.

6.1.2 The support of the scaling function and the mother


wavelet
The scaling functions and mother wavelets we encounter will turn out to always
be functions with compact support. An interesting consequence of equations
(6.2) and (6.3) is that we can find the size of these supports from the number of
filter coefficients in G0 and G1 . In the following we will say that the support of
a filter is [E, F ] if t−E , ..., tF are the only nonzero filter coefficients.

Theorem 6.9. Support size.


Assume that G0 has support [M0 , M1 ], G1 has support [N0 , N1 ]. Then the
support of φ is [M0 , M1 ], and the support of ψ is [(M0 +N0 +1)/2, (M1 +N1 +1)/2].
Proof. Let [m0 , m1 ] be the support of φ. Then φ1,n clearly has support [(m0 +
n)/2, (m1 + n)/2]. In the equation
M1
X
φ(t) = (G0 )n φ1,n ,
n=M0

the function with the leftmost support is φ1,M0 , while the one with the rightmost
one is φ1,M1 . These supports are [(m0 + M0 )/2, (m1 + M1 )/2]. In order for the
supports of the two sides to match we clearly must have
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 223

1.4 1.4
1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.00 1 2 3 4 5 6 0.00 1 2 3 4 5 6

1.4 1.4
1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.00 1 2 3 4 5 6 0.00 1 2 3 4 5 6

Figure 6.5: The frequency responses λG0 (ω) and λG1 (ω) for the Haar wavelet
(top), and for the alternative piecewise linear wavelet (bottom).

m0 = (m0 + M0 )/2 m1 = (m1 + M1 )/2.

It follows that m0 = M0 and m1 = M1 , so that the support of φ is [M0 , M1 ].


Similarly, let [n0 , n1 ] be the support of ψ. We have that
NX
1 +1

ψ(t) = (G1 )n−1 φ1,n ,


n=N0 +1

and we get in the same way

n0 = (m0 + N0 + 1)/2 n1 = (m1 + N1 + 1)/2.

It follows that n0 = (M0 + N0 + 1)/2. n1 = (M1 + N1 + 1)/2, so that the support


of ψ is ((M0 + N0 + 1)/2, (M1 + N1 + 1)/2).

There are two special cases of the above we will run into.
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 224

Wavelets with symmetric filters. The results then say the support of φ is
[−M1 , M1 ] (i.e. symmetric around 0), and the support of ψ is 1/2 + [−(M1 +
N1 )/2, (M1 + N1 )/2], i.e. symmetric around 1/2. The wavelet with most such
filter coefficients we will consider has 7 and 9 filter coefficients, respectively, so
that the support of φ is [−3, 3], and the support of ψ is [−3, 4]. This is why
we have plotted these functions over [−4, 4], so that the entire function can be
seen. For the alternative piecewise linear wavelet the same argument gives that
support of φ is [−1, 1], and the support of ψ is [−1, 2] (which we already knew
from Figure 5.18). For the piecewise linear wavelet the support of ψ is deduced
to be [0, 1].

Orthonormal wavelets. For these wavelets it will turn that G0 has filter
coefficients evenly distributed around 1/2, and G1 has equally many, and evenly
distributed around −1/2. It is straightforward to check that the filters for the
Haar wavelet are of this kind, and this will turn out to be the simplest case of an
orthonormal wavelet. For such supports Theorem 6.9 says that both supports are
symmetric around 1/2, and that both φ, ψ, G0 and G1 have the same support
lengths. This can also be verified from the plots for the Haar wavelet. We
will only consider orthonormal wavelets with at most 8 filter coefficients. This
number of filter coefficients is easily seen to give the support [−3, 4], which is
why we have used [−4, 4] as a common range when we plot functions on this
form.

6.1.3 Symmetric extensions and the bd_mode parameter.


Continuous functions f : [0, N ] → R are approximated well from Vm , at least
if the wavelet has one vanishing moment. The periodic extension of f is not
continuous, however, if f (0) 6= f (N ). We can instead form the symmetric
extension f˘ of f , as given by Definition 1.22, which is defined on [0, 2N ], and
which can be periodically extended to a continuous function. We make a smaller
error if we restrict to the approximation of f˘ from Vm , when compared to that
of f from Vm .
The input to the DWT P is given in terms of a vector c, however, so that our
f is really the
P function n cm,n φm,n . If φ has compact support, then clearly
the function n c̆m,n φm,n retains the same values as the symmetric extension
(except near the boundary), when c̆ is some kind of symmetric extension of the
vector c. We are free to decide how vectors are symmetrically extended near the
boundary. In the theory of wavelets, the following symmetric extension strategy
for vectors is used.
Definition 6.10. Symmetric extension of a vector.
By the symmetric extension of x ∈ RN , we mean x̆ ∈ R2N −2 defined by

xk 0≤k<N
x̆k = (6.8)
x2N −2−k N ≤ k < 2N − 3
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 225

This is different from the symmetric extension given by Definition 4.1. Note
that (f˘(0), f˘(1), ..., f˘(N − 1), f˘(N ), f˘(N + 1), ..., f˘(2N − 1)) ∈ R2N is now the
symmetric extension of (f (0), f (1), ..., f (N )), so that this way of defining sym-
metric extensions is perhaps the most natural when it comes to sampling which
includes the boundaries.
2.0 2.0
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0 10 20 30 40 0.0 0 10 20 30 40
Figure 6.6: A vector and its symmetric extension. Note that the period of the
vector is now 2N − 2, while it was 2N for the vector shown in Figure 4.1.

Consider applying the DWT to a symmetric extension x̆ of length 2N − 2.


Assume that all filters are symmetric (2N − 2) × (2N − 2) filters. Accorins to
Chapter 4, they preserve vectors symmetric around 0 and N − 1, so that there
exist N × N -matrices (H0 )r , (H1 )r , (G0 )r , and (G1 )r , so that for all x ∈ RN ,
Hi x̆ = (Hi˘)r x and Gi x̆ = (Gi˘)r x. In particular, the first N entries in Hi x̆
are (Hi )r x. The first N entries of H x̆ are thus obtained by assembling the
even-indexed entries from (H0 )r x, and the odd-indexed entries from (H1 )r x.
Since the difference between N − 1 − k and N − 1 + k is even, and since Hi x̆
is a symmetric extension, for one i we have that

(H x̆)N −1−k = (Hi x̆)N −1−k = (Hi x̆)N −1+k = (H x̆)N −1+k .
It follows that H preserves the same type of symmetric extensions, i.e. there
exists an N × N -matrix Hr so that H x̆ = H˘r x. Moreover, the entries in Hr x
are assembled from the entries in (Hi )r x, in the same way as the entries in Hx
are assembled from the entries in Hi x.
Note also that setting every second element to zero in a symmetric extension
only creates a new symmetric extension, so that G also preserves symmetric
extensions. It follows that there exist N × N -matrices Gr , (G0 )r , (G1 )r so that
Gx̆ = G˘r x, and so that the entries 0, ..., N − 1 in the output of Gr are obtained
by combining (G0 )r and (G1 )r as in Theorem 6.5.
Theorem 6.11. Symmetric filters and symmetric extensions.
If the filters H0 , H1 , G0 , and G1 in a wavelet transform are symmetric, then
the DWT/IDWT preserve symmetric extensions (as defined in Definition 6.10).
Also, applying the filters H0 , H1 , G0 , and G1 to x̆ ∈ R2N −2 in the DWT/IDWT
is equivalent to applying (H0 )r , (H1 )r , (G0 )r , and (G1 )r to x ∈ RN as described
in theorems 6.3 and 6.5.
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 226

In our implementations, we factored H and G in terms of elementary lifting


matrices. Note that the filters in these matrices are symmetric so that, after
applying Ax̆ = A˘r x repeatedly,
Y Y ˘
Aλ2i Bλ2i+1 x̆ = (Aλ2i )r (Bλ2i+1 )r x
i i
Q Q
It follows that ( i Aλ2i Bλ2i+1 )r = i (Aλ2i )r (Bλ2i+1 )r . It is straightforward
to find expressions for (Aλ )r and (Bλ )r (Exercise 6.7). DWT and IDWT
implementations can thus apply symmetric extensions by replacing Aλ and Bλ
with (Aλ )r and (Bλ )r .
The purpose of the bd_mode parameter is to control how we should han-
dle the boundary of the signal, in particular whether symmetric extensions
should be performed. This parameter is passed to lifting_even_symm and
lifting_odd_symm, which are the building blocks in our kernel functions. If this
parameter equals ’symm’ (this is the default value), the methods apply symmetric
extensions by using (Aλ )r /(Bλ )r rather than Aλ /Bλ . If the parameter equals
’per’, a periodic extension at the boundaries is made.
Fact 6.12. The bd_mode-parameter in DWT kernel functions.
Assume that the filters H0 , H1 , G0 , and G1 are symmetric. If the bd_mode
parameter is "symm", the symmetric versions (H0 )r , (H1 )r , (G0 )r , and (G1 )r
are applied in the DWT and IDWT, rather than the filters H0 , H1 , G0 , and
G1 themselves. If the ‘parameter is "per", the filters H0 , H1 , G0 , and G1 are
applied.

Exercise 6.5: Implement the dual filter bank transforms


a) Show that ATλ = Bλ and BλT = Aλ , i.e. that the transpose of an elementary
lifting matrix of even/odd type is an elementary lifting matrix of odd/even type.
b) Let H be the kernel of the DWT, and assume that we have a factorization
of it in terms of elementary lifting matrices. Use a) how that the dual DWT is
obtained from the DWT by replacing each Aλ with B−λ , and Bλ with A−λ in
this factorization.
c) Previously we expressed the DWT and the IDWT of the piecewise linear
wavelets in terms of elementary liftings. Use b) to write down the dual DWT and
IDWT of these two wavelets in terms of lifting matrices. Verify your answer by
going through the code in the functions dwt_kernel_pwl0, idwt_kernel_pwl0,
dwt_kernel_pwl2, and idwt_kernel_pwl2 where the dual parameter is set to
true..

Exercise 6.6: Transpose of the DWT and IDWT


Explain why
• The transpose of the DWT can be computed with an IDWT with the
kernel of the dual IDWT
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 227

• The transpose of the dual DWT can be computed with an IDWT with the
kernel of the IDWT
• The transpose of the IDWT can be computed with a DWT with the kernel
of the dual DWT
• The transpose of the dual IDWT can be computed with a DWT with the
kernel of the DWT

Exercise 6.7: Reduced matrices for elementary lifting


Show that the reduced matrices for elementary lifting are

 
1 2λ 0 0 ··· 0 00
0 1 0 0
 ··· 0 00 
0 λ 1 λ ··· 0 00
(Aλ )r =  . (6.9)
 
.. .. .. .. .. .... 
 .. . . . . . . .
 
0 0 0 0 · · · λ 1 λ
0 0 0 0 ··· 0 0 1
 
1 0 0 0 ··· 0 0 0
λ 1 λ 0
 · · · 0 0 0 
0 0 1 0 · · · 0 0 0
(Bλ )r =  . . . . ..  . (6.10)
 
.. .. ..
 .. .. .. .. . . . .
 
0 0 0 0 · · · 0 1 0
0 0 0 0 · · · 0 2λ 1

Also, change the implementations of liftingstevensymm and liftingstoddsymm


so that these expressions are used (rather than Aλ , Bλ ) when the parmeter symm
is set to True.

Exercise 6.8: Prove expression for Sr


Show that, with
 
S1 S2
S= ∈ R2N −2 × R2N −2
S3 S4
a symmetric filter, with S1 ∈ RN × RN , S2 ∈ RN × RN −2 , we have that

(S2 )f

Sr = S1 + 0 0 .
Use the proof of Theorem 4.9 as a guide.
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 228

Exercise 6.9: Orthonormal basis for the symmetric exten-


sions
In this exercise we will establish an orthonormal basis for the symmetric exten-
sions, as defined by Definition 6.10. This parallels Theorem 4.6.
a) Explain why, if x ∈ R2N −2 is a symmetric extension (according to Def-
x)n = zn e−πin , where z is a real vectors which satisfies
inition 4.1), then (b
zn = z2N −2−n
b) Show that
(  N −2 )
1
e0 , √ (ei + e2N −2−i ) , eN −1 (6.11)
2 n=1

b with x ∈ R2N −2 a
is an orthonormal basis for the vectors on the form x
symmetric extension.
c) Show that

 
1 0
√ cos 2π k
2N − 2 2N − 2
  N −2
1 n
√ cos 2π k
N −1 2N − 2 n=1
 
1 N −1
√ cos 2π k (6.12)
2N − 2 2N − 2

is an orthonormal basis for the symmetric extensions in R2N −2 .


d) Assume that S is symmetric. Show that the vectors listed in (6.12) are
eigenvectors for Sr , when the vectors are viewed as vectors in RN , and that they
are linearly independent. This shows that Sr is diagonalizable.

Exercise 6.10: Diagonalizing Sr


Let us explain how the matrix Sr can be diagonalized, similarly to how we
previously diagonalized using the DCT. In Exercise 6.9 we showed that the
vectors

  N −1
n
cos 2π k (6.13)
2N − 2 n=0

in RN is a basis of eigenvectors for Sr when S is symmetric. Sr itself is not


symmetric, however, so that this basis can not possibly be orthogonal (S is
symmetric if and only if it is orthogonally digonalizable). However, when the
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 229

vectors are viewed in R2N −2 we showed in Exercise 6.9c) an orthogonality


statement which can be written as


2N −3     2
 if n1 = n2 ∈ {0, N − 1}
X n1 n2
cos 2π k cos 2π k = (N − 1) × 1 if n1 = n2 6∈ {0, N − 1} .
2N − 2 2N − 2 
k=0 
0 if n1 6= n2
(6.14)
a) Show that


1
 if n1 = n2 ∈ {0, N − 1}
1
(N − 1) × if n1 = n2 6∈ {0, N − 1}
2
0 if n1 6= n2

   
1 n1 1 n2
= √ cos 2π · 0 √ cos 2π ·0
2 2N − 2 2 2N − 2
N −2    
X n1 n2
+ cos 2π k cos 2π k
2N − 2 2N − 2
k=1
   
1 n1 1 n2
+ √ cos 2π (N − 1) √ cos 2π (N − 1) .
2 2N − 2 2 2N − 2

Hint. Use that cos x = cos(2π − x) to pair the summands k and 2N − 2 − k.


(I)
Now, define the vector dn as

    N −2  !
1 n n 1 n
dn,N √ cos 2π · 0 , cos 2π k , √ cos 2π (N − 1) ,
2 2N − 2 2N − 2 k=1 2 2N − 2
(I) (I) √ (I) p
and define d0,N = dN −1,N = 1/ N − 1, and dn,N = 2/(N − 1) when n > 1.
(I)
The orthogonal N × N matrix where the rows are dn is called the DCT-I,
(I)
and we will denote it by DN . DCT-I is also much used, just as the DCT-II of
Chapter 4. The main difference from the previous cosine vectors is that 2N has
been replaced by 2N − 2.
(I)
b) Explain that the vectors dn are orthonormal, and that the matrix

 √  √  
1/ 2 0 ··· 0 0 1/ 2 0 ··· 0 0
r  0 1 ··· 0 0   0
 1 ··· 0 0 
2 
 .. .. .. ..

..

 . .. .. .. ..

 cos 2π 2Nn−2 k  ..
 
N −1 . . . . . . . . .
 
  
 0 0 ··· 1 0√   0 0 ··· 1 0√ 
0 0 ··· 0 1/ 2 0 0 ··· 0 1/ 2
is orthogonal.
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 230

  −1
c) Explain from b) that cos 2π 2Nn−2 k can be written as

   
1/2 0 ··· 0 0 1/2 0 ··· 0 0
 0 1 ··· 0 0   0 1 ··· 0 0 
2 
 .. .. .. .. ..
   
 . .. .. .. ..

 cos 2π 2Nn−2 k  ..
 
N −1 . . . . . . . . .
 
  
 0 0 ··· 1 0   0 0 ··· 1 0 
0 0 ··· 0 1/2 0 0 ··· 0 1/2

With the expression we found in c) Sr can now be diagonalized as


     −1
cos 2π 2Nn−2 k D cos 2π 2Nn−2 k .

Exercise 6.11: Compute filters and frequency responses 1


Write down the corresponding filters G0 og G1 for Exercise 5.40. Plot their
frequency responses, and characterize the filters as low-pass- or high-pass filters.

Exercise 6.12: Symmetry of MRA matrices vs. symmetry


of filters 1
Find two symmetric filters, so that the corresponding MRA-matrix, constructed
with alternating rows from these two filters, is not a symmetric matrix.

Exercise 6.13: Symmetry of MRA matrices vs. symmetry


of filters 2
Assume that an MRA-matrix is symmetric. Are the corresponding filters H0 ,
H1 , G0 , G1 also symmetric? If not, find a counterexample.

Exercise 6.14: Finding H0 , H1 from H


Assume that one stage in a DWT is given by the MRA-matrix

 
1/5 1/5 1/5 0 0 0 ··· 0 1/5 1/5
−1/3 1/3 −1/3 0 0 0 ··· 0 0 0 
 
H =  1/5
 1/5 1/5 1/5 1/5 0 ··· 0 0 0 
 0 0 −1/3 1/3 −1/3 0 ···0 0 0 
.. .. .. .. .. .. .. .. .. ..
 
. . . . . . . . . .

Write down the compact form for the corresponding filters H0 , H1 , and compute
and plot the frequency responses. Are the filters symmetric?
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 231

Exercise 6.15: Finding G0 ,G1 from G


Assume that one stage in the IDWT is given by the MRA-matrix
 
1/2 −1/4 0 0 ···
1/4 3/8
 1/4 1/16 · · · 
 0 −1/4 1/2 −1/4 · · ·
 
 0
 1/16 1/4 3/8 · · · 
 0 0 0 −1/4 · · ·
G= 0
 
 0 0 1/16 · · · 
 0
 0 0 0 · · ·

 .. .. .. .. .. 
 . . . . . 
 
 0 0 0 0 · · ·
1/4 1/16 0 0 ···
Write down the compact form for the filters G0 , G1 , and compute and plot the
frequency responses. Are the filters symmetric?

Exercise 6.16: Finding H from H0 , H1


Assume that H0 = {1/16, 1/4, 3/8, 1/4, 1/16}, and H1 = {−1/4, 1/2, −1/4}.
Plot the frequency responses of H0 and H1 , and verify that H0 is a lowpass
filter, and that H1 is a highpass filter. Also write down the change of coordinate
matrix PC1 ←φ1 for the wavelet corresponding to these filters.

Exercise 6.17: Finding G from G0 , G1


Assume that G0 = 13 {1, 1, 1}, and G1 = 15 {1, −1, 1, −1, 1}. Plot the frequency
responses of G0 and G1 , and verify that G0 is a lowpass filter, and that G1 is a
highpass filter. Also write down the change of coordinate matrix Pφ1 ←C1 for the
wavelet corresponding to these filters.

Exercise 6.18: Computing by hand


In Exercise 5.21 we computed the DWT of two very simple vectors x1 and x2 ,
using the Haar wavelet.
a) Compute H0 x1 , H1 x1 , H0 x2 , and H1 x2 , where H0 and H1 are the filters
used by the Haar wavelet.
b) Compare the odd-indexed elements in H1 x1 with the odd-indexed elements
in H1 x2 . From this comparison, attempt to find an explanation to why the two
vectors have very different detail components.

Exercise 6.19: Comment code


Suppose that we run the following algorithm on the sound represented by the
vector x:
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 232

c = (x[0::2] + x[1::2])/sqrt(2)
w = (x[0::2] - x[1::2])/sqrt(2)
newx = concatenate([c, w])
newx /= abs(newx).max()
play(newx,44100)

a) Comment the code and explain what happens. Which wavelet is used? What
do the vectors c and w represent? Describe the sound you believe you will hear.
b) Assume that we add lines in the code above which sets the elements in the
vector w to 0 before we compute the inverse operation. What will you hear if
you play the new sound you then get?

Exercise 6.20: Computing filters and frequency responses


Let us return to the piecewise linear wavelet from Exercise 5.39.
a) With ψ̂ as defined as in b) in Exercise 5.39, compute the coordinates of ψ̂ in
the basis φ1 (i.e. [ψ̂]φ1 ) with N = 8, i.e. compute the IDWT of

[ψ̂](φ0 ,ψ0 ) = (−α, −β, −δ, 0, 0, 0, 0, −γ) ⊕ (1, 0, 0, 0, 0, 0, 0, 0),


which is the coordinate vector you computed in d) in Exercise 5.39. For this,
you should use the function IDWTImpl, with the kernel of the piecewise linear
wavelet without symmetric extension as input. Explain that this gives you the
filter coefficients of G1 .
b) Plot the frequency response of G1 .

Exercise 6.21: Computing filters and frequency responses


2
Repeat the previous exercise for the Haar wavelet as in Exercise 5.41, and plot
the corresponding frequency responses for k = 2, 4, 6.

Exercise 6.22: Implementing with symmetric extension


In Exercise 3.9 we implemented a symmetric filter applied to a vector, i.e. when a
periodic extension is assumed. The corresponding function was called filterS(t,
x), and used the function numpy.convolve.
a) Reimplement the function filterS so that it also takes a third parameter
symm. If symm is false a periodic extension of x should be performed (i.e. filtering
as we have defined it, and as the previous version of filterS performs it). If
symm is true, symmetric extensions should be used (as given by Definition 6.10).
b) Implement functions
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 233

dwt_kernel_filters(H0, H1, G0, G1, x, bd_mode)


idwt_kernel_filters(H0, H1, G0, G1, x, bd_mode)

which return the DWT and IDWT kernels using theorems 6.3 and 6.5, respectively.
This function thus bases itself on that the filters of the wavelet are known. The
functions should call the function filterS from a). Recall also the definition of
the parameter dual from this section.
With the functions defined in b) you can now define standard DWT and
IDWT kernels in the following way, once the filters are known.

dwt_kernel = lambda x, bd_mode: dwt_kernel_filters(H0, H1, G0, G1, x, bd_mode)


idwt_kernel = lambda x, bd_mode: idwt_kernel_filters(H0, H1, G0, G1, x, bd_mode)

6.2 Properties of the filter bank transforms of a


wavelet
We have now described the DWT/IDWT as linear transformations G, H so
that GH = I, and where two filters G0 , G1 characterize G, two filters H0 , H1
characterize H. G and H are not Toeplitz matrices, however, so they are not
filters. Since filters produce the same output frequency from an input frequency,
we must have that G and H produce other (undesired) frequencies in the output
than those that are present in the input. We will call this phenomenon aliasing.
In order for GH = I, the undesired frequencies must cancel each other, so that
we end up with what we started with. Thus, GH must have what we will refer to
as alias cancellation. This is the same as saying that GH is a filter. In order for
GH = I, alias cancellation is not enough: We also need that the amount at the
given frequency is unchanged, i.e. that GHφn = φn for any Fourier basis vector
φn . We then say that we have perfect reconstruction. Perfect reconstruction
is always the case for wavelets by construction, but in signal processing many
interesting examples (G0 , G1 , H0 , H1 ) exist, for which we do not have perfect
reconstruction. Historically, forward and reverse filter bank transforms have
been around long before they appeared in a wavelet context. Operations where
GHφn = cn φn for all n may also be useful, in particular when cn is close to 1
for all n. If cn is real for all n, we say that we have no phase distortion. If we
have no phase distortion, the output from GH has the same phase, even if we
do not have perfect reconstruction. Such “near-perfect reconstruction systems"
have also been around long before many perfect reconstruction wavelet systems
were designed. In signal processing, these transforms also exist in more general
variants, and we will define these later. Let us summarize as follows.
Definition 6.13. Alias cancellation, phase distortion, and perfect reconstruction.
We say that we have alias cancellation if, for any n,

GHφn = cn φn ,
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 234

for some constant cn (i.e. GH is a filter). If all cn are real, we say that we
no phase distortion. If GH = I (i.e. cn = 1 for all n) we say that we have
perfect reconstruction. If all cn are close to 1, we say that we have near-perfect
reconstruction.
In signal processing, one also says that we have perfect- or near-perfect
reconstruction when GH equals Ed , or is close to Ed (i.e. the overall result is a
delay). The reason why a delay occurs has to do with that the transforms are
used in real-time processing, for which we may not be able to compute the output
at a given time instance before we know some of the following samples. Clearly
the delay is unproblematic, since one can still can reconstruct the input from
the output. We will encounter a useful example of near-perfect reconstruction
soon in the MP3 standard.
Let us now find a criterium for alias cancellation: When do we have that
GHe2πirk/N is a multiplum of e2πirk/N , for any r? We first remark that
(
λH0 ,r e2πirk/N k even
H(e2πirk/N ) =
λH1 ,r e2πirk/N k odd.
The frequency response of H(e2πirk/N ) is

N/2−1 N/2−1
X X
2πir(2k)/N −2πi(2k)n/N
λH0 ,r e e + λH1 ,r e2πir(2k+1)/N e−2πi(2k+1)n/N
k=0 k=0
N/2−1 N/2−1
X X
= λH0 ,r e2πi(r−n)(2k)/N + λH1 ,r e2πi(r−n)(2k+1)/N
k=0 k=0
N/2−1
X
= (λH0 ,r + λH1 ,r e2πi(r−n)/N ) e2πi(r−n)k/(N/2) .
k=0
PN/2−1
Clearly, k=0 e2πi(r−n)k/(N/2) = N/2 if n = r or n = r + N/2, and 0 else.
The frequency response is thus the vector
N N
(λH0 ,r + λH1 ,r )er + (λH0 ,r − λH1 ,r )er+N/2 ,
2 2
so that

1 1
H(e2πirk/N ) = (λH0 ,r + λH1 ,r )e2πirk/N + (λH0 ,r − λH1 ,r )e2πi(r+N/2)k/N .
2 2
(6.15)
Let us now turn to the reverse filter bank transform. We can write

1 2πirk/N
(e2πir·0/N , 0, e2πir·2/N , 0, . . . , e2πir(N −2)/N , 0) = (e + e2πi(r+N/2)k/N )
2
1
(0, e2πir·1/N , 0, e2πir·3/N , . . . , 0, e2πir(N −1)/N ) = (e2πirk/N − e2πi(r+N/2)k/N ).
2
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 235

This means that

     
1 2πirk/N 1 2πirk/N
G(e2πirk/N ) = G0 e + e2πi(r+N/2)k/N + G1 e − e2πi(r+N/2)k/N
2 2
1 1
= (λG0 ,r e2πirk/N + λG0 ,r+N/2 e2πi(r+N/2)k/N ) + (λG1 ,r e2πirk/N − λG1 ,r+N/2 e2πi(r+N/2)k/N )
2 2
1 1
= (λG0 ,r + λG1 ,r )e2πirk/N + (λG0 ,r+N/2 − λG1 ,r+N/2 )e2πi(r+N/2)k/N . (6.16)
2 2
Now, if we combine equations (6.15) and (6.16), we get

GH(e2πirk/N )
1 1
= (λH0 ,r + λH1 ,r )G(e2πirk/N ) + (λH0 ,r − λH1 ,r )G(e2πi(r+N/2)k/N )
2  2 
1 1 2πirk/N 1 2πi(r+N/2)k/N
= (λH0 ,r + λH1 ,r ) (λG0 ,r + λG1 ,r )e + (λG0 ,r+N/2 − λG1 ,r+N/2 )e )
2 2 2
 
1 1 2πi(r+N/2)k/N 1 2πirk/N
+ (λH0 ,r − λH1 ,r ) (λG0 ,r+N/2 + λG1 ,r+N/2 )e + (λG0 ,r − λG1 ,r )e )
2 2 2
1
= ((λH0 ,r + λH1 ,r )(λG0 ,r + λG1 ,r ) + (λH0 ,r − λH1 ,r )(λG0 ,r − λG1 ,r )) e2πirk/N
4
1
(λH0 ,r + λH1 ,r )(λG0 ,r+N/2 − λG1 ,r+N/2 ) + (λH0 ,r − λH1 ,r )(λG0 ,r+N/2 + λG1 ,r+N/2 ) e2πi(r+N/2)k/N

+
4
1 1
= (λH0 ,r λG0 ,r + λH1 ,r λG1 ,r )e2πirk/N + (λH0 ,r λG0 ,r+N/2 − λH1 ,r λG1 ,r+N/2 )e2πi(r+N/2)k/N .
2 2
If we also replace with the continuous frequency response, we obtain the following:
Theorem 6.14. Expression for aliasing.
We have that

1
GH(e2πirk/N ) = (λH0 ,r λG0 ,r + λH1 ,r λG1 ,r )e2πirk/N
2
1
+ (λH0 ,r λG0 ,r+N/2 − λH1 ,r λG1 ,r+N/2 )e2πi(r+N/2)k/N .
2
(6.17)

In particular, we have alias cancellation if and only if

λH0 (ω)λG0 (ω + π) = λH1 (ω)λG1 (ω + π). (6.18)


We will refer to this as the alias cancellation condition. If in addition

λH0 (ω)λG0 (ω) + λH1 (ω)λG1 (ω) = 2, (6.19)


we also have perfect reconstruction. We will refer to as the condition for perfect
reconstruction.
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 236

No phase distortion means that we have alias cancellation, and that

λH0 (ω)λG0 (ω) + λH1 (ω)λG1 (ω) is real.


Now let us turn to how we can construct wavelets/perfect reconstruction systems
from FIR-filters (recall from Chapter 3 that FIR filters where filters with a finite
number of filter coefficients). We will have use for some theorems which allow us
to construct wavelets from prototype filters. In particular we show that, when
G0 and H0 are given lowpass filters which satisfy a certain common property,
we can define unique (up to a constant) highpass filters H1 and G1 so that the
collection of these four filters can be used to implement a wavelet. We first state
the following general theorem.
Theorem 6.15. Criteria for perfect reconstruction.
The following statements are equivalent for FIR filters H0 , H1 , G0 , G1 :

• H0 , H1 , G0 , G1 give perfect reconstruction,


• there exist α ∈ R and d ∈ Z so that

(H1 )n = (−1)n α−1 (G0 )n−2d (6.20)


(G1 )n = (−1)n α(H0 )n+2d (6.21)
2 = λH0 ,n λG0 ,n + λH0 ,n+N/2 λG0 ,n+N/2 (6.22)

Let us translate this to continuous frequency responses. We first have that

X X
λH1 (ω) = (H1 )k e−ikω = (−1)k α−1 (G0 )k−2d e−ikω
k k
X X
−1
=α (−1)k (G0 )k e−i(k+2d)ω = α−1 e−2idω (G0 )k e−ik(ω+π)
k k
= α−1 e−2idω λG0 (ω + π).

We have a similar computation for λG1 (ω). We can thus state the following:
Theorem 6.16. Criteria for perfect reconstruction.
The following statements are equivalent for FIR filters H0 , H1 , G0 , G1 :

• H0 , H1 , G0 , G1 give perfect reconstruction,

• there exist α ∈ R and d ∈ Z so that

λH1 (ω) = α−1 e−2idω λG0 (ω + π) (6.23)


2idω
λG1 (ω) = αe λH0 (ω + π) (6.24)
2 = λH0 (ω)λG0 (ω) + λH0 (ω + π)λG0 (ω + π) (6.25)
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 237

Proof. Let us prove first that equations (6.23)- (6.25) for a FIR filter implies
that we have perfect reconstruction. Equations (6.23)-(6.24) mean that the alias
cancellation condition (6.18) is satisfied, since

λH1 (ω)λG1 (ω + π) =α−1 e−2idω λG0 (ω + π)(α)e2id(ω+π λH0 (ω)


=λH0 (ω)λG0 (ω + π).

Inserting this in the perfect reconstruction condition (6.25), we get

2 = λH0 (ω)λG0 (ω) + λG0 (ω + π)λH0 (ω + π)


= λH0 (ω)λG0 (ω) + α−1 e−2idω λG0 (ω + π)αe2idω λH0 (ω + π)
= λH0 (ω)λG0 (ω) + λH1 (ω)λG1 (ω),

which is Equation (6.19), so that equations (6.23)- (6.25) imply perfect recon-
struction. We therefore only need to prove that any set of FIR filters which give
perfect reconstruction, also satisfy these equations. Due to the calculation above,
it is enough to prove that equations (6.23)-(6.24) are satisfied. The proof of this
will wait till Section 8.1, since it uses some techniques we have not introduced
yet.

When constructing a wavelet it may be that we know one of the two pairs
(G0 , G1 ), (H0 , H1 ), and that we would like to construct the other two. This can
be achieved if we can find the constants d and α from above. If the filters are
symmetric we just saw that d = 0. If G0 , G1 are known, it follows from from
equations (6.20) and(6.21) that

X X X
1= (G1 )n (H1 )n = (G1 )n α−1 (−1)n (G0 )n = α−1 (−1)n (G0 )n (G1 )n ,
n n n

so that α = n (−1)n (G0 )n (G1 )n . On the other hand, if H0 , H1 are known


P
instead, we must have that

X X X
1= (G1 )n (H1 )n = α(−1)n (H0 )n (H1 )n = α (−1)n (H0 )n (H1 )n ,
n n n

so that α = 1/( n (−1)n (H0 )n (H1 )n ). Let us use these observations to state
P
the filters for the alternative wavelet of piecewise linear functions, which is the
only wavelet we have gone through we have not computed the filters and the
frequency response for.
Let us use Theorem 6.16 to compute the filters H0 and H1 for the alternative
piecewise linear wavelet. These filters are also symmetric, since G0 , G1 were. We
get that
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 238

    
X 1 1 1 3 1 1 1
α= (−1)n (G0 )n (G1 )n = − − +1· − − = .
n
2 2 4 4 2 4 2

We now get

(H0 )n = α−1 (−1)n (G1 )n = 2(−1)n (G1 )n


(H1 )n = α−1 (−1)n (G0 )n = 2(−1)n (G0 )n , (6.26)

so that


H0 = 2{−1/8, 1/4, 3/4, 1/4, −1/8}

H1 = 2{−1/2, 1, −1/2}. (6.27)

Note that, even though conditions (6.23) and (6.24) together ensure that the
alias cancellation condition is satisfied, alias cancellation can occur also if these
conditions are not satisfied. Conditions (6.23) and (6.24) thus give a stronger
requirement than alias cancellation. We will be particularly concerned with
wavelets where the filters are symmetric, for which we can state the following
corollary.
Corollary 6.17. Criteria for perfect reconstruction .
The following statements are equivalent:

• H0 , H1 , G0 , G1 are the filters of a symmetric wavelet,


• λH0 (ω), λH1 (ω), λG0 (ω), λG1 (ω) are real functions, and

λH1 (ω) = α−1 λG0 (ω + π) (6.28)


λG1 (ω) = αλH0 (ω + π) (6.29)
2 = λH0 (ω)λG0 (ω) + λH0 (ω + π)λG0 (ω + π). (6.30)

Thw delay d is thus 0 for symmetric wavelets.


Proof. Since H0 is symmetric, (H0 )n = (H0 )−n , and from equations (6.20) and
(6.21) it follows that

(G1 )n−2d = (−1)n−2d α(H0 )n = (−1)n α−1 (H0 )−n


= (−1)(−n−2d) α−1 (H0 )(−n−2d)+2d = (G1 )−n−2d

This shows that G1 is symmetric about both −2d, in addition to being symmetric
about 0 (by assumption). We must thus have that d = 0, so that (H1 )n =
(−1)n α(G0 )n and (G1 )n = (−1)n α−1 (H0 )n . We now get that
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 239

X X
λH1 (ω) = (H1 )k e−ikω = α−1 (−1)k (G0 )k e−ikω
k k
X X
−1 −ikπ −ikω
=α e (G0 )k e = α−1 (G0 )k e−ik(ω+π)
k k
= α−1 λG0 (ω + π),

which proves Equation (6.28). Equation (6.28) follows similarly.


In the literature, two particular cases of filter banks have been important.
They are both refered to as Quadrature Mirror Filter banks, or QMF filter
banks, and some confusion exist between the two. Let us therefore make precise
definitions of the two.
Definition 6.18. Classical QMF filter banks.
In the classical definition of a QMF filter banks it is required that G0 = H0
and G1 = H1 (i.e. the filters in the forward and reverse transforms are equal),
and that

λH1 (ω) = λH0 (ω + π). (6.31)


It is straightforward to check that, for a classical QMF filter bank, the
forward and reverse transforms are equal (i.e. G = H). It is easily checked that
conditions (6.23) and (6.24) are satisfied with α = 1, d = 0 for a classical QMF
filter bank. In particular, the alias cancellation condition is satisfied. The perfect
recontruction condition can be written as

2 = λH0 (ω)λG0 (ω) + λH1 (ω)λG1 (ω) = λH0 (ω)2 + λH0 (ω + π)2 . (6.32)

Unfortunately, it is impossible to find non-trivial FIR-filters which satisfy this


quadrature formula (Exercise 6.23). Therefore, classical QMF filter banks which
give perfect reconstruction do not exist. Nevertheless, one can construct such
filter banks which give close to perfect reconstruction [23], and this together
with the fulfillment of the alias cancellation condition still make them useful. In
fact, we will see in Section 8.3 that the MP3 standard take use of such filters,
and this explains our previous observation that the MP3 standard does not give
perfect reconstruction. Note, however, that if the filters in a classical QMF filter
bank are symmetric (so that λH0 (ω) is real), we have no phase distortion.
The second type of QMF filter bank is defined as follows.
Definition 6.19. Alternative QMF filter banks.
In the alternative definition of a QMF filter bank it is required that G0 =
(H0 )T and G1 = (H1 )T (i.e. the filter coefficients in the forward and reverse
transforms are reverse of oneanother), and that

λH1 (ω) = λH0 (ω + π). (6.33)


CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 240

The perfect reconstruction condition for an alternative QMF filter bank can
be written as

2 = λH0 (ω)λG0 (ω) + λH1 (ω)λG1 (ω) = λH0 (ω)λH0 (ω) + λH0 (ω + π)λH0 (ω + π)
= |λH0 (ω)|2 + |λH0 (ω + π)|2 .

We see that the perfect reconstruction property of the two definitions of QMF
filter banks only differ in that the latter take absolute values. It turns out that
the latter also has many interesting solutions, as we will see in Chapter 7. If we
in in condition (6.23) substitute G0 = (H0 )T we get

λH1 (ω) = α−1 e−2idω λG0 (ω + π) = α−1 e−2idω λH0 (ω + π).

If we set α = 1, d = 0, we get equality here. A similar computation follows for


Condition (6.24). In other words, also alternative QMF filter banks satisfy the
alias cancellation condition. In the literature, a wavelet is called orthonormal if
G0 = (H0 )T , G1 = (H1 )T . From our little computation it follows that alternative
QMF filter banks with perfect reconstruction are examples of orthonormal
wavelets, and correpond to orthonormal wavelets which satisfy α = 1, d = 0.
For the Haar wavelet it is easily checked that G0 = (H0 )T , G1 = (H1 )T , but
it does not satisfy the relation λH1 (ω) = λH0 (ω + π). Instead it satsifies the
relation λH1 (ω) = −λH0 (ω + π). In other words, the Haar wavelet is not an
alternative QMF filter bankthe way we have defined them. The difference lies
only in a sign, however. This is the reason why the Haar wavelet is still listed as
an alternative QMF filter bank in the literature. The additional sign leads to
orthonormnal wavelets which satisfy α = −1, d = 0 instead.
The following is clear for orthonormal wavelets.

Theorem 6.20. Orthogonality og the DWT matrix.


A DWT matrix is orthogonal (i.e. the IDWT equals the transpose of the
DWT) if and only if the filters satisfy G0 = (H0 )T , G1 = (H1 )T , i.e. if and only
if the MRA equals the dual MRA.
This can be proved simply by observing that, if we transpose the DWT-matrix,
Theorem 6.23 says that we get an IDWT matrix with filters (H0 )T , (H1 )T , and
this is equal to the IDWT if and only if G0 = (H0 )T , G1 = (H1 )T . It follows that
QMF filter banks with perfect reconstruction give rise to orthonormal wavelets.

Exercise 6.23: Finding FIR filters


Show that it is impossible to find a non-trivial FIR-filter which satisfies Equation
(6.32).
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 241

Exercise 6.24: The Haar wavelet as an alternative QMF


filter bank
Show that the Haar wavelet satisfies λH1 (ω) = −λH0 (ω + π), and G0 = (H0 )T ,
G1 = (H1 )T . The Haar wavelet can thus be considered as an alternative QMF
filter bank.

6.3 A generalization of the filter representation,


and its use in audio coding
It turns out that the filter representation, which we now have used for an
alternative representation of a wavelet transformation, can be generalized in
such a way that it also is useful for audio coding. In this section we will first
define this generalization. We will then state how the MP3 standard encodes
and decodes audio, and see how our generalization is connected to this. Much
literature fails to elaborate on this connection. We will call our generalizations
filter bank transforms, or simply filter banks. Just as for wavelets, filters are
applied differently for the forward and reverse transforms. The code for this
section can be found in a module called mp3funcs.
We start by defining the forward filter bank transform and its filters.
Definition 6.21. Forward filter bank transform.
Let H0 , H1 , . . . , HM −1 be N × N -filters. A forward filter bank transform H
produces output z ∈ RN from the input x ∈ RN in the following way:

• ziM = (H0 x)iM for any i so that 0 ≤ iM < N .


• ziM +1 = (H1 x)iM +1 for any i so that 0 ≤ iM + 1 < N .

• ...
• ziM +(M −1) = (HM −1 x)iM +(M −1) for any i so that 0 ≤ iM + (M − 1) < N .

In other words, the output of a forward filter bank transform is computed


by applying filters H0 , H1 , . . . , HM −1 to the input, and by downsampling and
assembling these so that we obtain the same number of output samples as
input samples (also in this more general setting this is called critical sampling).
H0 , H1 , . . . , HM −1 are also called analysis filter components, the output of filter
Hi is called channel i channel, and M is called the number of channels. The
output samples ziM +k are also called the subband samples of channel k.
Clearly this definition generalizes the DWT and its analysis filters, since
these can be obtained by setting M = 2. The DWT is thus a2-channel  forward
cm−1
filter bank transform. While the DWT produces the output from the
wm−1
input cm , an M -channel forward filter bank transform splits the output into
M components, instead of 2. Clearly, in the matrix of a forward filter bank
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 242

transform the rows repeat cyclically with period M , similarly to MRA-matrices.


In practice, the filters in a forward filter bank transform are chosen so that
they concentrate on specific frequency ranges. This parallels what we saw for
the filters of a wavelet, where one concentrated on high frequencies, one on low
frequencies. Using a filter bank to split a signal into frequency components is
also called subband coding. But the filters in a filter bank are usually not ideal
bandpass filters. There exist a variety of different filter banks, for many different
purposes [47, 39]. In Chapter 7 we will say more on how one can construct filter
banks which can be used for subband coding.
Let us now turn to reverse filter bank transforms.
Definition 6.22. Reverse filter bank transforms.
Let G0 , G1 , . . . , GM −1 be N × N -filters. An reverse filter bank transform G
produces x ∈ RN from z ∈ RN in the following way:
Define zk RN as the vector where (zk )iM +k = ziM +k for all i so that 0 ≤
iM + k < N , and (zk )s = 0 for all other s.

x = G0 z0 + G1 z1 + . . . + GM −1 zM −1 . (6.34)
G0 , G1 , . . . , GM −1 are also called synthesis filter components.
Again, this generalizes the IDWT and its synthesis filters, and the IDWT
can be seen as a 2-channel reverse filter bank transform. Also, in the matrix of
a reverse filter bank transform, the columns repeat cyclically with period M ,
similarly to MRA-matrices. Also in this more general setting the filters Gi are
in general different from the filters Hi . But we will see that, just as we saw
for the Haar wavelet, there are important special cases where the analysis and
synthesis filters are equal, and where their frequency responses are simply shifts
of oneanother. It is clear that definitions 6.21 and 6.22 give the diagram for
computing forward and reverse filter bank transforms shown in Figure 6.7:
Here ↓M and ↑M means that we extract every M ’th element in the vector,
and add M − 1 zeros between the elements, respectively, similarly to how we
previously defined ↓2 and ↑2 . Comparing Figure 6.3 with Figure 6.7 makes the
similarities between wavelet transformations and the transformation used in
the MP3 standard very visible: Although the filters used are different, they are
subject to the same kind of processing, and can therefore be subject to the same
implementations.
In general it may be that the synthesis filters do not invert exactly the
analysis filters. If the synthesis system exactly inverts the analysis system, we
say that we have a perfect reconstruction filter bank. Since the analysis system
introduces undesired frequencies in the different channels, these have to cancel
in the inverse transform, in order to reconstruct the input exactly.
We will have use for the following simple connection between forward and
reverse filter bank transforms, which follows imemdiately from the definitions.
Theorem 6.23. Connection between forward and reverse filter bank transforms.
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 243

↓M ↑M
HE 0 x / ziM / z0

↓M ↑M
H / ziM +1 / z1
= 1x
G0

G1
.. .. .. 
x . . . ⊕ /x
>F
GM −2

! ↓M ↑M
HM −2 x / ziM +(M −2) / z M −2 GM −1

 ↓M ↑M
HM −1 x / ziM +(M −1) / z M −1

Figure 6.7: Illustration of forward and reverse filter bank transforms.

Assume that H is a forward filter bank transform with filters H0 , . . . , HN −1 .


Then H T is a reverse filter bank transform with filters G0 = (H0 )T , . . . , GN −1 =
(HN −1 )T .

6.3.1 Forward filter bank transform in the MP3 standard


Now, let us turn to the MP3 standard. The MP3 standard document states
that it applies a filter bank, and explains the following procedure for applying
this filter bank, see p. 67 of the standard document (the procedure is slightly
modified with mathematical terminology adapted to this book):

• Input 32 audio samples at a time.


• Build an input sample vector X ∈ R512 , where the 32 new samples are
placed first, all other samples are delayed with 32 elements. In particular
the 32 last samples are taken out.
• Multiply X componentwise with a vector C (this vector is defined through
a table in the standard), to obtain a vector Z ∈ R512 . The standard calls
this windowing.
P7
• Compute the vector Y ∈ R64 where Yi = j=0 Zi+64j . The standard calls
this a partical calculation.
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 244

• Calculate S = M Y ∈ R32 , where M is the 32 × 64- matrix where Mik =


cos((2i + 1)(k − 16)π/64). S is called the vector of output samples, or
output subband samples. The standard calls this matrixing.

The standard does not motivate these steps, and does not put them into the
filter bank transform framework which we have established. Also, the standard
does not explain how the values in the vector C have been constructed.
Let us start by proving that the steps above really corresponds to applying a
forward filter bank transform, and let us state the corresponding filters of this
transform. The procedure computes 32 outputs in each iteration, and each of
them is associated with a subband. Therefore, from the standard we would guess
that we have M = 32 channels, and we would like to find the corresponding 32
filters H0 , H1 , . . . , H31 .
It may seem strange to use the name matrixing here, for something which
obviously is matrix multiplication. The reason for this name must be that the
at the origin of the procedure come from outside a linear algebra framework.
The name windowing is a bit strange, too. This really does not correspond to
applying a window to the sound samples as we explained in Section 3.3.1. We
will see that it rather corresponds to applying a filter coefficient to a sound
sample. A third and final thing which seems a bit strange is that the order of the
input samples is reversed, since we are used to having the first sound samples
in time with lowest index. This is perhaps more usual to do in an engineering
context, and not so usual in a mathematical context. FIFO.
Clearly, the procedure above defines a linear transformation, and we need to
show that this linear transformation coincides with the procedure we defined for a
forward filter bank transform, for a set of 32 filters. The input to the transforma-
tion are the audio samples, which we will denote by a vector x. At iteration s of
the procedure above the input audio samples are x32s−512 , x32s−510 , . . . , x32s−1 ,
and Xi = x32s−i−1 due to the reversal of the input samples. The output to the
transformation at iteration s of the procedure are the S0 , . . . , S31 . We assem-
ble these into a vector z, so that the output at iteration s are z32(s−1) = S0 ,
z32(s−1)+1 = S1 ,. . . ,z32(s−1)+31 = S31 .
We will have use for the following cosine-properties, which are easily verified:

cos (2π(n + 1/2)(k + 2N r)/(2N )) = (−1)r cos (2π(n + 1/2)k/(2N )) (6.35)


cos (2π(n + 1/2)(2N − k)/(2N )) = − cos (2π(n + 1/2)k/(2N )) . (6.36)

With the terminology above and using Property (6.35) the transformation can
be written as
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 245

63
X 63
X 7
X
z32(s−1)+n = cos((2n + 1)(k − 16)π/64)Yk = cos((2n + 1)(k − 16)π/64) Zk+64j
k=0 k=0 j=0
63 X
X 7
= (−1)j cos((2n + 1)(k + 64j − 16)π/64)Zk+64j
k=0 j=0
63 X
X 7
= cos((2n + 1)(k + 64j − 16)π/64)(−1)j Ck+64j Xk+64j
k=0 j=0
63 X
X 7
= cos((2n + 1)(k + 64j − 16)π/64)(−1)j Ck+64j x32s−(k+64j)−1 .
k=0 j=0

j
Now, if we define {hr }511
r=0 by hk+64j = (−1) Ck+64j , 0 ≤ j < 8, 0 ≤ k < 64, and
h(n) as the filter with coefficients {cos((2n + 1)(k − 16)π/64)hk }511
k=0 , the above
can be simplified as

511
X 511
X
z32(s−1)+n = cos((2n + 1)(k − 16)π/64)hk x32s−k−1 = (h(n) )k x32s−k−1
k=0 k=0
(n) (n)
= (h x)32s−1 = (En−31 h x)32(s−1)+n .
This means that the output of the procedure stated in the MP3 standard can
be computed as a forward filter bank transform, and that we can choose the
analysis filters as Hn = En−31 h(n) .
Theorem 6.24. Forward filter bank transform for the MP3 standard.
j
Define {hr }511
r=0 by hk+64j = (−1) Ck+64j , 0 ≤ j < 8, 0 ≤ k < 64, and h
(n)
511
as the filter with coefficients {cos((2n + 1)(k − 16)π/64)hk }k=0 . If we define
Hn = En−31 h(n) , the procedure stated in the MP3 standard corresponds to
applying the corresponding forward filter bank transform.
The filters Hn were shown in Example 3.34 as examples of filters which
concentrate on specific frequency ranges. The hk are the filter coefficients of
what is called a prototype filter. This kind of filter bank is also called a cosine-
modulated filter. The multiplication with cos (2π(n + 1/2)(k − 16)/(2N )) hk ,
modulated the filter coefficients so that the new filter has a frequency response
which is simply shifted in frequency in a symmetric manner: In Exercise 3.44,
we saw that, by multiplying with a cosine, we could contruct new filters with
real filter coefficients, which also corresponded to shifting a prototype filter in
frequency. Of course, multiplication with a complex exponential would also shift
the frequency response (such filter banks are called DFT-modulated filter banks),
but the problem with this is that the new filter has complex coefficients: It will
turn out that cosine-modulated filter banks can also be constructed so that they
are invertible, and that one can find such filter banks where the inverse is easily
found.
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 246

The effect of the delay in the definition of Hn is that, for each n, the
multiplications with the vector x are “aligned”, so that we can save a lot of
multiplications by performing this multiplication first, and summing these. We
actually save even more multiplications in the sum where j goes from 0 to 7, since
we here multiply with the same cosines. The steps defined in the MP3 standard
are clearly motivated by the desire to reduce the number of multiplications due
to these facts. A simple arithmetic count illutrates these savings: For every 32
output samples, we have the following number of multiplications:

• The first step computes 512 multiplications.


• The second step computes 64 sums of 8 elements each, i.e. a total of
7 × 64 = 448 additions (note that q = 512/64 = 8).

The standard says nothing about how the matrix multiplication in the third
step can be implemented. A direct multiplication would yield 32 × 64 = 2048
multiplications, leaving a total number of multiplications at 2560. In a direct
implementation of the forward filter bank transform, the computation of 32
samples would need 32 × 512 = 16384 multiplications, so that the procedure
sketched in the standard gives a big reduction.
The standard does not mention all possibilities for saving multiplications,
however: We can reduce the number of multiplications even further, since clearly
a DCT-type implementation can be used for the matrixing operation. We already
have an efficient implementation for multiplication with a 32 × 32 type-III cosine
matrix (this is simply the IDCT). We have seen that this implementation can
be chosen to reduce the number of multiplications to N log2 N/2 = 80, so that
the total number of multiplications is 512 + 80 = 592. Clearly then, when we
use the DCT, the first step is the computationally most intensive part.

6.3.2 Reverse filter bank transform in the MP3 standard


Let us now turn to how decoding is specified in the MP3 standard, and see that
we can associate this with a reverse filter bank transform. The MP3 standard
also states the following procedure for decoding:

• Input 32 new subband samples as the vector S.


• Change vector V ∈ R512 , so that all elements are delayed with 64 elements.
In particular the 64 last elements are taken out.
• Set the first 64 elements of V as N S ∈ R64 , where N is the 64 × 32-
matrix where Nik = cos((16 + i)(2k + 1)π/64). The standard also calls
this matrixing.
• Build the vector U ∈ R512 from V from the formulas U64i+j = V128i+j ,
U64i+32+j = V128i+96+j for 0 ≤ i ≤ 7 and 0 ≤ j ≤ 31, i.e. U is the vector
where V is first split into segments of length 132, and U is constructed by
assembling the first and last 32 elements of each of these segments.
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 247

• Multiply U componentwise with a vector D (this vector is defined in the


standard), to obtain a vector W ∈ R512 . The standard also calls this
windowing.
P15
• Compute the 32 next sound samples as i=0 W32i+j .

To interpret this also in terms of filters, rewrite first steps 4 to 6 as

15
X 15
X
x32(s−1)+j = W32i+j = D32i+j U32i+j
i=0 i=0
7
X 7
X
= D64i+j U64i+j + D64i+32+j U64i+32+j
i=0 i=0
7
X 7
X
= D64i+j V128i+j + D64i+32+j V128i+96+j . (6.37)
i=0 i=0

The elements in V are obtained by “matrixing” different segments of the vector


z. More precisely, at iteration s we have that

   
V64r z32(s−r−1)
 V64r+1   z32(s−r−1)+1 
 ..  = N  ,
   
..
 .   . 
V64r+63 z32(s−r−1)+31

so that

31
X
V64r+j = cos((16 + j)(2k + 1)π/64)z32(s−r−1)+k
k=0

for 0 ≤ j ≤ 63. Since also

V128i+j = V64(2i)+j V128i+96+j = V64(2i+1)+j+32 ,

we can rewrite Equation (6.37) as

7
X 31
X
D64i+j cos((16 + j)(2k + 1)π/64)z32(s−2i−1)+k
i=0 k=0
7
X 31
X
+ D64i+32+j cos((16 + j + 32)(2k + 1)π/64)z32(s−2i−2))+k .
i=0 k=0

Again using Relation (6.35), this can be written as


CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 248

31 X
X 7
(−1)i D64i+j cos((16 + 64i + j)(2k + 1)π/64)z32(s−2i−1)+k
k=0 i=0
31 X
X 7
+ (−1)i D64i+32+j cos((16 + 64i + j + 32)(2k + 1)π/64)z32(s−2i−2)+k .
k=0 i=0

i
Now, if we define {gr }511
r=0 by g64i+s = (−1) C64i+s , 0 ≤ i < 8, 0 ≤ s < 64, and
g as the filter with coefficients {cos((r + 16)(2k + 1)π/64)gr }511
(k)
r=0 , the above
can be simplified as

X 7
31 X 31 X
X 7
(g (k) )64i+j z32(s−2i−1)+k + (g (k) )64i+j+32 z32(s−2i−2)+k
k=0 i=0 k=0 i=0
31 7 7
!
X X X
(k)
= (g )32(2i)+j z32(s−2i−1)+k + (g (k) )32(2i+1)+j z32(s−2i−2)+k
k=0 i=0 i=0
31 X
X 15
= (g (k) )32r+j z32(s−r−1)+k ,
k=0 r=0

where we observed that 2i and 2i + 1 together run through the values from 0 to
15 when i runs from 0 to 7. Since z has the same values as zk on the indices
32(s − r − 1) + k, this can be written as

31 X
X 15
= (g (k) )32r+j (zk )32(s−r−1)+k
k=0 r=0
31
X 31
X
(k)
= (g zk )32(s−1)+j+k = ((E−k g (k) )zk )32(s−1)+j .
k=0 k=0
P31 (k)
By substituting a general s and j we see that x = k=0 (E−k g )zk . We have
thus proved the following.
Theorem 6.25. Reverse filter bank transform for the MP3 standard.
i
Define {gr }511
r=0 by g64i+s = (−1) C64i+s , 0 ≤ i < 8, 0 ≤ s < 64, and g
(k)
511
as the filter with coefficients {cos((r + 16)(2k + 1)π/64)gr }r=0 . If we define
Gk = E−k g (k) , the procedure stated in the MP3 standard corresponds to applying
the corresponding reverse filter bank transform.
In other words, both procedures for encoding and decoding stated in the
MP3 standard both correspond to filter banks: A forward filter bank transform
for the encoding, and a reverse filter bank transform for the decoding. Moreover,
both filter banks can be constructed by cosine-modulating prototype filters, and
the coefficients of these prototype filters are stated in the MP3 standard (up to
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 249

multiplication with an alternating sign). Note, however, that the two prototype
filters may be different. When we compare the two tables for these coefficients in
the standard they do indeed seem to be different. At closer inspection, however,
one sees a connection: If you multiply the values in the D-table with 32, and
reverse them, you get the values in the C-table. This indicates that the analysis
and synthesis prototype filters are the same, up to multiplication with a scalar.
This connection will be explained in Section 8.3.
While the steps defined in the MP3 standard for decoding seem a bit more
complex than the steps for encoding, they are clearly also motivated by the
desire to reduce the number of multiplications. In both cases (encoding and
decoding), the window tables (C and D) are in direct connection with the filter
coefficients of the prototype filter: one simply adds a sign which alternates for
every 64 elements. The standard document does not mention this connection,
and it is perhaps not so simple to find this connection in the literature (but see
[35]).
The forward and reverse filter bank transforms are clearly very related. The
following result clarifies this.
Theorem 6.26. Connection between the forward and reverse filter bank trans-
forms in the MP3 standard.
Assume that a forward filter bank transform has filters on the form Hi =
Ei−31 h(i) for a prototype filter h. Then G = E481 H T is a reverse filter bank
transform with filters on the form Gk = E−k g (k) , where g is a prototype filter
where the elements equal the reverse of those in h. Vice versa, H = E481 GT .
Proof. From Theorem 6.23 we know that H T is a reverse filter bank transform
with filters

(Hi )T = (Ei−31 h(i) )T = E31−i (h(i) )T .


(h(i) )T has filter coefficients cos((2i + 1)(−k − 16)π/64))h−k . If we delay all
(Hi )T with 481 = 512 − 31 elements as in the theorem, we get a total delay of
512 − 31 + 31 − i = 512 − i elements, so that we get the filter

E512−i {cos((2i + 1)(−k − 16)π/64))h−k }k


= E−i {cos((2i + 1)(−(k − 512) − 16)π/64))h−(k−512) }k
= E−i {cos((2i + 1)(k + 16)π/64))h−(k−512) }k .

Now, we define the prototype filter g with elements gk = h−(k−512) . This has,
just as h, its support on [1, 511], and consists of the elements from h in reverse
order. If we define g (i) as the filter with coefficients cos((2i + 1)(k + 16)π/64))gk ,
we see that E481 H T is a reverse filter bank transform with filters E−i g (i) . Since
g (k) now has been defined as for the MP3 standard, and its elements are the
reverse of those in h, the result follows.
We will have use for this result in Section 8.3, when we find conditions on
the protototype filter in order for the reverse transform to invert the forward
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 250

transform. Preferably, the reverse filter bank transform inverts exactly the
forward filter bank transform. In Exercise 6.26 we construct examples which
show that this is not the case. In the same exercise we also find many examples
where the reverse transform does what we would expect. These examples will
also be explained in Section 8.3, where we also will see how one can get around
this so that we obtain a system with perfect reconstruction. It may seem strange
that the MP3 standard does not do this.
In the MP3 standard, the output from the forward filter bank transform is
processed further, before the result is compressed using a lossless compression
method.

Exercise 6.25: Plotting frequency responses


The values Cq , Dq can be found by calling the functions mp3ctable, mp3dtable
which can be found on the book’s webpage.
a) Use your computer to verify the connection we stated between the tables C
and D, i.e. that Di = 32Ci for all i.
b) Plot the frequency responses of the corresponding prototype filters, and verify
that they both are lowpass filters. Use the connection from Theorem (6.24) to
find the prototype filter coefficients from the Cq .

Exercise 6.26: Implementing forward and reverse filter bank


transforms
It is not too difficult to make implementations of the forward and reverse steps
as explained in the MP3 standard. In this exercise we will experiment with this.
In your code you can for simplicity assume that the input and output vectors to
your methods all have lengths which are multiples of 32. Also, use the functions
mp3ctable, mp3dtable mentioned in the previous exercise.
a) Write a function mp3forwardfbt which implements the steps in the forward
direction of the MP3 standard.
b) Write also a function mp3reversefbt which implements the steps in the
reverse direction.

6.4 Summary
We started this chapter by noting that, by reordering the target base of the
DWT, the change of coordinate matrix took a particular form. From this form
we understood that the DWT could be realized in terms of two filters H0 and
H1 , and that the IDWT could be realized in a similar way in terms of two filters
G0 and G1 . This gave rise to what we called the filter representation of wavelets.
The filter representation gives an entirely different view on wavelets: instead of
constructing function spaces with certain properties and deducing corresponding
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 251

filters from these, we can instead construct filters with certain properties (such
as alias cancellation and perfect reconstruction), and attempt to construct
corresponding mother wavelets, scaling functions, and function spaces. This
strategy, which replaces problems from function theory with discrete problems,
will be the subject of the next chapter. In practice this is what is done.
We stated what is required for filter bank matrices to invert each other: The
frequency responses of the lowpass filters needed to satisfy a certain equation,
and once this is satsified the highpass filters can easily be obtained in the same
way we previously obtained highpass filters from lowpass filters. We will return
to this equation in the next chapter.
A useful consequence of the filter representation was that we could reuse
existing implementations of filters to implement the DWT and the IDWT, and
reuse existing theory, such as symmetric extensions. For wavelets, symmetric
extensions are applied in a slightly different way, when compared to the develop-
ments which lead to the DCT. We looked at the frequency responses of the filters
for the wavelets we have encountered upto now. From these we saw that G0 , H0
were lowpass filters, and that G1 , H1 were highpass filters, and we argued why
this is typically the case for other wavelets as well. The filter reprersentation
was also easily generalized from 2 to M > 2 filters, and such transformations
had a similar interpretation in terms of splitting the input into a uniform set of
frequencies. Such transforms were generally called filter bank transforms, and
we saw that the processing performed by the MP3 standard could be interpreted
as a certain filter bank transform, called a cosine-modulated filter bank. This
is just one of many possible filter banks. In fact, the filter bank of the MP3
standard is largely outdated, since it is too simple, and as we will see it does not
even give perfect reconstruction (only alias cancellation and no phase distortion).
It is merely chosen here since it is the simplest to present theoretically, and
since it is perhaps the best known standard for compression of sound. Other
filters banks with better properties have been constructed, and they are used in
more recent standards. In many of these filter banks, the filters do not partition
frequencies uniformly, and have been adapted to the way the human auditory
system handles the different frequencies. Different contruction methods are used
to construct such filter banks. The motivation behind filter bank transforms is
that their output is more suitable for further processing, such as compression, or
playback in an audio system, and that they have efficient implementations.
We mentioned that the MP3 standard does not say how the prototype filters
were chosen. We will have more to say on what dictates their choice in Section 8.3.
There are several differences between the use of wavelet transformations
in wavelet theory, and the use of filter bank transforms in signal processing
theory One is that wavelet transforms are typically applied in stages, while filter
bank transforms often are not. Nevertheless, such use of filter banks also has
theoretical importance, and this gives rise to what is called tree-structured filter
banks [47]. Another difference lies in the use of the term perfect reconstruction
system. In wavelet theory this is a direct consequence of the wavelet construction,
since the DWT and the IDWT correspond to change of coordinates to and from
the same bases. The alternative QMF filter bank was used as an example
CHAPTER 6. THE FILTER REPRESENTATION OF WAVELETS 252

of a filter bank which stems from signal processing, and which also shows
up in wavelet transformation. In signal processing theory, one has a wider
perspective, since one can design many useful systems with fast implementations
when one replaces the perfect reconstruction requirement with a near perfect
reconstruction requirement. One instead requires that the reverse transform
gives alias cancellation. The classical QMF filter banks were an example of this.
The original definition of classical QMF filter banks are from [9], and differ only
in a sign from how they are defined here.
All filters we encounter in wavelets and filter banks in this book are FIR.
This is just done to limit the exposition. Much useful theory has been developed
using IIR-filters.

What you should have learned in this chapter.


• How one can find the filters of a wavelet transformation by considering its
matrix and its inverse.

• Forward and reverse filter bank transforms.


• How one can implement the DWT and the IDWT with the help of the
filters.
• Plot of the frequency responses for the filters of the wavelets we have
considered, and their interpretation as low-pass and high-pass filters.
Chapter 7

Constructing interesting
wavelets

In the previous chapter, from an MRA with corresponding scaling function


and mother wavelet, we defined what we called a forward filter bank transform.
We also defined a reverse filter bank transform, but we did not state an MRA
connected to this, or prove if any such association could be made. In this
chapter we will address this. We will also see, if we start with a forward and
reverse filter bank transform, how we can construct corresponding MRA’s, and
for which transforms we can make this construction. We will see that there
is a great deal of flexibility in the filter bank transforms we can construct (as
this is a discrete problem). Actually it is so flexible that we can construct
scaling functions/mother wavelets with any degree of regularity, and well suited
for approximation of functions. This will also explain our previous interest in
vanishing moments, and explain how we can find the simplest filters which give
rise to a given number of vanishing moments, or a given degree of differentiability..
Answers to these questions certainly transfer much more theory between wavelets
and filters. Several of these filters enjoy a widespread use in applications. We
will look at two of these. These are used for lossless and lossy compression in
JPEG2000, which is a much used standard. These wavelets all have symmetric
filters. We end the chapter by looking at a family of orthonormal wavelets with
different number of vanishing moments.

7.1 From filters to scaling functions and mother


wavelets
From Theorem 6.9 it follows that the support sizes of these dual functions are
are 4 and 3, respectively, so that their supports should be [−2, 2] and [−1, 2],
respectively. This is the reason why we have plotted the functions over [−2, 2].
The plots seem to confirm the support sizes we have computed.

253
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 254

In our first examples of wavelets in Chapter 5, we started with some bases


og functions φm , and deduced filters G0 and G1 from these. If we instead start
with the filters G0 and G1 , what properties must they fulfill in order for us to
make an association the opposite way? We should thus demand that there exist
functions φ, ψ so that

2N
X −1
φ(t) = (G0 )n,0 φ1,n (t) (7.1)
n=0
2N
X −1
ψ(t) = (G1 )n,1 φ1,n (t) (7.2)
n=0

Using Equation (7.1), the Fourier transform of φ is

!
Z ∞ Z ∞ √
1 −iωt 1 X
φ̂(ω) = √ φ(t)e dt = √ (G0 )n,0 2φ(2t − n) e−iωt dt
2π −∞ 2π −∞ n
XZ ∞
1 −iω(t+n)/2
=√ √ (G0 )n,0 φ(t)e dt
2 2π n −∞
! Z ∞
1 X 1 λG0 (ω/2)
=√ (G0 )n,0 e−iωn/2 √ φ(t)e−i(ω/2)t) dt = √ φ̂(ω/2).
2 n
2π −∞ 2
(7.3)

Clearly this expression can be continued recursively. We can thus state the
following result.
Theorem 7.1. gN .
Define

N
Y λG0 (ω/2s )
gN (ω) = √ χ[0,2π] (2−N ω). (7.4)
s=1
2

Then on [0, 2π2N ] we have that φ̂(ν) = gN (ν)φ̂(ν/2N ).


We can now prove the following.
Lemma 7.2. gN (ν) P converges.√ √
Assume that n (G0 )n = 2 (i.e. λG0 (0) = 2), and that G0 is a FIR-
filter. Then gN (ν) converges pointwise as N → ∞ to an infinitely differentiable
function.
Q∞ λ (2πν/2s )
Proof. We need to verify that the infinite product s=1 G0 √2 converges.
λG0 (2πν/2s )
P  
Taking logarithms we get s ln √
2
. To see if this series converges,
we consider the ratio between two successive terms:
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 255

λG0 (2πν/2s+1 )
 
ln √
2
  .
λG0 (2πν/2s )
ln √
2
P √ √ √
Since n (G0 )n = 2, we see that λG0 (0) = 2. Since limν→0 λG0 (ν) = 2,
both the numerator and the denominator above tends
 to 0 (to one inside the
λG (ν/2)
ln 0√
2
logarithms), so that we can use L’hospital’s rule on λ  to obtain
G0 (ν)
ln √
2

−inν/2
P
λG0 (ν) n (G0 )n (−in)e /2 1
P −inν
→ <1
λG0 (ν/2) n (G )
0 n (−in)e 2
as ν → 0. It follows that the product converges for any ν. Clearly the conver-
gence is absolute and uniform on compact sets, so that the limit is infinitely
differentiable.

It follows that φ̂, when φ exists, must be an infinitely differentiable function


also. Similarly we get

! Z ∞
1 X
−iωn/2 1
ψ̂(ω) = √ (G1 )n−1,0 e √ φ(t)e−i(ω/2)t) dt
2 n
2π −∞
!
1 X λG1 (ω/2)
=√ (G1 )n,0 e−iω(n+1)/2 φ̂(ω/2) = e−iω/2 √ φ̂(ω/2).
2 n
2

It follows in the same way that ψ̂ must be an infinitely differentiable function


also.
Now consider the dual filter bank transform,as defined in Chapter 6. Its
synthesis filter are (H0 )T and (H1 )T . If there exist a scaling function φ̃ and a
mother wavelet ψ̃ for the dual transform, they must in the same way be infinitely
differentiable. Moreover, φ̂, ψ̂, φ̃ˆ, ψ̃ˆ can be found as infinite products of the known
frequency responses. If these functions are in L2 R, then we can find unique
functions φ, ψ, φ̃, ψ̃ with these as Fourier transforms.
So, our goal is to find filters so that the derived infinite products of the
frequency responses lie in L2 (R), and so that the constructed functions φ, ψ, φ̃, ψ̃
give rise to “nice” wavelet bases. Some more technical requirements will be
needed in order for this. In order to state these we should be clear on what we
mean by a “nice” basis in this context. First of all, the bases should together
span all of L2 (R). But our bases are not orthogonal, so we should have some
substitute for this. We will need the following definitions.

Definition 7.3. Frame.


Let H be a Hilbert space. A set of vectors {un }n is called a frame of H if
there exist constants A > 0 and B > 0 so that, for any f ∈ H,
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 256

X
Akf k2 ≤ |hf, un i|2 ≤ Bkf k2 .
n
If A = B, the frame is said to be tight.
Note that, for a frame of H, any f ∈ H is uniquely characterized by the
inner products hf, un i. Indeed, if both a, b ∈ H have the same inner products,
then a − b ∈ H have inner products 0, which implies that a = b from the left
inequality.
For every frame one can find a dual frame {u˜n }n which satisfies
1 X 1
kf k2 ≤ |hf, u˜n i|2 ≤ kf k2 ,
B n
A
and

X X
f= hf, un iu˜n = hf, u˜n iun . (7.5)
n n

Thus, if the frame is tight, the dual frame is also tight.


A frame is called a Riesz basis if all its vectors also are linearly independent.
One can show that the vectors in the dual frame of a Riesz basis also are linearly
independent, so that the dual frame of a Riesz basis also is a Riesz basis. It is
also called the dual Riesz basis. We will also need the following definition.
Definition 7.4. Biorthogonal bases.
We say that two bases {fn }n , {gm }m are biorthogonal if hfn , gm i = 0
whenever n 6= m, and 1 if n = m.
From Equation (7.5) and linear independence, it is clear that the vectors in
a Riesz basis and in its dual Riesz basis are biorthogonal. In the absence of
orthonormal bases for L2 (R), the best we can hope for is dual Riesz bases for
L2 (R). The following result explains how we can obtain this from the filters.
Proposition 7.5. Biorthogonality.
Assume that the frequency responses λG0 and λH0 can be written as.

L L̃
1 + e−iω 1 + e−iω
 
λG0 (ω) λH0 (ω)
√ = F(ω) √ = F̃(ω), (7.6)
2 2 2 2
where F and F̃ are trigonometric polynomials of finite degree. Assume also that,
for some k, k̃ > 0,

1/k
Bk = max F(ω) · · · F(2k−1 ω) < 2L−1/2 (7.7)
ω
1/k̃
B˜k = max F̃(ω) · · · F̃(2k̃−1 ω) < 2L̃−1/2 (7.8)
ω

Then the following hold:


CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 257

• φ, φ̃ ∈ L2 (R), and the corresponding bases φ0 and φ̃0 are biorthogonal.


• ψm,n is a Riesz basis of L2 (R).
• ψ̃m,n is the dual Riesz basis of ψm,n . Thus, ψm , n and ψ̃m,n are biorthog-
onal bases, and for any f ∈ L2 (R),
X X
f= hf, ψ̃m,n iψm,n = hf, ψm,n iψ̃m,n . (7.9)
m,n m,n

If also

Bk < 2L−1−m B˜k < 2L̃−1−m̃ , (7.10)

then

• φ, ψ are m times differentiable and ψ̃ has m + 1 vanishing moments,


• φ̃, ψ̃ are m̃ times differentiable and ψ has m̃ + 1 vanishing moments.

The proof for Proposition 7.5 is long, technical, and split in many stages.
The entire proof can be found in [7], and we will not go through all of it, only
address some simple parts of it in the following subsections. After that we will
see how we can find G0 , H0 so that equations (7.6), (7.7), (7.8) are fulfilled.
Before we continue on this path, several comments are in order.
1. The paper [7] much more general conditions for when filters give rise to a
Riesz basis as stated here. The conditions (7.7), (7.8) are simply chosen because
they apply or the filters we consider.
2. From Equation (7.6) it follows that the flatness in the frequency responses
close to π explains how good the bases are for approximations, since the number
of vanishing moments is infered from the multiplictity of the zero at π for the
frequency response.
3. From the result we obtain an MRA (with scaling function φ), and a dual
MRA (with scaling function φ̃), as well as mother wavelets (ψ and ψ̃), and we
can define the resolution spaces Vm and the detail spaces Wm as before, as well
as the “dual resolution spaces” Ṽm , (the spaces spanned by φ̃m = {φ̃m,n }n ) and
“dual detail spaces” W̃m (the spaces spanned by ψ̃m = {ψ̃m,n }n ). In general
Vm is different from V˜m (except when φ = φ̃), and Wm is in general different
from the orthogonal complement of Vm−1 in Vm (except when φ = φ̃, when all
bases are orthonormal), although constructed so that Vm = Vm−1 ⊕ Wm−1 . Our
construction thus involves two MRA’s

V0 ⊂ V1 ⊂ V2 ⊂ · · · ⊂ Vm ⊂ · · · Ṽ0 ⊂ Ṽ1 ⊂ Ṽ2 ⊂ · · · ⊂ Ṽm ⊂ · · ·

where there are different scaling functions, satisfying a biorthogonality relation-


ship. This is also called a dual multiresolution analysis.
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 258

4. The DWT and IDWT are defined as before, so that the same change
of coordinates can be applied, as dictated by the filter coefficients. As will
be seen below, while proving Proposition 7.5 it also follows that the bases
φ0 ⊕ ψ0 ⊕ ψ1 · · · ψm−1 and φ̃0 ⊕ ψ̃0 ⊕ ψ̃1 · · · ψ̃m−1 are biorthogonal (in addition
to that φm and φ̃m are biorthogonal, as stated). For f ∈ Vm this means that

X X X
f (t) = hf (t), φ̃m,n iφm,n = hf (t), φ̃0,n iφ0,n + hf (t), ψ̃m0 ,n iψm0 ,n ,
n n m0 <m,n

since this relationship is fulfilled for any linear combination of the {φm,n }n , or
for any of the {φ0,n , ψm0 ,n }m0 <m,n , due to biorthogonality. Similarly, for f˜ ∈ Ṽm

X X X
f˜(t) = hf˜(t), φm,n iφ̃m,n = hf˜(t), φ0,n iφ̃0,n + hf˜(t), ψm0 ,n iψ̃m0 ,n .
n n m0 <m,n

It follows that for f ∈ Vm and for f˜ ∈ Ṽm the DWT and the IDWT and their
duals can be expressed in terms of inner products as follows.
• The input to the DWT is cm,n = hf, φ̃m,n i. The output of the DWT is
c0,n = hf, φ̃0,n i and wm0 ,n = hf, ψ̃m0 ,n i
• The input to the dual DWT is c̃m,n = hf˜, φm,n i. The output of the dual
DWT is c̃0,n = hf˜, φ0,n i and w̃m0 ,n = hf˜, ψm0 ,n i.
• in the DWT matrix, column k has entries hφ1,k , φ̃0,l i, and hφ1,k , ψ̃0,l i (with
a similar expression for the dual DWT).
• in the IDWT matrix, column 2k has entries hφ0,k , φ̃1,l i, and column 2k + 1
has entries hψ0,k , φ̃1,l i (with a similar expression for the dual IDWT).
Equation (7.9) comes from eliminating the φm,n by letting m → ∞.
5. When φ = φ̃ (orthonormal MRA’s), the approximations (finite sums)
above coincide with projections onto the spaces Vm , Ṽm , Wm , W̃m . When φ 6= φ̃,
however, there are no reasons to believe that these approximations equal the best
approximations to f from Vm . In this case we have no procedure for computing
best approximations. When f is not in Vm , Ṽm we can, however, consider the
approximations
X X
hf (t), φ̃m,n iφm,n (t) ∈ Vm and hf (t), φm,n iφ̃m,n (t) ∈ Ṽm
n n
(when the MRA is orthonormal, this coincides Pwith the best approximation).
Now, we can choose m so large that f (t) = n cn φm,n (t) + (t), with (t) a
small function. The first approximation can now be written

XX X X
h cn0 φm,n0 (t) + (t), φ̃m,n iφm,n (t) = cn φm,n (t) + h(t), φ̃m,n iφm,n (t)
n n0 n n
X
= f (t) + h(t), φ̃m,n iφm,n (t) − (t).
n
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 259

P
Clearly, the difference n h(t), φ̃m,n iφm,n (t) − (t) from f is small. It may.
however, be hard to compute the cn above, so that instead, as in Theorem 5.33,
−m
one uses R N 2 f (n/2m )φm,n (t) as an approximation to f (i.e. use sample
φm,0 (t)dt
0
values as cn ) also in this more general setting.
6. Previously we were taught to think in a periodic or folded way, so that we
−1
could restrict to an interval [0, N ], and to bases of finite dimensions ({φ0,n }N
n=0 ).
But the results above are only stated for wavelet bases of infinite dimension. Let
us therefore say something on how the results carry over to our finite dimensional
setting. If f ∈ L2 (R) we can define the function

X X X
f per (t) = f (t + kN ) f f old (t) = f (t + 2kN ) + f (2kN − t).
k k k

f per and f f old are seen to be periodic with periods N and 2N . It is easy to see
that the restriction of f per to [0, N ] is in L2 ([0, N ]), and that the restriction of
f f old to [0, 2N ] is in L2 ([0, 2N ]). In [6] it is shown that the result above extends
f old
to a similar result for the periodized/folded basis (i.e. ψm,n ), so that we obtain
dual Riesz bases for L2 ([0, N ]) and L2 ([0, 2N ]) instead of L2 (R). The result on
the vanishing moments does not extend, however. One can, however, alter some
of the basis functions so that one achieves this. This simply changes some of the
columns in the DWT/IDWT matrices. Note that our extension strategy is not
optimal. The extension is usually not differentiable at the boundary, so that the
corresponding wavelet coefficients may be large, even though the wavelet has
many vanishing moments. The only way to get around this would be to find an
extension strategy which gave a more regular extension. However, natural images
may not have high regularity, which would make such an extension strategy
useless.

Sketch of proof for the biorthogonality in Proposition 7.5 (1). We


first show that φ0 and φ˜0 are biorthogonal. Recall that definition (7.4) said
QN λ (ω/2s )
that gN (ω) = s=1 G0√2 χ[0,2π] (2−N ω). Let us similarly define hN (ω) =
QN λH0 (ω/2s )
s=1

2
χ[0,2π] (2−N ω). Recall that gN → φ̂ and hN → φ̃ˆ pointwise as
N → ∞. We have that

λG0 (ω/2) λH0 (ω/2)


gN +1 (ω) = √ gN (ω/2) hN +1 (ω) = √ hN (ω/2).
2 2
gN , hN are compactly supported, and equal to trigonometric polynomials on
their support, so that gN , hN ∈ L2 (R). Since the Fourier transform also is an
isomorphism og L2 (R) onto itself, there exist functions uN , vN ∈ L2 (R) so that
gN = uˆN , hN = vˆN . Since the above relationship equals that of Equation (7.3),
with φ̂ replaced with gN , we must have that
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 260

X √ X √
uN +1 (t) = (G0 )n,0 2uN (2t − n) vN +1 (t) = (H0 )0,n 2vN (2t − n).
n n

Now, note that g0 (ω) = h0 (ω) = χ[0,1] (ω). Since hu0 , v0 i = hg0 , h0 i we get that

Z −∞ Z −∞ Z 2π
u0 (t)v0 (t − k)dt = g0 (ν)h0 (ν)e2πikν dν = e−2πikν dν = δk,0 .
∞ ∞ 0

Now assume that we have proved that huN (t), vN (t − k)i = δk,0 . We then get
that

X
huN +1 (t), vN +1 (t − k)i = 2 (G0 )n1 ,0 (H0 )0,n2 huN (2t − n1 ), vN (2(t − k) − n2 )i
n1 ,n2
X
=2 (G0 )n1 ,0 (H0 )0,n2 huN (t), vN (t + n1 − n2 − 2k)i
n1 ,n2
X X
= (G0 )n1 ,0 (H0 )0,n2 = (H0 )0,n−2k (G0 )n,0
n1 ,n2 |n1 −n2 =2k n
X X
= (H0 )2k,n (G0 )n,0 = H2k,n Gn,0 = (HG)2k,0 = I2k,0 = δk,0
n n

where we did the change of variables u = 2t − n1 . There is an extra argument to


show that gN →L2 φ̂ (stronger than pointwise convergence as was stated above),
so that also uN →L2 φ ∈ L2 (R), since the Fourier transform is an isomorphism
of L2 (R) onto itself. It follows that

hφm,k , φ̃m,l i = lim huN (t − k), vN (t − l)i = δk,l .


N →∞

While proving this one also establishes that

|φ̂(ω)| ≤ C(1 + |ω|)−1/2− |φ̃ˆ(ω)| ≤ C(1 + |ω|)−1/2− , (7.11)

where  = L − 1/2 − log Bk / log 2 > 0 due to Assumption (7.7). In the paper
it is proved that this condition implies that the bases constitute dual frames.
The biorthogonality is used to show that they also are dual Riesz bases (i.e. that
they also are linearly independent).

Sketch of proof for the biorthogonality of in Proposition 7.5 (2). The


biorthogonality of ψm,n and ψ̃m,n can be deduced from the biorthogonality of
φ0 and φ̃0 as follows. We have that
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 261

X
hψ0,k , ψ̃0,l i = (G1 )n1 ,1 (H1 )1,n2 hφ1,n1 +2k (t)φ̃1,n2 +2l (t)i
n1 ,n2
X X X
= (G1 )n,1 (H1 )1,n+2(k−l) = (H1 )1+2(l−k),n (G1 )n,1 = H1+2(l−k),n Gn,1
n n n
= (HG)1+2(l−k),1 = δk,0 .

Similarly,

X X
hψ0,k φ̃0,l i = (G1 )n1 ,1 (H0 )0,n2 hφ1,n1 +2k (t)φ̃1,n2 +2l (t)i = (G1 )n,1 (H0 )0,n+2(k−l)
n1 ,n2 n
X X
= (H0 )2(l−k),n (G1 )n,1 = H2(l−k),n Gn,1 = (HG)2(l−k),1 = 0
n n
X X
hφ0,k ψ̃0,l i = (G0 )n1 ,0 (H1 )1,n2 hφ1,n1 +2k (t)φ̃1,n2 +2l (t)i = (G0 )n,0 (H1 )1,n+2(k−l)
n1 ,n2 n
X X
= (H1 )1+2(l−k),n (G0 )n,0 = H1+2(l−k),n Gn,0 = (HG)1+2(l−k),0 = 0.
n n

From this we also get with a simple change of coordinates that

hψm,k , ψ̃m,l i = hψm,k , φ̃m,l i = hφm,k , ψ̃m,l i = 0.


Finally, if m0 < m, φm0 ,k0 , ψm0 ,k can be written as a linear combination of φm,l ,
so that hφm0 ,k , ψ̃m,l i = hψm0 ,k , ψ̃m,l i = 0 due to what we showed above. Similarly,
hφ̃m0 ,k , ψm,l i = hψ̃m0 ,k , ψm,l i = 0.

Regularity and vanishing moments. Now assume also that Bk < 2L−1−m ,
so that log Bk < L − 1 − m. We have that  = L − 1/2 − log Bk / log 2 > L − 1/2 −
L + 1 + m = m + 1/2, so that |φ̂(ω)| < C(1 + |ω|)−1/2− = C(1 + |ω|)−m−1−δ
for some δ > 0. This implies that φ̂(ω)(1 + |ω|)m < C(1 + |ω|)−1−δ ∈ L1 . An
important property of the Fourier transform is that φ̂(ω)(1 + |ω|)m ∈ L1 if and
only if φ is m times differentiable. This property implies that φ, and thus ψ is
m times differentiable. Similarly, φ̃, ψ̃ are m̃ times differentiable.
In [7] it is also proved that if

• ψm,n and ψ̃m,n are biorthogonal bases,


• ψ is m times differentiable with all derivatives ψ (l) (t) of order l ≤ m
bounded, and
• ψ̃(t) < C(1 + |t|)m+1 ,

then ψ̃ has m + 1 vanisning moments. In our case we have that ψ and ψ̃ have
compact support, so that these conditions are satisfied. It follows that ψ̃ has
m + 1 vanisning moments.
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 262

In the next section we will construct a wide range of forward and reverse
filter bank transforms which invert each other, and which give rise to wavelets.
In [7] one checks that many of these wavelets satisfy (7.7) and (7.8) (implying
that they give rise to dual Riesz bases for L2 (R)), or the more general (7.10)
(implying a certain regularity and a certain number of vanishing moments).
Requirements on the filters lengths in order to obtain a given number of vanishing
moments are also stated.

Exercise 7.1: Implementation of the cascade algorithm


a) In the code above, we turned off symmetric extensions (the symm-argument is
0). Attempt to use symmetric extensions instead, and observe the new plots you
obtain. Can you explain why these new plots do not show the correct functions,
while the previous plots are correct?

Exercise 7.2: Using the cascade algorithm


In Exercise 6.20 we constructed a new mother wavelet ψ̂ for piecewise linear
functions by finding constants α, β, γ, δ so that

ψ̂ = ψ − αφ0,0 − βφ0,1 − δφ0,2 − γφ0,N −1 .


Use the cascade algorithm to plot ψ̂. Do this by using the wavelet kernel for
the piecewise linear wavelet (do not use the code above, since we have not
implemented kernels for this wavelet yet).

7.2 Vanishing moments


The scaling functions and mother wavelets we constructed in Chapter 5 were
very simple. They were however, enough to provide scaling functions which
were differentiable. This may clearly be important for signal approximation, at
least in cases where we know certain things about the regularity of the functions
we approximate. However, there seemed to be nothing which dictated how the
mother wavelet should be chosen in order to be useul. To see that this may pose
a problem, consider the mother wavelet we hose for piecewise linear functions.
Set N = 1 and consider the space V10 , which has dimension 210 . When we
apply a DWT, we start with a function g10 ∈ V10 . This may be a very good
representation of the underlying data. However, when we compute gm−1 we
just pick every other coefficient from gm . By the time we get to g0 we are just
left with the first and last coefficient from g10 . In some situations this may be
adequate, but usually not.
Idea 7.6. Approximation.
We would like a wavelet basis to be able f efficiently. By this
Pto represent P
we mean that the approximation f (m) = n c0,n φ0,n + m0 <m,n wm0 ,n ψm0 ,n
to f from Observation 7.9 should converge quickly for the f we work with, as
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 263

m increases. This means that, with relatively few ψm,n , we can create good
approximations of f .
In this section we will address a property which the mother wavelet must
fulfill in order to be useful in this respect. To motivate this property, let us first
use decompose f ∈ Vm as
r
N
X −1 m−1
X 2X N −1
f= hf, φ̃0,n iφ0,n + hf, ψ̃r,n iψr,n . (7.12)
n=0 r=0 n=0

If f is s times differentiable, it can be represented as f = Ps (x) + Qs (x), where


Ps is a polynomial of degree s, and Qs is a function which is very small (Ps
could for instance be a Taylor series expansion of f ). If in addition htk , ψ̃i = 0,
for k = 1, . . . , s, we have also that htk , ψ̃r,t i = 0 for r ≤ s, so that hPs , ψ̃r,t i = 0
also. This means that Equation (7.12) can be written

r
N
X −1 m−1
X 2X N −1
f= hPs + Qs , φ̃0,n iφ0,n + hPs + Qs , ψ̃r,n iψr,n
n=0 r=0 n=0
r r
N
X −1 m−1
X 2X N −1 m−1
X 2X N −1
= hPs + Qs , φ̃0,n iφ0,n + hPs , ψ̃r,n iψr,n + hQs , ψ̃r,n iψr,n
n=0 r=0 n=0 r=0 n=0
r
N
X −1 m−1
X 2X N −1
= hf, φ̃0,n iφ0,n + hQs , ψ̃r,n iψr,n .
n=0 r=0 n=0

Here the first sum lies in V0 . We see that the wavelet coefficients from Wr are
hQs , ψ̃r,n i, which are very small since Qs is small. This means that the detail in
the different spaces Wr is very small, which is exactly what we aimed for. Let
us summarize this as follows:
Theorem 7.7. Vanishing moments.
If a function f ∈ Vm is r times differentiable, and ψ̃ has r vanishing mo-
ments, then f can be approximated well from V0 . Moreover, the quality of this
approximation improves when r increases.
Having many vanishing moments is thus very good for compression, since
the corresponding wavelet basis is very efficient for compression. In particular,
if f is a polynomial of degree less than or equal to k − 1 and ψ̃ has k vanishing
moments, then the detail coefficients wm,n are exactly 0. Since (φ, ψ) and (φ̃,
ψ̃) both are wavelet bases, it is equally important for both to have vanishing
moments. We will in the following concentrate on the number of vanishing
moments of ψ.
RN
The Haar wavelet has one vanishing moment, since ψ̃ = ψ and 0 ψ(t)dt = 0
as we noted in Observation 5.14. It isR Nan exercise to see that the Haar wavelet
has only one vanishing moment, i.e. 0 tψ(t)dt 6= 0.
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 264

Theorem 7.8. Vanishing moments.


Assume that the filters are chosen so that the scaling functions exist. Then
the following hold

• The number of vanishing moments of ψ̃ equals the multiplicity of a zero at


ω = π for λG0 (ω).
• The number of vanishing moments of ψ equals the multiplicity of a zero at
ω = π for λH0 (ω).

number of vanishing moments of ψ, ψ̃ equal the multiplicities of the zeros of the


frequency responses λH0 (ω), λG0 (ω), respectively, at ω = π.
In other words, the flatter the frequency responses λH0 (ω) and λG0 (ω) are near
high frequencies (ω = π), the better the wavelet functions are for approximation
of functions. This is analogous to the smoothing filters we constructed previously,
where the use of values from Pascals triangle resulted in filters which behaved
like the constant function one at low frequencies. The frequency response for the
Haar wavelet had just a simple zero at π, so that it cannot represent functions
efficiently. The result also proves why we should consider G0 , H0 as lowpass
filters, G1 , H1 as highpass filters.
Proof. We have that
Z ∞
λs−ψ̃(−t) (ν) = − ψ̃(−t)e−2πiνt dt. (7.13)
−∞

By differentiating this expression k times w.r.t. ν (differentiate under the integral


sign) we get
Z
(λs−ψ̃(−t) )(k) (ν) = − (−2πit)k ψ̃(t)e−2πiνt dt. (7.14)

Evaluating this at ν = 0 gives


Z
(λs−ψ̃(−t) )(k) (0) = − (−2πit)k ψ̃(t)dt. (7.15)

From this expression it is clear that the number of vanishing moments of ψ̃


equals the multiplicity of a zero at ν = 0 for λs−ψ̃(−t) (ν), which we have already
shown equals the multiplicity of a zero at ω = 0 for λH1 (ω). Similarly it follows
that the number of vanishing moments of ψ equals the multiplicity of a zero at
ω = 0 for λG1 (ω). Since we know that λG0 (ω) has the same number of zeros at
π as λH1 (ω) has at 0, and λH0 (ω) has the same number of zeros at π as λG1 (ω)
has at 0, the result follows.
These results explain how we can construct φ, ψ, φ̃, ψ̃ from FIR-filters H0 ,
H1 , G0 , G1 satisfying the perfect reconstruction condition. Also, the results
explain how we can obtain such functions with as much differentiability and
as many vanishing moments as we want. We will use these results in the next
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 265

section to construct interesting wavelets. There we will also cover how we can
construct the simplest possible such filters.
There are some details which have been left out in this section: We have not
addressed why the wavelet bases we have constructed are linearly independent,
and why they span L2 (R). Dual Riesz bases. These details are quite technical,
and we refer to [7] for them. Let us also express what we have found in terms of
analog filters.
Observation 7.9. Analog filters.
Let
X X X
f (t) = cm,n φm,n = c0,n φ0,n + wm0 ,n ψm0 ,n ∈ Vm .
n n m0 <m,n

cm,n and wm,n can be computed by sampling the output of an analog filter. To
be more precise,

Z N Z N
cm,n = hf, φ̃m,n i = f (t)φ̃m,n (t)dt = (−φ̃m,0 (−t))f (2−m n − t)dt
0 0
Z N Z N
wm,n = hf, ψ̃m,n i = f (t)ψ̃m,n (t)dt = (−ψ̃m,0 (−t))f (2−m n − t)dt.
0 0

In other words, cm,n can be obtained by sampling s−φ̃m,0 (−t) (f (t)) at the points
2−m n, wm,n by sampling s−ψ̃m,0 (−t) (f (t)) at 2−m n, where the analog filters
s−φ̃m,0 (−t) , s−ψ̃m,0 (−t) were defined in Theorem 1.25, i.e.

Z N
s−φ̃m,0 (−t) (f (t)) = (−φ̃m,0 (−s))f (t − s)ds (7.16)
0
Z N
s−ψ̃m,0 (−t) (f (t)) = (−ψ̃m,0 (−s))f (t − s)ds. (7.17)
0

A similar statement can be made for f˜ ∈ Ṽm . Here the convolution kernels
of the filters were as before, with the exception that φ, ψ were replaced by φ̃, ψ̃.
Note also that, if the functions φ̃, ψ̃ are symmetric, we can increase the precision
in the DWT with the method of symmetric extension also in this more general
setting.

7.3 Characterization of wavelets w.r.t. number


of vanishing moments
We have seen that wavelets are particularly suitable for approximation of func-
tions when the mother wavelet or the dual mother wavelet have vanishing
moments. The more vanishing moments they have, the more attractive they
are. In this section we will attempt to characterize wavelets which have a given
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 266

number of vanishing moments. In particular we will characterize the simplest


such, those where the filters have few filters coefficients.
There are two particular cases we will look at. First we will consider the case
when all filters are symmetric. Then we will look at the case of orthonormal
wavelets. It turns out that these two cases are mutually disjoint (except for
trivial examples), but that there is a common result which can be used to
characterize the solutions to both problems. We will state the results in terms
of the multiplicities of the zeros of λH0 , λG0 at π, which we proved are the same
as the number of vanishing moments.

7.3.1 Symmetric filters


The main result when the filters are symmetric looks as follows.
Theorem 7.10. Wavelet criteria.
Assume that H0 , H1 , G0 , G1 are the filters of a wavelet, and that

• the filters are symmetric,


• λH0 has a zero of multiplicity N1 at π,
• λG0 has a zero of multiplicity N2 at π.

Then N1 and N2 are even, and there exist a polynomial Q which satisfies

u(N1 +N2 )/2 Q(1 − u) + (1 − u)(N1 +N2 )/2 Q(u) = 2. (7.18)


so that λH0 (ω), λG0 (ω) can be written on the form

 N1 /2  
1 1
λH0 (ω) = (1 + cos ω) Q1 (1 − cos ω) (7.19)
2 2
 N2 /2  
1 1
λG0 (ω) = (1 + cos ω) Q2 (1 − cos ω) , (7.20)
2 2

where Q = Q1 Q2 .

Proof. Since the filters are symmetric, λH0 (ω) = λH0 (−ω) and λG0 (ω) =
λG0 (−ω). Since einω + e−inω = 2 cos(nω), and since cos(nω) is the real part of
(cos ω + i sin ω)n , which is a polynomial in cosk ω sinl ω with l even, and since
sin2 ω = 1 − cos2 ω, λH0 and λG0 can both be written on the form P (cos ω), with
P a real polynomial.
Note that a zero at π in λH0 , λG0 corresponds to a factor of the form 1 + e−iω ,
so that we can write
N1
1 + e−iω

λH0 (ω) = f (eiω ) = e−iN1 ω/2 cosN1 (ω/2)f (eiω ),
2
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 267

where f is a polynomial. In order for this to be real, we must have that


f (eiω ) = eiN1 ω/2 g(eiω ) where g is real-valued, and then we can write g(eiω ) as a
real polynomial in cos ω. This means that λH0 (ω) = cosN1 (ω/2)P1 (cos ω), and
similarly for λG0 (ω). Clearly this can be a polynomial in eiω only if N1 is even.
Both N1 and N2 must then be even, and we can write

λH0 (ω) = cosN1 (ω/2)P1 (cos ω) = (cos2 (ω/2))N1 /2 P1 (1 − 2 sin2 (ω/2))


= (cos2 (ω/2))N1 /2 Q1 (sin2 (ω/2)),
where we have used that cos ω = 1 − 2 sin2 (ω/2), and defined Q1 by the relation
Q1 (x) = P1 (1−2x). Similarly we can write λG0 (ω) = (cos2 (ω/2))N2 /2 Q2 (sin2 (ω/2))
for another polynomial Q2 . Using the identities

ω 1 ω 1
cos2 = (1 + cos ω) sin2 = (1 − cos ω),
2 2 2 2
we see that λH0 and λG0 satisfy equations (7.19) and (7.20). With Q = Q1 Q2 ,
Equation (6.25) can now be rewritten as

2 = λG0 (ω)λH0 (ω) + λG0 (ω + π)λH0 (ω + π)


(N1 +N2 )/2 (N1 +N2 )/2
= cos2 (ω/2) Q(sin2 (ω/2)) + cos2 ((ω + π)/2) Q(sin2 ((ω + π)/2))
= (cos2 (ω/2))(N1 +N2 )/2 Q(sin2 (ω/2)) + (sin2 (ω/2))(N1 +N2 )/2 Q(cos2 (ω/2))
= (cos2 (ω/2))(N1 +N2 )/2 Q(1 − cos2 (ω/2)) + (1 − cos2 (ω/2))(N1 +N2 )/2 Q(cos2 (ω/2))
Setting u = cos2 (ω/2) we see that Q must fulfill the equation

u(N1 +N2 )/2 Q(1 − u) + (1 − u)(N1 +N2 )/2 Q(u) = 2,


which is Equation (7.18). This completes the proof.
While this result characterizes all wavelets with a given number of vanishing
moments, it does not say which of these have fewest filter coefficients. The
polynomial Q decides the length of the filters H0 , G0 , however, so that what we
need to do is to find the polynomial Q of smallest degree. In this direction, note
first that the polynomials uN1 +N2 and (1 − u)N1 +N2 have no zeros in common.
Bezouts theorem, proved in Section 7.3.3, states that the equation

uN q1 (u) + (1 − u)N q2 (u) = 1 (7.21)


has unique solutions q1 , q2 with deg(q1 ), deg(q2 ) < (N1 + N2 )/2. To find these
solutions, substituting 1 − u for u gives the following equations:

uN q1 (u) + (1 − u)N q2 (u) = 1


uN q2 (1 − u) + (1 − u)N q1 (1 − u) = 1,
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 268

and uniqueness in Bezouts theorem gives that q1 (u) = q2 (1 − u), and q2 (u) =
q1 (1 − u). Equation (7.21) can thus be stated as

uN q2 (1 − u) + (1 − u)N q2 (u) = 1,
and comparing with Equation (7.18) (set N = (N1 + N2 )/2) we see that Q(u) =
2q2 (u). uN q1 (u) + (1 − u)N q2 (u) = 1 now gives

q2 (u) = (1 − u)−N (1 − uN q1 (u)) = (1 − u)−N (1 − uN q2 (1 − u))


−1 
N  !
X N +k−1 k
= u + O(u ) (1 − uN q2 (1 − u))
N
k
k=0
N −1
X N + k − 1
= uk + O(uN ),
k
k=0

where we have used the first N terms in the Taylor series expansion of (1 − u)−N
around 0. Since q2 is a polynomial of degree N − 1, we must have that
N −1  
X N +k−1 k
Q(u) = 2q2 (u) = 2 u . (7.22)
k
k=0
PN −1 N +k−1

Define Q(N ) (u) = 2 k=0 k uk . The first Q(N ) are

Q(1) (u) = 2 Q(2) (u) = 2 + 4u


Q(3) (u) = 2 + 6u + 12u2 Q(4) (u) = 2 + 8u + 20u2 + 40u3 ,

for which we compute

 
(1) 1
Q (1 − cos ω) = 2
2
 
1
Q(2)
(1 − cos ω) = −e−iω + 4 − eiω
2
 
1 3 9 19 9 iω 3 2iω
Q(3)
(1 − cos ω) = e−2iω − e−iω + − e + e
2 4 2 2 2 4
 
1 5 131 131 iω 5
Q(4) (1 − cos ω) = − e−3iω + 5e−2iω − e−iω + 26 − e + 5e2iω − e3iω ,
2 8 8 8 8

Thus in order to construct wavelets where λH0 , λG0 have as many zeros at π as
possible, and where there are as few filter coefficients as possible, we need to
compute the polynomials above, factorize them into polynomials Q1 and Q2 ,
and distribute these among λH0 and λG0 . Since we need real factorizations, we
must in any case pair complex roots. If we do this we obtain the factorizations
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 269

 
1
Q(1) (1 − cos ω) = 2
2
 
1 1
Q(2) (1 − cos ω) = (eiω − 3.7321)(e−iω − 3.7321)
2 3.7321
 
1 3 1
Q(3) (1 − cos ω) = (e2iω − 5.4255eiω + 9.4438)
2 4 9.4438
× (e−2iω − 5.4255e−iω + 9.4438)
 
1 5 1 1
Q(4) (1 − cos ω) = (eiω − 3.0407)(e2iω − 4.0623eiω + 7.1495)
2 8 3.0407 7.1495
× (e−iω − 3.0407)(e−2iω − 4.0623e−iω + 7.1495), (7.23)

The factors in these factorizations can be distributed as factors in the frequency


responses of λH0 (ω), and λG0 (ω). One possibility is to let one of these frequency
responses absorb all the factors, another possibility is to split the factors as evenly
as possible across the two. When a frequency response absorbs more factors, the
corresponding filter gets more filter coefficients. In the following examples, both
factor distribution strategies will be encountered. Note that it is straightforward
to use your computer to factor Q into a product of polynomials Q1 and Q2 .
First the roots function can be used to find the roots in the polynomials. Then
the conv function can be used to multiply together factors corresponding to
different roots, to obtain the coefficients in the polynomials Q1 and Q2 .

7.3.2 Orthonormal wavelets


Now we turn to the case of orthonormal wavelets, i.e. where G0 = (H0 )T ,
G1 = (H1 )T . For simplicity we will assume d = 0, α = −1 in conditions
(6.23) and (6.24) (this corresponded to requiring λH1 (ω) = −λH0 (ω + π) in the
definition of alternative QMF filter banks). We will also assume for simplicity
that G0 is causal, meaning that t−1 , t−2 , . . . all are zero (the other solutions can
be derived from this). We saw that the Haar wavelet was such an orthonormal
wavelet. We have the following result:
Theorem 7.11. Criteria for perfect reconstruction.
Assume that H0 , H1 , G0 , G1 are the filters of an orthonormal wavelet (i.e. H0 =
(G0 )T and H1 = (G1 )T ) which also is an alternative QMF filter bank (i.e. λH1 (ω) =
−λH0 (ω + π)). Assume also that λG0 (ω) has a zero of multiplicity N at π and
that G0 is causal. Then there exists a polynomial Q which satisfies

uN Q(1 − u) + (1 − u)N Q(u) = 2, (7.24)


−iω 1


so that if f is another polynomial which satisfies f (e )f (e )=Q 2 (1 − cos ω) ,
λG0 (ω) can be written on the form
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 270

N
1 + e−iω

λG0 (ω) = f (e−iω ), (7.25)
2

We avoided stating λH0 (ω) in this result, since the relation H0 = (G0 )T gives
that λH0 (ω) = λG0 (ω). In particular, λH0 (ω) also has a zero of multiplicity N
at π. That G0 is causal is included to simplify the expression further.
Proof. The proof is very similar to the proof of Theorem 7.10. N vanishing
moments and that G0 is causal means that we can write

N
1 + e−iω

λG0 (ω) = f (e−iω ) = (cos(ω/2))N e−iN ω/2 f (e−iω ),
2

where f is a real polynomial. Also

λH0 (ω) = λG0 (ω) = (cos(ω/2))N eiN ω/2 f (eiω ).

Condition (6.25) now says that

2 = λG0 (ω)λH0 (ω) + λG0 (ω + π)λH0 (ω + π)


= (cos2 (ω/2))N f (eiω )f (e−iω ) + (sin2 (ω/2))N f (ei(ω+π) )f (e−i(ω+π) ).

Now, the function f (eiω )f (e−iω ) is symmetric around 0, so that it can be written
on the form P (cos ω) with P a polynomial, so that

2 = (cos2 (ω/2))N P (cos ω) + (sin2 (ω/2))N P (cos(ω + π))


= (cos2 (ω/2))N P (1 − 2 sin2 (ω/2)) + (sin2 (ω/2))N P (1 − 2 cos2 (ω/2)).

If we as in the proof of Theorem 7.10 define Q by Q(x) = P (1 − 2x), we can


write this as

(cos2 (ω/2))N Q(sin2 (ω/2)) + (sin2 (ω/2))N Q(cos2 (ω/2)) = 2,


which again gives Equation (7.18) for finding Q. What we thus need to do
is to compute the polynomial Q 12 (1 − cos ω) as before, and consider the
different factorizations of this on the form f (eiω )f (e−iω ). Since this polynomial
is symmetric, a is a root if and only 1/a is, and if and only if ā is. If the real
roots are

b1 , . . . ., bm , 1/b1 , . . . , 1/bm ,
and the complex roots are
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 271

a1 , . . . , an , a1 , . . . , an and 1/a1 , . . . , 1/an , 1/a1 , . . . , 1/an ,


we can write

 
1
Q (1 − cos ω)
2
= K(e−iω − b1 ) . . . (e−iω − bm )
× (e−iω − a1 )(e−iω − a1 )(e−iω − a2 )(e−iω − a2 ) · · · (e−iω − an )(e−iω − an )
× (eiω − b1 ) . . . (eiω − bm )
× (eiω − a1 )(eiω − a1 )(eiω − a2 )(eiω − a2 ) · · · (eiω − an )(eiω − an )

where K is a constant. We now can define the polynomial f by


f (eiω ) = K(eiω − b1 ) . . . (eiω − bm )
× (eiω − a1 )(eiω − a1 )(eiω − a2 )(eiω − a2 ) · · · (eiω − an )(eiω − an )
1
− cos ω) = f (eiω )f (e−iω ). This con-

in order to obtain a factorization Q 2 (1
cludes the proof.
In the previous proof we note that the polynomial f is not unique - we could
pair the roots in many different ways. The new algorithm is thus as follows:

• As before, write Q 12 (1 − cos ω) as a polynomial in eiω , and find the




roots.
• Split the roots into the two classes

{b1 , . . . ., bm , a1 , . . . , an , a1 , . . . , an }
and

{1/b1 , . . . , 1/bm , 1/a1 , . . . , 1/an , 1/a1 , . . . , 1/an },


and form the polynomial f as above.
 N
1+e−iω
• Compute λG0 (ω) = 2 f (e−iω ).

Clearly the filters obtained with this strategy are not symmetric since f is not
symmetric. In Section 7.6 we will take a closer look at wavelets constructed in
this way.
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 272

7.3.3 The proof of Bezouts theorem


Theorem 7.12. Existence of polynomials.
If p1 and p2 are two polynomials, of degrees n1 and n2 respectively, with no
common zeros, then there exist unique polynomials q1 , q2 , of degree less than
n2 , n1 , respectively, so that

p1 (x)q1 (x) + p2 (x)q2 (x) = 1. (7.26)


Proof. We first establish the existence of q1 , q2 satisfying Equation (7.26). Denote
by deg(P ) the degree of the polynomial P . Renumber the polynomials if necessary,
so that n1 ≥ n2 . By polynomial division, we can now write

p1 (x) = a2 (x)p2 (x) + b2 (x),


where deg(a2 ) = deg(p1 ) − deg(p2 ), deg(b2 ) < deg(p2 ). Similarly, we can write

p2 (x) = a3 (x)b2 (x) + b3 (x),


where deg(a3 ) = deg(p2 ) − deg(b2 ), deg(b3 ) < deg(b2 ). We can repeat this
procedure, so that we obtain a sequence of polynomials an (x), bn (x) so that

bn−1 (x) = an+1 (x)bn (x) + bn+1 (x), (7.27)


where deg an+1 = deg(bn−1 ) − deg(bn ), deg(bn+1 < deg(bn ). Since deg(bn ) is
strictly decreasing, we must have that bN +1 = 0 and bN 6= 0 for some N ,
i.e. bN −1 (x) = aN +1 (x)bN (x). Since bN −2 = aN bN −1 + bN , it follows that bN −2
can be divided by bN , and by induction that all bn can be divided by bN , in
particlar p1 and p2 can be divided by bN . Since p1 and p2 have no common
zeros, bN must be a nonzero constant.
Using Equation (7.27), we can write recursively

bN = bN −2 − aN bN −1
= bN −2 − aN (bN −3 − aN −1 bN −2 )
= (1 + aN aN −1 )bN −2 − aN bN −3 .

By induction we can write


(1) (2)
bN = aN,k bN −k + aN,k bN −k−1 .
(1)
We see that the leading order term for aN,k is aN · · · aN −k+1 , which has degree

(deg(bN −2 )−deg(bN −1 )+· · ·+(deg(bN −k−1 )−deg(bN −k ) = deg(bN −k−1 )−deg(bN −1 ),


(2)
while the leading order term for aN,k is aN · · · aN −k+2 , which similarly has order
deg(bN −k ) − deg(bN −1 ). For k = N − 1 we find
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 273

(1) (2) (1) (2)


bN = aN,N −1 b1 + aN,N −1 b0 = aN,N −1 p2 + aN,N −1 p1 , (7.28)
(1)
with deg(aN,N −1 ) = deg(p1 ) − deg(bN −1 ) < deg(p1 ) (since by construction
(2)
deg(bN −1 ) > 0), and deg(aN,N −1 ) = deg(p2 ) − deg(bN −1 ) < deg(p2 ). From
(2) (1)
Equation (7.28) it follows that q1 = aN,N −1 /bN and q2 aN,N −1 /bN satisfies
Equation (7.26), and that they satisfy the required degree constraints.
Now we turn to uniquness of solutions q1 , q2 . Assume that r1 , r2 are two
other solutions to Equation (7.26). Then

p1 (q1 − r1 ) + p2 (q2 − r2 ) = 0.
Since p1 and p2 have no zeros in common this means that every zero of p2 is a
zero of q1 − r1 , with at least the same multiplicity. If q1 6= r1 , this means that
deg(q1 − r1 ) ≥ deg(p2 ), which is impossible since deg(q1 ) < deg(p2 ), deg(r1 ) <
deg(p2 ). Hence q1 = r1 . Similarly q2 = r2 , establishing uniqueness.

Exercise 7.3: Compute filters


Compute the filters H0 , G0 in Theorem 7.10 when N = N1 = N2 = 4, and
Q1 = Q(4) , Q2 = 1. Compute also filters H1 , G1 so that we have perfect
reconstruction (note that these are not unique).

7.4 A design strategy suitable for lossless com-


pression
We choose Q1 = Q, Q2 = 1. In this case there is no need to find factors in Q.
The frequency responses of the filters in the filter factorization are

 N1 /2  
1 1
λH0 (ω) = (1 + cos ω) Q(N ) (1 − cos ω)
2 2
 N2 /2
1
λG0 (ω) = (1 + cos ω) , (7.29)
2

where N = (N1 + N2 )/2. Since Q(N ) has degree N − 1, λH0 has degree N1 + N1 +
N2 − 2 = 2N1 + N2 − 2, and λG0 has degree N2 . These are both even numbers,
so that the filters have odd length. The names of these filters are indexed by
the filter lengths, and are called Spline wavelets, since, as we now now will show,
the scaling function for this design strategy is the B-spline of order N2 : we have
that
1
λG0 (ω) = (1 + cos ω)N2 /2 = cos(ω/2)N2 .
2N2 /2
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 274

Letting s be the analog filter with convolution kernel φ we can as in Equation


(7.3) write

k k
Y λG0 (2πf /2i ) Y cosN2 (πf /2i )
λs (f ) = λs (f /2k ) = λs (f /2k )
i=1
2 i=1
2
k N2 N2
sin(2πf /2i )
  
k
Y
k sin(πf )
= λs (f /2 ) = λs (f /2 ) ,
i=1
2 sin(πf /2i ) 2k sin πf /2k

where we have used the identity cos ω = sin(2ω)


2 sin ω . If we here let k → ∞, and use
sin f
the identity limf →0 f = 1, we get that
 N2
sin(πf )
λs (f ) = λs (0) .
πf
On the other hand, the frequency response of χ[−1/2,1/2) (t)

Z 1/2  1/2
1
= e−2πif t dt = e−2πif t
−1/2 −2πif −1/2
1 1 sin(πf )
= (e−πif − eπif ) = 2i sin(−πf ) = .
−2πif −2πif πf
 N2
sin(πf )
Due to this πf is the frequency response of ∗N
k=1 χ[−1/2,1/2) (t). By the
2

uniqueness of the frequency response we have that φ(t) = φ̂(0) ∗N k=1 χ[−1/2,1/2) (t).
2

In Exercise 7.5 you will be asked to show that this scaling function gives rise to
the multiresolution analysis of functions which are piecewise polynomials which
are differentiable at the borders, also called splines. This explains why this type
of wavelet is called a spline wavelet. To be more precise, the resolution spaces
are as follows.

Definition 7.13. Resolution spaces of piecewise polynomials.


We define Vm as the subspace of functions which are r − 1 times continuously
differentiable and equal to a polynomial of degree r on any interval of the form
[n2−m , (n + 1)2−m ].
Note that the piecewise linear wavelet can be considered as the first Spline
wavelet. This is further considered in the following example.

7.4.1 The Spline 5/3 wavelet


For the case of N1 = N2 = 2 when the first design strategy is used, equations
(7.19) and (7.20) take the form
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 275

1 1 1 1
λG0 (ω) = (1 + cos ω) = eiω + + e−iω
2 4 2 4 
1 1 1
λH0 (ω) = (1 + cos ω)Q (1)
(1 − cos ω) = (2 + eiω + e−iω )(4 − eiω − e−iω )
2 2 4
1 1 3 1 1
= − e2iω + eiω + + e−iω − e−2iω .
4 2 2 2 4
The filters G0 , H0 are thus

   
1 1 1 1 1 3 1 1
G0 = , , H0 = − , , , ,−
4 2 4 4 2 2 2 4

The length of the filters are 3 and 5 in this case, so that this wavelet is called
the Spline 5/3 wavelet. Up to a constant, the filters are seen to be the same as
those of the alternative piecewise linear wavelet, see Example 6.3. Now, how do
we find the filters (G1 , H1 )? Previously we saw how to find the constant α in
Theorem 6.16 when we knew one of the two pairs (G0 , G1 ), (H0 , H1 ). This was
the last part of information we needed in order to construct the other two filters.
Here we know (G0 , H0 ) instead. In this case it is even easier to find (G1 , H1 )
since we can set α = 1. This means that (G1 , H1 ) can be obtained simply by
adding alternating signs to (G0 , H0 ), i.e. they are the corresponding high-pass
filters. We thus can set

   
1 1 3 1 1 1 1 1
G1 = − ,− , ,− ,− H1 = − , ,− .
4 2 2 2 4 4 2 4

We have now found all the filters. It is clear that the forward and reverse filter
bank transforms here differ only by multiplication with a constant from those of
the the alternative piecewise linear wavelet, so that this gives the same scaling
function and mother wavelet as that wavelet.
The coefficients for the Spline wavelets are always dyadic fractions, and are
therefore suitable for lossless compression, as they can be computed using low
precision arithmetic and bitshift operations. The particular Spline wavelet from
Example 7.4.1 is used for lossless compression in the JPEG2000 standard.

Exercise 7.4: Viewing the frequency response


In this exercise we will see how we can view the frequency responses, scaling
functions and mother wavelets for any spline wavelet.
a) Plot the frequency responses of the filters of some of the spline wavelets in
this section.
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 276

Exercise 7.5: Wavelets based on higher degree polynomials


Show that Br (t) = ∗rk=1 χ[−1/2,1/2) (t) is r − 2 times differentiable, and equals a
polynomial of degree r − 1 on subintervals of the form [n, n + 1]. Explain why
these functions can be used as basis for the spaces Vj of functions which are
piecewise polynomials of degree r − 1 on intervals of the form [n2−m , (n + 1)2−m ],
and r − 2 times differentiable. Br is also called the B-spline of order r.

7.5 A design strategy suitable for lossy compres-


sion
The factors of Q are split evenly among Q1 and Q2 . In this case we need to
factorize Q into a product of real polynomials. This can be done by finding
all roots, and pairing the complex conjugate roots into real second degree
polynomials (if Q is real, its roots come in conjugate pairs), and then distribute
these as evenly as possible among Q1 and Q2 . These filters are called the
CDF-wavelets, after Cohen, Daubechies, and Feauveau, who discovered them.

Example 7.6: The CDF 9/7 wavelet


We choose N1 = N2 = 4. In Equation (7.23) we pair inverse terms to obtain

 
1 5 1 1
Q(3)
(1 − cos ω) = (eiω − 3.0407)(e−iω − 3.0407)
2 8 3.0407 7.1495
× (e2iω − 4.0623eiω + 7.1495)(e−2iω − 4.0623e−iω + 7.1495)
5 1 1
= (−3.0407eiω + 10.2456 − 3.0407e−iω )
8 3.0407 7.1495
× (7.1495e2iω − 33.1053eiω + 68.6168 − 33.1053e−iω + 7.1495e−2iω ).

We can write this as Q1 Q2 with Q1 (0) = Q2 (0) when

Q1 (ω) = −1.0326eiω + 3.4795 − 1.0326e−iω


Q2 (ω) = 0.6053e2iω − 2.8026eiω + 5.8089 − 2.8026e−iω + 0.6053e−2iω ,

from which we obtain


CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 277

 2
1
λG0 (ω) = (1 + cos ω) Q1 (ω)
2
= −0.0645e3iω − 0.0407e2iω + 0.4181eiω + 0.7885
+ 0.4181e−iω − 0.0407e−2iω − 0.0645e−3iω
 2
1
λH0 (ω) = (1 + cos ω) 40Q2 (ω)
2
= 0.0378e4iω − 0.0238e3iω − 0.1106e2iω + 0.3774eiω + 0.8527
+ 0.3774e−iω − 0.1106e−2iω − 0.0238e−3iω + 0.0378e−4iω .

The filters G0 , H0 are thus

G0 = {0.0645, 0.0407, −0.4181, −0.7885, −0.4181, 0.0407, 0.0645}


H0 = {−0.0378, 0.0238, 0.1106, −0.3774, −0.8527, −0.3774, 0.1106, 0.0238, −0.0378}.

The corresponding frequency responses are plotted in Figure 7.1.

1.4 1.4
1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.00 1 2 3 4 5 6 0.00 1 2 3 4 5 6

Figure 7.1: The frequency responses λH0 (ω) (left) and λG0 (ω) (right) for the
CDF 9/7 wavelet.

It is seen that both filters are low-pass filters also here, and that the are
closer to an ideal bandpass filter. Here, the frequency response acts even more
like the constant zero function close to π, proving that our construction has
worked. We also get

G1 = {−0.0378, −0.0238, 0.1106, 0.3774, −0.8527, 0.3774, 0.1106, −0.0238, −0.0378}


H1 = {−0.0645, 0.0407, 0.4181, −0.7885, 0.4181, 0.0407, −0.0645}.

The length of the filters are 9 and 7 in this case, so that this wavelet is called
the CDF 9/7 wavelet. This wavelet is for instance used for lossy compression with
JPEG2000 since it gives a good tradeoff between complexity and compression.
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 278

In Example 6.3 we saw that we had analytical expressions for the scaling
functions and the mother wavelet, but that we could not obtain this for the
dual functions. For the CDF 9/7 wavelet it turns out that none of the four
functions have analytical expressions. Let us therefore use the cascade algorithm
to plot these functions. Note first that since G0 has 7 filter coefficients, and
G1 has 9 filter coefficients, it follows from Theorem 6.9 that supp(φ) = [−3, 3],
supp(ψ) = [−3, 4], supp(φ̃) = [−4, 4], and supp(ψ̃) = [−3, 4]. The scaling
functions and mother wavelets over these supports are shown in Figure 7.2.
Again they have irregular shapes, but now at least the functions and dual
functions more resemble each other.
1.4 2.0
1.2 φ(t) ψ(t)
1.5
1.0
0.8 1.0
0.6 0.5
0.4 0.0
0.2
0.0 0.5
0.2 4 3 2 1 0 1 2 3 4 1.0 4 3 2 1 0 1 2 3 4

1.4 2.0
1.2 φ̃(t) 1.5
ψ̃(t)
1.0
1.0
0.8
0.6 0.5
0.4 0.0
0.2
0.5
0.0
0.2 1.0
0.4 4 3 2 1 0 1 2 3 4 1.5 4 3 2 1 0 1 2 3 4

Figure 7.2: Scaling functions and mother wavelets for the CDF 9/7 wavelet.

In the above example there was a unique way of factoring Q into a product
of real polynomials. For higher degree polynomials there is no unique way to
form to distribute the factors, and we will not go into what strategy can be used
for this. In general, the steps we must go through are as follows:

• Compute the polynomial Q, and find its roots.


• Pair complex conjugate roots into real second degree polynomials, and
form polynomials Q1 , Q2 .

• Compute the coefficients in equations (7.19) and (7.20).


CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 279

7.6 Orthonormal wavelets


Since the filters here are not symmetric, the method of symmetric extension does
not work in the same simple way as before. This partially explains why symmetric
filters are used more often: They may not be as efficient in representing functions,
since the corresponding basis is not orthogonal, but their simple implementation
still makes them attractive.
In Theorem 7.11 we characterized orthonormal wavelets where G0 was causal.
All our filters have an even number, say 2L, of filter coefficients. We can also
find an orthonormal wavelet where H0 has a minimum possible overweight of
filter coefficients with negative indices, H1 a minimum possible overweight of
positive indices, i.e. that the filters can be written with the following compact
notation:

H0 = {t−L , . . . , t−1 , t0 , t1 , . . . , tL−1 } H1 = {s−L+1 , . . . , s−1 , s0 , s1 , . . . , sL }.


(7.30)
To see why, Theorem 6.16 says that we first can shift the filter coefficients of
H0 so that it has this form (we then need to shift G0 in the opposite direction).
H1 , G1 then can be defined by α = 1 and d = 0. We will follow this convention
for the orthonormal wavelets we look at.
The polynomials Q(0) , Q(1) , and Q(2) require no  further action to obtain
the factorization f (eiω )f (e−iω ) = Q 12 (1 − cos ω) . The polynomial Q(3) in
Equation (7.23) can be factored further as
 
1 5 1 1
Q (3)
(1 − cos ω) = (e−3iω − 7.1029e−2iω + 19.5014−iω − 21.7391)
2 8 3.0407 7.1495
× (e3iω − 7.1029e2iω + 19.5014iω − 21.7391),
q
1
which gives that f (eiω ) = 58 3.0407 1
7.1495 (e
3iω
−7.1029e2iω +19.5014iω −21.7391).
This factorization is not unique, however. This gives the frequency response
 −iω
N
λG0 (ω) = 1+e2 f (e−iω ) as

1 −iω √
(e + 1) 2
2 r
1 −iω 1
(e + 1) 2
(e−iω − 3.7321)
4 3.7321
r
1 −iω 3 1
(e + 1)3 (e−2iω − 5.4255e−iω + 9.4438)
8 4 9.4438
r
1 −iω 5 1 1
(e + 1)4 (e−3iω − 7.1029e−2iω + 19.5014−iω − 21.7391),
16 8 3.0407 7.1495
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 280

which gives the filters

√ √
G0 = (H0 )T =( 2/2, 2/2)
G0 = (H0 )T =(−0.4830, −0.8365, −0.2241, 0.1294)
G0 = (H0 )T =(0.3327, 0.8069, 0.4599, −0.1350, −0.0854, 0.0352)
G0 = (H0 )T =(−0.2304, −0.7148, −0.6309, 0.0280, 0.1870, −0.0308, −0.0329, 0.0106)

so that we get 2, 4, 6 and 8 filter coefficients in G0 = (H0 )T . We see that the


filter coefficients when N = 1 are those of the Haar wavelet. The three next
filters we have not seen before. The filter G1 = (H1 )T can be obtained from the
relation λG1 (ω) = −λG0 (ω + π), i.e. by reversing the elements and adding an
alternating sign, plus an extra minus sign, so that

√ √
G1 = (H1 )T =( 2/2, − 2/2)
G1 = (H1 )T =(0.1294, 0.2241, −0.8365, 0.4830)
G1 = (H1 )T =(0.0352, 0.0854, −0.1350, −0.4599, 0.8069, −0.3327)
G1 = (H1 )T =(0.0106, 0.0329, −0.0308, −0.1870, 0.0280, 0.6309, −0.7148, 0.2304).

Frequency responses are shown in Figure 7.3 for N = 1 to N = 6. It is seen that


the frequency responses get increasingly flatter as N increases. The frequency
responses are now complex, so their magnitudes are plotted.

1.4

1.2

1.0

0.8

0.6

0.4

0.2

0.0
0 1 2 3 4 5 6

Figure 7.3: The magnitudes |λG0 (ω)| = |λH0 (ω)| for the first orthonormal
wavelets.

Clearly these filters have low-pass characteristic. We also see that the high-
pass characteristics resemble the low-pass characteristics. We also see that the
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 281

frequency response gets flatter near the high and low frequencies, as N increases.
One can verify that this is the case also when N is increased further. The shapes
for higher N are very similar to the frequency responses of those filters used in
the MP3 standard (see Figure 3.10). One difference is that the support of the
latter is concentrated on a smaller set of frequencies.
The way we have defined the filters, one can show in the same way as in
the proof of Theorem 6.9 that, when all filters have 2N coefficients, φ = φ̃ has
support [−N + 1, N ], ψ = ψ̃ has support [−N + 1/2, N − 1/2] (i.e. the support
of ψ is symmetric about the origin). In particular we have that

• for N = 2: supp(φ) = supp(ψ) = [−1, 2],


• for N = 3: supp(φ) = supp(ψ) = [−2, 3],
• for N = 4: supp(φ) = supp(ψ) = [−3, 4].

The scaling functions and mother wavelets are shown in Figure 7.4. All functions
have been plotted over [−4, 4], so that all these support sizes can be verified.
Also here we have used the cascade algorithm to approximate the functions.
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 282

1.4 2.0
1.2 φ(t) ψ(t)
1.5
1.0
1.0
0.8
0.6 0.5
0.4 0.0
0.2
0.5
0.0
0.2 1.0
0.4 4 3 2 1 0 1 2 3 4 1.5 4 3 2 1 0 1 2 3 4

1.4 2.0
1.2 φ(t) ψ(t)
1.5
1.0
1.0
0.8
0.6 0.5
0.4 0.0
0.2
0.5
0.0
0.2 1.0
0.4 4 3 2 1 0 1 2 3 4 1.5 4 3 2 1 0 1 2 3 4

1.2 1.5
1.0 φ(t) ψ(t)
1.0
0.8
0.6 0.5
0.4
0.2 0.0
0.0
0.5
0.2
0.4 4 3 2 1 0 1 2 3 4 1.0 4 3 2 1 0 1 2 3 4

Figure 7.4: The scaling functions and mother wavelets for orthonormal wavelets
with N vanishing moments, for different values of N .

7.7 Summary
We started the section by showing how filters from filter bank matrices can give
rise to scaling functions and mother wavelets. We saw that we obtained dual
function pairs in this way, which satisfied a mutual property called biorthogonal-
ity. We then saw how differentiable scaling functions or mother wavelets with
vanishing moments could be constructed, and we saw how we could construct
the simplest such. These could be found in terms of the frequency responses
CHAPTER 7. CONSTRUCTING INTERESTING WAVELETS 283

of the involved filters. Finally we studied some examples with applications to


image compression.
For the wavelets we constructed in this chapter, we also plotted the cor-
responding scaling functions and mother wavelets (see figures 7.2, 7.4). The
importance of these functions are that they are particularly suited for approxi-
mation of regular functions, and providing a compact representation of these
functions which is localized in time. It seems difficult to guess that these strange
shapes are connected to such approximation. Moreover, it may seem strange
that, although these functions are useful, we can’t write down exact expressions
for them, and they are only approximated in terms of the Cascade Algorithm.
In the literature, the orthonormal wavelets with compact support we have
constructed were first constructed in [12]. Biorthogonal wavelets were first
constructed in [7].
Chapter 8

The polyphase
representation and wavelets

In Chapter 6 we saw that we could express wavelet transformations and more


general transformations in terms of filters. Through this we obtained intuition
for what information the different parts of a wavelet transformation represent,
in terms of lowpass and highpass filters. We also obtained some insight into the
filters used in the transformation used in the MP3 standard. We expressed the
DWT and IDWT implementations in terms of what we called kernel transforma-
tions, and these were directly obtained from the filters of the wavelet.
We have looked at many wavelets, however, but have only stated the kernel
transformation for the Haar wavelet. In order to use these wavelets in sound
and image processing, or in order to use the cascade algorithm to plot the
corresponding scaling functions and mother wavelets, we need to make these
kernel transformations. This will be one of the goals in this chapter. This will
be connected to what we will call the polyphase representation of the wavelet.
This representation will turn out to be useful for different reasons than the
filter representation as well. First of all, with the polyphase representation,
transformations can be viewed as block matrices where the blocks are filters.
This allows us to prove results in a different way than for filter bank transforms,
since we can prove results through block matrix manipulation. There will be
two major results we will prove in this way.
First, in Section 8.1 we obtain a factorization of a wavelet transformation
into sparse matrices, called elementary lifting matrices. We will show that this
factorization reduces the number of arithmetic operations, and also enables
us to compute the DWT in-place, in a similar way to how the FFT could
be computed in-place after a bit-reversal. This is important: recall that we
previously factored a filter into a product of smaller filters which is useful for
efficient hardware implementations. But this did not address the fact that
only every second component of the filters needs to be evaluated in the DWT,
something any efficient implementation of the DWT should take into account.

284
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS285

The factorization into sparse matrices will be called the lifting factorization, and
it will be clear from this factorization how the wavelet kernels and their duals can
be implemented. We will also see how we can use the polyphase representation
to prove the remaining parts of Theorem 6.16.
Secondly, in Section 8.3 we will use the polyphase representation to analyze
how the forward and reverse filter bank transforms from the MP3 standard can
be chosen in order for us to have perfect or near perfect reconstruction. Actually,
we will obtain a factorization of the polyphase representation into block matrices
also here, and the conditions we need to put on the prototype filters will be clear
from this.

8.1 The polyphase representation and the lifting


factorization
Let us start by defining the basic concepts in the polyphase representation.

Definition 8.1. Polyphase components and representation.


Assume that S is a matrix, and that M is a number. By the polyphase
(i,j)
components of S we mean the matrices S (i,j) defined by Sr1 ,r2 = Si+r1 M,j+r2 M ,
i.e. the matrices obtained by taking every M ’th component of S. By the polyphase
representation of S we mean the block matrix with entries S (i,j) .
The polyphase representation applies in particular for vectors. Since a vector
x only has one column, we write x(p) for its polyphase components. As an
examples consider the 6 × 6 MRA-matrix
 
2 3 0 0 0 1
4 5 6 0 0 0
 
0 1 2 3 0 0
S=  . (8.1)
0 0 4 5 6 0

0 0 0 1 2 3
6 0 0 0 4 5
The polyphase components of S are

   
2 0 0 3 0 1
S (0,0) = 0 2 0 S (0,1) = 1 3 0
0 0 2 0 1 3
   
4 6 0 5 0 0
S (1,0) = 0 4 6 S (1,1) = 0 5 0
6 0 4 0 0 5

We will mainly be concerned with polyphase representations of MRA matrices.


For such matrices we have the following result (this result can be stated more
generally for any filter bank transform).
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS286

Theorem 8.2. Similarity.


When S is an MRA-matrix, the polyphase components S (i,j) are filters (in
general different from the filters considered in Chapter 6), i.e. the polyphase
representation is a 2 × 2-block matrix where all blocks are filters. Also, S is
similar to its polyphase representation, through a permutation matrix P which
places the even-indexed elements first.
To see why, note that when P is the permutation matrix defined above,
then P S consists of S with the even-indexed rows grouped first, and since
also SP T = (P S T )T , SP T groups the even-indexed columns first. From these
observations it is clear that P SP T is the polyphase representation of S, so that
S is similar to its polyphase representation.
We also have the following result on the polyphase representation. This result
is easily proved from manipulation with block matrices, and is therefore left to
the reader.
Theorem 8.3. Products and transpose.
Let A and B be (forward or reverse) filter bank transforms, and denote the
corresponding polyphase components by A(i,j) , B (i,j) . The following hold
• C = ABPis also a filter bank transform, with polyphase components
C (i,j) = k A(i,k) B (k,j) .
• AT is also a filter bank transform, with polyphase components ((AT )(i,j) )k,l =
(A(j,i) )l,k .
Also, the polyphase components of the identity matrix is the M × M -block
matrix with the identity matrix on the diagonal, and 0 elsewhere.
To see an application of the polyphase representation, let us prove the final
ingredient of Theorem 6.16. We need to prove the following:
Theorem 8.4. Criteria for perfect reconstruction.
For any set of FIR filters H0 , H1 , G0 , G1 which give perfect reconstruction,
there exist α ∈ R and d ∈ Z so that

λH1 (ω) = α−1 e−2idω λG0 (ω + π) (8.2)


2idω
λG1 (ω) = αe λH0 (ω + π). (8.3)

Proof. Let H (i,j) be the polyphase components of H, G(i,j) the polyphase


components of G. GH = I means that
 (0,0)
G(0,1)
  (0,0)
H (0,1)
  
G H I 0
= .
G(1,0) G(1,1) H (1,0) H (1,1) 0 I
 (1,1)
−G(0,1)

G
If we here multiply with on both sides to the left, or with
−G(1,0) G(0,0)
 (1,1)
−H (0,1)

H
(1,0) on both sides to the right, we get
−H H (0,0)
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS287

G(1,1) −G(0,1) (G(0,0) G(1,1) − G(1,0) G(0,1) )H (0,0) (G(0,0) G(1,1) − G(1,0) G(0,1) )H (0,1)
   
=
−G(1,0) G(0,0) (G(0,0) G(1,1) − G(1,0) G(0,1) )H (1,0) (G(0,0) G(1,1) − G(1,0) G(0,1) )H (1,1)
H (1,1) −H (0,1) (H (0,0) H (1,1) − H (1,0) H (0,1) )G(0,0) (H (0,0) H (1,1) − H (1,0) H (0,1) )G(0,1)
   
=
−H (1,0) H (0,0) (H (0,0) H (1,1) − H (1,0) H (0,1) )G(1,0) (H (0,0) H (1,1) − H (1,0) H (0,1) )G(1,1)

Now since G(0,0) G(1,1) − G(1,0) G(0,1) and H (0,0) H (1,1) − H (1,0) H (0,1) also are
circulant Toeplitz matrices, the expressions above give that

l(H (0,0) ) ≤ l(G(1,1) ) ≤ l(H (0,0) )


l(H (0,1) ) ≤ l(G(0,1) ) ≤ l(H (0,1) )
l(H (1,0) ) ≤ l(G(1,0) ) ≤ l(H (1,0) )
so that we must have equality here, and with both

G(0,0) G(1,1) − G(1,0) G(0,1) and H (0,0) H (1,1) − H (1,0) H (0,1)


having only one nonzero diagonal. In particular we can define the diagonal
matrix D = H (0,0) H (1,1) − H (0,1) H (1,0) = α−1 Ed (for some α, d), and we have
that

G(0,0) G(0,1) αE−d H (1,1) −αE−d H (0,1)


   
= .
G(1,0) G(1,1) −αE−d H (1,0) αE−d H (0,0)
The first columns here state a relation between G0 and H1 , while the second
columns state a relation between G1 and H0 . It is straightforward to show that
these relations imply equation (8.2)-(8.3). The details for this can be found in
Exercise 8.1.
In the following we will find factorizations of 2 × 2-block matrices where the
blocks are filters, into simpler such matrices. The importance of Theorem 8.2 is
then that MRA-matrices can be written as a product of simpler MRA matrices.
These simpler MRA matrices will be called elementary lifting matrices, and will
be of the following type.
Definition 8.5. Elementary lifting matrices.
A matrix on the form
 
I S
0 I
where S is a filter is called an elementary lifting matrix of even type. A matrix
on the form
 
I 0
S I
is called an elementary lifting matrix of odd type.
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS288

The following are the most useful properties of elementary lifting matrices:
Lemma 8.6. Lifting lemma.
The following hold:
T T
ST
     
I S I 0 I 0 I
= , and = ,
0 I ST I S I 0 I

         
I S1 I S2 I S1 + S2 I 0 I 0 I 0
= , and = ,
0 I 0 I 0 I S1 I S2 I S1 + S2 I
 −1    −1  
I S I −S I 0 I 0
= , and =
0 I 0 I S I −S I
These statements follow directly from Theorem 8.3. Due to Property 2, one
can assume that odd and even types of lifting matrices appear in alternating
order, since matrices of the same type can be grouped together. The following
result states why elementary lifting matrices can be used to factorize general
MRA-matrices:

Theorem 8.7. Multiplying.  (0,0)


S (0,1)

S
Any invertible matrix on the form S = , where the S (i,j) are
S (1,0) S (1,1)
filters with a finite numer of filter coefficients, can be written on the form
 
α0 Ep 0
Λ1 · · · Λn , (8.4)
0 α1 Eq
where Λi are elementary lifting matrices, p, q are integers, α0 , α1 are nonzero
scalars, and Ep , Eq are time delay filters. The inverse is given by
 −1 
α0 E−p 0
(Λn )−1 · · · (Λ1 )−1 . (8.5)
0 α1−1 E−q

Note that (Λi )−1 can be computed with the help of Property 3 of Lemma 8.6.
Proof. The proof will use the conceptof the length of a filter, as defined in
 (0,0)
S S (0,1)
Definition 3.3. Let S = (1,0) be an arbitrary invertible matrix. We
S S (1,1)
will incrementally find an elementary lifting matrix Λi with filter Si in the lower
left or upper right corner so that Λi S has filters of lower length in the first
column. Assume first that l(S (0,0) ) ≥ l(S (1,0) ), where l(S) is the length of a
filter as given by Definition 3.3. If Λi is of even type, then the first column in
Λi S is
  (0,0)   (0,0)
+ Si S (1,0)
 
I Si S S
= . (8.6)
0 I S (1,0) S (1,0)
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS289

Si can now be chosen so that l(S (0,0) + Si S (1,0) ) < l(S (1,0) ). To see how,
recall that we in Section 3.1 stated that multiplying filters corresponds to
multiplying polynomials. Si can thus be found from polynomial division with
remainder: when we divide S (0,0) by S (1,0) , we actually find polynomials Si
and P with l(P ) < l(S (1,0) ) so that S (0,0) = Si S (1,0) + P , so that the length of
P = S (0,0) − Si S (1,0) is less than l(S (1,0) ). The same can be said if Λi is of odd
type, in which case the first and second components are simply swapped. This
procedure can be continued until we arrive at a product

Λn · · · Λ1 S
where either the first or the second component in the first column is 0. If the
first component in the first column is 0, the identity
     
I 0 I I 0 X Y X +Z
=
−I I 0 I Y Z 0 −X
explains that we can bring the matrix to a form where the second element in
the first column is zero instead, with the help of the additional lifting matrices
   
I I I 0
Λn+1 = and Λn+2 = ,
0 I −I I
so that we always can assume that the second element in the first column is 0,
i.e.
 
P Q
Λn · · · Λ1 S = ,
0 R
for some matrices P, Q, R. From the proof of Theorem 6.16 we will see that
in order for S to be invertible, we must have that S (0,0) S (1,1) − S (0,1) S (1,0) =
−α−1 Ed for some nonzero scalar α and integer d. Since
 
P Q
0 R
is also invertible, we must thus have that P R must be on the form αEn . When
the filters have a finite number of filter coefficients, the only possibility for this
to happen is when P = α0 Ep and R = α1 Eq for some p, q, α0 , α1 . Using this,
and also isolating S on one side, we obtain that
 
−1 −1 α0 Ep Q
S = (Λ1 ) · · · (Λn ) , (8.7)
0 α1 Eq
Noting that
1
    
α0 Ep Q
=
1 α1 E−q Q α0 Ep 0
,
0 α1 Eq 0 1 0 α1 Eq
we can rewrite Equation (8.7) as
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS290

1
  
S = (Λ1 ) −1
· · · (Λn ) −1 1 α1 E−q Q α0 Ep 0
,
0 1 0 α1 Eq
which is a lifting factorization of the form we wanted to arrive at. The last matrix
in the lifting factorization is not really a lifting matrix, but it too can easily be
inverted, so that we arrive at Equation (8.5). This completes the proof.
Factorizations on the form given by Equation (8.4) will be called lifting
factorizations. Assume that we have applied Theorem 8.7 in order to get a
factorization of the polyphase representation of the DWT kernel of the form
 
α 0
Λn · · · Λ2 Λ1 H = . (8.8)
0 β
Theorem 8.6 then immediately gives us the following factorizations.

 
−1 −1 −1 α 0
H = (Λ1 ) (Λ2 ) · · · (Λn ) (8.9)
0 β
 
1/α 0
G= Λn · · · Λ2 Λ1 (8.10)
0 1/β
 
α 0
HT = ((Λn )−1 )T ((Λn−1 )−1 )T · · · ((Λ1 )−1 )T (8.11)
0 β
 
T T T T 1/α 0
G = (Λ1 ) (Λ2 ) · · · (Λn ) . (8.12)
0 1/β

Since H T and GT are the kernel transformations of the dual IDWT and the
dual DWT, respectively, these formulas give us recipes for computing the DWT,
IDWT, dual IDWT, and the dual DWT, respectively. All in all, everything can
be computed by combining elementary lifting steps.
In practice, one starts with a given wavelet with certain proved properties
such as the ones from Chapter 7, and applies an algorithm to obtain a lifting
factorization of the polyphase representation of the kernels. The algorihtm can
easily be written down from the proof of Theorem 8.7. The lifting factorization
is far from unique, and the algorithm only gives one of them.
It is desirable for an implementation to obtain a lifting factorization where the
lifting steps are as simple as possible. Let us restrict to the case of wavelets with
symmetric filters, since the wavelets used in most applications are symmetric.
In particular this means that S (0,0) is a symmetric matrix, and that S (1,0) is
symmetric about −1/2 (see Exercise 8.8).
Assume that we in the proof of Theorem 8.7 add an elementary lifting of
even type. At this step we then compute S (0,0) + Si S (1,0) in the first entry of
the first column. Since S (0,0) is now assumed symmetric, Si S (1,0) must also be
symmetric in order for the length to be reduced. And since the filter coefficients
of S (1,0) are assumed symmetric about −1/2, Si must be chosen with symmetry
around 1/2.
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS291

For most of our wavelets we will consider in the following examples it will
turn out the filters in the first column differ in the number of filter coefficients
by 1 at all steps. When this is the case, we can choose a filter of length 2 to
reduce the length by 2, so that the Si in an even lifting step can be chosen on
the form Si = λi {1, 1}. Similarly, for an odd lifting step, Si can be chosen on
the form Si = λi {1, 1}. Let us summarize this as follows:

Theorem 8.8. Differing by 1.


When the filters in a wavelet are symmetric and the lengths of the filters in
the first column differ by 1 at all steps in the lifting factorization, the lifting
steps of even and odd type take the simplified form
   
I λi {1, 1} I 0
and ,
0 I λi {1, 1} I
respectively.
The lifting steps mentioned in this theorem are quickly computed due to
their simple structure.
Each lifting step leaves every second element unchanged, while for the remain-
ing elements, we simply add the two neighbours. Clearly these computations
can be computed in-place, without the need for extra memory allocations. From
this it is also clear how we can compute the entire DWT/IDWT in-place. We
simply avoid the reorganizing into the (φm−1 , ψm−1 )-basis until after all the
lifting steps. After the application of the matrices above, we have coordinates
in the Cm -basis. Here only the coordinates with indices (0, 2, 4, . . .) need to be
further transformed, so the next step in the algorithm should work directly on
these. After the next step only the coordinates with indices (0, 4, 8, . . .) need to
be further transformed, and so on. From this it is clear that

• the ψm−k coordinates are found at indices 2k−1 + r2k , i.e. the last k bits
are 1 followed by k − 1 zeros.
• the φ0 coordinates are found at indices r2m , i.e. the last m bits are 0.

If we place the last k bits of the ψm−k -coordinates in front in reverse order, and
the the last m bits of the φ0 -coordinates in front, the coordinates have the same
order as in the (φm−1 , ψm−1 )-basis. This is also called a partial bit-reverse, and
is related to the bit-reversal performed in the FFT algorithm.
Clearly, these lifting steps are also MRA-matrices with symmetric filters, so
that our procedure factorizes an MRA-matrix with symmetric filters into simpler
MRA-matrices which also have symmetric filters.

8.1.1 Reduction in the number of arithmetic operations


The number of arithmetic operations needed to apply matrices on the form
stated in Equation (6.10) is easily computed. The number of multiplications
is N/2 if symmetry is exploited as in Observation 4.20 (N if symmetry is not
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS292

exploited). Similarly, the number of additions is N . Let K be the total number


of filter coefficients in H0 , H1 . In the following we will see that each lifting step
can be chosen to reduce the number of filter coefficients in the MRA matrix by 4,
so that a total number of K/4 lifting steps are required. Thus, a total number of
KN/8 (KN/4) multiplications, and KN/4 additions are required when a lifting
factorization is used. In comparison, a direct implementation would require
KN/4 (KN/2) multiplications, and KN/2 additions. For the examples we will
consider, we therefore have the following result.
Theorem 8.9. Reducing arithmetic operations.
The lifting factorization approximately halves the number of additions and
multiplications needed, when compared with a direct implementation (regardless
of whether symmetry is exploited or not).

Exercise 8.1: The frequency responses of the polyphase


components
Let H and G be MRA-matrices for a DWT/IDWT, with corresponding filters
H0 , H1 , G0 , G1 , and polyphase components H (i,j) , G(i,j) .
a) Show that

λH0 (ω) = λH (0,0) (2ω) + eiω λH (0,1) (2ω)


λH1 (ω) = λH (1,1) (2ω) + e−iω λH (1,0) (2ω)
λG0 (ω) = λG(0,0) (2ω) + e−iω λG(1,0) (2ω)
λG1 (ω) = λG(1,1) (2ω) + eiω λG(0,1) (2ω).

b) In the proof of the last part of Theorem 6.16, we defered the last part, namely
that equations (8.2)-(8.3) follow from

G(0,0) G(0,1) αE−d H (1,1) −αE−d H (0,1)


   
= .
G(1,0) G(1,1) −αE−d H (1,0) αE−d H (0,0)

Prove this based on the result from a).

Exercise 8.2: Finding new filters


Let S be a filter. Show that
a)  
I 0
G
S I
is an MRA matrix with filters G̃0 , G1 , where

λG˜0 (ω) = λG0 (ω) + λS (2ω)e−iω λG1 (ω),


CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS293

b)  
I S
G
0 I
is an MRA matrix with filters G0 , G̃1 , where

λG˜1 (ω) = λG1 (ω) + λS (2ω)eiω λG0 (ω),


c)  
I 0
H
S I
is an MRA-matrix with filters H0 , H̃1 , where

λH̃1 (ω) = λH1 (ω) + λS (2ω)e−iω λH0 (ω).


d)  
I S
H
0 I
is an MRA-matrix with filters H̃0 , H1 , where

λH̃0 (ω) = λH0 (ω) + λS (2ω)eiω λH1 (ω).


In summary, this exercise shows that one can think of the steps in the lifting
factorization as altering one of the filters of an MRA-matrix in alternating order.

Exercise 8.3: Relating to the polyphase components


−1
Show that S is a filter of length kM if and only if the entries {S i,j }M
i,j=0 in the
polyphase representation of S satisfy S (i+r) mod M,(j+r) mod M = Si,j . In other
words, S is a filter if and only if the polyphase representation of S is a “block-
circulant Toeplitz matrix”. This implies a fact that we will use: GH is a filter
(and thus provides alias cancellation) if blocks in the polyphase representations
repeat cyclically as in a Toeplitz matrix (in particular when the matrix is
block-diagonal with the same block repeating on the diagonal).

Exercise 8.4: QMF filter banks


Recall from Definition 6.18 that we defined a classical QMF filter bank as one
where M = 2, G0 = H0 , G1 = H1 , and λH1 (ω) = λH0 (ω + π). Show that the
forward and reverse filter bank transforms of a classical QMF filter bank take
the form

 
A −B
H=G=
B A
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS294

Exercise 8.5: Alternative QMF filter banks


Recall from Definition 6.19 that we defined an alternative QMF filter bank as
one where M = 2, G0 = (H0 )T , G1 = (H1 )T , and λH1 (ω) = λH0 (ω + π). Show
that the forward and reverse filter bank transforms of an alternative QMF filter
bank take the form.

T
AT BT −B T AT BT
    
A
H= G= = .
−B A B AT −B A

Exercise 8.6: Alternative QMF filter banks with additional


sign
Consider alternative QMF filter banks where we take in an additional sign, so
that λH1 (ω) = −λH0 (ω + π) (the Haar wavelet was an example of such a filter
bank). Show that the forward and reverse filter bank transforms now take the
form

 T T
BT BT AT BT
   
A A
H= G= = .
B −A B −AT B −A
It is straightforward to check that also these satisfy the alias cancellation con-
dition, and that the perfect reconstruction condition also here takes the form
|λH0 (ω)|2 + |λH0 (ω + π)|2 = 2.

8.2 Examples of lifting factorizations


We have seen that the polyphase representations of wavelet kernels can be
factored into a product of elementary lifting matrices. In this section we will
compute the exact factorizations for the wavelets we have considered. In the
exercises we will then complete the implementations, so that we can make actual
experiments, such as listening to the low-resolution approximations in sound, or
using the cascade algorithm to plot scaling functions and mother wavelets. We
will omit the Haar wavelet. One can easily write down a lifting factorization for
this as well, but there is little to save in this factorization when compared to the
direct form of this we already have considered.
First we will consider the two piecewise linear wavelets we have looked at.
It turns out that their lifting factorizations can be obtained in a direct way by
considering the polyphase representations as a change of coordinates. To see
how, we first define

Dm = {φm,0 , φm,2 , φm,4 . . . , φm,1 , φm,3 , φm,5 , . . .}, (8.13)


PDm ←φm is clearly the permutation matrix P used in the similarity between
a matrix and its polyphase representation. Let now H and G be the kernel
transformations of a wavelet. The polyphase representation of H is
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS295

P HP T = PDm ←φm PCm ←φm Pφm ←Dm = P(φ1 ,ψ1 )←φm Pφm ←Dm = P(φ1 ,ψ1 )←Dm .

Taking inverses here we obtain that P GP T = PDm ←(φ1 ,ψ1 ) . We therefore have
the following result:

Theorem 8.10. The polyphase representation.


The polyphase representation of H equals the change of coordinates ma-
trix P(φ1 ,ψ1 )←Dm , and the polyphase representation of G equals the change of
coordinates matrix PDm ←(φ1 ,ψ1 ) .

8.2.1 The piecewise linear wavelet


 
√1
I 0
The polyphase representation of G is 1 . Due to Theorem 8.6,
{1, 1} I
2
 2


I 0
the polyphase representation of H is 2 We can summarize that
− 21 {1, 1} I
the polyphase representations of the kernels H and G for the piecewise linear
wavelet are

   
I 0 1 I 0
2 and √ , (8.14)
− 12 {1, 1} I 1
2 2 {1, 1} I
respectively.

Example 8.7: Lifting factorization of the alternative piece-


wise linear wavelet
The polyphase representation of H is
√ I 1 {1, 1}
  
4 I 0
2 .
0 I − 12 {1, 1} I
In this case we required one additional lifting step. We can thus conclude that the
polyphase representations of the kernels H and G for the alternative piecewise
linear wavelet are

√ 1
I − 14 {1, 1}
     
I 4 {1, 1} I 0 1 I 0
2 and √ ,
0 I − 12 {1, 1} I 1
2 2 {1, 1} I 0 I
(8.15)
respectively.

8.2.2 The Spline 5/3 wavelet


Let us consider the Spline 5/3 wavelet, which we defined in Example 7.4.1. Let
us start by looking at, and we recall that
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS296

   
1 1 3 1 1 1 1 1
H0 = − , , , ,− H1 = − , ,− .
4 2 2 2 4 4 2 4
from which we see that the polyphase components of H are
  1 3 1
{− 4 , 2 , − 4 } 12 {1, 1}
 (0,0)
H (0,1)

H
=
H (1,0) H (1,1) − 41 {1, 1} 1
2I
We see here that the upper filter has most filter coefficients in the first column,
so that we must start with an elementary lifting of even type. We need to find a
filter S1 so that S1 {−1/4, −1/4} + {−1/4, 3/2, −1/4} has fewer filter coefficients
than {−1/4, 3/2, −1/4}. It is clear that we can choose S1 = {−1, −1}, and that

 1 3 1 1
{− 4 , 2 , − 4 } 2 {1, 1}
   
I {−1, −1} 2I 0
Λ1 H = =
0 I − 14 {1, 1} 1
2I
− 41 {1, 1} 1
2I

Now we need to apply an elementary lifting of odd type, and we need to find a
filter S2 so that S2 I − 14 {1, 1} = 0. Clearly we can choose S2 = {1/8, 1/8}, and
we get

    
I 0 2I 0 2I 0
Λ2 Λ1 H = 1 = .
8 {1, 1} I − 14 {1, 1} 1
2I 0 1
2I

Multiplying with inverses of elementary lifting steps, we now obtain that the
polyphase representations of the kernels for the Spline 5/3 wavelet are

   
I {1, 1} I 0 2I 0
H=
0 I − 18 {1, 1} I 0 1
2I

and

1   
2I 0 I 0 I {−1, −1}
G= 1 ,
0 2I 8 {1, 1} I 0 I

respectively. Two lifting steps are thus required. We also see that the lifting
steps involve only dyadic fractions, just as the filter coefficients did. This means
that the lifting factorization also can be used for lossless operations.

8.2.3 The CDF 9/7 wavelet


For the wavelet we considered in Example 7.6, it is more cumbersome to compute
the lifting factorization by hand. It is however, straightforward to write an
algorithm which computes the lifting steps, as these are performed in the proof
of Theorem 8.7. You will be spared the details of this algorithm. Also, when we
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS297

use these wavelets in implementations later they will use precomputed values of
these lifting steps, and you can take these implementations for granted too. If
we run the algorithm for computing the lifting factorization we obtain that the
polyphase representations of the kernels H and G for the CDF 9/7 wavelet are

   
I 0.5861{1, 1} I 0 I −0.0700{1, 1}
0 I 0.6681{1, 1} I 0 I
  
I 0 −1.1496 0
× and
−1.2002{1, 1} I 0 −0.8699
   
−0.8699 0 I 0 I 0.0700{1, 1}
0 −1.1496 1.2002{1, 1} I 0 I
  
I 0 I −0.5861{1, 1}
× ,
−0.6681{1, 1} I 0 I

respectively. In this case four lifting steps were required.


Perhaps more important than the reduction in the number of arithmetic
operations is the fact that the lifting factorization splits the DWT and IDWT
into simpler components, each very attractive for hardware implementations
since a lifting step only requires the additional value λi from Theorem 8.8. Lifting
actually provides us with a complete implementation strategy for the DWT and
IDWT, in which the λi are used as precomputed values.
Finally we will find a lifting factorization for orthonormal wavelets. Note
that here the filters H0 and H1 are not symmetric, and each of them has an
even number of filter coefficients. There are thus a different number of filter
coefficients with positive and negative indices, and in Section 7.6 we defined the
filters so that the filter coefficients were as symmetric as possible when it came
to the number of nonzero filter coefficients with positive and negative indices.

8.2.4 Orthonormal wavelets


We will attempt to construct a lifting factorization where the following property
is preserved after each lifting step:
P1: H (0,0) , H (1,0) have a minimum possible overweight of filter coefficients
with negative indices.
This property stems from the assumption in Section 7.6 that H0 is assumed
to have a minimum possible overweight of filter coefficients with negative indices.
To see that this holds at the start, assume as before that all the filters have 2L
nonzero filter coefficients, so that H0 and H1 are on the form given by Equation
(7.30). Assume first that L is even. It is clear that
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS298

H (0,0) = {t−L , . . . , t−2 , t0 , t2 , . . . , tL−2 }


H (0,1) = {t−L+1 , . . . , t−3 , t−1 , t1 , . . . , tL−1 }
H (1,0) = {s−L+1 , . . . , s−1 , s1 , s3 , . . . , sL−1 }
H (1,1) = {s−L+2 , . . . , s−2 , s0 , s2 , . . . , sL }.

Clearly P1 holds. Assume now that L is odd. It is now clear that

H (0,0) = {t−L+1 , . . . , t−2 , t0 , t2 , . . . , tL−1 }


H (0,1) = {t−L , . . . , t−3 , t−1 , t1 , . . . , tL−2 }
H (1,0) = {s−L+2 , . . . , s−1 , s1 , s3 , . . . , sL }
H (1,1) = {s−L+1 , . . . , s−2 , s0 , s2 , . . . , sL−1 }.

In this case it is seen that all filters have equally many filter coefficients with
positive and negative indices, so that P1 holds also here.
Now let us turn to the first lifting step. We will choose it so that the number
of filter coefficients in the first column is reduced with 1, and so that H (0,0) has
an odd number of coefficients. If L is even, we saw that H (0,0) and H (1,0) had
an even number of coefficients, so that the first lifting step must be even. To
preserve P1, we must cancel t−L , so that the first lifting step is
 
I −t−L /s−L+1
Λ1 = .
0 I
If L is odd, we saw that H (0,0) and H (1,0) had an odd number of coefficients, so
that the first lifting step must be odd. To preserve P1, we must cancel sL , so
that the first lifting step is
 
I 0
Λ1 = .
−sL /tL−1 I
Now that we have a difference of one filter coefficent in the first column, we
will reduce the entry with the most filter coefficients with two with a lifting step,
until we have H (0,0) = {K}, H (1,0) = 0 in the first column.
Assume first that H (0,0) has the most filter coefficients. We then need to
apply an even lifting step. Before an even step, the first column has the form
 
{t−k , . . . , t−1 , t0 , t1 , . . . , tk }
.
{s−k , . . . , s−1 , s0 , s1 , . . . , sk−1 }
 
I {−t−k /s−k , −tk /sk−1 }
We can then choose Λi = as a lifting step.
0 I
Assume then that H (1,0) has the most filter coefficients. We then need to
apply an odd lifting step. Before an odd step, the first column has the form
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS299

 
{t−k , . . . , t−1 , t0 , t1 , . . . , tk }
.
{s−k−1 , . . . , s−1 , s0 , s1 , . . . , sk }
 
I 0
We can then choose Λi = as a lifting step.
{−s−k−1 /t−k , −sk /tk } I
 
α {0, K}
If L is even we end up with a matrix on the form , and we can
0 β
 
I {0, −K/β}
choose the final lifting step as Λn = .
0 I
If L is odd we end up with a matrix on the form
 
α K
,
0 β
 
I −K/β
and we can choose the final lifting step as Λn = . Again using
0 I
equations (8.9)-(8.10), this gives us the lifting factorizations.
 In summary  we see that all even and odd lifting steps take the form
I {λ1 , λ2 } I 0
and . We see that symmetric lifting steps cor-
0 I λ1 , λ2 } I
respond to the special case when λ1 = λ2 . The even and odd lifting matrices
now used are

   
1 λ1 0 0 ··· 0 0 λ2 1 0 0 0 ··· 0 0 0
0
 1 0 0 ··· 0 0 0 
λ2
 1 λ1 0 ··· 0 0 0 
0 λ2 1 λ1 ··· 0 0 0 0 0 1 0 ··· 0 0 0
..  and  .. ..  ,
   
 .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . .
  
0 0 0 0 ··· λ2 1 λ1  0 0 0 0 ··· 0 1 0
0 0 0 0 ··· 0 0 1 λ1 0 0 0 ··· 0 λ2 1
(8.16)

respectively. We note that when we reduce elements to the left and right in
the upper and lower part of the first column, the same type of reductions must
occur in the second column, since the determinant H (0,0) H (1,1) − H(0, 1)H (1,0)
is a constant after any number of lifting steps.
This example explains the procedure for finding the lifting factorization
into steps of the form given in Equation (8.16). You will be spared the details
of writing an implementation which applies this procedure. In order to use
orthornormal wavelets in implementations, we have implemented a function
liftingfactortho, which takes N as input, and computes the steps in a lifting
factorization so that (8.8) holds. These are written to file, and read from file
when needed (you need not call liftingfactortho yourself, this is handled
behind the curtains). In the exercises, you will be asked to implement both these
non-symmetric elementary lifting steps, as well as the full kernel transformations
for orthonormal wavelets.
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS300

Exercise 8.8: Polyphase components for symmetric filters


Assume that the filters H0 , H1 of a wavelet are symmetric, and denote by S (i,j)
the polyphase components of the corresponding MRA-matrix H. Show that
S (0,0) and S (1,1) are symmetric filters, that the filter coefficients of S (1,0) has
symmetry about −1/2, and that S (0,1) has symmetry about 1/2. Also show a
similar statement for the MRA-matrix G of the inverse DWT.

Exercise 8.9: Implementing kernels transformations using


lifting
Up to now in this chapter we have obtained lifting factorizations for four different
wavelets where the filters are symmetric. Let us now implement the kernel
transformations for these wavelets. Your functions should call the functions from
Exercise 5.33 and Exercise 5.37. in order to compute the individual lifting steps.
Recall that the kernel transformations should take the input vector x, symm
(i.e. whether symmetric extension should be applied), and dual (i.e. whether
the dual wavelet transform should be applied) as input. You will need equations
(8.9)-(8.12) here, in order to complete the kernels for bot the transformations
and the dual transformations.
a) Write functions

dwt_kernel_53(x, bd_mode)
idwt_kernel_53(x, bd_mode)

which implement the DWT and IDWT kernel transformations for the Spline 5/3
wavelet. Use the lifting factorization obtained in Example 8.2.2.
b) Write functions

dwt_kernel_97(x, bd_mode)
idwt_kernel_97(x, bd_mode)

which implement the DWT and IDWT kernel transformations for the CDF 9/7
wavelet. Use the lifting factorization obtained in Example 8.2.3.
c) In Chapter 5, we listened to the low-resolution approximations and detail
components in sound for three different wavelets. Repeat these experiments
with the Spline 5/3 and the CDF 9/7 wavelet, using the new kernels we have
implemented in this exercise.
d) Plot all scaling functions and mother wavelets for the Spline 5/3 and the CDF
9/7 wavelets, using the cascade algorithm and the kernels you have implemented.

Exercise 8.10: Lifting orthonormal wavelets


In this exercise we will implement the kernel transformations for orthonormal
wavelets.
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS301

a) Write functions

lifting_even(lambda1, lambda2, x, bd_mode)


lifting_odd(lambda1, lambda2, x, bd_mode)

which apply the elementary lifting matrices (8.16) to x. Assume that N is even.
b) Write functions

dwt_kernel_ortho(x, filters, bd_mode)


idwt_kernel_ortho(x, filters, bd_mode)

which apply the DWT and IDWT kernel transformations for orthonormal
wavelets to x. You should call the functions lifting_even and lifting_odd.
You can assume that you can access the lifting steps so that the lifting factor-
ization (8.8) holds, through the object filters by writing filters.lambdas,
filters.alpha, and filters.beta. filters.lambdas is an n × 2-matrix so
that the filter coefficients {λ1 , λ2 } or {λ1 , λ2 } in the i’th lifting step is found in
row i. Recall that the last lifting step was even.
Due to the filters object, the functions dwt_kernel_ortho and idwt_kernel_ortho
do not abide to the signature we have required for kernel functions up to now.
The code base creates such functions based on the functions above in the following
way:

filters = ...
dwt_kernel = lambda x, bd_mode: dwt_kernel_ortho(x, filters, bd_mode)

c) Listen to the low-resolution approximations and detail components in sound


for orthonormal wavelets for N = 1, 2, 3, 4.
d) Plot all scaling functions and mother wavelets for the orthonormal wavelets for
N = 1, 2, 3, 4, using the cascade algorithm. Since the wavelets are orthonormal,
we should have that φ = φ̃, and ψ = ψ̃. In other words, you should see that the
bottom plots equal the upper plots.

Exercise 8.11: 4 vanishing moments


In Exercise 5.39 we found constants α, β, γ, δ which give the coordinates of ψ̂ in
(φ1 , ψ̂1 ), where ψ̂ had four vanishing moments, and where we worked with the
multiresolution analysis of piecewise constant functions.
a) Show that the polyphase representation of G when ψ̂ is used as mother
wavelet can be factored as
  
1 I 0 I {−γ, −α, −β, −δ}
√ . (8.17)
2 {1/2, 1/2} I 0 I
You here need to reconstruct what you did in the lifting factorization for the
alternative piecewise linear wavelet, i.e. write
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS302

PD1 ←(φ1 ,ψ̂1 ) = PD1 ←(φ1 ,ψ1 ) P(φ1 ,ψ1 )←(φ1 ,ψ̂1 ) .
By inversion, find also a lifting factorization of H.

Exercise 8.12: Wavelet based on piecewise quadratic scaling


function
In Exercise 7.3 you should have found the filters

1
H0 = {−5, 20, −1, −96, 70, 280, 70, −96, −1, 20, −5}
128
1
H1 = {1, −4, 6, −4, 1}
16
1
G0 = {1, 4, 6, 4, 1}
16
1
G1 = {5, 20, 1, −96, −70, 280, −70, −96, 1, 20, 5}.
128
a) Show that

1
I − 14 {1, 1}
    1 
I − 128 {5, −29, −29, 5} I 0 0
G= 4 .
0 I −{1, 1} I 0 I 0 4
From this we can easily derive the lifting factorization of G.
b) Listen to the low-resolution approximations and detail components in sound
for this wavelet.
c) Plot all scaling functions and mother wavelets for this wavelet, using the
cascade algorithm.

8.3 Cosine-modulated filter banks and the MP3


standard
Previously we saw that the MP3 standard used a certain filter bank, called a
cosine-modulated filter bank. We also illustrated that, surprisingly for a much
used international standard, the synthesis system did not exactly invert the
analysis system, i.e. we do not have perfect reconstruction, only “near-perfect
reconstruction”. In this section we will first explain how this filter bank can be
constructed, and why it can not give perfect reconstruction. In particular it will
be clear how the prototype filter can be constructed. We will then construct a
very similar filter bank, which actually can give perfect reconstruction. It may
seem very surprising that the MP3 standard does not use this filter bank instead
due to this. The explanation may lie in that the MP3 standard was established
at about the same time as these filter banks were developed, so that the standard
did not capture this very similar filter bank with perfect reconstruction.
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS303

8.3.1 Polyphase representations of the filter bank trans-


forms
The main idea is to find the polyphase representations of the forward and reverse
filter bank transforms of the MP3 standard. We start with the expression
511
X
z32(s−1)+n = cos((n + 1/2)(k − 16)π/32)hk x32s−k−1 , (8.18)
k=0
which lead to the expression of the forward filter bank transform (Theorem 6.24).
Using that any k < 512 can be written uniquely on the form k = m + 64r, where
0 ≤ m < 64, and 0 ≤ r < 8, we can rewrite this as

63 X
X 7
= (−1)r cos (2π(n + 1/2)(m − 16)/64) hm+64r x32s−(m+64r)−1
m=0 r=0
63
X 7
X
= cos (2π(n + 1/2)(m − 16)/64) (−1)r hm+32·2r x32(s−2r)−m−1 .
m=0 r=0

Here we also used Property (6.35). If we write

V (m) = {(−1)0 hm , 0, (−1)1 hm+64 , 0, (−1)2 hm+128 , . . . , (−1)7 hm+7·64 , 0},


(8.19)
for 0 ≤ m ≤ 63, and we can write the expression above as

63
X 15
X
cos (2π(n + 1/2)(m − 16)/64) Vr(m) x32(s−r)−m−1
m=0 r=0
63 15
(32−m−1)
X X
= cos (2π(n + 1/2)(m − 16)/64) Vr(m) xs−1−r
m=0 r=0
63
X
= cos (2π(n + 1/2)(m − 16)/64) (V (m) x(32−m−1) )s−1 ,
m=0

where we recognized x32(s−r)−m−1 in terms of the polyphase components of


x, and the inner sum as a convolution. We remark that the inner terms
{(V (m) x(32−m−1) )s−1 }63
m=0 here are what the standard calls partial calculations
(windowing refers to multiplication with the combined set of filter coefficients of
the V (m) ), and that matrixing here represents the multiplication with the cosine
entries. Since z (n) = {z32(s−1)+n }∞
s=0 is the n’th polyphase component of z, this
can be written as

63
X
z (n) = cos (2π(n + 1/2)(m − 16)/64) IV (m) x(32−m−1) .
m=0
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS304

In terms of matrices this can be written as

 
cos (2π(0 + 1/2) · (−16)/64) I ··· cos (2π(0 + 1/2) · (47)/64) I
z=
 .. .. .. 
. . . 
cos (2π(31 + 1/2) · (−16)/64) I ··· cos (2π(31 + 1/2) · (47)/64) I
V (0)
 
0 ··· 0 0  (31) 
 0 V (1)
··· 0 0  x(30)
  x 
×  ... .. .. .. ..
  ..  .
  
 . . . .  . 
 0 0 ··· V (62) 0 
(63) x(−32)
0 0 ··· 0 V

If we place the 15 first columns in the cosine matrix last using Property (6.35)
(we must then also place the 15 first rows last in the second matrix), we obtain

 
cos (2π(0 + 1/2) · (0)/64) I ··· cos (2π(0 + 1/2) · (63)/64) I
z=
 .. .. .. 
. . . 
cos (2π(31 + 1/2) · (0)/64) I ··· cos (2π(31 + 1/2) · (63)/64) I
··· V (16) ···
 
0 0 0
 .. .. .. .. .. ..   x(31) 
 . . . . . .   (30) 
x

 0 ··· 0 0 ··· V (63) 
×  ..  .

−V (0) ··· ··· 0 ··· 0 
  . 
 . .. .. .. .. (−32)
 .. 0  x

. . . .
0 ··· −V (15) 0 ··· 0

Using Equation (6.36) to combine column k and 64 − k in the cosine matrix (as
well as row k and 64 − k in the second matrix), we can write this as

x(31)
 
 
cos (2π(0 + 1/2) · (0)/64) I ··· cos (2π(0 + 1/2) · (31)/64) I
.. ..  x(30) 
 ..  0
 A B0 

.

 . . . ..
 . 
cos (2π(31 + 1/2) · (0)/64) I ··· cos (2π(31 + 1/2) · (31)/64) I
x(−32)

where
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS305

··· V (16) ···


 
0 0 0 0 0
 0 0 ··· V (15) 0 V (17) ··· 0 
.. .. .. .. ..
 
 .. .. 

 . . . . . . . 0  
 0 V (1) ··· 0 0 0 ··· V (31) 
A0 = 
V (0)

 0 ··· 0 0 0 ··· 0  
 0
 0 ··· 0 0 0 ··· 0  
 . .. .. .. .. .. .. .. 
 .. . . . . . . . 
0 0 ··· 0 0 0 ··· 0
··· ···
 
0 0 0 0 0 0
 .. .. .. .. .. .. .. .. 
 . . . . . . . . 
 
0
V(32) 0 ··· 0 0 0 ··· 0 
B = .
 0
 V (33) ··· 0 0 0 ··· −V (63) 

 . .. .. .. .. .. .. .. 
 .. . . . . . . . 
0 0 ··· V (47) 0 −V (49) ··· 0

Using Equation (4.3), the cosine matrix here can be written as


√ 
2 0 ··· 0
1 ···
r
M  0 0
(DM )T  . .
 
.. .. ..
2  .. . . . 
0 0 ··· 1
The above can thus be written as

x(31)
 

 x(30) 
4(D32 )T A B  ,
 
..
 . 
x(−32)

where A and B are the √ matrices A0 , B 0 with the first row multiplied by 2
(i.e. replace V (16) with 2V (16) in the matrix A0 ). Using that x(−i) = E1 xi for
1 ≤ i ≤ 32, we can write this as

x(31)
 
 .. 

 (0) . 

  (31) 
x

E1 x(31)

 x T   .  ..
4(D32 )T A

E1 x(31)  = 4(D32 ) A  ..  + B 
B   ,
 
.

(0) (0)
x E1 x
 
 .. 
 . 
E1 x(0)
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS306

which can be written as


0 0 ··· 0 2V (16) 0 ··· 0
 
 0 0 ··· V (15) 0 V (17) ··· 0 
.. .. .. .. ..
 
 .. .. 
. . . . . . .  x(31)
  

0 V (1) ··· 0 0 0 ··· V (31) 
. 
4(D32 )T 

  ..  ,

V (0) + E1 V (32) 0 ··· 0 0 0 ··· 0
x(0)
 

 0 E1 V (33) ··· 0 0 0 ··· −E1 V (63) 

 .. .. .. .. .. .. .. .. 
 . . . . . . . . 
0 0 ··· E1 V (47) 0 −E1 V (49) ··· 0

which also can be written as


0 ··· 0 2V (16) 0 ··· 0 0
 
 0 ··· V (17) 0 V (15) ··· 0 0 
.. .. .. .. .. ..
 
 .. .. 
. . . . . . . .  x(0)
  

 V (31) ··· 0 0 0 ··· V (1) 0 . 
4(D32 )T 

 ..  .
 0 ··· 0 0 0 ··· 0 V (0) + E1 V (32) 
 x(31)
 
−E1 V (63) ··· 0 0 0 ··· E1 V (33) 0
 
 .. .. .. .. .. .. .. .. 
 . . . . . . . . 
0 ··· −E1 V (49) 0 E1 V (47) ··· 0 0

We have therefore proved the following result.


Theorem 8.11. Polyphase factorization of a forward filter bank transform based
on a prototype filter.
The polyphase form of a forward filter bank transform based on a prototype
filter can be factored as


0 ··· 0 2V (16) 0 ··· 0 0
 
(17) (15)
 0 ··· V 0 V ··· 0 0 
.. .. .. .. .. ..
 
 .. .. 
 . . . . . . . . 
 (31)

T  V
 ··· 0 0 0 ··· V (1) 0 
4(D32 )  (0)

(32) 
 0 ··· 0 0 0 ··· 0 V + E1 V 
−E1 V (63) ··· 0 0 0 ··· E1 V (33) 0 
 
 .. .. .. .. .. .. .. .. 
 . . . . . . . . 
0 ··· −E1 V (49) 0 E1 V (47) ··· 0 0
(8.20)

Due to Theorem 6.26, it is also very simple to write down the polyphase
factorization of the reverse filter bank transform as well. Since E481 GT is a
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS307

forward filter bank transform where the prototype filter has been reversed,
E481 GT can be factored as above, with V (m) replaced by W (m) , with W (m)
being the filters derived from the synthesis prototype filter in reverse order. This
means that the polyphase form of G can be factored as

0 0 ··· (W (31) )T 0 −E−1 (W (63) )T ··· 0


 
.. .. .. .. .. .. .. ..
. .
 
 . . . . . . 
(W (17) )T · · · −E−1 (W (49) )T 
 
√
 0 ··· 0 0 0 
 2(W (16) )T 0 ··· 0 0 0 ··· 0 
4 

 0 (W (15) )T ··· 0 0 0 · · · E−1 (W (47) )T 
 .. .. .. .. .. .. .. .. 

 . . . . . . . . 

 0 0 ··· (W (1) )T 0 E−1 (W (33) )T ··· 0 
(0) T
0 0 ··· 0 (W ) + E−1 (W (32) )T 0 ··· 0
× D32 E481 . (8.21)

Now, if we define U (m) as the filters derived from the synthesis prototype filter
itself, we have that

(W (k) )T = −E−14 V (64−k) , 1 ≤ k ≤ 15 (W (0) )T = E−16 V (0) .


Inserting this in Equation (8.21) we get the following result:
Theorem 8.12. Polyphase factorization of a reverse filter bank transform based
on a prototype filter.
Assume that G is a reverse filter filter bank transform based on a prototype
filter, and that U (m) are the filters derived from this prototype filter. Then the
polyphase form of G can be factored as

0 0 · · · −U (33) 0 E−1 U (1) ··· 0


 
.. .. .. .. .. .. .. ..
. .
 
 . . . . . . 
−U (47) · · · E−1 U (15) 
 
 √ 0 (48)
 ··· 0 0 0 
− 2U 0 ··· 0 0 0 ··· 0 
4 

 0 −U (49) ··· 0 0 0 · · · −E−1 U (17) 

 .. .. .. .. .. .. .. .. 

 . . . . . . . . 

 0 0 · · · −U (63) 0 −E−1 U (31) ··· 0 
(0)
0 0 ··· 0 E−2 U − E−1 U (32) 0 ··· 0
× D32 E33 . (8.22)
Now, consider the matrices

V (32−i) V (i) −U (32+i) E−1 U (i)


   
and . (8.23)
−E1 V (64−i) E1 V (32+i) −U (64−i) −E−1 U (32−i)
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS308

for 1 ≤ i ≤ 15. These make out submatrices in the matrices in equations (8.20)
and (8.22). Clearly, only the product of these matrices influence the result. Since

−U (32+i) E−1 U (i) V (32−i) V (i)


  

−U (64−i) −E−1 U (32−i) −E1 V (64−i) E1 V (32+i)


−U (32+i) U (i) V (32−i) V (i)
   
= (8.24)
−U (64−i) −U (32−i) −V (64−i) V (32+i)

we have the following result.


Theorem 8.13. Filter bank transforms.
Let H, G be forward and reverse filter bank transforms defined from analysis
and synthesis prototype filters. Let also V (k) be the prototype filter of H, and
U (k) the reverse of the prototype filter of G. If

−U (32+i) U (i) V (32−i) V (i)


    
Ed 0
=c
−U (64−i) −U (32−i) −V (64−i)
V (32+i) 0 Ed
√ (16) √ (48)
( 2V )(− 2U ) = cEd
(V (0) + E1 V (32) )(E−2 U (0) − E−1 U (32) ) = cEd (8.25)

for 1 ≤ i ≤ 15, then GH = 16cE33+32d .

This result is the key ingredient we need in order to construct forward and
reverse systems which together give perfect reconstuction. In Exercise 8.15 we
go through how we can use lifting in order to express a wide range of possible
(U, V ) matrix pairs which satisfy Equation (8.25). This turns the problem of
constructing cosine-modulated filter banks which are useful for audio coding
into an optimization problem: the optimization variables are values λi which
characterize lifting steps, and the objective function is the deviation of the
corresponding prototype filter from an ideal bandpass filter. This optimization
problem has been subject to a lot of research, and we will not go into details on
this.

8.3.2 The prototype filters


Now, let us return to the MP3 standard. We previously observed that in this
standard the coefficients in the synthesis prototype filter seemed to equal 32
times the analysis prototype filter. This indicates that U (k) = 32V (k) . A closer
inspection also yields that there is a symmetry in the values of the prototype
filter: We see that Ci = −C512−i (i.e. antisymmetry) for most values of i. The
only exception is for i = 64, 128, . . . , 448, for which Ci = C512−i (i.e. symmetry).
The antisymmetry can be translated to that the filter coefficients of V (k) equal
those of V (64−k) in reverse order, with a minus sign. The symmetry can be
translated to that V (0) is symmetric. These observations can be rewritten as
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS309

V (64−k) = −E14 (V (k) )T , 1 ≤ k ≤ 15. (8.26)


V (0) = E16 (V (0) )T . (8.27)
Inserting first that U (k) = 32V (k) in Equation (8.24) gives

−U (32+i) U (i)
  (32−i)
V (i)
 
V
−U (64−i) −U (32−i) −V (64−i) V (32+i)
−V (32+i) V (i)
 (32−i)
V (i)
  
V
=32 .
−V (64−i) −V (32−i) −V (64−i) V (32+i)
Substituting for V (32+i) and V (64−i) after what we found by inspection now
gives

E14 (V (32−i) )T V (i) V (32−i) V (i)


  
32
E14 (V (i) )T −V (32−i) E14 (V (i) )T −E14 (V (32−i) )T
  (32−i) T
V (i)
  (32−i)
V (i)
 
E14 0 (V ) V
=32
0 E14 (V (i) )T −V (32−i) (V (i) )T −(V (32−i) )T
  (32−i) T  (32−i)
V (i) V (i)
 
E14 0 V V
=32
0 E14 (V (i) )T −(V (32−i) )T (V (i) )T −(V (32−i) )T
  (i) (i) T
V (V ) + V (32−i) (V (32−i) )T
 
E14 0 0
=32 .
0 E14 0 V (i) (V (i) )T + V (32−i) (V (32−i) )T
(8.28)
Due to Exercise 8.6 (set A = (V (32−i) )T , B = (V (i) )T ), with
 (32−i) T
V (32−i) V (i) V (i)
  
(V )
H= G=
(V (i) )T −(V (32−i) )T (V (i) )T −V (32−i)
we recognize an alternative QMF filter bank. We thus have alias cancellation,
with perfect reconstruction only if |λH0 (ω)|2 + |λH0 (ω + π)|2 . For the two
remaining filters we compute

√ √
( 2V (16) )(− 2U (48) )
= −64V (16) V (48) = 64E14 V (16) (V (16) )T = 32E14 (V (16) (V (16) )T + V (16) (V (16) )T )
(8.29)
and

(V (0) + E1 V (32) )(E−2 U (0) − E−1 U (32) )


= 32(V (0) + E1 V (32) )(E−2 V (0) − E−1 V (32) ) = 32E−2 (V (0) + E1 V (32) )(V (0) − E1 V (32) )
= 32E−2 (V (0) )2 − (V (32) )2 ) = 32E14 ((V (0) (V (0) )T + V (32) (V (32) )T )). (8.30)
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS310

We see that the filters from equations (8.28)-(8.30) are similar, and that we thus
can combine them into

{V (i) (V (i) )T + V (32−i) (V (32−i) )T }16


i=0 . (8.31)
(16) (16) T
All of these can be the identity, expect for 1024V (V ) , since we know
that the product of two FIR filters is never the identity, except when both are
delays (And all V (m) are FIR, since the prototype filters defined by the MP3
standard are FIR). This single filter is thus what spoils for perfect reconstruction,
so that we can only hope for alias cancellation, and this happens when the filters
from Equation (8.31) all are equal. Ideally this is close to cI for some scalar c,
and we then have that

GH = 16 · 32cE33+448 = 512cE481 I.
This explains the observation from the MP3 standard that GH seems to be close
to E481 . Since all the filters V (i) (V (i) )T + V (32−i) (V (32−i) )T are symmetric, GH
is also a symmetric filter due to Theorem 8.3, so that its frequency response is
real, so that we have no phase distortion. We can thus summarize our findings
as follows.
Observation 8.14. MP3 standard.
The prototype filters from the MP3 standard do not give perfect reconstruc-
tion. They are found by choosing 17 filters {V (k) }16 k=0 so that the filters from
Equation (8.31) are equal, and so that their combination into a prototype filter
using equations (8.19) and (8.26) is as close to an ideal bandpass filter as possible.
When we have equality the alias cancellation condition is satisfied, and we also
1
have no phase distortion. When the common value is close to 512 I, GH is close
to E481 , so that we have near-perfect reconstruction.
This states clearly the optimization problem which the values stated in the
MP3 standard solves.

8.3.3 Perfect reconstruction


How can we overcome the problem that 1024V (16) (V (16) )T = 6 I, which spoiled
for perfect reconstruction in the MP3 standard? It turns out that we can address
this a simple change in our procedure. In Equation (8.18) we replace with

511
X
z32(s−1)+n = cos((n + 1/2)(k + 1/2 − 16)π/32)hk x32s−k−1 , (8.32)
k=0

i.e. 1/2 is added inside the cosine. We now have the properties

cos (2π(n + 1/2)(k + 64r + 1/2)/(2N )) = (−1)r cos (2π(n + 1/2)(k + 1/2)/(2N ))
(8.33)
cos (2π(n + 1/2)(2N − k − 1 + 1/2)/(2N )) = − cos (2π(n + 1/2)(k + 1/2)/(2N )) .
(8.34)
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS311

Due to the first property, we can deduce as before that

63
X
z (n) = cos (2π(n + 1/2)(m + 1/2 − 16)/64) IV (m) x(32−m−1) ,
m=0

where the filters V (m) are defined as before. As before placing the 15 first
columns of the cosine-matrix last, but instead using Property (8.34) to combine
columns k and 64 − k − 1 of the cosine-matrix, we can write this as

x(31)
   
cos (2π(0 + 1/2) · (0 + 1/2)/64) I ··· cos (2π(0 + 1/2) · (31 + 1/2)/64) I
.. .. .. B  ... 

 A
  
 . . .
cos (2π(31 + 1/2) · (0 + 1/2)/64) I ··· cos (2π(31 + 1/2) · (31 + 1/2)/64) I x(−32)

where

V (15) V (16)

··· ··· ···

0 0 0
 .. .. .. .. .. .. .. .. 
 .
 . . . . . . . 
(1)
 0
 V ··· 0 0 ··· V (30) 0  
(0)
A = V
 0 ··· 0 0 ··· ··· V (31) 

 0
 0 ··· 0 0 0 ··· 0  
 . .. .. .. .. .. .. .. 
 .. . . . . . . . 
0 0 ··· 0 0 0 ··· 0
··· ···
 
0 0 0 0 0 0
 .. .. .. .. .. .. .. ..
 . . . . . . . .
 
V(32) 0 ··· 0 0 0 ··· −V (63) 
B= .
 0
 V (33) ··· 0 0 ··· −V (62) 0  
 . .. .. .. .. .. .. .. 
 .. . . . . . . . 
0 0 ··· V (47) −V (48) ··· ··· 0
q
M (iv)
Since the cosine matrix can be written as 2 DM , the above can be written as
 (31) 
x
(iv)  . 
4DM A B  ..  .
x(−32)
As before we can rewrite this as
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS312

x(31)
 
 .. 

 (0) . 

  (31) 
x

E1 x(31)

(iv)  x  (iv)   .  ..
4DM A E1 x(31)  = 4DM A  ..  + B 
B   ,
 
.

(0) (0)
x E1 x
 
 .. 
 . 
E1 x(0)

which can be written as

0 0 ··· V (15) V (16) ··· ··· 0


 
.. .. .. .. .. .. .. ..
. .
 
 . . . . . . 
V (1) V (30)
 
 0
 (0) ··· 0 0 ··· 0 
 x(31)

(iv)  V 0 ··· 0 0 ··· ··· V (31)  . 
4DM   ..  ,
E1 V(32) 0 ··· 0 0 ··· ··· −E1 V (63) 
 x(0)
 
 0
 E1 V (33) ··· 0 0 ··· −E1 V (62) 0 
 . .. .. .. .. .. .. ..
 ..

. . . . . . . 
0 0 ··· E1 V (47) −E1 V (48) ··· ··· 0

which also can be written as

0 0 ··· V (16) V (15) ··· ··· 0


 
.. .. .. .. .. .. .. ..
. .
 
 . . . . . . 
V (30) V (1)
 

 0 ··· 0 0 ··· 0   x(0)

(31) (0) 
(iv)  V 0 ··· 0 0 ··· ··· V . 
4DM   ..  .
−E1 V(63) 0 ··· 0 0 ··· ··· E1 V (32) 
x(31)
 

 0 −E1 V (62) ··· 0 0 ··· E1 V (33) 0  
 .. .. .. .. .. .. .. .. 
 . . . . . . . . 
0 0 ··· −E1 V (48) E1 V (47) ··· ··· 0

We therefore have the following result


Theorem 8.15. Polyphase factorization of a forward filter bank transform based
on a prototype filter, modified version.
The modified version of the polyphase form of a forward filter bank transform
based on a prototype filter can be factored as
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS313

0 0 ··· V (16) V (15) ··· ··· 0


 
.. .. .. .. .. .. .. ..
. .
 
 . . . . . . 
V (30) V (1)
 

 0 ··· 0 0 ··· 0  
(31)
(iv)  V 0 ··· 0 0 ··· ··· V (0) 
4DM  
−E1 V(63)
 0 ··· 0 0 ··· ··· E1 V (32) 


 0 −E1 V (62) ··· 0 0 ··· E1 V (33) 0  
 .. .. .. .. .. .. .. .. 
 . . . . . . . . 
0 0 ··· −E1 V (48) E1 V (47) ··· ··· 0
(8.35)
Clearly this factorization avoids having two blocks of filters: There are now
16 2 × 2-polyphase matrices, and as we know, each of them can be invertible, so
that the full matrix can be inverted in a similar fashion as before. It is therefore
now possible to obtain perfect reconstruction. Although we do not state recipes
for implementing this, one has just as efficient implementations as in the MP3
standard.
Since we ended up with the 2 × 2 polyphase matrices Mk , we can apply the
lifting factorization in order to halve the number of multiplications/additions.
This is not done in practice, since a lifting factorization requires that we compute
all outputs at once. In audio coding it is required that we compute the output
progressively, due to the large size of the input vector. The procedure above is
therefore mostly useful for providing the requirements for the filters, while the
preceding comments can be used for the implementation.

Exercise 8.13: Run forward and reverse transform


Run the forward and then the reverse transform from Exercise 6.26 on the vector
(1, 2, 3, . . . , 8192). Verify that there seems to be a delay on 481 elements, as
promised by Therorem 8.14. Do you get the exact same result back?

Exercise 8.14: Verify statement of filters


Use your computer to verify the symmetries we have stated for the symmetries
in the prototype filters, i.e. that
(
−C512−i i 6= 64, 128, . . . , 448
Ci =
C512−i i = 64, 128, . . . , 448.
Explain also that this implies that hi = h512−i for i = 1, . . . , 511. In other words,
the prototype filter has symmetry around (511 + 1)/2 = 256, so that it has linear
phase.
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS314

Exercise 8.15: Lifting


We mentioned that we could use the lifting factorization to construct filters on
the form stated in Equation (8.19), so that the matrices on the form given by
Equation (8.23), i.e.
 (32−i)
V (i)

V
,
−V (64−i) V (32+i)
are invertible. Let us see what kind of lifting steps produce such matrices.
a) Show that the lifting steps
   
I λE2 I 0
and
0 I λI I
applied in alternating order to a matrix on the form given by Equation (8.23),
where the filters are on the from given by Equation (8.19), again produces
matrices and filters on these forms. This explains how we can parametrize a
larger number of such matrices with the help of lifting steps.It also explain why
the inverse matrix is on the form stated in Equation (8.23) with filters on the
same form, since the inverse lifting steps are of the same type.
b) Explain that 16 numbers {λi }16 i=1 are needed (together with what we start
with on the diagonal in the lifting construction), in order to construct filters so
that the prototype filter has 512 coefficients. Since there are 15 submatrices,
this gives 240 optimization variables.
Lifting gives the following strategy for finding a corresponding synthesis
prototype filter which gives perfect reconstruction: First compute matrices V, W
which are inverses of oneanother using lifting (using the lifting steps of this
exercise ensures that all filters will be on the form stated in Equation (8.19)),
and write

T
V (1) V (2) (W (1) )T (W (2) )T
  (1)
−W (3)
  (1)
V (2)
 
W V
VW = =
−V (3) V (4) W (2) W (4) −V (3) V (4) −(W (3) )T (W (4) )T
T 
V (1) V (2) E15 (W (1) )T E15 (W (2) )T
  
E15 0
= = I.
−V (3) V (4) −E15 (W (3) )T E15 (W (4) )T 0 E15

Now, the matrices U (i) = E15 (W (i) )T are on the form stated in Equation (8.19),
and we have that
 (1)
V (2)
  (1)
U (2)
  
V U E−15 0
=
−V (3) V (4) −U (3) U (4) 0 E−15
We can now conclude from Theorem 8.13 that if we define the synthesis prototype
filter as therein, and set c = 1, d = −15, we have that GH = 16E481−32·15 =
16E1 .
CHAPTER 8. THE POLYPHASE REPRESENTATION AND WAVELETS315

8.4 Summary
We defined the polyphase representation of a matrix, and proved some useful
properties. For filter bank transforms, the polyphase representation was a block
matrix where the blocks are filters, and these blocks/filters were called polyphase
components. In particular, the filter bank transforms of wavelets were 2 × 2-block
matrices of filters. We saw that, for wavelets, the polyphase representation could
be realized through a rearrangement of the wavelet bases, and thus paralleled
the development in Chapter 6 for expressing the DWT in terms of filters, where
we instead rearranged the target base of the DWT.
We showed with two examples that factoring the polyphase representation
into simpler matrices (also refered to as a polyphase factorization) could be
a useful technique. First, for wavelets (M = 2), we established the lifting
factorization. This is useful not only since it factorizes the DWT and the IDWT
into simpler operations, but also since it reduces the number of arithmetic
operations in these. The lifting factorization is therefore also used in practical
implementations, and we applied it to some of the wavelets we constructed in
Chapter 7. The JPEG2000 standard document [21] explains a procedure for
implementing some of these wavelet transforms using lifting, and the values of
the lifting steps used in the standard thus also appear here.
The polyphase representation was also useful for proving the characterization
of wavelets we encountered in Chapter 7, which we used to find expressions for
many useful wavelets.
The polyphase representation was also useful to explain how the prototype
filters of the MP3 standard should be chosen, in order for the reverse filter bank
transform to invert the forward filter bank transform. Again this was attacked
by factoring the polyphase representation of the forward and reverse filter bank
transforms. The parts of the factorization which represented the prototype
filters were represented by a sparse matrix, and it was clear from this matrix
what properties we needed to put on the prototype filter, in order to have alias
cancellation, and no phase distortion. In fact, we proved that the MP3 standard
could not possible give perfect reconstruction, but it was very clear from our
construction how the filter bank could be modified in order for the overall system
to provide perfect reconstruction.
The lifting scheme as introduced here was first proposed by Sweldens [45].
How to use lifting for in-place calculation for the DWT was also suggested by
Sweldens [44].
This development concludes the one-dimensional aspect of wavelets in this
book. In the following we will extend our theory to also apply for images. Images
will be presented in Chapter 9. After that we will define the tensor product
concept, which will be the key ingredient to apply wavelets to two-dimensional
objects such as images.
Chapter 9

Digital images

Upto now we have presented wavelets in a one-dimensional setting. Images,


however, are two-dimensional by nature. This poses another challenge, which we
did not encounter in the case of sound signals. In this chapter we will establish
the mathematics to handle this, but first we will present some basics on images,
as well as how they can be represented and manipulated with simple mathematics.
Images are a very important type of digital media, and this material is thus useful,
general knowledge for anyone with a digital camera and a computer. For many
scientists this material is also an essential tool. As an example, in astrophysics
data from both satellites and distant stars and galaxies is collected in the form
of images, and information is extracted from the images with advanced image
processing techniques. As another example, medical imaging makes it possible
to gather different kinds of information in the form of images, even from the
inside of the body. By analysing these images it is possible to discover tumours
and other disorders.
We will see how filter-based operations extend naturally to the two-dimensional
setting of images. Smoothing and edge detections are the two main examples
of filter-based operations we will concider for images. The key mathematical
concept in this extension is the tensor product, which can be thought of as
a general tool for constructing two-dimensional objects from one-dimensional
counterparts. We will also see that the tensor product allows us to establish an
efficient implementation of filtering for images, efficient meaning a complexity
substantially less than what is required by general linear transformations.
We will finally consider useful coordinate changes for images. Recall that
the DFT, the DCT, and the wavelet transform were all defined as changes of
coordinates for vectors or functions of one variable, and therefore cannot be
directly applied to two-dimensional data like images. It turns out that the tensor
product can also be used to extend changes of coordinates to a two-dimensional
setting.
Functionality for accessing images are collected in a module called images.

316
CHAPTER 9. DIGITAL IMAGES 317

9.1 What is an image?


Before we do computations with images, it is helpful to be clear about what an
image really is. Images cannot be perceived unless there is some light present,
so we first review superficially what light is.

Light.
Fact 9.1. Light.
Light is electromagnetic radiation with wavelengths in the range 400–700 nm
(1 nm is 10−9 m): Violet has wavelength 400 nm and red has wavelength 700
nm. White light contains roughly equal amounts of all wave lengths.

Other examples of electromagnetic radiation are gamma radiation, ultraviolet


and infrared radiation and radio waves, and all electromagnetic radiation travel
at the speed of light (≈ 3 × 108 m/s). Electromagnetic radiation consists of
waves and may be reflected and refracted, just like sound waves (but sound
waves are not electromagnetic waves).
We can only see objects that emit light, and there are two ways that this
can happen. The object can emit light itself, like a lamp or a computer monitor,
or it reflects light that falls on it. An object that reflects light usually absorbs
light as well. If we perceive the object as red it means that the object absorbs
all light except red, which is reflected. An object that emits light is different; if
it is to be perceived as being red it must emit only red light.

Digital output media. Our focus will be on objects that emit light, for
example a computer display. A computer monitor consists of a matrix of small
dots which emit light. In most technologies, each dot is really three smaller dots,
and each of these smaller dots emit red, green and blue light. If the amounts of
red, green and blue is varied, our brain merges the light from the three small
light sources and perceives light of different colors. In this way the color at each
set of three dots can be controlled, and a color image can be built from the total
number of dots.
It is important to realise that it is possible to generate most, but not all,
colors by mixing red, green and blue. In addition, different computer monitors
use slightly different red, green and blue colors, and unless this is taken into
consideration, colors will look different on the two monitors. This also means
that some colors that can be displayed on one monitor may not be displayable
on a different monitor.
Printers use the same principle of building an image from small dots. On
most printers however, the small dots do not consist of smaller dots of different
colors. Instead as many as 7–8 different inks (or similar substances) are mixed
to the right color. This makes it possible to produce a wide range of colors, but
not all, and the problem of matching a color from another device like a monitor
is at least as difficult as matching different colors across different monitors.
CHAPTER 9. DIGITAL IMAGES 318

Video projectors builds an image that is projected onto a wall. The final
image is therefore a reflected image and it is important that the surface is white
so that it reflects all colors equally.
The quality of a device is closely linked to the density of the dots.
Fact 9.2. Resolution.
The resolution of a medium is the number of dots per inch (dpi). The number
of dots per inch for monitors is usually in the range 70–120, while for printers it is
in the range 150–4800 dpi. The horizontal and vertical densities may be different.
On a monitor the dots are usually referred to as pixels (picture elements).

Digital input media. The two most common ways to acquire digital images
is with a digital camera or a scanner. A scanner essentially takes a photo of a
document in the form of a matrix of (possibly colored) dots. As for printers, an
important measure of quality is the number of dots per inch.
Fact 9.3. Printers.
The resolution of a scanner usually varies in the range 75 dpi to 9600 dpi,
and the color is represented with up to 48 bits per dot.
For digital cameras it does not make sense to measure the resolution in dots
per inch, as this depends on how the image is printed (its size). Instead the
resolution is measured in the number of dots recorded.
Fact 9.4. Pixels.
The number of pixels recorded by a digital camera usually varies in the range
320 × 240 to 6000 × 4000 with 24 bits of color information per pixel. The total
number of pixels varies in the range 76 800 to 24 000 000 (0.077 megapixels to
24 megapixels).
For scanners and cameras it is easy to think that the more dots (pixels), the
better the quality. Although there is some truth to this, there are many other
factors that influence the quality. The main problem is that the measured color
information is very easily polluted by noise. And of course high resolution also
means that the resulting files become very big; an uncompressed 6000 × 4000
image produces a 72 MB file. The advantage of high resolution is that you can
magnify the image considerably and still maintain reasonable quality.

Definition of digital image. We have already talked about digital images,


but we have not yet been precise about what they are. From a mathematical
point of view, an image is quite simple.
Fact 9.5. Digital image.
A digital image P is a matrix of intensity values {pi,j }M,N
i,j=1 . For grey-level
images, the value pi,j is a single number, while for color images each pi,j is a
vector of three or more values. If the image is recorded in the rgb-model, each
pi,j is a vector of three values,
pi,j = (ri,j , gi,j , bi,j ),
CHAPTER 9. DIGITAL IMAGES 319

that denote the amount of red, green and blue at the point (i, j).
Note that, when referring to the coordinates (i, j) in an image, i will refer to
row index, j to column index, in the same was as for matrices. In particular,
−1
the top row in the image have coordinates {(0, j)}N j=0 , while the left column in
M −1
the image has coordinates {(i, 0)}i=0 . With this notation, the dimension of the
image is M × N . The value pi,j gives the color information at the point (i, j).
It is important to remember that there are many formats for this. The simplest
case is plain black and white images in which case pi,j is either 0 or 1. For
grey-level images the intensities are usually integers in the range 0–255. However,
we will assume that the intensities vary in the interval [0, 1], as this sometimes
simplifies the form of some mathematical functions. For color images there are
many different formats, but we will just consider the rgb-format mentioned in
the fact box. Usually the three components are given as integers in the range
0–255, but as for grey-level images, we will assume that they are real numbers
in the interval [0, 1] (the conversion between the two ranges is straightforward,
see Example 9.3 below).

Figure 9.1: Our test image.

In Figure 9.1 we have shown the test image we will work with, called the
Lena image. It is named after the girl in the image. This image is also used as a
test image in many textbooks on image processing.
In Figure 9.2 we have shown the corresponding black and white, and grey-level
versions of the test image.
Fact 9.6. Intensity.
In these notes the intensity values pi,j are assumed to be real numbers in the
interval [0, 1]. For color images, each of the red, green, and blue intensity values
are assumed to be real numbers in [0, 1].
CHAPTER 9. DIGITAL IMAGES 320

Figure 9.2: Black and white (left), and grey-level (right) versions of the image
in Figure 9.1.

Figure 9.3: 18 × 18 pixels excerpt of the color image in Figure 9.1. The grid
indicates the borders between the pixels.

If we magnify the part of the color image in Figure 9.1 around one of the
eyes, we obtain the images in figures 9.3-9.4. As we can see, the pixels have
been magnified to big squares. This is a standard representation used by many
programs — the actual shape of the pixels will depend on the output medium.
Nevertheless, we will consider the pixels to be square, with integer coordinates
at their centers, as indicated by the grids in figures 9.3-9.4.
Fact 9.7. Shape of pixel.
CHAPTER 9. DIGITAL IMAGES 321

Figure 9.4: 50 × 50 pixels excerpt of the color image in Figure 9.1.

The pixels of an image are assumed to be square with sides of length one,
with the pixel with value pi,j centered at the point (i, j).

9.2 Some simple operations on images with Python


Images are two-dimensional matrices of numbers, contrary to the sound signals
we considered in the previous section. In this respect it is quite obvious that we
can manipulate an image by performing mathematical operations on the numbers.
In this section we will consider some of the simpler operations. In later sections
we will go through more advanced operations, and explain how the theory for
these can be generalized from the corresponding theory for one-dimensional
(sound) signals (which we will go through first).
In order to perform these operations, we need to be able to use images with
a programming environment.
An image can also be thought of as a matrix, by associating each pixel with
an element in a matrix. The matrix indices thus correspond to positions in the
pixel grid. Black and white images correspond to matrices where the elements
are natural numbers between 0 and 255. To store a color image, we need 3
matrices, one for each color component. We will also view this as a 3-dimensional
matrix. In the following, operations on images will be implemented in such
a way that they are applied to each color component simultaneously. This is
similar to the FFT and the DWT, where the operations were applied to each
sound channel simultaneously.
Since images are viewed as 2-dimensional or 3-dimensional matrices, we can
use any linear algebra software in order to work with images. After we now have
made the connection with matrices, we can create images from mathematical
formulas, just as we could with sound in the previous sections. But what we also
CHAPTER 9. DIGITAL IMAGES 322

need before we go through operations on images, is, as in the sections on sound,


means of reading an image from a file so that its contents are accessible as a
matrix, and write images represented by a matrix which we have constructed
ourself to file. Reading a function from file can be done with help of the function
imread. If we write

X = double(imread(’filename.fmt’, ’fmt’))

the image with the given path and format is read, and stored in the matrix
which we call X. ’fmt’ can be ’jpg’,’tif’, ’gif’, ’png’, and so on. This parameter is
optional: If it is not present, the program will attempt to determine the format
from the first bytes in the file, and from the filename. After the call to imread,
we have a matrix where the entries represent the pixel values, and of integer
data type (more precisely, the data type uint8). To perform operations on the
image, we must first convert the entries to the data type double, as shown above.
Similarly, the function imwrite
can be used to write the image represented by a matrix to file. If we write

imwrite(uint8(X), ’filename.fmt’, ’fmt’)

the image represented by the matrix X is written to the given path, in the given
format. Before the image is written to file, you see that we have converted the
matrix values back to the integer data type. In other words: imread and imwrite
both assume integer matrix entries, while operations on matrices assume double
matrix entries. If you want to print images you have created yourself, you can
use this function first to write the image to a file, and then send that file to
the printer using another program. Finally, we need an alternative to playing a
sound, namely displaying an image. The function imshow(uint8(X)) displays
the matrix X as an image in a separate window. Also here we needed to convert
the samples using the function uint8.
The following examples go through some much used operations on images.

Example 9.1: Normalising the intensities


We have assumed that the intensities all lie in the interval [0, 1], but as we noted,
many formats in fact use integer values in the range [0,255]. And as we perform
computations with the intensities, we quickly end up with intensities outside
[0, 1] even if we start out with intensities within this interval. We therefore need
to be able to normalise the intensities. This we can do with the simple linear
function
x−a
g(x) = , a < b,
b−a
which maps the interval [a, b] to [0, 1]. A simple case is mapping [0, 255] to [0, 1]
which we accomplish with the scaling g(x) = x/255. More generally, we typically
perform computations that result in intensities outside the interval [0, 1]. We
can then compute the minimum and maximum intensities pmin and pmax and
CHAPTER 9. DIGITAL IMAGES 323

map the interval [pmin , pmax ] back to [0, 1]. Below we have shown a function
mapto01 which achieves this task.

def mapto01(X):
minval, maxval = X.min(), X.max()
X -= minval
X /= (maxval-minval)
def contrastadjust(X,epsilon):
"""
Assumes that the values are in [0,255]
"""
X /= 255.
X += epsilon
log(X, X)
X -= log(epsilon)
X /= (log(1+epsilon)-log(epsilon))
X *= 255
def contrastadjust0(X,n):
"""
Assumes that the values are in [0,255]
"""
X /= 255.
X -= 1/2.
X *= n
arctan(X, X)
X /= (2*arctan(n/2.))
X += 1/2.0
X *= 255 # Maps the values back to [0,255]

Several examples of using this function will be shown below. A good question
here is why the functions min and max are called three times in succession. The
reason is that there is a third “dimension” in play, besides the spatial x- and
y-directions. This dimension describes the color components in each pixel, which
are usually the red-, green-, and blue color components.

Example 9.2: Extracting the different colors


If we have a color image

P = (ri,j , gi,j , bi,j )m,n


i,j=1 ,

it is often useful to manipulate the three color components separately as the


three images

Pr = (ri,j )m,n
i,j=1 , Pr = (gi,j )m,n
i,j=1 , Pr = (bi,j )m,n
i,j=1 .

As an example, let us first see how we can produce three separate images, showing
the R,G, and B color components, respectively. Let us take the image lena.png
used in Figure 9.1. When the image is read (first line below), the returned
object has three dimensions. The first two dimensions represent the spatial
directions (the row-index and column-index). The third dimension represents
the color component. One can therefore view images representing the different
color components with the help of the following code:
CHAPTER 9. DIGITAL IMAGES 324

X1 = zeros_like(img)
X1[:, :, 0] = img[:, :, 0]
X2 = zeros_like(img)
X2[:, :, 1] = img[:, :, 1]
X3 = zeros_like(img)
X3[:, :, 2] = img[:, :, 2]

The resulting images are shown in Figure 9.5.

Figure 9.5: The red, green, and blue components of the color image in Figure 9.1.

Example 9.3: Converting from color to grey-level


If we have a color image we can convert it to a grey-level image. This means
that at each point in the image we have to replace the three color values (r, g, b)
by a single value p that will represent the grey level. If we want the grey-level
image to be a reasonable representation of the color image, the value p should
somehow reflect the intensity of the image at the point. There are several ways
to do this.
It is not unreasonable to use the largest of the three color components as a
measure of the intensity, i.e, to set p = max(r, g, b). An alternative is to use the
sum of the three values as a measure of the total intensity at the point. This
corresponds to setting p = r + g + b. Here we have to be a bit careful with a
subtle point. We have required each of the r, g and b values to lie in the range
[0, 1], but their sum may of course become as large as 3. We also require our
grey-level values to lie in the range [0, 1] so after having computed all the sums
we must normalise as explained above. A third possibility is to think of the
intensity of (r, g, b) as
p the length of the color vector, in analogy with points in
space, and √ set p = r2 + g 2 + b2 . Again, we may end up with values in the
range [0, 3] so we have to normalise like we did in the second case.
Let us sum this up as follows: A color image P = (ri,j , gi,j , bi,j )m,n
i,j=1 can be
converted to a grey level image Q = (qi,j )m,n i,j=1 by one of the following three
operations:
CHAPTER 9. DIGITAL IMAGES 325

• Set qi,j = max(ri,j , gi,j , bi,j ) for all i and j.


• Compute q̂i,j = ri,j + gi,j + bi,j for all i and j.
• Transform all the values to the interval [0, 1] by setting

q̂i,j
qi,j = .
maxk,l q̂k,l
q
• Compute q̂i,j = 2 + g 2 + b2 for all i and j.
ri,j i,j i,j

• Transform all the values to the interval [0, 1] by setting

q̂i,j
qi,j = .
maxk,l q̂k,l

In practice one of the last two methods are preferred, perhaps with a preference
for the last method, but the actual choice depends on the application. These
can be implemented as follows.

mx = maximum(img[:, :, 0], img[:, :, 1])


X1 = maximum(img[:, :, 2], mx)
X2 = img[:, :, 0] + img[ :, :, 1] + img[ :, :, 2]
mapto01(X2)
X2 *= 255
X3 = sqrt(img[:,:,0]**2 + img[:, :, 1]**2 + img[:, :, 2]**2)
mapto01(X3)
X3 *= 255

The results of applying these three operations can be seen in Figure 9.6.

Figure 9.6: Alternative ways to convert the color image in Figure 9.1 to a grey
level image.
CHAPTER 9. DIGITAL IMAGES 326

Example 9.4: Computing the negative image


In film-based photography a negative image was obtained when the film was
developed, and then a positive image was created from the negative. We can
easily simulate this and compute a negative digital image.
Suppose we have a grey-level image P = (pi,j )m,ni,j=1 with intensity values in
the interval [0, 1]. Here intensity value 0 corresponds to black and 1 corresponds
to white. To obtain the negative image we just have to replace an intensity p
by its ’mirror value’ 1 − p. This is also easily translated to code as above. The
resulting image is shown in Figure 9.7.

Figure 9.7: The negative versions of the corresponding images in Figure 9.6.

Example 9.5: Increasing the contrast


A common problem with images is that the contrast often is not good enough.
This typically means that a large proportion of the grey values are concentrated
in a rather small subinterval of [0, 1]. The obvious solution to this problem is
to somehow spread out the values. This can be accomplished by applying a
monotone function f which maps [0, 1] onto [0, 1]. If we choose f so that its
derivative is large in the area where many intensity values are concentrated, we
obtain the desired effect. We will consider two such families of functions:

arctan(n(x − 1/2)) 1
fn (x) = + (9.1)
2 arctan(n/2) 2
ln(x + ) − ln 
g (x) = . (9.2)
ln(1 + ) − ln 
The first type of functions have quite large derivatives near x = 0.5 and will
therefore increase the contrast in images with a concentration of intensities with
value around 0.5. The second type of functions have a large derivative near
x = 0 and will therefore increase the contrast in images with a large proportion
of small intensity values, i.e., very dark images. Figure 9.8 shows some examples
of these functions. The three functions in the left plot in Figure 9.8 are f4 , f10 ,
and f100 , the ones shown in the plot are g0.1 , g0.01 , and g0.001 .
CHAPTER 9. DIGITAL IMAGES 327

1.0 1.0
0.8 0.8
0.6 0.6 ²=0.1
0.4 n=4 0.4 ²=0.01
0.2 n=10 0.2 ²=0.001
n=100
0.00.0 0.2 0.4 0.6 0.8 1.0 0.00.0 0.2 0.4 0.6 0.8 1.0
Figure 9.8: Some functions that can be used to improve the contrast of an
image.

Figure 9.9: The result after applying f10 and g0.01 to the test image.

In Figure 9.9 f10 and g0.01 have been applied to the image in the right part
of Figure 9.6. Since the image was quite well balanced, f10 made the dark areas
too dark and the bright areas too bright. g0.01 on the other hand has made the
image as a whole too bright.
Increasing the contrast is easy to implement. The following function uses the
contrast adjusting function from Equation (9.2), with  as parameter.

def contrastadjust(X,epsilon):
"""
Assumes that the values are in [0,255]
"""
X /= 255.
X += epsilon
log(X, X)
X -= log(epsilon)
X /= (log(1+epsilon)-log(epsilon))
X *= 255
CHAPTER 9. DIGITAL IMAGES 328

def contrastadjust0(X,n):
"""
Assumes that the values are in [0,255]
"""
X /= 255.
X -= 1/2.
X *= n
arctan(X, X)
X /= (2*arctan(n/2.))
X += 1/2.0
X *= 255 # Maps the values back to [0,255]

This has been used to generate the right image in Figure 9.9.

Exercise 9.6: Generate black and white images


Black and white images can be generated from greyscale images (with values
between 0 and 255) by replacing each pixel value with the one of 0 and 255
which is closest. Use this strategy to generate the black and white image shown
in the right part of Figure 9.2.

Exercise 9.7: Adjust contrast in images


a) Write a function contrastadjust0 which instead uses the function from
Equation (9.1) to increase the contrast. n should be a parameter to the function.
b) Generate the left and right images in Figure 9.9 on your own by writing code
which uses the two functions contrastadjust0 and contrastadjust.

Exercise 9.8: Adjust contrast with another function


In this exercise we will look at another function for increasing the contrast of a
picture.
a) Show that the function f : R → R given by

fn (x) = xn ,

for all n maps the interval [0, 1] → [0, 1], and that f 0 (1) → ∞ as n → ∞.
b) The color image secret.jpg,shown in Figure 9.10, contains some informa-
tion that is nearly invisible to the naked eye on most computer monitors. Use
the function f (x), to reveal the secret message.

Hint. You will first need to convert the image to a greyscale image. You can
then use the function contrastadjust as a starting point for your own program.
CHAPTER 9. DIGITAL IMAGES 329

Figure 9.10: Secret message.

9.3 Filter-based operations on images


The next examples of operations on images we consider will use filters. These
examples define what it means to apply a filter to two-dimensional data. We
start with the following definition of a computational molecule. This term stems
from image processing, and seems at the outset to be unrelated to filters.
Definition 9.8. Computational molecules.
We say that an operation S on an image X is given by the computational
molecule

.. .. .. .. ..
 
 . . . . . 
· · · a−1,−1 a−1,0 a−1,1 · · ·
 
A= · · · a0,−1 a0,0 a0,1 · · · 
· · · a1,−1 a1,0 a 1,1 · · · 
 
.. .. .. .. ..
. . . . .
if we have that
X
(SX)i,j = ak1 ,k2 Xi−k1 ,j−k2 . (9.3)
k1 ,k2

In the molecule, indices are allowed to be both positive and negative, we underline
the element with index (0, 0) (the center of the molecule), and assume that ai,j
with indices falling outside those listed in the molecule are zero (as for compact
filter notation).
In Equation (9.3), it is possible for the indices i − k1 and j − k2 to fall
outside the legal range for X. We will solve this case in the same way as we
CHAPTER 9. DIGITAL IMAGES 330

did for filters, namely that we assume that X is extended (either periodically
or symmetrically) in both directions. The interpretation of a computational
molecule is that we place the center of the molecule on a pixel, multiply the
pixel and its neighbors by the corresponding weights ai,j in reverse order, and
finally sum up in order to produce the resulting value. This type of operation
will turn out to be particularly useful for images. The following result expresses
how computational molecules and filters are related. It states that, if we apply
one filter to all the columns, and then another filter to all the rows, the end
result can be expressed with the help of a computational molecule.
Theorem 9.9. Filtering and computational molecules.
Let S1 and S2 be filters with compact filter notation t1 and t2 , respectively,
and consider the operation S where S1 is first applied to the columns in the
image, and then S2 is applied to the rows in the image. Then S is an operation
which can be expressed in terms of the computational molecule ai,j = (t1 )i (t2 )j .
Proof. Let Xi,j be the pixels in the image. When we apply S1 to the columns
of X we get the image Y defined by
X
Yi,j = (t1 )k1 Xi−k1 ,j .
k1

When we apply S2 to the rows of Y we get the image Z defined by

X X X
Zi,j = (t2 )k2 Yi,j−k2 = (t2 )k2 (t1 )k1 Xi−k1 ,j−k2
k2 k2 k1
XX
= (t1 )k1 (t2 )k2 Xi−k1 ,j−k2 .
k1 k2

Comparing with Equation (9.3) we see that S is given by the computational


molecule with entries ai,j = (t1 )i (t2 )j .
Note that, when we filter an image with S1 and S2 in this way, the order
does not matter: since computing S1 X is the same as applying S1 to all columns
of X, and computing Y (S2 )T is the same as applying S2 to all rows of Y , the
combined filtering operation, denoted S, takes the form

S(X) = S1 X(S2 )T , (9.4)


and the fact that the order does not matter simply boils down to the fact that
it does not matter which of the left or right multiplications we perform first.
Applying S1 to the columns of X is what we call a vertical filtering operation,
while applying S2 to the rows of X is what we call a horizontal filtering operation.
We can thus state the following.

Observation 9.10. Order of vertical and horizontal filtering.


The order of vertical and horizontal filtering of an image does not matter.
CHAPTER 9. DIGITAL IMAGES 331

Most computational molecules we will consider in the following can be


expressed in terms of filters as in this theorem, but clearly there exist also
computational molecules which are not on this form, since the matrix A with
entries ai,j = (t1 )i (t2 )j has rank one, and a general computational molecule can
have any rank. In most of the examples the filters are symmetric.
Assume that the image is stored as the matrix X. In Exercise 9.13 you will be
asked to implement a function tensor_impl which computes the transformation
S(X) = S1 X(S2 )T , where X, S1, and S2 are input. If the computational molecule
is obtained by applying the filter S1 to the columns, and the filter S2 to the
rows, we can compute it with the following code: (we have assumed that the
filter lengths are odd, and that the middle filter coefficient has index 0):

def S1(x):
filterS(S1, x, True)
def S2(x):
filterS(S2, x, True)
tensor_impl(X, S1, S2)

We have here used the function filterS to implement the filtering, so that we
assume that the image is periodically or symmetrically extended. The above
code uses symmetric extension, and can thus be used for symmetric filters. If
the filter is non-symmetric, we should use a periodic extension instead, for which
the last parameter to filterS should be changed.

9.3.1 Tensor product notation for operations on images


Filter-based operations on images can be written compactly using what we
will call tensor product notation. This is part of a very general tensor product
framework, and we will review parts of this framework for the sake of completeness.
Let us first define the tensor product of vectors.
Definition 9.11. Tensor product of vectors.
If x, y are vectors of length M and N , respectively, their tensor product
x ⊗ y is defined as the M × N -matrix defined by (x ⊗ y)i,j = xi yj . In other
words, x ⊗ y = xy T .
The tensor product xy T is also called the outer product of x and y (contrary
to the inner product hx, yi = xT y). In particular x ⊗ y is a matrix of rank 1,
which means that most matrices cannot be written as a tensor product of two
vectors. The special case ei ⊗ ej is the matrix which is 1 at (i, j) and 0 elsewhere,
and the set of all such matrices forms a basis for the set of M × N -matrices.
Observation 9.12. Standard basis for LM,N (R).
−1 N −1
Let EM = {ei }M
i=0 EN = {ei }i=0 be the standard bases for R
M
and RN .
Then
(M −1,N −1)
EM,N = {ei ⊗ ej }(i,j)=(0,0)
CHAPTER 9. DIGITAL IMAGES 332

is a basis for LM,N (R), the set of M × N -matrices. This basis is often referred
to as the standard basis for LM,N (R).
The standard basis thus consists of rank 1-matrices. An image can simply be
thought of as a matrix in LM,N (R), and a computational molecule is simply a
special type of linear transformation from LM,N (R) to itself. Let us also define
the tensor product of matrices.
Definition 9.13. Tensor product of matrices.
If S1 : RM → RM and S2 : RN → RN are matrices, we define the linear
mapping S1 ⊗ S2 : LM,N (R) → LM,N (R) by linear extension of (S1 ⊗ S2 )(ei ⊗
ej ) = (S1 ei ) ⊗ (S2 ej ). The linear mapping S1 ⊗ S2 is called the tensor product
of the matrices S1 and S2 .
A couple of remarks are in order. First, from linear algebra we know that,
when S is linear mapping from V and S(vi ) is known for a basis {vi }i of V , S is
uniquely determined. In particular, since the {ei ⊗ej }i,j form a basis, there exists
a unique linear transformation S1 ⊗S2 so that (S1 ⊗S2 )(ei ⊗ej ) = (S1 ei )⊗(S2 ej ).
This unique linear transformation is what we call the linear extension from
the values in the given basis. Clearly, by linearity, also (S1 ⊗ S2 )(x ⊗ y) =
(S1 x) ⊗ (S2 y), since

X X X
(S1 ⊗ S2 )(x ⊗ y) = (S1 ⊗ S2 )(( xi e i ) ⊗ ( yj ej )) = (S1 ⊗ S2 )( xi yj (ei ⊗ ej ))
i j i,j
X X
= xi yj (S1 ⊗ S2 )(ei ⊗ ej ) = xi yj (S1 ei ) ⊗ (S2 ej )
i,j i,j
X X X
= xi yj S1 ei ((S2 ej ))T = S1 ( xi ei )(S2 ( yj ej ))T
i,j i j
T
= S1 x(S2 y) = (S1 x) ⊗ (S2 y).
Here we used the result from Exercise 9.17. We can now prove the following.
Theorem 9.14. Compact filter notation and computational molecules.
If S1 : RM → RM and S2 : RN → RN are matrices of linear transformations,
then (S1 ⊗ S2 )X = S1 X(S2 )T for any X ∈ LM,N (R). In particular S1 ⊗ S2 is
the operation which applies S1 to the columns of X, and S2 to the resulting rows.
In other words, if S1 , S2 have compact filter notations t1 and t2 , respectively,
then S1 ⊗ S2 has computational molecule t1 ⊗ t2 .
We have not formally defined the tensor product of compact filter notations.
This is a straightforward extension of the usual tensor product of vectors, where
we additionally mark the element at index (0, 0).
Proof. We have that

(S1 ⊗ S2 )(ei ⊗ ej ) = (S1 ei ) ⊗ (S2 ej ) = S1 ei ((S2 ej ))T


= S1 ei (ej )T (S2 )T = S1 (ei ⊗ ej )(S2 )T .
CHAPTER 9. DIGITAL IMAGES 333

This means that (S1 ⊗ S2 )X = S1 X(S2 )T for any X ∈ LM,N (R) also, since
equality holds on the basis vectors ei ⊗ ej . Since the matrix A with entries
ai,j = (t1 )i (t2 )j also can be written as t1 ⊗ t2 , the result follows.
We have thus shown that we alternatively can write S1 ⊗S2 for the operations
we have considered. This notation also makes it easy to combine several two-
dimensional filtering operations:
Corollary 9.15. Composing tensor products.
We have that (S1 ⊗ T1 )(S2 ⊗ T2 ) = (S1 S2 ) ⊗ (T1 T2 ).
Proof. By Theorem 9.14 we have that

(S1 ⊗T1 )(S2 ⊗T2 )X = S1 (S2 XT2T )T1T = (S1 S2 )X(T1 T2 )T = ((S1 S2 )⊗(T1 T2 ))X.

for any X ∈ LM,N (R). This proves the result.


Suppose that we want to apply the operation S1 ⊗ S2 to an image. We can
factorize S1 ⊗ S2 as

S1 ⊗ S2 = (S1 ⊗ I)(I ⊗ S2 ) = (I ⊗ S2 )(S1 ⊗ I). (9.5)


Moreover, since

(S1 ⊗ I)X = S1 X (I ⊗ S2 )X = X(S2 )T = (S2 X T )T ,

S1 ⊗ I is a vertical filtering operation, and I ⊗ S2 is a horizontal filtering


operation in this factorization. For filters we have an even stronger result: If
S1 , S2 , S3 , S4 all are filters, we have from Corollary 9.15 that (S1 ⊗S2 )(S3 ⊗S4 ) =
(S3 ⊗ S4 )(S1 ⊗ S2 ), since all filters commute. This does not hold in general since
general matrices do not commute.
We will now consider two important examples of filtering operations on
images: smoothing and edge detection/computing partical derivatives. For all
examples we will use the tensor product notation for these operations.

Example 9.9: Smoothing an image


When we considered filtering of digital sound, low-pass filters dampened high
frequencies. We will here similarly see that an image can be smoothed by
applying a low-pass filters to the rows and the columns. Let us consider such
computational molecules. In particular, let us as before take filter coefficients
taken from Pascal’s triangle. If we use the filter S = 14 {1, 2, 1} (row 2 from
Pascal’s triangle), Theorem 9.9 gives the computational molecule
 
1 2 1
1 
A= 2 4 2 . (9.6)
16
1 2 1
CHAPTER 9. DIGITAL IMAGES 334

If the pixels in the image are pi,j , this means that we compute the new pixels by

1
p̂i,j = 4pi,j + 2(pi,j−1 + pi−1,j + pi+1,j + pi,j+1 )
16 
+ pi−1,j−1 + pi+1,j−1 + pi−1,j+1 + pi+1,j+1 .
1
If we instead use the filter S = 64 {1, 6, 15, 20, 15, 6, 1} (row 6 from Pascal’s
triangle), we get the computational molecule
 
1 6 15 20 15 6 1
 6 36 90 120 90 36 6
 
15 90 225 300 225 90 15
1 20 120 300 400 300 120

20. (9.7)
4096 
15 90 225 300 225 90
 15
6 36 90 120 90 36 6
1 6 15 20 15 6 1
We anticipate that both molecules give a smoothing effect, but that the second
molecules provides more smoothing. The result of applying the two molecules
in (9.6) and (9.7) to our greyscale-image is shown in the two right images in
Figure 9.11. With the help of the function tensor_impl, smoothing with the
first molecule (9.6) above can be obtained by writing

def S(x):
filterS([1., 2., 1.]/4., x, True);
tensor_impl(X, S, S)

To make the smoothing effect visible, we have zoomed in on the face in the
image. The smoothing effect is clarly best visible in the second image.

Figure 9.11: The two right images show the effect of smoothing the left image.

Smoothing effects are perhaps more visible if we use a simple image, as the
one in the left part of Figure 9.12.
Again we have used the filter S = 14 {1, 2, 1}. Here we also have shown what
happens if we only smooth the image in one of the directions. In the right
CHAPTER 9. DIGITAL IMAGES 335

Figure 9.12: The results of smoothing the simple image to the left in three
different ways.

image we have smoothed in both directions. We clearly see the union of the two
one-dimensional smoothing operations then.

Example 9.10: Edge detection


Another operation on images which can be expressed in terms of computational
molecules is edge detection. An edge in an image is characterised by a large
change in intensity values over a small distance in the image. For a continuous
function this corresponds to a large derivative. An image is only defined at
isolated points, so we cannot compute derivatives, but since a grey-level image
is a scalar function of two variables, we have a perfect situation for applying
numerical differentiation techniques.

Partial derivative in x-direction. Let us first consider computation of


the partial derivative ∂P/∂x at all points in the image. Note first that it is
the second coordinate in an image which refers to the x-direction used when
plotting functions. This means that the familiar symmetric Newton quotient
approximation for the partial derivative [31] takes the form
∂P pi,j+1 − pi,j−1
(i, j) ≈ , (9.8)
∂x 2
where we have used the convention h = 1 which means that the derivative is
measured in terms of ’intensity per pixel’. This corresponds to applying the
bass-reducing filter S = 12 {1, 0, −1} to all the rows (alternatively, applying the
tensor product I ⊗ S to the image). We can thus express this in terms of the
computational molecule.
 
0 0 0
1
1 0 −1 .
2
0 0 0
We have included the two rows of 0s just to make it clear how the computa-
tional molecule is to be interpreted when we place it over the pixels. Let us first
apply this molecule to the usual excerpt of the Lena image. This gives the first
image in Figure 9.13. This images shows many artefacts since the pixel values
CHAPTER 9. DIGITAL IMAGES 336

lie outside the legal range: many of the intensities are in fact negative. More
specifically, the intensities turn out to vary in the interval [−0.424, 0.418]. Let
us therefore normalise and map all intensities to [0, 1]. This gives the second
image in Figure 9.13. The predominant color of this image is an average grey,
i.e. an intensity of about 0.5. To get more detail in the image we therefore try
to increase the contrast by applying the function f50 in equation (9.1) to each
intensity value. The result is shown in the third image in Figure 9.13. This does
indeed show more detail.

Figure 9.13: Experimenting with the partial derivative in the x-direction for
the image in 9.6. The left image has artefacts, since the pixel values are outside
the legal range. We therefore normalize the intensities to lie in [0, 255] (middle),
before we increase the contrast (right).

It is important to understand the colors in these images. We have computed


the derivative in the x-direction, and we recall that the computed values varied
in the interval [−0.424, 0.418]. The negative value corresponds to the largest
average decrease in intensity from a pixel pi−1,j to a pixel pi+1,j . The positive
value on the other hand corresponds to the largest average increase in intensity.
A value of 0 in the left image in Figure 9.13 corresponds to no change in intensity
between the two pixels.
When the values are mapped to the interval [0, 1] in the second image, the
small values are mapped to something close to 0 (almost black), the maximal
values are mapped to something close to 1 (almost white), and the values near 0
are mapped to something close to 0.5 (grey). In the third image these values
have just been emphasised even more.
The third image tells us that in large parts of the image there is very little
variation in the intensity. However, there are some small areas where the intensity
changes quite abruptly, and if you look carefully you will notice that in these
areas there is typically both black and white pixels close together, like down the
vertical front corner of the bus. This will happen when there is a stripe of bright
or dark pixels that cut through an area of otherwise quite uniform intensity.

Partial derivative in y-direction. The partial derivative ∂P/∂y can be


computed analogously to ∂P/∂x, i.e. we apply the filter −S = 12 {−1, 0, 1} to
CHAPTER 9. DIGITAL IMAGES 337

all columns of the image (alternatively, apply the tensor product −S ⊗ I to the
image), where S is the filter which we used for edge detection in the x-direction.
Note that the positive direction of this axis in an image is opposite to the
direction of the y-axis we use when plotting functions. We can express this in
terms of the computational molecule
 
0 1 0
1
0 0 0 .
2
0 −1 0
Let us compare the partial derivatives in both directions. The result is shown in
Figure 9.14.

Figure 9.14: The first-order partial derivatives in the x- and y-direction,


respectively. In both images, the computed numbers have been normalised and
the contrast enhanced.

The intensities have been normalised and the contrast enhanced by the
function f50 from Equation (9.1).

The gradient. The gradient of a scalar function is often used as a measure of


the size of the first derivative. The gradient is defined by the vector
!
∂P ∂P
∇P = , ,
∂x ∂y

so its length is given by


v
u ∂P 2
u ! !2
∂P
|∇P | = t + .
∂x ∂y
CHAPTER 9. DIGITAL IMAGES 338

When the two first derivatives have been computed it is a simple matter to
compute the gradient vector and its length. Note that, as for the first order
derivatives, it is possible for the length of the gradient to be outside the legal
range of values. The computed gradient values, the gradient mapped to the legal
range, and the gradient with contrast adjusted, are shown in Figure 9.15.

Figure 9.15: The computed gradient (left). In the middle the intensities have
been normalised to the [0, 255], and to the right the contrast has been increased.

The image of the gradient looks quite different from the images of the two
partial derivatives. The reason is that the numbers that represent the length of
the gradient are (square roots of) sums of squares of numbers. This means that
the parts of the image that have virtually constant intensity (partial derivatives
close to 0) are colored black. In the images of the partial derivatives these
values ended up in the middle of the range of intensity values, with a final color
of grey, since there were both positive and negative values. To enhance the
contrast for this image we should thus do something different from what was
done in the other images, since we now have a large number of intensities near
0. The solution was to apply a function like the ones shown in the right plot in
Figure 9.8. Here we have used the function g0.01 .
Figure 9.14 shows the two first-order partial derivatives and the gradient. If
we compare the two partial derivatives we see that the x-derivative seems to
emphasise vertical edges while the y-derivative seems to emphasise horizontal
edges. This is precisely what we must expect. The x-derivative is large when
the difference between neighbouring pixels in the x-direction is large, which is
the case across a vertical edge. The y-derivative enhances horizontal edges for a
similar reason.
The gradient contains information about both derivatives and therefore
emphasises edges in all directions. It also gives a simpler image since the sign of
the derivatives has been removed.

Example 9.11: Second-order derivatives


To compute the three second order derivatives we can combine the two com-
putational molecules which we already have described. For the mixed second
CHAPTER 9. DIGITAL IMAGES 339

order derivative we get (I ⊗ S)((−S) ⊗ I) = −S ⊗ S. For the last two second


2 2
order derivative ∂∂xP2 , ∂∂yP2 , we can also use the three point approximation to the
second derivative [31]
∂P
(i, j) ≈ pi,j+1 − 2pi,j + pi,j−1 (9.9)
∂x2
(again we have set h = 1). This gives a smaller molecule than if we combine
the two molecules for order one differentiation (i.e. (I ⊗ S)(I ⊗ S) = (I ⊗ S 2 )
and ((−S) ⊗ I)((−S) ⊗ I) = (S 2 ⊗ I)), since S 2 = 12 {1, 0, −1} 12 {1, 0, −1} =
1
4 {1, 0, −2, 0, 1}. The second order derivatives of an image P can thus be
computed by applying the computational molecules

 
2 0 0 0
∂ P 1 −2 1 ,
: (9.10)
∂x2
0 0 0
 
−1 0 1
∂2P 1
: 0 0 0 , (9.11)
∂y∂x 4
1 0 −1
 
0 1 0
∂2P 0 −2 0 .
: (9.12)
∂y 2
0 1 0

With these molecules it is quite easy to compute the second-order derivatives.


The results are shown in Figure 9.16.

Figure 9.16: The second-order partial derivatives in the xx-, xy-, and yy-
directions, respectively. In all images, the computed numbers have been nor-
malised and the contrast enhanced.

The computed derivatives were first normalised and then the contrast en-
hanced with the function f100 in each image, see equation (9.1).
As for the first derivatives, the xx-derivative seems to emphasise vertical
edges and the yy-derivative horizontal edges. However, we also see that the
second derivatives are more sensitive to noise in the image (the areas of grey are
CHAPTER 9. DIGITAL IMAGES 340

less uniform). The mixed derivative behaves a bit differently from the other two,
and not surprisingly it seems to pick up both horizontal and vertical edges.
This procedure can be generalized to higher order derivatives also. To apply
∂ k+l P
∂xk ∂y l
to an image we can compute Sl ⊗ Sk where Sr corresponds to any point
method for computing the r’th order derivative. We can also compute (S l )⊗(S k ),
where we iterate the filter S = 12 {1, 0, −1} for the first derivative, but this gives
longer filters.

Example 9.12: Chess pattern image


Let us apply the molecules for differentiation to a chess pattern test image. In
Figure 9.17 we have applied S ⊗ I, I ⊗ S, and S ⊗ S, I ⊗ S 2 , and S 2 ⊗ I to the
example image shown in the upper left.

Figure 9.17: Different tensor products applied to the simple chess pattern image
shown in the upper left.

These images make it is clear that S ⊗ I detects all horizontal edges, I ⊗ S


detects all vertical edges, and that S ⊗ S detects all points where abrupt changes
appear in both directions. We also see that the second order partial derivative
detects exactly the same edges which the first order partial derivative found.
Note that the edges detected with I ⊗ S 2 are wider than the ones detected
with I ⊗ S. The reason is that the filter S 2 has more filter coefficients than
S. Also, edges are detected with different colors. This reflects whether the
difference between the neighbouring pixels is positive or negative. The values
after we have applied the tensor product may thus not lie in the legal range
CHAPTER 9. DIGITAL IMAGES 341

of pixel values (since they may be negative). The figures have taken this into
account by mapping the values back to a legal range of values, as we did in
Chapter 9. Finally, we also see additional edges at the first and last rows/edges
in the images. The reason is that the filter S is defined by assuming that the
pixels repeat periodically (i.e. it is a circulant Toeplitz matrix). Due to this, we
have additional edges at the first/last rows/edges. This effect can also be seen in
Chapter 9, although there we did not assume that the pixels repeat periodically.
Defining a two-dimensional filter by filtering columns and then rows is not
the only way we can define a two-dimensional filter. Another possible way is
to let the M N × M N -matrix itself be a filter. Unfortunately, this is a bad way
to define filtering of an image, since there are some undesirable effects near the
boundaries between rows: in the vector we form, the last element of one row
is followed by the first element of the next row. These boundary effects are
unfortunate when a filter is applied.

Exercise 9.13: Implement a tensor product


Implement a function tensor_impl which takes a matrix X, and functions S1
and S2 as parameters, and applies S1 to the columns of X, and S2 to the rows of
X.

Exercise 9.14: Generate images


Write code which calls the function tensor_impl with appropriate filters and
which generate the following images:
a) The right image in Figure 9.11.
b) The right image in Figure 9.13.
c) The images in Figure 9.15.
d) The images in Figure 9.14.
e) The images in Figure 9.16.

Exercise 9.15: Interpret tensor products


Let the filter S be defined by S = {−1, 1}.
a) Let X be a matrix which represents the pixel values in an image. What can
you say about how the new images (S ⊗ I)X og (I ⊗ S)X look? What are the
interpretations of these operations?
b) Write down the 4 ⊗ 4-matrix X = (1, 1, 1, 1) ⊗ (0, 0, 1, 1). Compute (S ⊗ I)X
by applying the filters to the corresponding rows/columns of X as we have
learned, and interpret the result. Do the same for (I ⊗ S)X.
CHAPTER 9. DIGITAL IMAGES 342

Exercise 9.16: Computational molecule of moving average


filter
1
Let S be the moving average filter of length 2L+1, i.e. T = L {1,
|
··· , 1, 1, 1, · · · , 1}.
{z }
2L+1 times
What is the computational molecule of S ⊗ S?

Exercise 9.17: Bilinearity of the tensor product


Show that the mapping F (x, y) = x ⊗ y is bi-linear, i.e. that F (αx1 + βx2 , y) =
αF (x1 , y) + βF (x2 , y), and F (x, αy1 + βy2 ) = αF (x, y1 ) + βF (x, y2 ).

Exercise 9.18: Attempt to write as tensor product


Attempt to find matrices S1 : RM → RM and S2 : RN → RN so that the
following mappings from LM,N (R) to LM,N (R) can be written on the form
X → S1 X(S2 )T = (S1 ⊗ S2 )X. In all the cases, it may be that no such S1 , S2
can be found. If this is the case, prove it.
a) The mapping which reverses the order of the rows in a matrix.
b) The mapping which reverses the order of the columns in a matrix.
c) The mapping which transposes a matrix.

Exercise 9.19: Computational molecules


Let the filter S be defined by S = {1, 2, 1}.
a) Write down the computational molecule of S ⊗ S.
b) Let us define x = (1, 2, 3), y = (3, 2, 1), z = (2, 2, 2), and w = (1, 4, 2).
Compute the matrix A = x ⊗ y + z ⊗ w.
c) Compute (S ⊗ S)A by applying the filter S to every row and column in the
matrix the way we have learned. If the matrix A was more generally an image,
what can you say about how the new image will look?

Exercise 9.20: Computational molecules 2


Let S = 14 {1, 2, 1} be a filter.
a) What is the effect of applying the tensor products S ⊗ I, I ⊗ S, and S ⊗ S
on an image represented by the matrix X?
b) Compute (S ⊗ S)(x ⊗ y), where x = (4, 8, 8, 4), y = (8, 4, 8, 4) (i.e. both x
and y are column vectors).
CHAPTER 9. DIGITAL IMAGES 343

Exercise 9.21: Comment on code


Suppose that we have an image given by the M × N -matrix X, and consider the
following code:

for n in range(N):
X[0, n] = 0.25*X[N-1, n] + 0.5*X[0, n] + 0.25*X[1, n]
X[1:(N-1), n] = 0.25*X[0:(N-2), n] + 0.5*X[1:(N-1), n] \
+ 0.25*X[2:N, n]
X[N-1, n] = 0.25*X[N-2, n] + 0.5*X[N-1, n] + 0.25*X[0, n]
for m in range(m):
X[m, 0] = 0.25*X[m, M-1] + 0.5*X[m, 0] + 0.25*X[m, 1]
X[m, 1:(M-1)] = 0.25*X[m, 0:(M-2)] + 0.5*X[m, 1:(M-1)] \
+ 0.25*X[m, 2:M]
X[m, M-1] = 0.25*X[m, M-2] + 0.5*X[m, M-1] + 0.25*X[m, 0]

Which tensor product is applied to the image? Comment what the code does, in
particular the first and third line in the inner for-loop. What effect does the
code have on the image?

Exercise 9.22: Eigenvectors of tensor products


Let vA be an eigenvector of A with eigenvalue λA , and vB an eigenvector of
B with eigenvalue λB . Show that vA ⊗ vB is an eigenvector of A ⊗ B with
eigenvalue λA λB . Explain from this why kA ⊗ Bk = kAkkBk, where k · k denotes
the operator norm of a matrix.

Exercise 9.23: The Kronecker product


The Kronecker tensor product of two matrices A and B, written A ⊗k B, is
defined as
 
a1,1 B a1,2 B · · · a1,M B
a2,1 B a2,2 B · · · a2,M B 
A ⊗k B =  . ..  ,
 
.. ..
 .. . . . 
ap,1 B ap,2 B · · · ap,M B
where the entries of A are ai,j . The tensor product of a p × M -matrix, and a
q × N -matrix is thus a (pq) × (M N )-matrix. Note that this tensor product in
particular gives meaning for vectors: if x ∈ RM , y ∈ RN are column vectors,
then x ⊗k y ∈ RM N is also a column vector. In this exercise we will investigate
how the Kronecker tensor product is related to tensor products as we have
defined them in this section.
a) Explain that, if x ∈ RM , y ∈ RN are column vectors, then x ⊗k y is the
column vector where the rows of x ⊗ y have first been stacked into one large
row vector, and this vector transposed. The linear extension of the operation
defined by

x ⊗ y ∈ RM,N → x ⊗k y ∈ RM N
CHAPTER 9. DIGITAL IMAGES 344

thus stacks the rows of the input matrix into one large row vector, and transposes
the result.
b) Show that (A ⊗k B)(x ⊗k y) = (Ax) ⊗k (By). We can thus use any of
the defined tensor products ⊗, ⊗k to produce the same result, i.e. we have the
commutative diagram shown in Figure 9.18, where the vertical arrows represent
stacking the rows in the matrix, and transposing, and the horizontal arrows
represent the two tensor product linear transformations we have defined. In
particular, we can compute the tensor product in terms of vectors, or in terms
of matrices, and it is clear that the Kronecker tensor product gives the matrix
of tensor product operations.

A⊗B
x⊗y / (Ax) ⊗ (By)

 
A⊗k B
x ⊗k y / (Ax) ⊗k (By),

Figure 9.18: Tensor products

c) Using the Euclidean inner product on L(M, N ) = RM N , i.e.


M
X −1 N
X −1
hX, Y i = Xi,j Yi,j .
i=0 j=0

and the correspondence in a) we can define the inner product of x1 ⊗ y1 and


x2 ⊗ y2 by

hx1 ⊗ y1 , x2 ⊗ y2 i = hx1 ⊗k y1 , x2 ⊗k y2 i.
Show that

hx1 ⊗ y1 , x2 ⊗ y2 i = hx1 , x2 ihy1 , y2 i.


Clearly this extends linearly to an inner product on LM,N .
d) Show that the FFT factorization can be written as
   
FN/2 DN/2 FN/2 I DN/2
= N/2 (I2 ⊗k FN/2 ).
FN/2 −DN/2 FN/2 IN/2 −DN/2
Also rewrite the sparse matrix factorization for the FFT from Equation (2.18)
in terms of tensor products.
CHAPTER 9. DIGITAL IMAGES 345

9.4 Change of coordinates in tensor products


Filter-based operations were not the only operations we considered for sound.
We also considered the DFT, the DCT, and the wavelet transform, which
were changes of coordinates which gave us useful frequency- or time-frequency
information. We would like to define similar changes of coordinates for images,
which also give useful such information. Tensor product notation will also be
useful in this respect, and we start with the following result.
Theorem 9.16. The basis B1 ⊗ B2 .
M −1 −1
If B1 = {vi }i=0 is a basis for RM , and B2 = {wj }N N
j=0 is a basis for R , then
(M −1,N −1)
{vi ⊗ wj }(i,j)=(0,0)
is a basis for LM,N (R). We denote this basis by B1 ⊗ B2 .
P(M −1,N −1) PN −1
Proof. Suppose that (i,j)=(0,0) αi,j (vi ⊗ wj ) = 0. Setting hi = j=0 αi,j wj
we get
N
X −1 N
X −1
αi,j (vi ⊗ wj ) = vi ⊗ ( αi,j wj ) = vi ⊗ hi .
j=0 j=0

where we have used the bi-linearity of the tensor product mapping (x, y) → x⊗y
(Exercise 9.17). This means that

(M −1,N −1) M −1 M −1
X X X
0= αi,j (vi ⊗ wj ) = vi ⊗ hi = vi hTi .
(i,j)=(0,0) i=0 i=0
PM −1
Column k in this matrix equation says 0 = i=0 hi,k vi , where hi,k are the
components in hi . By linear independence of the vi we must have that h0,k =
h1,k = · · · = hM −1,k = 0. Since this applies for all k, we must have that all
PN −1
hi = 0. This means that j=0 αi,j wj = 0 for all i, from which it follows by
linear independence of the wj that αi,j = 0 for all j, and for all i. This means
that B1 ⊗ B2 is a basis.
In particular, as we have already seen, the standard basis for LM,N (R) can be
written EM,N = EM ⊗ EN . This is the basis for a useful convention: For a tensor
product the bases are most naturally indexed in two dimensions, rather than
the usual sequential indexing. This difference translates also to the meaning
of coordinate vectors, which now are more naturally thought of as coordinate
matrices:
Definition 9.17. Coordinate matrix.
−1 N −1
Let B = {bi }M
i=0 , C = {cj }j=0 be bases for R
M
and RN , and let A ∈
in B ⊗ C we mean the M × N -matrix
LM,N (R). By the coordinate matrix of A P
X (with components Xkl ) such that A = k,l Xk,l (bk ⊗ cl ).
We will have use for the following theorem, which shows how change of
coordinates in RM and RN translate to a change of coordinates in the tensor
product:
CHAPTER 9. DIGITAL IMAGES 346

Theorem 9.18. Change of coordinates in tensor products.


Assume that
• B1 , C1 are bases for RM , and that S1 is the change of coordinates matrix
from B1 to C1 ,
• B2 , C2 are bases for RN , and that S2 is the change of coordinates matrix
from B2 to C2 .
Both B1 ⊗ B2 and C1 ⊗ C2 are bases for LM,N (R), and if X is the coordinate
matrix in B1 ⊗ B2 , and Y the coordinate matrix in C1 ⊗ C2 , then the change of
coordinates from B1 ⊗ B2 to C1 ⊗ C2 can be computed as

Y = S1 X(S2 )T . (9.13)
Proof. Denote the change of coordinates from B1 ⊗ B2 to C1 ⊗ C2 by S. Since
any change of coordinates is linear, it is enough to show that S(ei ⊗ ej ) =
S1 (ei ⊗ ej )(S2 )T for any i, j. We can write

! !
X X X
b1i ⊗ b2j = (S1 )k,i c1k ⊗ (S2 )l,j c2l = (S1 )k,i (S2 )l,j (c1k ⊗ c2l )
k l k,l
X X
= (S1 )k,i ((S2 )T )j,l (c1k ⊗ c2l ) = (S1 ei (ej )T (S2 )T )k,l (c1k ⊗ c2l )
k,l k,l
X
= (S1 (ei ⊗ ej )(S2 )T )k,l (c1k ⊗ c2l )
k,l

This shows that the coordinate matrix of b1i ⊗ b2j in C1 ⊗ C2 is S1 (ei ⊗ ej )(S2 )T .
Since the coordinate matrix of b1i ⊗ b2j in B1 ⊗ B2 is ei ⊗ ej , this shows that
S(ei ⊗ ej ) = S1 (ei ⊗ ej )(S2 )T . The result follows.
In both cases of filtering and change of coordinates in tensor products, we
see that we need to compute the mapping X → S1 X(S2 )T . As we have seen,
this amounts to a row/column-wise operation, which we restate as follows:
Observation 9.19. Change of coordinates in tensor products.
The change of coordinates from B1 ⊗ B2 to C1 ⊗ C2 can be implemented as
follows:
• For every column in the coordinate matrix in B1 ⊗ B2 , perform a change
of coordinates from B1 to C1 .
• For every row in the resulting matrix, perform a change of coordinates
from B2 to C2 .

We can again use the function tensor_impl in order to implement change


of coordinates for a tensor product. We just need to replace the filters with the
functions S1 and S2 for computing the corresponding changes of coordinates:
CHAPTER 9. DIGITAL IMAGES 347

tensor_impl(X, S1, S2)

The operation X → (S1 )X(S2 )T , which we now have encountered in two different
2
ways, is one particular type of linear transformation from RN to itself (see
Exercise 9.23 for how the matrix of this linear transformation can be constructed).
While a general such linear transformation requires N 4 multiplications (i.e. when
we perform a full matrix multiplication), X → (S1 )X(S2 )T can be implemented
generally with only 2N 3 multiplications (since multiplication of two N × N -
matrices require N 3 multiplications in general). The operation X → (S1 )X(S2 )T
is thus computationally simpler than linear transformations in general. In
practice the operations S1 and S2 are also computationally simpler, since they
can be filters, FFT’s, or wavelet transformations, so that the complexity in
X → (S1 )X(S2 )T can be even lower.
In the following examples, we will interpret the pixel values in an image as
coordinates in the standard basis, and perform a change of coordinates.

Example 9.24: Change of coordinates with the DFT


The DFT is one particular change of coordinates which we have considered.
It was the change of coordinates from the standard basis to the Fourier basis.
A corresponding change of coordinates in a tensor product is obtained by
substituting the DFT as the functions S1 , S2 for implementing the changes
of coordinates above. The change of coordinates in the opposite direction is
obtained by using the IDFT instead of the DFT.
Modern image standards do typically not apply a change of coordinates to
the entire image. Rather the image is split into smaller squares of appropriate
size, called blocks, and a change of coordinates is performed independently for
each block. In this example we have split the image into blocks of size 8 × 8.
Recall that the DFT values express frequency components. The same applies
for the two-dimensional DFT and thus for images, but frequencies are now
represented in two different directions. Let us introduce a neglection threshold
in the same way as in Example 2.17, to view the image after we set certain
frequencies to zero. As for sound, this has little effect on the human perception
of the image, if we use a suitable neglection threshold. After we have performed
the two-dimensional DFT on an image, we can neglect DFT-coefficients below a
threshold on the resulting matrix X with the following code:
X *= (abs(X) >= threshold)

abs(X)>=threshold now instead returns a threshold matrix with 1 and 0 of the


same size as X.
In Figure 9.19 we have applied the two-dimensional DFT to our test image.
We have then neglected DFT coefficients which are below certain thresholds,
and transformed the samples back to reconstruct the image. When increasing
the threshold, the image becomes more and more unclear, but the image is quite
clear in the first case, where as much as more than 76.6% of the samples have
been zeroed out. A blocking effect at the block boundaries is clearly visible.
CHAPTER 9. DIGITAL IMAGES 348

Figure 9.19: The effect on an image when it is transformed with the DFT, and
the DFT-coefficients below a certain threshold are zeroed out. The threshold
has been increased from left to right, from 100, to 200, and 400. The percentage
of pixel values that were zeroed out are 76.6, 89.3, and 95.3, respectively.

Example 9.25: Change of coordinates with the DCT


Similarly to the DFT, the DCT was the change of coordinates from the standard
basis to what we called the DCT basis. Change of coordinates in tensor products
between the standard basis and the DCT basis is obtained by substituting with
the DCT and the IDCT for the changes of coordinates S1 , S2 above.
The DCT is used more than the DFT in image processing. In particular, the
JPEG standard applies a two-dimensional DCT, rather than a two-dimensional
DFT. With the JPEG standard, the blocks are always 8 × 8, as in the previous
example. It is of course not a coincidence that a power of 2 is chosen here: the
DCT, as the DFT, has an efficient implementation for powers of 2.
If we follow the same strategy for the DCT as for the DFT example, so that
we zero out DCT-coefficients which are below a given threshold 1 , and use the
same block sizes, we get the images shown in Figure 9.20. We see similar effects
as with the DFT.
It is also interesting to compare with what happens when we drop splitting
the image into blocks. Of course, when we neglect many of the DCT-coefficients,
we should see some artifacts, but there is no reason to believe that these should
be at the old block boundaries. The new artifacts can be seen in Figure 9.21,
where the same thresholds as before have been used. Clearly, the new artifacts
take a completely different shape.
In the exercises you will be asked to implement functions which generate the
images shown in these examples.

Exercise 9.26: Implement DFT and DCT on blocks


In this section we have used functions which apply the DCT and the DFT either
to subblocks of size 8 × 8, or to the full image. Implement functions DFTImpl8,
1 The JPEG standard does not do exactly the kind of thresholding described here. Rather

it performs what is called a quantization.


CHAPTER 9. DIGITAL IMAGES 349

Figure 9.20: The effect on an image when it is transformed with the DCT, and
the DCT-coefficients below a certain threshold are zeroed out. The threshold
has been increased from left to right, from 30, to 50, and 100. The percentage of
pixel values that were zeroed out are 93.2, 95.8, and 97.7, respectively.

Figure 9.21: The effect on an image when it is transformed with the DCT, and
the DCT-coefficients below a certain threshold are zeroed out. The image has
not been split into blocks here, and the same thresholds as in Figure 9.20 were
used. The percentage of pixel values that were zeroed out are 93.2, 96.6, and
98.8, respectively.

IDFTImpl8, DCTImpl8, and IDCTImpl8 which apply the DFT, IDFT, DCT, and
IDCT, to consecutive segments of length 8.

Exercise 9.27: Implement two-dimensional FFT and DCT


Implement functions DFTImplFull, IDFTImplFull, DCTImplFull, and IDCTImplFull
which applies the DFT, IDFT, DCT, and IDCT, to the entire vector, and use
these to implement. FFT2, IFFT2, DCT2, and IDCT2 on an image, with the
help of the function tensor_impl.
CHAPTER 9. DIGITAL IMAGES 350

Exercise 9.28: Zeroing out DCT coefficients


The function forw_comp_rev_DCT2 in the module forw_comp_rev applies the
DCT to a part of our sample image, and sets DCT coefficients below a certain
threshold to be zero. This is very similar to what the JPEG standard does.
Run forw_comp_rev_DCT2 for different threshold parameters, and with the
functions DCTImpl8, IDCTImpl8, DCTImplFull, and IDCTImplFull as parame-
ters. Check that this reproduces the DCT test images of this section, and that
the correct numbers of values which have been neglected (i.e. which are below
the threshold) are printed on screen.

Exercise 9.29: Comment code


Suppose that we have given an image by the matrix X. Consider the following
code:

threshold = 30
[M, N] = shape(X)[0:2]
for n in range(N):
FFTImpl(X[:, n], FFTKernelStandard)
for m in range(M):
FFTImpl(X[m, :], FFTKernelStandard)
X = X.*(abs(X) >= threshold)
for n in range(N):
FFTImpl(X[:, n], FFTKernelStandard, 0)
for m in range(M):
FFTImpl((X[m, :], FFTKernelStandard, 0)

Comment what the code does. Comment in particular on the meaning of the
parameter threshold, and what effect this has on the image.

9.5 Summary
We started by discussing the basic question what an image is, and took a closer
look at digital images. We then went through several operations which give
meaning for digital images. Many of these operations could be described in
terms of a row/column-wise application of filters, and more generally in term
of what we called computational molecules. We defined the tensor product,
and saw how our operations could be expressed within this framework. The
tensor product framework could also be used to state change of coordinates for
images, so that we could consider changes of coordinates such as the DFT and
the DCT also for images. The algorithm for computing filtering operations or
changes of coordinates for images turned out to be similar, in the sense that the
one-dimensional counterparts were simply assplied to the rows and the columns
in the image.
In introductory image processing textbooks, many other image processing
methods are presented. We have limited to the techniques presented here, since
CHAPTER 9. DIGITAL IMAGES 351

our interest in images is mainly for transformation operations which are useful
for compression. An excellent textbook on image processing which uses Matlab is
[18]. This contains important topics such as image restoration and reconstruction,
geometric transformations, morphology, and object recognition. None of these
are considered in this book.
In much literature, one only mentions that filtering can be extended to images
by performing one-dimensional filtering for the rows, followed by one-dimensional
filtering for the columns, without properly explaining why this is the natural
thing to do. The tensor product may be the most natural concept to explain this,
and a concept which is firmly established in mathematical literature. Tensor
products are usually not part of beginning courses in linear algebra. We have
limited the focus here to an introduction to tensor products, and the theory
needed to explain filtering an image, and computing the two-dimensional wavelet
transform. Some linear algebra books (such as [30]) present tensor products in
exercise form only, and often only mentions the Kronecker tensor product, as we
defined it.
Many international standards exist for compression of images, and we will
take a closer look at two of them in this book. The JPEG standard, perhaps the
most popular format for images on the Internet, applies a change of coordinates
with a two-dimensional DCT, as described in this chapter. The compression
level in JPEG images is selected by the user and may result in conspicuous
artefacts if set too high. JPEG is especially prone to artefacts in areas where
the intensity changes quickly from pixel to pixel. JPEG is usually lossy, but may
also be lossless and has become. The standard defines both the algorithms for
encoding and decoding and the storage format. The extension of a JPEG-file is
.jpg or .jpeg. JPEG is short for Joint Photographic Experts Group, and was
approved as an international standard in 1994. A more detailed description of
the standard can be found in [36].
The second standard we will consider is JPEG2000. It was developed to
address some of the shortcomings of JPEG, and is based on wavelets. The
standard document for this [21] does not focus on explaining the theory behind
the standard. As the MP3 standard document, it rather states step-by-step
procedures for implementing the standard.
The theory we present related to these image standards concentrate on
transforming the image (either with a DWT or a DCT) to obtain something
which is more suitable for (lossless or lossy) compression. However, many other
steps are also needed in order to obtain a full image compression system. One of
these is quantization. In the simplest form of quantization, every resulting sample
from the transformation is rounded to a fixed number of bits. Quantization can
also be done in more advanced ways than this: We have already mentioned that
the MP3 standard may use different number of bits for values in the different
subbands, depending on the importance of the samples for the human perception.
The JPEG2000 standard quantizes in such a way that there is bigger interval
around 0 which is quantized to 0, i.e. the rounding error is allowed to be bigger
in an interval around 0. Standards which are lossless do not apply quantization,
since this always leads to loss.
CHAPTER 9. DIGITAL IMAGES 352

Somewhere in the image processing or sound processing pipeline, we also


need a step which actually achieves compression of the data. Different standards
use different lossless coding techniques for this. JPEG2000 uses an advances
type of arithmetic coding for this. JPEG can also use arithmetic coding, but
also Huffman coding.
Besides transformation, quantization, and coding, many other steps are
used, which have different tasks. Many standards preprocess the pixel values
before a transform is applied. Preprocessing may mean to center the pixel
values around a certain value (JPEG2000 does this), or extracting the different
image components before they are processed separately. Also, the image is often
split into smaller parts (often called tiles), which are processed separately. For
big images this is very important, since it allows users to zoom in on a small
part of the image, without processing larger uninteresting parts of the image.
Independent processing of the separate tiles makes the image compression what
we call error-resilient, to errors such as transmission errors, since errors in one
tile does not propagate to errors in the other tiles. It is also much more memory-
friendly to process the image in several smaller parts, since it is not required
to have the entire image in memory at any time. It also gives possibilities for
parallel computing. For standards such as JPEG and JPEG2000, tiles are split
into even smaller parts, called blocks, where parts of the processing within each
block also is performed independently. This makes the possibilities for parallel
computing even bigger.
An image standard also defines how to store metadata about an image, and
what metadata is accepted, like resolution, time when the image was taken,
where the image was taken (such as GPS coordinates), and similar information.
Metadata can also tell us how the color in the image are represented. As we have
already seen, in most color images the color of a pixel is represented in terms of
the amount of red, green and blue or (r, g, b). But there are other possibilities
as well: Instead of storing all 24 bits of color information in cases where each of
the three color components needs 8 bits, it is common to create a table of up to
256 colors with which a given image could be represented quite well. Instead of
storing the 24 bits, one then just stores a color table in the metadata, and at
each pixel, the eight bits corresponding to the correct entry in the table. This is
usually referred to as eight-bit color, and the table is called a look-up table or
palette. For large photographs, however, 256 colors is far from sufficient to obtain
reasonable colour reproduction. Metadata is usually stored in the beginning of
the file, formatted in a very specific way.

What you should have learned in this chapter.

• How to read, write, and show images on your computer.


• How to extract different color components.
• How to convert from color to grey-level images.

• How to use functions for adjusting the contrast.


CHAPTER 9. DIGITAL IMAGES 353

• The operation X → S1 X(S2 )T can be used to define operations on images,


based on one-dimensional operations S1 and S2 . This amounts to applying
S1 to all columns in the image, and then S2 to all rows in the result. You
should know how this operation can be conveniently expressed with tensor
product notation, and that in the typical case when S1 and S2 are filters,
this can equivalently be expressed in terms of computational molecules.

• The case when the Si are smoothing filters gives rise to smoothing opera-
tions on images.
• A simple highpass filter, corresponding to taking the derivative, gives rise
to edge-detection operations on images.

• The operation X → S1 X(S2 )T can also be used to facilitate change of


coordinates in images, in addition to filtering images. In other words,
change of coordinates is done first column by column, then row by row.
The DCT and the DFT are particular changes of coordinates used for
images.
Chapter 10

Using tensor products to


apply wavelets to images

Previously we have used the theory of wavelets to analyze sound. We would also
like to use wavelets in a similar way to analyze images. Since the tensor product
concept constructs two dimensional objects (matrices) from one-dimensional
objects (vectors), we are lead to believe that tensor products can also be used to
apply wavelets to images. In this chapter we will see that this can indeed be
done. The vector spaces we Vm encountered for wavelets were function spaces,
however. What we therefore need first is to establish a general definition of
tensor products of function spaces. This will be done in the first section of this
chapter. In the second section we will then specialize the function spaces to the
spaces Vm we use for wavelets, and interpret the tensor product of these and the
wavelet transform applied to images more carefully. Finally we will look at some
examples on this theory applied to some example images.
The examples in this chapter can be run from the notebook applinalgnbchap10.ipynb.

10.1 Tensor product of function spaces


In the setting of functions, it will turn out that the tensor product of two
univariate functions can be most intiutively defined as a function in two variables.
This seems somewhat different from the strategy of Chapter 9, but we will see
that the results we obtain will be very similar.

Definition 10.1. Tensor product of function spaces.


Let U1 and U2 be vector spaces of functions, defined on the intervals [0, M )
and [0, N ), respectively, and suppose that f1 ∈ U1 and f2 ∈ U2 . The tensor
product of f1 and f2 , denoted f1 ⊗ f2 , is the function in two variables defined
on [0, M ) × [0, N ) by

(f1 ⊗ f2 )(t1 , t2 ) = f1 (t1 )f2 (t2 ).

354
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES355

f1 ⊗ f2 is also called the separable extension of f1 and f2 to two variables.


The tensor product of the spaces U1 ⊗ U2 is the vector space spanned by the
two-variable functions {f1 ⊗ f2 }f1 ∈U1 ,f2 ∈U2 .
We will always assume that the spaces U1 and U2 consist of functions which
are at least integrable. In this case U1 ⊗ U2 is also an inner product space, with
the inner product given by a double integral,
Z N Z M
hf, gi = f (t1 , t2 )g(t1 , t2 )dt1 dt2 . (10.1)
0 0
In particular, this says that

Z N Z M
hf1 ⊗ f2 , g1 ⊗ g2 i = f1 (t1 )f2 (t2 )g1 (t1 )g2 (t2 )dt1 dt2
0 0
Z M Z N
= f1 (t1 )g1 (t1 )dt1 f2 (t2 )g2 (t2 )dt2 = hf1 , g1 ihf2 , g2 i.
0 0
(10.2)

This means that for tensor products, a double integral can be computed as the
product of two one-dimensional integrals. This formula also ensures that inner
products of tensor products of functions obey the same rule as we found for
tensor products of vectors in Exercise 9.23.
The tensor product space defined in Definition 10.1 is useful for approximation
of functions of two variables if each of the two spaces of univariate functions
have good approximation properties.
Idea 10.2. Using tensor products for approximation.
If the spaces U1 and U2 can be used to approximate functions in one variable,
then U1 ⊗ U2 can be used to approximate functions in two variables.
We will not state this precisely, but just consider some important examples.

10.1.1 Tensor products of polynomials


Let U1 = U2 be the space of all polynomials of finite degree. We know that
U1 can be used for approximating many kinds of functions, such as continuous
Taylor series. The tensor product U1 ⊗ U1 consists of
functions, for example by P
all functions on the form i,j αi,j ti1 tj2 . It turns out that polynomials in several
variables have approximation properties analogous to univariate polynomials.

10.1.2 Tensor products of Fourier spaces


Let U1 = U2 = VN,T be the N th order Fourier space which is spanned by the
functions

e−2πiN t/T , . . . , e−2πit/T , 1, e2πit/T , . . . , e2πiN t/T


CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES356

The tensor product space U1 ⊗ U1 now consists of all functions on the form
N
X
αk,l e2πikt1 /T e2πilt2 /T .
k,l=−N

One can show that this space has approximation properties similar to VN,T for
functions in two variables. This is the basis for the theory of Fourier series in
two variables.
In the following we think of U1 ⊗ U2 as a space which can be used for
approximating a general class of functions. By associating a function with the
vector of coordinates relative to some basis, and a matrix with a function in two
variables, we have the following parallel to Theorem 9.16:
Theorem 10.3. Bases for tensor products of function spaces.
−1 −1
If {fi }M
i=0 is a basis for U1 and {gj }N
j=0 is a basis for U2 , then {fi ⊗
(M −1,N −1)
gj }(i,j)=(0,0) is a basis for U1 ⊗ U2 . Moreover, if the bases for U1 and U2 are
orthogonal/orthonormal, then the basis for U1 ⊗ U2 is orthogonal/orthonormal.
Proof. The proof is similar to that of Theorem 9.16: if
(M −1,N −1)
X
αi,j (fi ⊗ gj ) = 0,
(i,j)=(0,0)
PN −1 PM −1
we define hi (t2 ) = j=0 αi,j gj (t2 ). It follows as before that i=0 hi (t2 )fi = 0
for any t2 , so that hi (t2 ) = 0 for any t2 due to linear independence of the fi . But
then αi,j = 0 also, due to linear independene of the gj . The statement about
orthogonality follows from Equation (10.2).

We can now define the tensor product of two bases of functions as before,
and coordinate matrices as before:
Definition 10.4. Coordinate matrix.
−1 −1
if B = {fi }M
i=0 and C = {gj }Nj=0 , we define B ⊗ C as the basis {fi ⊗
(M −1,N −1)
gj }(i,j)=(0,0) for U1 ⊗ U2 . We say that X is the coordinate matrix of f if
P
f (t1 , t2 ) = i,j Xi,j (fi ⊗ gj )(t1 , t2 ), where Xi,j are the elements of X.

Theorem 9.18 can also be proved in the same way in the context of function
spaces. We state this as follows:
Theorem 10.5. Change of coordinates in tensor products of function spaces.
Assume that U1 and U2 are function spaces, and that

• B1 , C1 are bases for U1 , and that S1 is the change of coordinates matrix


from B1 to C1 ,

• B2 , C2 are bases for U2 , and that S2 is the change of coordinates matrix


from B2 to C2 .
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES357

Both B1 ⊗ B2 and C1 ⊗ C2 are bases for U1 ⊗ U2 , and if X is the coordinate matrix


in B1 ⊗ B2 , Y the coordinate matrix in C1 ⊗ C2 , then the change of coordinates
from B1 ⊗ B2 to C1 ⊗ C2 can be computed as

Y = S1 X(S2 )T . (10.3)

10.2 Tensor product of function spaces in a wavelet


setting
We will now specialize the spaces U1 , U2 from Definition 10.1 to the resolution
spaces Vm and the detail spaces Wm , arising from a given wavelet. We can in
particular form the tensor products φ0,n1 ⊗ φ0,n2 . We will assume that
−1
• the first component φ0,n1 has period M (so that {φ0,n1 }M
n1 =0 is a basis for
the first component space),
−1
• the second component φ0,n2 has period N (so that {φ0,n2 }N
n2 =0 is a basis
for the second component space).

When we speak of V0 ⊗ V0 we thus mean an M N -dimensional space with basis


(M −1,N −1)
{φ0,n1 ⊗ φ0,n2 }(n1 ,n2 )=(0,0) , where the coordinate matrices are M × N . This
difference in the dimension of the two components is done to allow for images
where the number of rows and columns may be different. In the following we
will implicitly assume that the component spaces have dimension M and N , to
ease notation. If we use that (φm−1 , ψm−1 ) also is a basis for Vm , we get the
following corollary to Theorem 10.3:
Corollary 10.6. Bases for tensor products.
Let φ, ψ be a scaling function and a mother wavelet. Then the two sets of
tensor products given by

φm ⊗ φm = {φm,n1 ⊗ φm,n2 }n1 ,n2


and

(φm−1 , ψm−1 ) ⊗ (φm−1 , ψm−1 )


= {φm−1,n1 ⊗ φm−1,n2 ,
φm−1,n1 ⊗ ψm−1,n2 ,
ψm−1,n1 ⊗ φm−1,n2 ,
ψm−1,n1 ⊗ ψm−1,n2 }n1 ,n2

are both bases for Vm ⊗ Vm . This second basis is orthogonal/orthonormal


whenever the first basis is.
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES358

From this we observe that, while the one-dimensional wavelet decomposition


splits Vm into a direct sum of the two vector spaces Vm−1 and Wm−1 , the
corresponding two-dimensional decomposition splits Vm ⊗ Vm into a direct sum
of four tensor product vector spaces. These vector spaces deserve individual
names:
Definition 10.7. Tensor product spaces.
We define the following tensor product spaces:
(0,1)
• The space Wm spanned by {φm,n1 ⊗ ψm,n2 }n1 ,n2 ,
(1,0)
• The space Wm spanned by {ψm,n1 ⊗ φm,n2 }n1 ,n2 ,
(1,1)
• The space Wm spanned by {ψm,n1 ⊗ ψm,n2 }n1 ,n2 .

Since these spaces are linearly independent, we can write


(0,1) (1,0) (1,1)
Vm ⊗ Vm = (Vm−1 ⊗ Vm−1 ) ⊕ Wm−1 ⊕ Wm−1 ⊕ Wm−1 . (10.4)
Also in the setting of tensor products we refer to Vm−1 ⊗ Vm−1 as the space of
(0,1) (1,0) (1,1)
low-resolution approximations. The remaining parts, Wm−1 , Wm−1 , and Wm−1 ,
are refered to as detail spaces. The coordinate matrix of

2m−1
XN (0,1)
(cm−1,n1 ,n2 (φm−1,n1 ⊗ φm−1,n2 ) + wm−1,n1 ,n2 (φm−1,n1 ⊗ ψm−1,n2 )+
n1 ,n2 =0
(1,0) (1,1)
wm−1,n1 ,n2 (ψm−1,n1 ⊗ φm−1,n2 ) + wm−1,n1 ,n2 (ψm−1,n1 ⊗ ψm−1,n2 ))
(10.5)

in the basis (φm−1 , ψm−1 ) ⊗ (φm−1 , ψm−1 ) is


 (0,1)

cm−1,0,0 · · · wm−1,0,0 ···
 .. .. .. .. 
. . . .
 
. (10.6)
 
 (1,0) (1,1)
 wm−1,0,0 · · · wm−1,0,0 ··· 
.. .. .. ..
 
. . . .
The coordinate matrix is thus split into four submatrices:

• The cm−1 -values, i.e. the coordinates for Vm−1 ⊕ Vm−1 . This is the upper
left corner in Equation (10.6).
(0,1) (0,1)
• The wm−1 -values, i.e. the coordinates for Wm−1 . This is the upper right
corner in Equation (10.6).
(1,0) (1,0)
• The wm−1 -values, i.e. the coordinates for Wm−1 . This is the lower left
corner in Equation (10.6).
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES359

(1,1) (1,1)
• The wm−1 -values, i.e. the coordinates for Wm−1 . This is the lower right
corner in Equation (10.6).
(i,j)
The wm−1 -values are as in the one-dimensional situation often refered to as
wavelet coefficients. Let us consider the Haar wavelet as an example.

Example 10.1: Piecewise constant functions


If Vm is the vector space of piecewise constant functions on any interval of the
form [k2−m , (k + 1)2−m ) (as in the piecewise constant wavelet), Vm ⊗ Vm is the
vector space of functions in two variables which are constant on any square of
the form [k1 2−m , (k1 + 1)2−m ) × [k2 2−m , (k2 + 1)2−m ). Clearly φm,k1 ⊗ φm,k2
is constant on such a square and 0 elsewhere, and these functions are a basis for
Vm ⊗ Vm .
Let us compute the orthogonal projection of φ1,k1 ⊗ φ1,k2 onto V0 ⊗ V0 . Since
the Haar wavelet is orthonormal, the basis functions in (10.4) are orthonormal,
and we can thus use the orthogonal decomposition formula to find this projection.
Clearly φ1,k1 ⊗ φ1,k2 has different support from all except one of φ0,n1 ⊗ φ0,n2 .
Since

√ √
2 2 1
hφ1,k1 ⊗ φ1,k2 , φ0,n1 ⊗ φ0,n2 i = hφ1,k1 , φ0,n1 ihφ1,k2 , φ0,n2 i = =
2 2 2
when the supports intersect, we obtain

1

 2 (φ0,k1 /2 ⊗ φ0,k2 /2 ) when k1 , k2 are even
 1 (φ

0,k1 /2 ⊗ φ0,(k2 −1)/2 ) when k1 is even, k2 is odd
projV0 ⊗V0 (φ1,k1 ⊗φ1,k2 ) = 12
 (φ0,(k1 −1)/2 ⊗ φ0,k2 /2 ) when k1 is odd, k2 is even
 12


2 (φ0,(k1 −1)/2 ⊗ φ0,(k2 −1)/2 ) when k1 , k2 are odd

So, in this case there were 4 different formulas, since there were 4 different
combinations of even/odd. Let us also compute the projection onto the orthogonal
complement of V0 ⊗V0 in V1 ⊗V1 , and let us express this in terms of the φ0,n , ψ0,n ,
like we did in the one-variable case. Also here there are 4 different formulas.
When k1 , k2 are both even we obtain
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES360

φ1,k1 ⊗ φ1,k2 − projV0 ⊗V0 (φ1,k1 ⊗ φ1,k2 )


1
= φ1,k1 ⊗ φ1,k2 − (φ0,k1 /2 ⊗ φ0,k2 /2 )
 2   
1 1 1
= √ (φ0,k1 /2 + ψ0,k1 /2 ) ⊗ √ (φ0,k2 /2 + ψ0,k2 /2 ) − (φ0,k1 /2 ⊗ φ0,k2 /2 )
2 2 2
1 1
= (φ0,k1 /2 ⊗ φ0,k2 /2 ) + (φ0,k1 /2 ⊗ ψ0,k2 /2 )
2 2
1 1 1
+ (ψ0,k1 /2 ⊗ φ0,k2 /2 ) + (ψ0,k1 /2 ⊗ ψ0,k2 /2 ) − (φ0,k1 /2 ⊗ φ0,k2 /2 )
2 2 2
1 1 1
= (φ0,k1 /2 ⊗ ψ0,k2 /2 ) + (ψ0,k1 /2 ⊗ φ0,k2 /2 ) + (ψ0,k1 /2 ⊗ ψ0,k2 /2 ).
2 2 2
Here we have used the relation φ1,ki = √12 (φ0,ki /2 + ψ0,ki /2 ), which we have from
our first analysis of the Haar wavelet. Checking the other possibilities we find
similar formulas for the projection onto the orthogonal complement of V0 ⊗ V0
in V1 ⊗ V1 when either k1 or k2 is odd. In all cases, the formulas use the basis
(0,1) (1,0) (1,1)
functions for W0 , W0 , W0 . These functions are shown in Figure 10.1,
together with the function φ ⊗ φ ∈ V0 ⊗ V0 .

1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.0 1.0
1.4 1.4
1.01.2
0.8 1.01.2
0.8
0.4 0.20.0
0.2 0.4 0.6 0.40.6
t_2 0.4 0.20.0
0.2 0.4 0.6 0.40.6
t_2
t_1 0.8 1.0 1.2 0.00.2
0.2 t_1 0.8 1.0 1.2 0.00.2
0.2
1.4 0.4 1.4 0.4

1.0 1.0
0.5 0.5
0.0 0.0
0.5 0.5
1.0 1.0
1.4 1.4
1.01.2
0.8 1.01.2
0.8
0.4 0.20.0
0.2 0.4 0.6 0.40.6
t_2 0.4 0.20.0
0.2 0.4 0.6 0.40.6
t_2
t_1 0.8 1.0 1.2 0.00.2
0.2 t_1 0.8 1.0 1.2 0.00.2
0.2
1.4 0.4 1.4 0.4

Figure 10.1: The functions φ ⊗ φ, φ ⊗ ψ, ψ ⊗ φ, ψ ⊗ ψ, which are bases for


(0,1) (1,0) (1,1)
(V0 ⊗ V0 ) ⊕ W0 ⊕ W0 ⊕ W0 for the Haar wavelet.
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES361

Example 10.2: Piecewise linear functions


If we instead use any of the wavelets for piecewise linear functions, the wavelet
basis functions are not orthogonal anymore, just as in the one-dimensional case.
The new basis functions are shown in Figure 10.2 for the alternative piecewise
linear wavelet.

1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.2 0.2
0.4 0.4
2.0 2.0
1.5 1.5
1.0 0.5 1.0 1.0 0.5 1.0
0.5t_2 0.5t_2
0.0 0.5 0.0 0.0 0.5 0.0
t_1 1.0 1.5 0.5 t_1 1.0 1.5 0.5
2.0 1.0 2.0 1.0

1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.2 0.2
0.4 0.4
2.0 2.0
1.5 1.5
1.0 0.5 1.0 1.0 0.5 1.0
0.5t_2 0.5t_2
0.0 0.5 0.0 0.0 0.5 0.0
t_1 1.0 1.5 0.5 t_1 1.0 1.5 0.5
2.0 1.0 2.0 1.0

Figure 10.2: The functions φ ⊗ φ, φ ⊗ ψ, ψ ⊗ φ, ψ ⊗ ψ, which are bases for


(0,1) (1,0) (1,1)
(V0 ⊗ V0 ) ⊕ W0 ⊕ W0 ⊕ W0 for the alternative piecewise linear wavelet.

10.2.1 Interpretation
An immediate corollary of Theorem 10.5 is the following:
Corollary 10.8. Implementing tensor product.
Let

Am = P(φm−1 ,ψm−1 )←φm


Bm = Pφm ←(φm−1 ,ψm−1 )
be the stages in the DWT and the IDWT, and let
!
(0,1)
(cm−1,i,j )i,j (wm−1,i,j )i,j
X = (cm,i,j )i,j Y = (1,0) (1,1) (10.7)
(wm−1,i,j )i,j (wm−1,i,j )i,j
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES362

be the coordinate matrices in φm ⊗ φm , and (φm−1 , ψm−1 ) ⊗ (φm−1 , ψm−1 ),


respectively. Then

Y = Am XATm (10.8)
T
X = Bm Y Bm (10.9)

By the m-level two-dimensional DWT/IDWT (or DWT2/IDWT2) we mean the


change of coordinates where this is repeated m times as in a DWT/IDWT.
It is straightforward to make implementations of DWT2 and IDWT2, in the
same way we implemented DWTImpl and IDWTImpl. In Exercise 10.7 you will
be asked to program functions DW2TImpl and IDW2TImpl for this. Each stage
in DWT2 and IDWT2 can now be implemented by substituting the matrices
Am , Bm above into the code following Theorem 9.18. When using many levels
of the DWT2, the next stage is applied only to the upper left corner of the
matrix, just as the DWT at the next stage only is applied to the first part of
the coordinates. At each stage, the upper left corner of the coordinate matrix
(which gets smaller at each iteration), is split into four equally big parts. This is
illustrated in Figure 10.3, where the different types of coordinates which appear
in the first two stages in a DWT2 are indicated.

Figure 10.3: Illustration of the different coordinates in a two level DWT2 before
the first stage is performed (left), after the first stage (middle), and after the
second stage (right).

It is instructive to see what information the different types of coordinates


in an image represent. In the following examples we will discard some types of
coordinates, and view the resulting image. Discarding a type of coordinates will
be illustrated by coloring the corresponding regions from Figure 10.3 black. As
an example, if we perform a two-level DWT2 (i.e. we start with a coordinate
matrix in the basis φ2 ⊗ φ2 ), Figure 10.4 illustrates first the collection of all
coordinates, and then the resulting collection of coordinates after removing
subbands at the first level successively.
Figure 10.5 illustrates in the same way incremental removal of the subbands
at the second level.
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES363

Figure 10.4: Graphical representation of neglecting the wavelet coefficients


at the first level. After applying DWT2, the wavelet coefficients are split into
four parts, as shown in the left figure. In the following figures we have removed
(1,1) (1,0) (0,1)
coefficients from W1 , W1 , and W1 , in that order.

Figure 10.5: Graphical representation of neglecting the wavelet coefficients


at the second level. After applying the second stage in DWT2, the wavelet
coefficients from the upper left corner are also split into four parts, as shown in
(1,1)
the left figure. In the following figures we have removed coefficients from W2 ,
(1,0) (0,1)
W2 , and W2 , in that order.

Before we turn to experiments on images using wavelets, we would like to make


another interpretation on the corners in the matrices after the DWT2, which cor-
respond to the different coordinates (cm−1,i,j )i,j , (w(0,1) )m−1,i,j , (w(1,0) )m−1,i,j ,
and (w(1,1) )m−1,i,j . It turns out that these corners have natural interpretations
in terms of the filter characterization of wavelets, as given in Chapter 6. Recall
again that in a DWT2, the DWT is first applied to the columns in the image,
then to the rows in the image. Recall first that the DWT2 applies first the DWT
to all columns, and then to all rows in the resulting matrix.
First the DWT is applied to all columns in the image. Since the first half of
the coordinates in a DWT are outputs from a lowpass filter H0 (Theorem 6.3),
the upper half after the DWT has now been subject to a lowpass filter to the
columns. Similarly, the second half of the coordinates in a DWT are outputs
from a highpass filter H1 (Theorem 6.3 again), so that the bottom half after the
DWT has been subject to a highpass filter to the columns.
Then the DWT is applied to all rows in the image. Similarly as when we
applied the DWT to the columns, the left half after the DWT has been subject
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES364

to the same lowpass filter to the rows, and the right half after the DWT has
been subject to the same highpass filter to the rows.
These observations split the resulting matrix after DWT2 into four blocks,
with each block corresponding to a combination of lowpass and highpass filters.
The following names are thus given to these blocks:

• The upper left corner is called the LL-subband,


• The upper right corner is called the LH-subband,
• The lower left corner is called the HL-subband,
• The lower right corner is called the HH-subband.

The two letters indicate the type of filters which have been applied (L=lowpass,
H=highpass). The first letter indicates the type of filter which is applied to the
columns, the second indicates which is applied to the rows. The order is therefore
important. The name subband comes from the interpretation of these filters as
being selective on a certain frequency band. In conclusion, a block in the matrix
after the DWT2 corresponds to applying a combination of lowpass/higpass filters
to the rows of the columns of the image. Due to this, and since lowpass filters
extract slow variations, highpass filters abrupt changes, the following holds:
Observation 10.9. Visual interpretation of the DWT2.
After the DWT2 has been applied to an image, we expect to see the following:

• In the upper left corner, slow variations in both the vertical and horizontal
directions are captured, i.e. this is a low-resolution version of the image.
• In the upper right corner, slow variations in the vertical direction are
captured, together with abrupt changes in the horizontal direction.
• In the lower left corner, slow variations in the horizontal direction are
captured, together with abrupt changes in the vertical direction.
• In the lower right corner, abrupt changes in both directions appear are
captured.

These effects will be studied through examples in the next section.

10.3 Experiments with images using wavelets


In this section we will make some experiments with images using the wavelets we
have considered. The wavelet theory is applied to images in the following way:
We first visualize the pixels in the image as coordinates in the basis φm ⊗ φm
(so that the image has size (2m M ) × (2m N )). As in the case for sound, this will
represent a good approximation wehn m is large. We then perform a change
of coordinates with the DWT2. As we did for sound, we can then either set
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES365

(i,j)
the detail components from the Wk -spaces to zero, or the low-resolution
approximation from V0 ⊗ V0 to zero, depending on whether we want to inspect
the detail components or the low-resolution approximation. Finally we apply
the IDWT2 to end up with coordinates in φm ⊗ φm again, and display the new
image with pixel values equal to these coordinates.

Example 10.3: Applying the Haar wavelet to a very simple


example image
Let us apply the Haar wavelet to the sample chess pattern example image from
Figure 9.17. The lowpass filter of the Haar wavelet was essentially a smoothing
filter with two elements. Also, as we have seen, the highpass filter essentially
computes an approximation to the partial derivative. Clearly, abrupt changes in
the vertical and horizontal directions appear here only at the edges in the chess
pattern, and abrupt changes in both directions appear only at the grid points in
the chess pattern. Due to Observation 10.9, after a DWT2 we expect to see the
following:
• In the upper left corner, we should see a low-resolution version of the
image.
• In the upper right corner, only the vertical edges in the chess pattern
should be visible.
• In the lower left corner, only the horizontal edges in the chess pattern
should be visible.
• In the lower right corner, only the grid points in the chess pattern should
be visible.
These effects are seen clearly if we apply one level of the DWT2 to the chess
pattern example image. The result of this can be seen in Figure 10.6.

Example 10.4: Creating thumbnail images


Let us apply the Haar wavelet to our sample image. After the DWT2, the upper
left submatrices represent the low-resolution approximations from Vm−1 ⊗ Vm−1 ,
Vm−2 ⊗ Vm−2 , and so on. We can now use the following code to store the
low-resolution approximation for m = 1:

DWT2Impl(X, 1, ’Haar’)
X = X[0:(shape(X)[0]/2), 0:(shape(X)[1]/2)]
mapto01(X); X *= 255

Note that here it is necessary to map the result back to [0, 255].
In Figure 10.7 the results are shown up to 4 resolutions. In Figure 10.8 we
have also shown the entire result after a 1- and 2-stage DWT2 on the image.
The first two thumbnail images can be seen as the the upper left corners of the
first two images. The other corners represent detail.
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES366

Figure 10.6: The chess pattern example image after application of the DWT2.
The Haar wavelet was used.

Figure 10.7: The corresponding thumbnail images for the Image of Lena,
obtained with a DWT of 1, 2, 3, and 4 levels.

Example 10.5: Detail and low-resolution approximations


for different wavelets
Let us take a closer look at the images generated when we use different wavelets.
Above we viewed the low-resolution approximation as a smaller image. Let us
compare with the image resulting from setting the wavelet detail coefficients to
zero, and viewing the result as an image of the same size. In particular, let us
neglect the wavelet coefficients as pictured in Figure 10.4 and Figure 10.5. We
should expect that the lower order resolution approximations from V0 are worse
when m increase.
Figure 10.9 confirms this for the lower order resolution approximations for
the Haar wavelet. Alternatively, we should see that the higher order detail spaces
contain more information as m increases. The result is shown in Figure 10.10.
Figures 10.11 and 10.12 confirms the same for the CDF 9/7 wavelet, which
also shows some improvement in the low resolution approximations. The black
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES367

Figure 10.8: The corresponding image resulting from a wavelet transform with
the Haar-wavelet for m = 1 and m = 2.

color indicates values which are close to 0. In other words, most of the coefficients
are close to 0.

Example 10.6: The Spline 5/3 wavelet and removing bands


in the detail spaces
Since the detail components in images are split into three bands, another thing
(1,1)
we can try is to neglect only parts of the detail components (i.e.e some of Wm ,
(1,0) (0,1)
Wm , Wm ), contrary to the one-dimensional case. Let us use the Spline 5/3
wavelet.
The resulting images when the bands on the first level indicated in Figure 10.4
are removed are shown in Figure 10.13. The corresponding plot for the second
level is shown in Figure 10.14.
The image is seen still to resemble the original one, even after two levels of
wavelets coefficients have been neglected. This in itself is good for compression
purposes, since we may achieve compression simply by dropping the given
coefficients. However, if we continue to neglect more levels of coefficients, the
result will look poorer.
In Figure 10.15 we have also shown the resulting image after the third and
fourth levels of detail have been neglected. Although we still can see details in
the image, the quality in the image is definitely poorer. Although the quality is
poorer when we neglect levels of wavelet coefficients, all information is kept if
we additionally include the detail/bands.
In Figure 10.16, we have shown the corresponding detail for DWT’s with 1,
2, 3, and 4 levels. Clearly, more detail can be seen in the image when more of
the detail is included.
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES368

Figure 10.9: Low resolution approximations of the Lena image, for the Haar
wavelet.

As mentioned, the procedure developed in this section for applying a wavelet


transform to an image with the help of the tensor product construction, is
adopted in the JPEG2000 standard. This lossy (can also be used as lossless)
image format was developed by the Joint Photographic Experts Group and
published in 2000. After significant processing of the wavelet coefficients, the
final coding with JPEG2000 uses an advanced version of arithmetic coding.
At the cost of increased encoding and decoding times, JPEG2000 leads to as
much as 20 % improvement in compression ratios for medium compression rates,
possibly more for high or low compression rates. The artefacts are less visible
than in JPEG and appear at higher compression rates. Although a number of
components in JPEG2000 are patented, the patent holders have agreed that the
core software should be available free of charge, and JPEG2000 is part of most
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES369

Figure 10.10: Detail of the Lena image, for the Haar wavelet.

Linux distributions. However, there appear to be some further, rather obscure,


patents that have not been licensed, and this may be the reason why JPEG2000
is not used more. The extension of JPEG2000 files is .jp2.

Exercise 10.7: Implement two-dimensional DWT


Implement functions

DWT2Impl_internal(x, nres, f, bd_mode)


IDWT2Impl_internal(x, nres, f, bd_mode)

which perform the m-level DWT2 and the IDWT2, respectively, on an image.
The arguments are the as those in DWTImpl_internal and IDWTImpl_internal,
with the input vector x replaced with a two-dimensional object/image. The
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES370

Figure 10.11: Low resolution approximations of the Lena image, for the CDF
9/7 wavelet.

functions should at each stage apply the kernel function f to the appropriate
rows and columns. If the image has several color components, the functions
should be applied to each color component (there are three color components in
the test image ’lena.png’).

Exercise 10.8: Comment code


Assume that we have an image represented by the M × N -matrix X, and consider
the following code:

for n in range(N):
c = (X[0:M:2, n] + X[1:M:2, n])/sqrt(2)
w = (X[0:M:2, n] - X[1:M:2, n])/sqrt(2)
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES371

Figure 10.12: Detail of the Lena image, for the CDF 9/7 wavelet.

X[:, n] = concatenate([c, w])


for m in range(M):
c = (X[m, 0:N:2] + X[m, 1:N:2])/sqrt(2)
w = (X[m, 0:N:2] - X[m, 1:N:2])/sqrt(2)
X[m, :] = concatenate([c,w])

a) Comment what the code does, and explain what you will see if you display X
as an image after the code has run.
b) The code above has an inverse transformation, which reproduce the original
image from the transformed values which we obtained. Assume that you zero
out the values in the lower left and the upper right corner of the matrix X after
the code above has run, and that you then reproduce the image by applying this
inverse transformation. What changes can you then expect in the image?
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES372

Figure 10.13: Image of Lena, with various bands of detail at the first level
(1,1) (1,0) (0,1)
zeroed out. From left to right, the detail at W1 , W1 , W1 , as illustrated
in Figure 10.4. The Spline 5/3 wavelet was used.

Figure 10.14: Image of Lena, with various bands of detail at the second level
(1,1) (1,0) (0,1)
zeroed out. From left to right, the detail at W2 , W2 , W2 , as illustrated
in Figure 10.5. The Spline 5/3 wavelet was used.

Exercise 10.9: Comment code


In this exercise we will use the filters G0 = {1, 1}, G1 = {1, −1}.
a) Let X be a matrix which represents the pixel values in an image. Define
x = (1, 0, 1, 0) and y = (0, 1, 0, 1). Compute (G0 ⊗ G0 )(x ⊗ y).
b) For a general image X, describe how the images (G0 ⊗ G0 )X, (G0 ⊗ G1 )X,
(G1 ⊗ G0 )X, and (G1 ⊗ G1 )X may look.
c) Assume that we run the following code on an image represented by the matrix
X:

M, N = shape(X)
for n in range(N):
c = X[0:M:2, n] + X[1:M:2, n]
w = X[0:M:2, n] - X[1:M:2, n]
X[:, n] = vstack([c,w])
for m in range(M):
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES373

Figure 10.15: Image of Lena, with detail including level 3 and 4 zeroed out.
The Spline 5/3 wavelet was used.

c = X[m, 0:N:2] + X[m, 1:N:2]


w = X[m, 0:N:2] - X[m, 1:N:2]
X[m, :] = hstack([c,w])

Comment the code. Describe what will be shown in the upper left corner of
X after the code has run. Do the same for the lower left corner of the matrix.
What is the connection with the images (G0 ⊗ G0 )X, (G0 ⊗ G1 )X, (G1 ⊗ G0 )X,
and (G1 ⊗ G1 )X?

Exercise 10.10: Experiments on a test image


In Figure 10.17 we have applied the DWT2 with the Haar wavelet to an image
very similar to the one you see in Figure 10.6. You see here, however, that there
seems to be no detail components, which is very different from what you saw
in Example 10.3, even though the images are very similar. Attempt to explain
what causes this to happen.

Hint. Compare with Exercise 5.21.


CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES374

Figure 10.16: The corresponding detail for the image of Lena. The Spline 5/3
wavelet was used.
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES375

Figure 10.17: A simple image before and after one level of the DWT2. The
Haar wavelet was used.
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES376

10.4 An application to the FBI standard for com-


pression of fingerprint images
In the beginning of the 1990s, the FBI had a major problem when it came to their
archive of fingerprint images. With more than 200 million fingerprint records,
their digital storage exploded in size, so that some compression strategy needed
to be employed. Several strategies were tried, for instance the widely adopted
JPEG standard. The problem with JPEG had to do with the blocking artefacts,
which we saw in Section 9.4. Among other strategies, FBI chose a wavelet-based
strategy due to its nice properties. The particular way wavelets are applied in
this strategy is called Wavelet transform/scalar quantization (WSQ).

Figure 10.18: A typical fingerprint image.

Fingerprint images are a very specific type of images, as seen in Figure 10.18.
They differ from natural images by having a large number of abrupt changes.
One may ask whether other wavelets than the ones we have used up to now are
more suitable for compressing such images. After all, the technique of vanishing
moments we have used for constructing wavelets are most suitable when the
images display some regularity (as many natural images do). Extensive tests
were undertaken to compare different wavelets, and the CDF 9/7 wavelet used
by JPEG2000 turned out to perform very well, also for fingerprint images. One
advantage with the choice of this wavelet for the FBI standard is that one then
can exploit existing wavelet transformations from the JPEG2000 standard.
Besides the choice of wavelet, one can also ask other questions in the quest to
compress fingerprint images: What number of levels is optimal in the application
of the DWT2? And, while the levels in a DWT2 (see Figure 10.3) have an
interpretation as change of coordinates, one can apply a DWT2 to the other
subbands as well. This can not be interpreted as a change of coordinates, but
if we assume that these subbands have the same characteristics as the original
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES377

image, the DWT2 will also help us with compression when applied to them.
Let us illustrate how the FBI standard applies the DWT2 to the different
subbands. We will split this process into five stages. The subband structures
and the resulting images after stage 1-4 are illustrated in Figure 10.19 and in
Figure 10.20, respectively.

Figure 10.19: Subband structure after the different stages of the wavelet
applications in the FBI fingerprint compression scheme.

1. First apply the first stage in a DWT2. This gives the upper left corners in
the two figures.

2. Then apply a DWT2 to all four resulting subbands. This is different from
the DWT2, which only continues on the upper left corner. This gives the
upper right corners in the two figures.
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES378

Figure 10.20: The fingerprint image after several DWT’s.

3. Then apply a DWT2 in three of the four resulting subbands. This gives
the lower left corners.

4. In all remaining subbands, the DWT2 is again applied. This gives the
lower right corners.

Now for the last stage. A DWT2 is again applied, but this time only to the upper
left corner. The subbands are illustrated in Figure 10.21, and in Figure 10.22
the resulting image is shown.
When establishing the standard for compression of fingerprint images, the
FBI chose this subband decomposition. In Figure 10.23 we also show the
corresponding low resolution approximation and detail.
As can be seen from the subband decomposition, the low-resolution approxi-
mation is simply the approximation after a five stage DWT2.
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES379

Figure 10.21: Subbands structure after all stages.

Figure 10.22: The resulting image obtained with the subband decomposition
employed by the FBI.

The original JPEG2000 standard did not give the possibility for this type
of subband decomposition. This has been added to a later extension of the
standard, which makes the two standards more compatible. IN FBI’s system,
there are also other important parts besides the actual compression strategy,
such as fingerprint pattern matching: In order to match a fingerprint quickly
with the records in the database, several characteristics of the fingerprints are
stored, such as the number of lines in the fingerprint, and points where the lines
split or join. When the database is indexed with this information, one may not
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES380

Figure 10.23: The low-resolution approximation and the detail obtained by the
FBI standard for compression of fingerprint images, when applied to our sample
fingerprint image.

need to decompress all images in the database to perform matching. We will


not go into details on this here.

Exercise 10.11: Implement the fingerprint compression scheme


Write code which generates the images shown in figures 10.20, 10.22, and 10.23.
Use the functions DW2TImpl and IDW2TImpl with the CDF 9/7 wavelet kernel
functions as input.

10.5 Summary
We extended the tensor product construction to functions by defining the tensor
product of functions as a function in two variables. We explained with some
examples that this made the tensor product formalism useful for approximation
of functions in several variables. We extended the wavelet transform to the tensor
product setting, so that it too could be applied to images. We also performed
several experiments on our test image, such as creating low-resolution images and
neglecting wavelet coefficients. We also used different wavelets, such as the Haar
wavelet, the Spline 5/3 wavelet, and the CDF 9/7 wavelet. The experiments
confirmed what we previously have proved, that wavelets with many vanishing
moments are better suited for compression purposes.
The specification of the JPGE2000 standard can be found in [21]. In [46],
most details of this theory is covered, in particular details on how the wavelet
coefficients are coded (which is not covered here).
One particular application of wavelets in image processing is the compression
of fingerprint images. The standard which describes how this should be performed
CHAPTER 10. USING TENSOR PRODUCTS TO APPLY WAVELETS TO IMAGES381

can be found in [15]. In [4], the theory is described. The book [16] uses the
application to compression of fingerprint images as an example of the usefulness
of recent developments in wavelet theory.

What you should have learned in this chapter.


• The special interpretation of DWT2 applied to an image as splitting into
four types of coordinates (each being one corner of the image), which rep-
resent lowpass/highpass combinations in the horizontal/vertical directions.

• How to call functions which perform different wavelet transformations on


an image.
• Be able to interpret the detail components and low-resolution approxima-
tions in what you see.
Chapter 11

The basics and applications

The problem of minimizing a function of several variables, possibly subject


to constraints on these variables, is what optimization is about. So the main
problem is easy to state! And, more importantly, such problems arise in many
applications in natural science, engineering, economics and business as well as in
mathematics itself.
Nonlinear optimization differs from Fourier analysis and wavelet theory in that
classical multivariate analysis also is an important ingredient. A recommended
book on this, used here at the University of Oslo, is [26] (in Norwegian). It
contains a significant amount of fixed point theory, nonlinear equations, and
optimization.
There are many excellent books on nonlinear optimization (or nonlinear
programming, as it is also called). Some of these books that have influenced
these notes are [2, 3, 27, 19, 41, 32]. These are all recommended books for those
who want to go deeper into the subject. These lecture notes are particularly
influenced by the presentations in [2, 3].
Optimization has its mathematical foundation in linear algebra and multi-
variate calculus. In analysis the area of convexity is especially important. For
the brief presentation of convexity given here the author’s own lecture notes [11]
(originally from 2001), and the very nice book [49], have been useful sources.
But, of course, anyone who wants to learn convexity should study the work by
R.T. Rockafellar, see e.g. the classic text [40].
Linear optimization (LP, linear programming) is a special case of nonlinear
optimization, but we do not discuss this in any detail here. The reason for this is
that we, at the University of Oslo, have a separate course in linear optimization
which covers many parts of that subject in some detail.
This first chapter introduces some of the basic concepts in optimization and
discusses some applications. Many of the ideas and results that you will find
in these lecture notes may be extended to more general linear spaces, even
infinite-dimensional. However, to keep life a bit easier and still cover most
applications, we will only be working in Rn .

382
CHAPTER 11. THE BASICS AND APPLICATIONS 383

Due to its character this chapter is a “proof-free zone”, but in the remaining
text we usually give full proofs of the main results.

Notation. For z ∈ Rn and δ > 0 define the (closed) ball B̄(z; ) = {x ∈ Rn :


kx − zk ≤ }. It consists of all points with distance at most  from z. Similarly,
define the open ball B(z; ) = {x ∈ Rn : kx − zk < }. A neighborhood of z
is a set N containing B(z; ) for some  > 0. Vectors are treated as column
vectors and they are identified with the corresponding n-tuple, denoted by
x = (x1 , x2 , . . . , xn ). A statement like

P (x) (x ∈ H)
means that the statement P (x) is true for all x ∈ H.

11.1 The basic concepts


Optimization deals with finding optimal solutions! So we need to define what
this is.
Let f : Rn → R be a real-valued function in n variables. The function value
is written as f (x), for x ∈ Rn , or f (x1 , x2 , . . . , xn ). This is the function we
want to minimize (or maximize) and it is often called the objective function. Let
x∗ ∈ Rn . Then x∗ is a local minimum (or local minimizer) of f if there is an
 > 0 such that

f (x∗ ) ≤ f (x) for all x ∈ B(x∗ ; ).


So, no point “sufficiently near” x∗ has smaller f -value than x∗ . A local maximum
is defined similarly, but with the inequality reversed. A stronger notion is that
x∗ is a global minimum of f which means that

f (x∗ ) ≤ f (x) for all x ∈ Rn .


A global maximum satisfies the opposite inequality.
The definition of local minimum has a “variational character”; it concerns the
behavior of f near x∗ . Due to this it is perhaps natural that Taylor’s formula,
which gives an approximation of f in such a neighborhood, becomes a main tool
for characterizing and finding local minima. We present Taylor’s formula, in
different versions, in Section 11.3.
An extension of the notion of minimum and maximum is for constrained
problems where we want, for instance, to minimize f (x) over all x lying in a
given set C. Then x∗ ∈ C is a local minimum of f over the set C, or subject to
x ∈ C as we shall say, provided no point in C in some neighborhood of x∗ has
smaller f -value than x∗ . A similar extension holds for global minimum over C,
and for maxima.
CHAPTER 11. THE BASICS AND APPLICATIONS 384

An example from plane geometry. Consider the point set C = {(x1 , x2 ) :


x1 ≥ 0 x2 ≥ 0, x1 + x2 ≤ 1} in the plane. We want to find a point x = (x1 , x2 ) ∈
C which is closest possible to the point a = (3, 2). This can be formulated as
the minimization problem

minimize (x1 − 3)2 + (x2 − 2)2


subject to
x1 + x2 ≤ 1
x1 ≥ 0, x2 ≥ 0.
The function we want to minimize is f (x) = (x1 − 3)2 + (x2 − 2)2 which is a
quadratic function. This is the square of the distance between x and a; and
minimizing the distance or the square of the distance is equivalent (why?). A
minimum here is x∗ = (1, 0), as can be seen from a simple geometric argument
where we draw the normal from (3, 2) to the line x1 + x2 = 1. If we instead
minimize f over R2 , the unique global minimum is clearly x∗ = a = (3, 2). It is
also useful, and not too hard, to find these minima analytically.
In optimization one considers minimization and maximization problems. As

max{f (x) : x ∈ S} = − min{−f (x) : x ∈ S}


it is clear how to convert a maximization problem into a minimization problem
(or vise versa). This transformation may, however, change the properties of the
function you work with. For instance, if f is convex (definitions come later!),
then −f is not convex (unless f is linear), so rewriting between minimization
and maximization may take you out of a class of “good problems”. Note that a
minimum or maximum may not exist. A main tool one uses to establish that
optimal solutions really exist is the extreme value theorem as stated next. You
may want to look these notions up in [26].
Theorem 11.1. Continuous functions on closed and bounded sets.
Let C be a subset of Rn which is closed and bounded, and let f : C → R be
a continuous function. Then f attains both its (global) minimum and maximum,
so these are points x1 , x2 ∈ C with

f (x1 ) ≤ f (x) ≤ f (x2 ) (x ∈ C).

11.2 Some applications


It is useful to see some application areas for optimization. They are many, and
here we mention a few in some detail. The methods we will learn later will be
applied to these examples.

Portfolio optimization. The following optimization problem was introduced


by Markowitz in order to find an optimal portfolio in a financial market; he later
CHAPTER 11. THE BASICS AND APPLICATIONS 385

received the Nobel prize in economics 1 (in 1990) for his contributions in this
area:
P Pn
minimize α i,j≤n cij xi xj − j=1 µj xj
subject to P
n
j=1 xj = 1
xj ≥ 0 (j ≤ n).
The model may be understood as follows. The decision variables are x1 , x2 ,
. . . , xn where xi is the fraction of a total investment that is made in (say) stock
i. Thus one has available a set of stocks in different companies (Statoil, IBM,
Apple etc.) or bonds. The fractions xi must be nonnegative (so we consider no
short sale) and add up to 1. The function f to be minimized is

X n
X
f (x) = α cij xi xj − µj xj .
i,j≤n j=1

It can be explained in terms of random variables. Let Rj be the return on stock


j, this is a random variable, and let µjP= ERj be the expectation of Rj . So
n
if X denotes the random variable X = j=1 xj Rj , which is the return on our
Pn
portfolio (= mix among investments), then EX = j=1 µj xj which is the second
term in f . The minus sign in front explains that we really want to maximize the
expected return. The first term in f is there because just looking at expected
return is too simple. We want to spread our investments to reduce the risk. The
first term in f is the variance of X multiplied by a weight factor α; the constant
cij is the covariance of Ri and Rj , defined by

cij = E(Ri − µi )(Rj − µj ).


cii is also called the variance of Ri .
So f is a weighted difference of variance and expected return. This is what
we want to minimize. The optimization problem is to minimize a quadratic
function subject to linear constraints. We shall discuss theory and methods for
such problems later.
In order to use such a model one needs to find good values for all the
parameters µj and cij ; this is done using historical data from the stock markets.
The weight parameter α is often varied and the optimization problem is solved
for each such “interesting” value. This makes it possible to find a so-called
efficient frontier of expectation versus variance for optimal solutions.
The Markowitz model is a useful tool for financial investments, and now
extensions and variations of the model exist, e.g., by using different ways of
measuring risk. All such models involve a balance between risk and expected
return.
1 The precise term is "Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred

Nobel".
CHAPTER 11. THE BASICS AND APPLICATIONS 386

Fitting a model. In many applications one has a mathematical model of


some phenomenon where the model has some parameters. These parameters
represent a flexibility of the model, and they may be adjusted so that the model
explains the phenomenon best possible.
To be more specific consider a model

y = Fα (x)
for some function Fα : R → R. Here α = (α1 , α2 , . . . , αn ) ∈ Rn is a parameter
m

vector (so we may have several parameters). Perhaps there are natural constraints
on the parameter, say α ∈ A for a given set A in Rn .
For instance, consider

y = α1 cos x1 + xα
2
2

so here n = m = 2, α = (α1 , α2 ) and Fα (x) = α1 cos x1 + xα


2 where (say)
2

α1 ∈ R and α2 ∈ [1, 2].


The general model may also be thought of as

y = Fα (x) + error
since it is usually a simplification of the system one considers. In statistics
one specifies this error term as a random variable with some (partially) known
distribution. Sometimes one calls y the dependent variable and x the explaining
variable. The goal is to understand how y depends on x.
To proceed, assume we are given a number of observations of the phenomenon
given by points

(xi , y i ) (i = 1, 2, . . . , m).
meaning that one has observed y i corresponding to x = xi . We have m such
observations. Usually (but not always) we have m ≥ n. The model fit problem
is to adjust the parameter α so that the model fits the given data as good as
possible. ThisPleads to the optimization problem
m
minimize i=1 (y i − Fα (xi ))2 subject to α ∈ A.
The optimization variable is the parameter α. Here the model error is
quadratic (corresponding to the Euclidean norm), but other norms are also used.
This optimization problem above is a constrained nonlinear optimization
problem. When the function Fα depends linearly on α, which often is the case in
practice, the problem becomes the classical least squares approximation problem
which is treated in basic linear algebra courses. The solution is then characterized
by a certain linear system of equations, the so-called normal equations.

Maximum likelihood. A very important problem in statistics, arising in


many applications, is parameter estimation and, in particular, maximum likeli-
hood estimation. It leads to optimization.
Let X be a “continuous” real-valued random variable with probability densisty
p(x; α). Here α is a parameter (often one uses other symbols for the parameter,
CHAPTER 11. THE BASICS AND APPLICATIONS 387

like ξ, θ etc.). For instance, if X is a normal (Gaussian) variable with expectation


2
α and variance 1, then p(x; α) = √12π e−(x−α) /2 and
Z b
1 2
P(a ≤ X ≤ b) = √ e−(x−α) /2 dx
a 2π
where P denotes probability.
Assume X is the outcome of an experiment, and that we have observed
X = x (so x is a known real number or a vector, if several observations were
made). On the basis of x we want to estimate the value of the parameter α
which “explains” best possible our observation X = x. We have now available
the probability density p(x; ·). The function α → p(x; α), for fixed x, is called
the likelihood function. It gives the “probability mass” in x as a function of the
parameter α. The maximum likelihood problem is to find a parameter value α
which maximizes the likelihood, i.e., which maximizes the probability of getting
precisely y. This is an optimization problem

max p(x; α)
α

where x is fixed and the optimization variable is α. We may here add a constraint
on α, say α ∈ C for some set C, which may incorporate possible knowledge of
α and assure that p(x; α) is positive for α ∈ C. Often it is easier to solve the
equivalent optimization problem of maximizing the logarithm of the likelihood
function

max ln p(x; α)
α

This is a nonlinear optimization problem. Often, in statistics, there are several


parameters, so α ∈ Rn for some n, and we need to solve a nonlinear optimization
problem in several variables, possibly with constraints on these variables. If
the likelihood function, or its logarithm, is a concave function, we have (after
multiplying by −1) a convex optimization problem. Such problems are easier to
solve than general optimization problems. This will be discussed later.
As a specific example assume we have the linear statistical model

x = Aα + w
where A is a given m × n matrix, α ∈ Rn is an unknown parameter, w ∈ Rm is a
random variable (the “noise”), and x ∈ Rn is the observed quantity. We assume
that the components of w, i.e., w1 , w2 , . . . , wm are independent and identically
distributed with common density function p on R. This leads to the likelihood
function
m
Y
p(x; α) = p(xi − ai α)
i=1

where ai is the i’th row in A. Taking the logarithm we obtain the maximum
likelihood problem
CHAPTER 11. THE BASICS AND APPLICATIONS 388

m
X
max ln p(xi − ai α).
i=1

In many applications of statistics is is central to solve this optimization problem


numerically.
Let us take a look at a model take from physics for desintegration of muons.
The angle θ in electron radiation for desintegration of muons has a probability
density
1 + αx
p(x; α) = (11.1)
2
for x ∈ [−1, 1], where x = cos θ, and where α is an unknown parameter in
[−1, 1]. Our goal is to estimate α from n measurements x = (x1 , . . . , xn ). In
this case
Qnthe likelihood function, which we seek to maximize, takes the form
g(α) = i=1 p(xi ; α). Taking logarithms and multiplying by −1, our problem is
to minimize

n
! n
Y X
f (α) = − ln g(α) = − ln p(xi ; α) =− ln((1 + αxi )/2). (11.2)
i=1 i=1

We compute

n n
X xi /2 X xi
f 0 (α) = − =−
i=1
(1 + αxi )/2 i=1
1 + αxi
n
X x2i
f 00 (α) =
i=1
(1 + αxi )2

We see that f 00 (α) ≥ 0, so that f is convex. As explained, this will make the
problem easier to solve using numerical methods. If we try to solve f 0 (α) = 0
we will run into problems, however. We see, however, that f 0 (α) → 0 when
xi
α → ±∞, and since 1+αx i
= 1/x1i +α , we must have that f 0 (α) → ∞ when
α → −1/xi from below, and f 0 (α) → −∞ when α → −1/xi from above. It is
therefore clear that f has exactly one minimum in every interval of the form
[−1/xi , −1/xi+1 ] when we list the xi in increasing order. It is not for sure that
there is a minimum within [−1, 1] at all. If all measurements have the same sign
we are guaranteed to find no such point. In this case the minimum must be one
of the end points in the interval. We will later look into numerical method for
finding this minimum.

Optimal control problems. Recall that a discrete dynamical system is an


equation

xt+1 = ht (xt ) (t = 0, 1, . . .)
CHAPTER 11. THE BASICS AND APPLICATIONS 389

where xt ∈ Rn , x0 is the initial solution, and ht is a given function for each


t. We here think of t as time and xt is the state of the process at time t. For
instance, let n = 1 and consider ht (x) = ax (t = 0, 1, . . .) for some a ∈ R. Then
the solution is xt = at x0 . Another example is when A is an n × n matrix,
xt ∈ Rn and ht (x) = Ax for each t. Then the solution is xt = At x0 . For the
more general situation, where the system functions ht may be different, it may
be difficult to find an explicit solution for xt . Numerically, however, we compute
xt simply in a for-loop by computing x0 , then x1 = f1 (x0 ) and then x2 = f2 (x1 )
etc.
Now, consider a dynamical system where we may “control” the system in
each time step. We restrict the attention to a finite time span, t = 0, 1, . . . , T .
A proper model is then

xt+1 = ht (xt , ut ) (t = 0, 1, . . . , T − 1)
where xt is the state of the system at time t and the new variable ut is the
control at time t. We assume xt ∈ Rn and ut ∈ Rm for each t (but these
things also work if these vectors lie in spaces of different dimensions). Thus,
when we choose the controls u0 , u1 , . . . , uT −1 and x0 is known, the sequence
{xt } of states is uniquely determined. Next, assume there are given functions
ft : Rn × Rm → R that we call cost functions. We think of ft (xt , ut ) as the
“cost” at time t when the system is in state xt and we choose control ut . The
optimal control problem is

PT −1
minimize fT (xT ) + t=0 ft (xt , ut )
subject to (11.3)
xt+1 = ht (xt , ut ) (t = 0, 1, . . . , T − 1)
where the control is the sequence (u0 , u1 , . . . , uT −1 ) to be determined. This
problem arises an many applications, in engineering, finance, economics etc. We
now rewrite this problem. First, let u = (u1 , u2 , . . . , uT ) ∈ RN where N = T n.
Since, as we noted, xt is uniquely determined by u, there is a function vt such
that xt = vt (u) (t = 1, 2, . . . , T ); x0 is given. Therefore the total cost may be
written

T
X −1 T
X −1
fT (xT ) + ft (xt , ut ) = fT (vT (u)) + ft (vt (u), ut ) := f (u)
t=0 t=0

which is a function of u. Thus, we see that the optimal control problem may be
transformed to the unconstrained optimization problem

min f (u)
u∈RN

Sometimes there may be constraints on the control variables, for instance that
they each lie in some interval, and then the transformation above results in a
constrained optimization problem.
CHAPTER 11. THE BASICS AND APPLICATIONS 390

Linear optimization. This is not an application, but rather a special case


of the general nonlinear optimization problem where all functions are linear. A
linear optimization problem, also called linear programming, has the form

minimize cT x subject to Ax = b and x ≥ 0. (11.4)


m
Here A is an m × n matrix, b ∈ R and x ≥ 0 means that xi ≥ 0 for each i ≤ n.
So in linear optimization one minimizes (or maximizes) a linear function subject
to linear equations and nonnegativity on the variables. Actually, one can show
any problem with constraints that are linear equations and/or linear inequalities
may be transformed into the form above. Such problems have a wide range
of application in science, engineering, economics, business etc. Applications
include portfolio optimization and many planning problems for e.g. production,
transportation etc. Some of these problems are of a combinatorial nature, but
linear optimization is a main tool here as well.
We shall not treat linear optimization in detail here since this is the topic
of a separate course, MAT-INF3100 Linear optimization. In that course one
presents some powerful methods for such problems, the simplex algorithm and
interior point methods. In addition one considers applications in network flow
models and game theory.

11.3 Multivariate calculus and linear algebra


We first recall some useful facts from linear algebra.
The spectral theorem says that if A is a real symmetric matrix, then there
is an orthogonal matrix P (i.e., its columns are orthonormal) and a diagonal
matrix D such that A = P DP T . The diagonal of D contains the eigenvalues of
A, and A has an orthonormal set of eigenvectors (the columns of P ).
A real symmetric matrix is positive semidefinite2 if xT Ax ≥ 0 for all x ∈ Rn .
The following statements are equivalent

1. A is positive semi-definite
2. all eigenvalues of A are nonnegative
3. A = W T W for some matrix W .

Similarly, a real symmetric matrix is positive definite if xT Ax > 0 for all nonzero
x ∈ Rn . The following statements are equivalent.
1. A is positive definite

2. all eigenvalues of A are positive


3. A = W T W for some invertible matrix W .
2 See Section 7.2 in [25].
CHAPTER 11. THE BASICS AND APPLICATIONS 391

Every positive definite matrix is therefore invertible.


We also recall some central facts from multivariate calculus. They will be
used repeatedly in these notes. Let f : Rn → R be a real-valued function defined
on Rn . The gradient of f at x is the n-tuple
 
∂f (x) ∂f (x) ∂f (x)
∇f (x) = , ,..., .
∂x1 ∂x2 ∂xn
We will always identify an n-tuple with the corresponding column vector3 . Of
course, the gradient only exists if all the partial derivatives exist. Second order
information is contained in a matrix: assuming f has second order partial
derivatives we define the Hessian matrix4 ∇2 f (x) as the n × n matrix whose
(i, j)’th entry is

∂ 2 f (x)
.
∂xi ∂xj
If these second order partial derivatives are continuous, then we may switch the
order in the derivations, and ∇2 f (x) is a symmetric matrix.
For vector-valued functions we also need the derivative. Consider the vector-
valued function F given by
 
F1 (x)
 F2 (x) 
F (x) = 
 
.. 
 . 
Fn (x)
so Fi : R → R is the ith component function of F . F 0
n
denotes the Jacobi
matrix5 , or simply the derivative, of F
 ∂F1 (x) ∂F1 (x)
· · · ∂F∂x1 (x)

∂x1 ∂x2 n
 ∂F2 (x) ∂F2 (x) ∂F1 (x)
· · ·

∂x ∂x ∂x
F 0 (x) = 
 1 2 n

..

 
 . 
∂Fn (x) ∂Fn (x)
∂x1 ∂x2 · · · ∂F∂x
n (x)
n

The ith row of this matrix is therefore the gradient of Fi , now viewed as a row
vector.
Next we recall Taylor’s theorems from multivariate calculus6 :
Theorem 11.2. First order Taylor theorem.
3 This is somewhat different from [26], since the gradient there is always considered as a

row vector.
4 See Section 5.9 in [26].
5 See Section 2.6 in [26].
6 This theorem is also the mean value theorem of functions in several variables, see Section

5.5 in [26].
CHAPTER 11. THE BASICS AND APPLICATIONS 392

Let f : Rn → R be a function having continuous partial derivatives in some


ball B(x; r). Then, for each h ∈ Rn with khk < r there is some t ∈ (0, 1) such
that

f (x + h) = f (x) + ∇f (x + th)T h.

The next one is known as Taylor’s formula, or the second order Taylor’s
theorem7 :
Theorem 11.3. Second order Taylor theorem.
Let f : Rn → R be a function having second order partial derivatives that
are continuous in some ball B(x; r). Then, for each h ∈ Rn with khk < r there
is some t ∈ (0, 1) such that
1
f (x + h) = f (x) + ∇f (x)T h + hT ∇2 f (x + th)h.
2
This may be shown by considering the one-variable function g(t) = f (x + th)
and applying the chain rule and Taylor’s formula in one variable.
There is another version of the second order Taylor theorem in which the
Hessian is evaluated in x and, as a result, we get an error term. This theorem
shows how f may be approximated by a quadratic polynomial in n variables8 :

Theorem 11.4. Second order Taylor theorem, version 2.


Let f : Rn → R be a function having second order partial derivatives that
are continuous in some ball B(x; r). Then there is a function  : Rn → R such
that, for each h ∈ Rn with khk < r,
1
f (x + h) = f (x) + ∇f (x)T h + hT ∇2 f (x)h + (h)khk2 .
2
Here (y) → 0 when y → 0.
The first and second order Taylor approximations can thus be summarized
as follows:

f (x + h) = f (x) + ∇f (x)T h + O(khk)


1
f (x + h) = f (x) + ∇f (x)T h + hT ∇2 f (x)h + O(khk2 ).
2
We introduce the following notation for the approximations

Tf1 (x; x + h) = f (x) + ∇f (x)T h


1
Tf2 (x; x + h) = f (x) + ∇f (x)T h + hT ∇2 f (x)h.
2
7 See Section 5.9 in [26].
8 See Section 5.9 in [26].
CHAPTER 11. THE BASICS AND APPLICATIONS 393

As we shall see, one can get a lot of optimization out of these approximations!
We also need a Taylor theorem for vector-valued functions, which follows by
applying Taylor’ theorem above to each component function:
Theorem 11.5. First order Taylor theorem for vector-valued functions.
Let F : Rn → Rm be a vector-valued function which is continuously differen-
tiable in a neighborhood N of x. Then

F (x + h) = F (x) + F 0 (x)h + O(khk)


when x + h ∈ N .
Finally, if F : Rn → Rm and G : Rk → Rn the composition H(x) = F (G(x))
is a function from Rk to Rm . Under the natural differentiability assumptions
the following chain rule9 holds:

H 0 (x) = F 0 (G(x))G0 (x).


Here the right-hand side is a product of two matrices, the respective Jacobi
matrices evaluated in the right points.
Finally, we discuss some notions concerning the convergence of sequences.
Definition 11.6. Linear convergence.
We say that a sequence {xk }∞ ∗
k=1 converges to x linearly (or that the con-
vergence speed in linear) if there is a γ < 1 such that

kxk+1 − x∗ k ≤ γkxk − x∗ k (k = 0, 1, . . .).


A faster convergence rate is superlinear convergence which means that

lim kxk+1 − x∗ k/kxk − x∗ k = 0


k→∞

A special type of superlinear convergence is quadratic convergence where

kxk+1 − x∗ k ≤ γkxk − x∗ k2 (k = 0, 1, . . .)
for some γ < 1.

Exercise 11.1: Solve


Give an example of a function f : R → R with 10 global minima.

Exercise 11.2: Solve


Consider the function f (x) = x sin(1/x) defined for x > 0. Find its local minima.
What about global minimum?
9 See Section 2.7 in [26].
CHAPTER 11. THE BASICS AND APPLICATIONS 394

Exercise 11.3: Solve


Let f : X → R+ be a function (with nonnegative function values). Explain why
it is equivalent to minimize f over x ∈ X or minimize f 2 (x) over X.

Exercise 11.4: Solve


In Example 11.2 we mentioned that optimizing the function px (y) is equivalent
to optimizing the function ln px (y). Explain why maximizing/minimizing g is
the same as maximizing/minimizing ln g for any positive function g.

Exercise 11.5: Solve


Consider f : R2 → R given by f (x) = (x1 − 3)2 + (x2 − 2)2 . How would you
explain to anyone that x∗ = (3, 2) is a minimum point?

Exercise 11.6: Level sets


The level sets of a function f : R2 → R are sets of the form Lα = x ∈ R2 :
f (x) = α}. Let f (x) = 14 (x1 − 1)2 + (x2 − 3)2 . Draw the level sets in the plane
for α = 10, 5, 1, 0.1.

Exercise 11.7: Sub-level sets


The sub-level set of a function f : Rn → R is the set Sα (f ) = {x ∈ R2 : f (x) ≤
α}, where α ∈ R. Assume that inf{f (x) : x ∈ Rn } = η exists.
a) What happens to the sub-level sets Sα as α decreases? Give an example.
b) Show that if f is continuous and there is an x0 such that with α = f (x0 ) the
sub-level set Sα (f ) is bounded, then f attains its minimum.

Exercise 11.8: Portfolio optimization


Consider the portfolio optimization problem in Section 11.2.
a) Assume that cij = 0 for each i 6= j. Find, analytically, an optimal solution.
Describe the set of all optimal solutions.
b) Consider the special case where n = 2. Solve the problem and discuss how
minimum point depends on α.

Hint. Eliminate one variable.

Exercise 11.9: Solve


Later in these notes we will need the expression for the gradient of functions
which are expressed in terms of matrices.
CHAPTER 11. THE BASICS AND APPLICATIONS 395

a) Let f : Rn → R be defined by f (x) = q T x = xT q, where q is a vector. Show


that ∇f (x) = q, and that ∇2 f (x) = 0.
b) Let f : Rn → R be the quadratic function f (x) = (1/2)xT Ax, where A is
symmetric. Show that ∇f (x) = Ax, and that ∇2 f (x) = A.
c) Show that, with f defined as in b., but with A not symmetric, we obtain
that ∇f (x) = 12 (A + AT )x, and ∇2 f = 12 (A + AT ). Verify that these formulas
are compatibe with what you found in b. when A is symmetric.

Exercise 11.10: Solve


Consider f (x) = f (x1 , x2 ) = x21 + 3x1 x2 − 5x22 + 3. Determine the first order
Taylor approximation to f at each of the points (0, 0) and (2, 1).

Exercise 11.11: Solve


 
1&2
Let A = . Show that A is positive definite. (Try to give two different
2&8
proofs.)

Exercise 11.12: Solve


Show that if A is positive definite, then its inverse is also positive definite.
Chapter 12

A crash course in convexity

Convexity is a branch of mathematical analysis dealing with convex sets and


convex functions. It also represents a foundation for optimization.
We just summarize concepts and some results. For proofs one may consult
[11] or [49], see also [2].

12.1 Convex sets


A set C ⊆ Rn is called convex if (1 − λ)x + λy ∈ C whenever x, y ∈ C and
0 ≤ λ ≤ 1. Geometrically, this means that C contains the line segment between
each pair of points in C, so, loosely speaking, a convex set contains no “holes”.
For instance, the ball B(a; δ) = {x ∈ Rn : kx − ak ≤ δ} is a convex set. Let
us show this. Recall the triangle inequality which says that ku + vk ≤ kuk + kvk
whenever u, v ∈ Rn . Let x, y ∈ B(a; δ) and λ ∈ [0, 1]. Then

k((1 − λ)x + λy) − ak = k(1 − λ)(x − a) + λ(y − a)k


≤ k(1 − λ)(x − a)k + kλ(y − a)k
= (1 − λ)kx − ak + λky − ak
≤ (1 − λ)δ + λδ = δ.
Therefore B(a; δ) is convex.
Every linear subspace is also a convex set, as well as the translate of every
subspace (which is called an affine set). Some other examples of convex sets in
R2 are shown in Figure 12.1.
We will come back to why each of these sets are convex later. Another
important property is that the intersection of a family of convex sets is a convex
set.
By a linear system we mean a finite system of linear equations and/or linear
inequalities involving n variables. For example

x1 + x2 = 3, x1 ≥ 0, x2 ≥ 0

396
CHAPTER 12. A CRASH COURSE IN CONVEXITY 397

2 2 2

1 1 1

0 0 0

−1 −1 −1

−2 −2 −2
−2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2

x2
Figure 12.1: Examples of some convex sets. A square, the ellipse 4 + y 2 ≤ 1,
and the area x4 + y 4 ≤ 1.

is a linear system in the variables x1 , x2 . The solution set is the set of points
(x1 , 3 − x1 ) where 0 ≤ x1 ≤ 3. The set of solutions of a linear system is called a
polyhedron. These sets often occur in optimization. Thus, a polyhedron has the
form

P = {x ∈ Rn : Ax ≤ b}
where A ∈ Rm,n and b ∈ Rm (m is arbitrary, but finite) and ≤ means compo-
nentwise inequality. There are simple techniques for rewriting any linear system
in the form Ax ≤ b.
Proposition 12.1. Polyhedra are convex.
Every polyhedron is a convex set.

Proof. Assume that P is the polyhedron given by all points where Ax ≤ b.


Assume that x and y lie in P , so that Ax ≤ b, and Ay ≤ b. We then have that

A(λx + (1 − λ)y) = λAx + (1 − λ)Ay ≤ λb + (1 − λ)b = b.

This shows that λAx + (1 − λ)Ay also lies in P , so that P is convex.


The square from Figure 12.1(a) is defined by the inequalities −1 ≤ x, y ≤ 1.
It is therefore a polyhedron, and therefore convex. The next result shows that
convex sets are preserved under linear maps.

Proposition 12.2. Linear transformations of convex sets are convex.


If T : Rn → Rm is a linear transformation, and C ⊆ Rn is a convex set, then
the image T (C) of this set is also convex.

12.2 Convex functions


The notion of a convex function also makes sense for real-valued functions of
several variables. Consider a real-valued function f : C → R where C ⊆ Rn is a
convex set. We say that f is convex provided that
CHAPTER 12. A CRASH COURSE IN CONVEXITY 398

f ((1 − λ)x + λy) ≤ (1 − λ)f (x) + λf (y) (x, y ∈ C, 0 ≤ λ ≤ 1) (12.1)

(This inequality holds for all x, y and λ as specified). Due to the convexity of
C, the point (1 − λ)x + λy lies in C, so the inequality is well-defined. The
geometrical interpretation in one dimension is that, for any x, y, the graph of
f on [x, y] lies below the secant through (x, f (x)) and (y, f (y)). For z ∈ (x, y),
since f (z) lies below that secant, the secant through (x, f (x)) and (z, f (z)) has
a smaller slope than the secant through (x, f (x)) and (y, f (y)). Since the slope
of the secant through (x, f (x)) and (y, f (y)) is (f (y) − f (x))/(y − x), it follows
that the slope function

f (y) − f (x)
gx (y) =
y−x
is increasing for any x. This characterizes all convex functions in one dimension
in terms of slope functions.
A function g is called concave if −g is convex.
For every linear function we have that f ((1−λ)x+λy) = (1−λ)f (x)+λf (y),
so that every linear function is convex. Some other examples of convex functions
in n variables are

• f (x) = L(x) + α where L is a linear function from Rn into R (a linear


functional) and α is a real number. In fact, for such functions we have
that f ((1 − λ)x + λy) = (1 − λ)f (x) + λf (y), just as for linear functions.
Functions on the form f (x) = L(x) + α are called affine functions, and
may be written on the form f (x) = cT x + α for a suitable vector c.
• f (x) = kxk (Euclidean norm). That this is convex can be proved by
writing k(1 − λ)x + λyk ≤ k(1 − λ)xk + kλyk = (1 − λ)kxk + λkyk. In
fact, the same argument can be used to show that every norm defines
an example is the l1 -norm, also called the sum
a convex function. Such P
n
norm, defined by kxk1 = j=1 |xj |.

• f (x) = eh(x) where h : Rn → R is a convex function (exercise 12.4 gives a


more general result).
• f (x) = maxi gi (x) where gi : Rn → R is an affine function (i ≤ m). This
means that the pointwise maximum of affine functions is a convex function.
Note that such convex functions are typically not differentiable everywhere.
A more general result is that the pointwise supremum of an arbitrary
family of affine functions (or even convex functions) is convex. This is a
very useful fact in convexity and its applications.

The following result is an exercise to prove, and it gives a method for proving
convexity of a function.
CHAPTER 12. A CRASH COURSE IN CONVEXITY 399

Proposition 12.3. Composition of convex and affine maps are convex.


Assume that f : Rn → R is convex and H : Rm → Rn is affine. Then the
composition f (H(x)) is convex.
Pn
x
As a consequence, f (x) = e j=1 j is convex. The next result is often used,
and is called Jensen’s inequality. It can be proved using induction.
Theorem 12.4. Jensen’s inequality.
n
Let f : C → R be a convex function defined Pr on a convex set C ⊆ R . If
1 2 r
x , x . . . , x ∈ C and λ1 , . . . , λr ≥ 0 satisfy j=1 λj = 1, then

Xr r
X
f( λj xj ) ≤ λj f (xj ). (12.2)
j=1 j=1
Pr
A point of the form j=1 λj xj , where the λj ’s are nonnegative and sum to
1, is called a convex combination of the points x1 , x2 . . . , xr . One can show that
a set is convex if and only if it contains all convex combinations of its points.
Finally, one connection between convex sets and convex functions is the
following fact whose proof is an exercise.
Proposition 12.5. sub-level sets of convex functions are convex.
Let C ⊆ Rn be a convex set and consider a convex function f : C → R. Let
α ∈ R. Then the “sub-level” set

{x ∈ C : f (x) ≤ α}
is a convex set.

12.3 Properties of convex functions


A convex function may not be differentiable in every point. However, one can
show that a convex function always has one-sided directional derivatives at any
point. But what about continuity? To address this, we will first establishthe
following lemma.
Lemma 12.6. Convex functions on unit balls.Pn
Define the norm k · k1 on Rn by kxk1 = i=1 |xi |, and let B1 (x, r) consist
of all points z where kz − xk1 ≤ r. Any convex function f defined on B1 (x, r)
has a maximum in B1 (x, r), and this maximum is achieved in at least one of the
points x ± rei .
Proof. Since z → z − x is affine (and affine mappings preserve convexity), we
can assume without
Pn loss of generality that x = 0. If z = (z1 , ..., zn ) ∈ B1 (0, r)
we have that i=1 |zi | ≤ r, and we can write

Pn
(r − i=1 |zi |) |z1 | |zn |
z= 0+ sign(z1 )re1 + · · · + sign(zn )ren .
r r r
CHAPTER 12. A CRASH COURSE IN CONVEXITY 400

This shows that any point in B1 (0, r) can be written as a convex combination
of the points 0, {±rei }ni=1 . Labeling these as y1 , y2 ,...,y2n+1 and using the
convexity of f we obtain
2n+1
! 2n+1
X X
f (z) = f λi yi ≤ λi f (yi ) ≤ max f (yi ),
i
i=1 i=1

which proves that f has a maximum on B1 (0, r), and this maximum is achieved
in one of the yi . Since
 
1 1 1 1
f (0) = f rei + (−rei ) ≤ f (rei ) + f (−rei ),
2 2 2 2
this maximum must be achieved in a point of the form ±rei . The result
follows.
Theorem 12.7. Convex functions are continuous on open sets.
Let f : C → R be a convex function defined on an open set C ⊆ Rn . Then f
is continuous on C.
Proof. Let x be in C, and let us show that f is continuous at x. Since C is open
we can find an r so that B(x, r) ⊂ C. We claim first that we can assume that f
is bounded from above on B(x, r). To prove this, note first that kxk1 > kxk,
so that B1 (x, r) ⊂ B(x, r). On the other hand B(x, r/n) ⊂ B1 (x, r). Using
Lemma 12.6 we see that f is also bounded from above on a set of the form
B(x, s) (choose s = r/n for instance).
Assume now that f (y) ≤  M on B(x, r), and let z ∈ B(x, r). Define the
z−x
function g(t) = f x + t kz−xk for t ∈ (−r, r). Note that g(kz − xk) = f (z).
H(t) = x + (z − x)/kz − xk takes its values in B(x, r), and since f is convex
and H is affine, g(t) = f (H(t)) is convex, and then g has an increasing slope
function s → (g(s) − g(t))/(s − t). In particular, with s = −r, kz − xk, r and
t = 0 we obtain

g(−r) − g(0) g(kz − xk) − g(0) g(r) − g(0)


≤ ≤ .
−r kz − xk r
The expression in the middle can be written as (f (z) − f (x))/kz − xk. Since
g(t) is bounded above by M , −M ≤ −g(−r) and g(r) ≤ M , so that

−M + f (x) f (z) − f (x) M − f (x)


≤ ≤ .
r kz − xk r
From this it follows that

|M | + |f (x)|
|f (z) − f (x)| ≤ kz − xk,
r
and the continuity of f follows.
CHAPTER 12. A CRASH COURSE IN CONVEXITY 401

However, a convex function may be discontinuous in points on the boundary


of its domain. For instance, the function f : [0, 1] → R given by f (0) = 1 and
f (x) = 0 for x ∈ (0, 1] is convex, but discontinuous at x = 0. Next we give a
useful technique for checking that a function is convex.
Theorem 12.8. Convex functions and positive semidefinite Hessians.
Let f be a real-valued function defined on an open convex set C ⊆ Rn and
assume that f has continuous second-order partial derivatives on C.
Then f is convex if and only if the Hessian matrix ∇2 f (x) is positive
semidefinite for each x ∈ C.
Using Theorem 12.8 it is straightforward to prove that the remaining sets from
Figure 12.1 are convex. They can be written as sub-level sets of the functions
2
f (x, y) = x4 + y 2 , and f (x, y) = x4 + y 4 . For the first of these the level sets are
ellipses, and are shown in Figure 12.2, together with f itself. One can quickly
verify that the Hessian matrices of these functions are positive semidefinite. It
follows from Proposition 12.5 that the corresponding sets are convex.

6
1
4

2 0

0
2 −1
2
0 0
−2 −2 −2
−2 −1 0 1 2

x2
Figure 12.2: The function f (x, y) = 4 + y 2 and some of its level curves.

An important class of convex functions consists of (certain) quadratic func-


tions. Let A ∈ Rn×n be a symmetric matrix which is positive semidefinite and
consider the quadratic function f : Rn → R given by

X n
X
T T
f (x) = (1/2) x Ax − b x = (1/2) aij xi xj − bj xj .
i,j j=1

(If A = 0, then the function is linear, and it may be strange to call it quadratic.
But we still do this, for simplicity.) Then (Exercise 11.9) the Hessian matrix
of f is A, i.e., ∇2 f (x) = A for each x ∈ Rn . Therefore, by Theorem 12.8 is a
convex function.
We remark that sometimes it may be easy to check that a symmetric matrix
P A (real) symmetric n × n matrix A is called diagonally
A is positive semidefinite.
dominant if |aii | ≥ j6=i |aij | for i = 1, . . . , n. These matrices arise in many
CHAPTER 12. A CRASH COURSE IN CONVEXITY 402

applications, e.g. splines and differential equations. It can be shown that every
symmetric diagonally dominant matrix is positive semidefinite. For a simple
proof of this fact using convexity, see [10]. Thus, we get a simple criterion
for convexity of a function: check if the Hessian matrix ∇2 f (x) is diagonally
dominant for each x. Be careful here: this matrix may be positive semidefinite
without being diagonally dominant!
We now look at differentiability properties of convex functions.
Theorem 12.9. Convexity, partial derivatives and differentiability.
Let f be a real-valued convex function defined on an open convex set C ⊆ Rn .
Assume that all the partial derivatives ∂f (x)/∂x1 , . . . , ∂f (x)/∂xn exist at a
point x ∈ C. Then f is differentiable at x.

A convex function may not be differentiable everywhere, but it is differentiable


“almost everywhere”. More precisely, for a convex function defined on an open
convex set in Rn , the set of points for which f is not differentiable has Lebesgue
measure zero. We do not go into further details on this here, but refer to e.g. [19]
for a proof and a discussion.
Another characterization of convex functions that involves the gradient may
now be presented.
Theorem 12.10. Equivalent conditions for convex functions.
Let f : C → R be a differentiable function defined on an open convex set
C ⊆ Rn . Then the following conditions are equivalent:

1. f is convex.
2. f (x) ≥ f (x0 ) + ∇f (x0 )T (x − x0 ) for all x, x0 ∈ C.
3. (∇f (x) − ∇f (x0 ))T (x − x0 ) ≥ 0 for all x, x0 ∈ C.

This theorem is important. Property 2 says that the first-order Taylor


approximation of f at x0 (which is the right-hand side of the inequality) always
underestimates f . This result has interesting consequences for optimization as
we shall see later.
Proof. Assume first that n = 1. If f is convex we have that

f (x0 + t(x − x0 )) = f ((1 − t)x0 + tx) ≤ (1 − t)f (x0 ) + tf (x),


which can be rewritten as

f (x0 + t(x − x0 )) − f (x0 ) f (x0 + t(x − x0 )) − f (x0 )


f (x) ≥ f (x0 )+ = f (x0 )+ (x−x0 ).
t t(x − x0 )

Taking the limit as t → 0 shows that (ii) holds. (iii) follows from (ii) by adding
the two equations
CHAPTER 12. A CRASH COURSE IN CONVEXITY 403

f (x) ≥ f (x0 ) + ∇f (x0 )T (x − x0 )


f (x0 ) ≥ f (x) + ∇f (x)T (x0 − x)

and reorganizing the terms (actually this holds for any n). (iii) says that the
derivative is increasing. Given x1 < x2 < x3 , the mean value theorem says that
there exist x1 ≤ c ≤ x2 , x2 ≤ d ≤ x3 , so that

f (x2 ) − f (x1 ) f (x3 ) − f (x2 )


= f 0 (c), and = f 0 (d).
x2 − x1 x3 − x2
Since f 0 (c) ≤ g 0 (d), the slope of the secant from x1 to x2 is smaller than the slope
of the secant from x2 to x3 . But then clearly the slope function is increasing, so
that f is convex. This completes the proof for n = 1.
When n > 1, define g(t) = f (tx + (1 − t)x0 ). If f is convex, then g is
also convex, and from (ii) it follows that g(1) ≥ g(0) + g 0 (0). The chain rule
then gives that g 0 (0) = ∇f (x0 )T (x − x0 ), and (ii) follows since g(0) = f (x0 ),
g(1) = f (x).
If we also show that (iii) implies (i) the proof will be complete. Let 0 ≤ t1 ≤
t2 < 1, and define yi = ti x + (1 − ti )x0 for i = 1, 2. Note first that (iii) is the
same as

∇f (x0 ))T (x − x0 ) ≤ ∇f (x))T (x − x0 ).


We have that y2 − y1 = (t2 − t1 )(x − x0 ), and we get

g 0 (ti ) = ∇f (yi )T (x − x0 ) = ∇f (yi )T (y2 − y1 )/(t2 − t1 ).


Since (iii) also holds if we replace x0 , x with y1 , y2 , it follows that g 0 (t1 ) ≤ g 0 (t2 ),
and it follows that g is convex. From this it also follows that f is convex, since

g(t) = f ((1−t)x0 +tx) = g((1−t)·0+t·1) ≤ (1−t)g(0)+tg(1) = (1−t)f (x0 )+tf (x).

Exercise 12.1: The intersection of convex sets is convex.


We recall that A ∩ B consists of all points which lie both in A and B. Show that
A ∩ B is convex when A and B are.

Exercise 12.2: Solve


Suppose that f is a convex function defined on R which also is positive. Show
that g(x) = (f (x))n also is convex.
CHAPTER 12. A CRASH COURSE IN CONVEXITY 404

Exercise 12.3: The convexity of a product of functions.


a) Assume that f , g are convex, positive, and increasing functions, both two
times differentiable and defined on R. Compute the second derivative of h(x) =
f (x)g(x), consider its sign, and prove from this that f (x)g(x) is convex.
This can also be generalized to functions which are not differentiable. For
this we first need the following result.
b) Show that, for any functions f, g defined on R which are convex, positive,
and increasing, we have that

λf (x)g(x) + (1 − λ)f (y)g(y) − f (λx + (1 − λ)y)g(λx + (1 − λ)y)


≥ λ(1 − λ)(f (x)g(x) + f (y)g(y) − f (x)g(y) − f (y)g(x))

c) Explain why it follows from b) that f (x)g(x) is convex, under the same
conditions on f and g.

Hint. Start by writing f (x)g(x) + f (y)g(y) − f (x)g(y) − f (y)g(x) as a product.

Exercise 12.4: The convexity of the composition of func-


tions.
a) Let f and g both be two times (continuously) differentiable functions both-
defined on R. Suppose also that f and g are convex, and that f is increasing.
Compute the second derivative of h(x) = f (g(x)), consider its sign, and deduce
from this that f (g(x)) is convex. This states that, in particular the function
f (x) = eh(x) (which we previously just stated as convex without proof), is convex
when h is.
b) Construct two convex functions f, g so that h(x) = f (g(x)) is not convex.
The result from a) holds also when f and g are not differentiable. In fact, g
can be defined on Rn :
c) Let f and g be convex functions, and suppose that f is increasing and defined
on R, g defined on Rn . Show that h(x) = f (g(x) also is convex.

Exercise 12.5: Solve


Let S = {(x, y, z) : z ≥ x2 + y 2 } ⊂ R3 . Sketch the set and verify that it is a
convex set.

Exercise 12.6: Solve


Let f : S → R be a differentiable function, where S is an open set in R. Check
that f is convex if and only if f 00 (x) ≥ 0 for all x ∈ S.
CHAPTER 12. A CRASH COURSE IN CONVEXITY 405

Exercise 12.7: Solve


Prove Proposition 12.3.

Exercise 12.8: Solve


Prove Proposition 12.5.

Exercise 12.9: Solve


Explain how you can write the LP problem max {cT x : Ax ≥ b, Bx = d, x ≥ 0}
as an LP problem of the form

max{cT x : Hx ≤ h, x ≥ 0}
for suitable matrix H and vector h.

Exercise 12.10: The set of convex combinations is convex


Let x1 , . . . , xt ∈ Rn and let C be the set of vectors of the form
t
X
λj xj
j=1
Pt
where λj ≥ 0 for each j = 1, . . . , t, and j=1 λj = 1. Show that C is convex.
Make a sketch of such a set in R3 .

Exercise 12.11: Solve


Assume that f and g are convex functions defined on an interval I. Which of
the following functions are convex or concave?
a) λf where λ ∈ R,
b) min{f, g},
c) |f |.

Exercise 12.12: A convex function defined on a closed real


interval attains its maximum in one of the end points.
Let f : [a, b] → R be a convex function. Show that

max{f (x) : x ∈ [a, b]} = max{f (a), f (b)}.


CHAPTER 12. A CRASH COURSE IN CONVEXITY 406

Exercise 12.13: The maximum of convex functions is con-


vex.
Show that max{f, g} is a convex function when f and g are convex (we define
max{f, g} by max{f, g}(x) = max(f (x), g(x)))).

Exercise 12.14: Solve


Let f : h0, ∞i → R and define the function g : h0, ∞i → R by g(x) = xf (1/x).
Why is the function x → xe1/x convex?

Exercise 12.15: The distance to a convex set is a convex


function.
Let C ⊆ Rn be a convex set and consider the distance function dC defined by
dC (x) = inf{kx − yk : y ∈ C}. Show that dC is a convex function.
Chapter 13

Nonlinear equations

A basic mathematical problem is to solve a system of equations in several


unknowns (variables). There are numerical methods that can solve such equations,
at least within a small error tolerance. We shall briefly discuss such methods
here; for further details, see [24, 32].

13.1 Equations and fixed points


In linear algebra one works a lot with linear equations in several variables, and
Gaussian elimination is a central method for solving such equations. There
are also other faster methods, so-called iterative methods, for linear equations.
But what about nonlinear equations? For instance, consider the system in two
variables x1 and x2 :

x21 − x1 x−3
2 + cos x1 = 1
5x41+ 2x31 − tan(x1 x82 ) = 3
Clearly, such equations can be very hard to solve. The general problem is to
solve the equation

F (x) = 0 (13.1)
for a given function F : Rn → Rn . If F (x) = 0 we call x a root of F
(or of the equation). The example above is equivalent to finding roots in
F (x) = (F1 (x), F2 (x)) where

F1 (x) = x21 − x1 x−3


2 + cos x1 − 1
F2 (x) = 5x41 + 2x31 − tan(x1 x82 ) − 3
In particular, if F (x) = Ax − b where A is an n × n matrix and b ∈ Rn , then we
are back to linear equations (a square system). More generally one may consider
equations G(x) = 0 where G : Rn → Rm , but we here only discuss the case
m = n.

407
CHAPTER 13. NONLINEAR EQUATIONS 408

Often the problem F (x) = 0 has the following form, or may be rewritten to
it:

K(x) = x. (13.2)
for some function K : Rn → Rn . This corresponds to the special choice
F (x) = K(x) − x. A point x ∈ Rn such that x = K(x) is called a fixed point of
the function K. In finding such a fixed point it is tempting to use the following
iterative method: choose a starting point x0 and repeat the following iteration

xk+1 = K(xk ) for k = 1, 2, . . . (13.3)


This is called a fixed-point iteration. We note that if K is continuous and this
procedure converges to some point x∗ , then x∗ must be a fixed point. The fixed-
point iteration is an extremely simple algorithm, and very easy to implement.
Perhaps surprisingly, it also works very well for many such problems.
When does the fixed-point iteration work? Let k · k be a fixed norm, e.g. the
Eulidean norm, on Rn . We say that the function K : Rn → Rn is a contraction
if there is a constant 0 ≤ c < 1 such that

kK(x) − K(y)k ≤ ckx − yk (x, y ∈ Rn ).


We also say that K is c-Lipschitz in this case. The following theorem is called
the Banach contraction principle. It also holds in Banach spaces, i.e., complete
normed vector spaces (possibly infinite-dimensional).
Theorem 13.1. Banach contraction principle.
Assume that K is c-Lipschitz with 0 < c < 1. Then K has a unique fixed
point x∗ . For any starting point x0 the fixed-point iteration (13.3) generates a
sequence {xk }∞ ∗
k=0 that converges to x . Moreover

kxk+1 − x∗ k ≤ ckxk − x∗ k for k = 0, 1, . . . (13.4)


so that

kxk − x∗ k ≤ ck kx0 − x∗ k.
Proof. First, note that if both x and y are fixed points of K, then

kx − yk = kK(x) − K(y)k ≤ ckx − yk


which means that x = y (as c < 1); therefore K has at most one fixed point.
Next, we compute

kxk+1 − xk k = kK(xk ) − K(xk−1 )k ≤ ckxk − xk−1 k = · · · ≤ ck kx1 − x0 k

so
CHAPTER 13. NONLINEAR EQUATIONS 409

Pm−1 Pm−1
kxm − x0 k = k k=0 (xk+1 − xk )k ≤ k=0 kxk+1 − xk k
Pn−1 k
≤ ( k=0 c )kx1 − x0 k ≤ (1/(1 − c))kx1 − x0 k
From this we derive that {xk } is a Cauchy sequence; as we have

kxs+m − xs k = kK(xs+m−1 ) − K(xs−1 )k ≤ ckxs+m−1 − xs−1 k = · · ·


≤ cs kxm − x0 k ≤ (cs /(1 − c))kx1 − x0 k.

and 0 < c < 1. Any Cauchy sequence in Rn has a limit point, so xm → x∗ for
some x∗ ∈ Rn . We now prove that the limit point x∗ is a (actually, the) fixed
point:

kx∗ − K(x∗ )k ≤ kx∗ − xm k + kxm − K(x∗ )k


= kx∗ − xm k + kK(xm−1 ) − K(x∗ )k
≤ kx∗ − xm k + ckxm−1 − x∗ k
and letting m → ∞ here gives kx∗ − K(x∗ )k ≤ 0 so x∗ = K(x∗ ) as desired.
Finally,

kxk+1 − x∗ k = kK(xk ) − K(x∗ )k ≤ ckxk − x∗ k ≤ ck+1 kx0 − x∗ k.

which completes the proof.


We see that xk → x∗ linearly, and that Equation (13.4) gives an estimate on
the convergence speed.

13.2 Newton’s method


We return to the main problem (13.1). Our goal is to present Newton’s method, a
highly efficient iterative method for solving this equation. The method constructs
a sequence

x0 , x1 , x2 , . . .
in R which, hopefully, converges to a root x∗ of F , so F (x∗ ) = 0. The idea is
n

to linearize F at the current iterate xk and choose the next iterate xk+1 as a
zero of this linearized function. The first order Taylor approximation of F at xk
is

TF1 (xk ; x) = F (xk ) + F 0 (xk )(x − xk ).


We solve TF1 (xk ; x) = 0 for x and define the next iterate as xk+1 = x. This
gives

xk+1 = xk − F 0 (xk )−1 F (xk ) (13.5)


CHAPTER 13. NONLINEAR EQUATIONS 410

which leads to Newton’s method. One here assumes that the derivative F 0 is
known analytically. Note that we do not (and hardly ever do!) compute the
inverse of the matrix F 0 . In the main step, which is to compute p, one needs
to solve an n × n linear system of equations where the coefficient matrix is the
Jacobi matrix of F , evaluated at xk . In MAT1110 [26] we implemented the
following code for Newton’s method for nonlinear equations:

function x=newtonmult(x0,F,J)
% Performs Newtons method in many variables
% x: column vector which contains the start point
% F: computes the values of F
% J: computes the Jacobi matrix
epsilon=0.0000001; N=30; n=0;
x=x0;
while norm(F(x)) > epsilon && n<=N
x=x-J(x)\F(x);
fval = F(x);
%fprintf(’itnr=%2d x=[%13.10f,%13.10f] F(x)=[%13.10f,%13.10f]\n’,...
% n,x(1),x(2),fval(1),fval(2))
n = n + 1;
end

This code also terminates after a given number of iterations, and when a given
accuracy is obtained. Note that this function should work for any function F ,
since it is a parameter to the function.
The convergence of Newton’s method may be analyzed using fixed point
theory since one may view Newton’s method as a fixed point iteration. Observe
that the Newton iteration (13.5) may be written

xk+1 = G(xk )
where G is the function

G(x) = x − F 0 (x)−1 F (x)


From this it is possible to show that if the starting point is sufficiently close to
the root, then Newton’s method will converge to this root at a linear convergence
rate. With more clever arguments one may show that the convergence rate
of Newton’s method is even faster: it has superlinear convergence. Actually,
for many functions one even has quadratic convergence rate. The proof of the
following convergence theorem relies purely on Taylor’s theorem.
Theorem 13.2. Convergence of the Newton method is superlinear.
Assume that Newton’s method with initial point x0 produces a sequence
{xk }∞ ∗
k=0 which converges to a solution x of (13.1). Then the convergence rate
is superlinear.
Proof. From Taylor’s theorem for vector-valued functions, Theorem 11.5, in the
point xk we have

0 = F (x∗ ) = F (xk + (x∗ − xk )) = F (xk ) + F 0 (xk )(x∗ − xk ) + O(kxk − x∗ k)


CHAPTER 13. NONLINEAR EQUATIONS 411

Multiplying this equation by F 0 (xk )−1 (which is assumed to exist!) gives

xk − x∗ − F 0 (xk )−1 F (xk ) = O(kxk − x∗ k)


Combining this with the Newton iteration xk+1 = xk − F 0 (xk )−1 F (xk ) we get

xk+1 − x∗ = O(kxk − x∗ k).


So

lim kxk+1 − x∗ k/kxk − x∗ k = 0


k→∞

This proves the superlinear convergence.


The previous result is interesting, but it does not say how near to the root
the starting point need to be in order to get convergence. This is the next topic.
Let F : U → Rn where U is an open, convex set in Rn . Consider the conditions
on the derivative F 0

(i) kF 0 (x) − F 0 (y)k2 ≤ Lkx − yk for all x, y ∈ U


(13.6)
(ii) kF 0 (x0 )k2 ≤ K for some x0 ∈ U

where K and L are some constants. Here kF 0 (x0 )k2 denotes the spectral norm
of the square matrix F 0 (x0 ). For a square matrix A this is defined by

kAk2 = max kAxk.


kxk=1

It is a fact that kAk2 is equal to the largest singular value of A, and that it
measures how much the operator F 0 (x0 ) may increase the size of vectors. The
following convergence result for Newton’s method is known as Kantorovich’
theorem.
Theorem 13.3. Kantorovich’ theorem.
Let F : U → Rn be a differentiable function satisfying (13.6). Assume that
B̄(x0 ; 1/(KL)) ⊆ U and that

kF 0 (x0 )−1 F (x0 )k ≤ 1/(2KL).


Then F 0 (x) is invertible for all x ∈ B(x0 ; 1/(KL)) and Newton’s method with
initial point x0 will produce a sequence {xk }∞ k=0 contained in B(x0 ; 1/(KL))
and limk→∞ xk = x∗ for some limit point x∗ ∈ B̄(x0 ; 1/(KL)) with

F (x∗ ) = 0.
A proof of this theorem is quite long (but not very difficult to understand)
[26].
One disadvantage with Newton’s method is that one needs to know the
Jacobi matrix F 0 explicitly. For complicated functions, or functions being the
CHAPTER 13. NONLINEAR EQUATIONS 412

output of a simulation, the derivative may be hard or impossible to find. The


quasi-Newton method, also called the secant-method, is then a good alternative.
The idea is to approximate F 0 (xk ) by some matrix Bk and to compute the new
search direction from

Bk p = −F (xk )
The method we define will make the following assumption:
Definition 13.4. Broyden’s method.
Assume that we have chosen the next iterate xk+1 . Broyden’s method updates
Bk to Bk+1 in such a way that

Bk+1 (xk+1 − xk ) = F (xk+1 ) − F (xk ), (13.7)


and so that Bk+1 u = Bk u for all u orthogonal to xk+1 − xk .
Equation (13.7) is close to true if we replace Bk+1 with F 0 (xk ) (due to the
Taylor series of first order), so this is a reasonable assumption to make. The
assumption that Bk+1 acts as Bk on vectors orthogonal to sk comes from that
the only new information we need to encapsulate in Bk+1 is given by Equation
(13.7).
It is straightforward to find an expression for Bk+1 . Define sk = xk+1 − xk
and yk = F (xk+1 ) − F (xk ). We require that Bk+1 sk = yk . The projection
onto the space spanned by sk is given by the matrix sk sTk /sTk sk , and the
projection onto the orthogonal complement of this space is I − sk sTk /sTk sk . Since
B = yk sTk /sTk sk satisfies Bsk = yk and Bu = 0 for all vectors in the orthogonal
complement, we have that

sk sT yk sT (yk − Bk sk )sTk
 
Bk+1 = Bk I − T k + T k = Bk + . (13.8)
sk sk sk sk sTk sk
Note that the matrix in Equation (13.8) is a rank one update of Bk , so that
it can be computed efficiently. In an algorithm for Broyden’s method Bk+1 is
computed from Equation (13.8), then xk+2 is computed by following the search
direction p obtained by solving Bk+1 p = −F (xk+1 ), and so on. Finally sk+1
and yk+1 are updated. An algorithm also computes an α through what we call a
line search, to attempt to find the optimal distance to follow the search direction.
We do not here specify how this line search can be performed. Also, we do
not specify how the initial values can be chosen. For B0 , any approximation of
the Jacobian of F at x0 can be used, using a numerical differentiation method
of your own choosing. One can show that Broyden’s method, under certain
assumptions, also converges superlinearly, see [32].

Exercise 13.1: Solve


Show that the problem of solving nonlinear equations (13.1) may be transformed
into a nonlinear optimization problem.
CHAPTER 13. NONLINEAR EQUATIONS 413

Hint. Square each component function and sum these up!

Exercise 13.2: Solve


Let T : R → R be given by T (x) = (3/2)(x − x3 ). Draw the graph of this
function, and determine its fixed points. Let x∗ denote the largest fixed point.
Find, using your graph, an interval I containing x∗ such that the fixed point
algorithm with an initial point in I will guaranteed converge
p towards x∗ . Then
try the fixed point algorithm with starting point x0 = 5/3.

Exercise 13.3: Solve



Let α ∈ R+ be fixed, and consider f (x) = x2 − α. Then the zeros are ± α.
Write down the Newton’s iteration for this problem. Let α = 2 and compute the
first three iterates in Newton’s method when x0 = 1.

Exercise 13.4: Solve


For any vector norm k · k on Rn , we can more generally define a corresponding
operator norm for n × n matrices by

kAk = sup kAxk.


kxk=1

a) Explain why this supremum is attained.


PnIn the rest nof this exercise we will use the vector norm kxk = kxk1 =
j=1 |xj | on R .

b) For n = 2, draw the sub-level set {x ∈ R2 : kxk1 ≤ 1}.


c) Show that f (x) = kAxk1 is convex for any n. It follows from Lemma 12.6
that the maximum of f on the set {x : kxk1 = 1} is attained in a point on the
form ±ek .
Pn
d) Show that, for any n × n-matrix A, kAk = supk i=1 |aik |, where aij are the
entries of A (i.e. the biggest sum of absolute values in a column).

Exercise 13.5: Solve


Consider a linear map T : Rn → Rm given by T (x) = Ax where A is an n × n
matrix. When is T a contraction w.r.t. the vector norm k · k1 ?

Exercise 13.6: Solve


Test the function newtonmult on the equations given initially in Section 13.1.

Exercise 13.7: Broyden’s method


In this exercise we will implement Broyden’s method.
CHAPTER 13. NONLINEAR EQUATIONS 414

a) Given a value x0 , implement a function which computes an estimate of F 0 (x0 )


by estimating the partial derivatives of F , using a numerical differentiation
method and step size of you own choosing.
b) Implement a function

function x=broyden(x0,F)

which returns an estimate of a zero of F using Broyden’s method. Your method


should set B0 to be the matrix obtained from the function in a. Just indicate
where line search along the search direction should be performed in your function,
without implementing it. The function should work as newtonmult in that it
terminates after a given number of iterations, or after precision of a given
accuracy has been obtained.

Exercise 13.8: Solve


The Frobenius norm is defined by
v
n
um X
uX
kAkF = t a2ij
i=1 j=1

for any matrix A. Show that kABkF ≤ kAkF kBk2 whenever the matrix product
AB is well-defined.

Exercise 13.9: Solve


Show that kvwT k2 = kvkkwk for any column vectors v and w.

Exercise 13.10: Solve


Show that the matrix Bk+1 obtained by Broyden’s method (Equation (13.8))
minimizes f (B) = kB − Bk kF subject to the constraint Bsk = yk .
Note that the Frobenius norm is not the only norm for which this result holds.
It is chosen since it is sensitive to changes in all components in the same way.
Chapter 14

Unconstrained optimization

How can we know whether a given point x∗ is a minimum, local or global, of


some given function f : Rn → R? And how can we find such a point x∗ ?
These are, of course, some main questions in optimization. In order to give
good answers to these questions we need optimality conditions. They provide
tests for optimality, and serve as the basis for algorithms. We here focus on
differentiable functions; the corresponding results for the nondifferentiable case
are more difficult (but they exist, and are based on convexity, see [19, 41]).
For unconstrained problems it is not difficult to find powerful optimality
conditions from Taylor’s theorem for functions in several variables.

14.1 Optimality conditions


In order to establish optimality conditions in unconstrained optimization, Taylor’s
theorem is the starting point, see Section 11.3. We only consider mini-mization
problems, as maximization problems are turned into minimization problems by
multiplying the function f by −1.
First we look at some necessary optimality conditions.
Theorem 14.1. Minima are stationary points.
Assume that f : Rn → R has continuous partial derivatives, and assume that

x is a local minimum of f . Then

∇f (x∗ ) = 0. (14.1)
If, moreover, f has continuous second order partial derivatives, then ∇ f (x∗ ) is
2

positive semidefinite.
Proof. Assume that x∗ is a local minimum of f and that ∇f (x∗ ) 6= 0. Let
h = −α∇f (x∗ ) where α > 0. Then ∇f (x∗ )T h = −αk∇f (x∗ )k2 < 0 and
by continuity of the partial derivatives of f , ∇f (x)T h < 0 for all x in some
neighborhood of x∗ . From Theorem 11.2 (first order Taylor) we obtain

415
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 416

f (x∗ + h) − f (x∗ ) = ∇f (x∗ + th)T h (14.2)


for some t ∈ (0, 1) (depending on α). By choosing α small enough, the right-hand
side of (14.2) is negative (as just said), and so f (x∗ + h) < f (x∗ ), contradicting
that x∗ is a local minimum. This proves that ∇f (x∗ ) = 0.
To prove the second statement, we get from Theorem 11.3 (second order
Taylor)

1
f (x∗ + h) = f (x∗ ) + ∇f (x∗ )T h + hT ∇2 f (x∗ + th)h
2
1
= f (x∗ ) + hT ∇2 f (x + th)h (14.3)
2
If ∇2 f (x∗ ) is not positive semidefinite, there is an h such that hT ∇2 f (x∗ )h < 0
and, by continuity of the second order partial derivatives, hT ∇2 f (x)h < 0 for
all x in some neighborhood of x∗ . But then (14.3) gives f (x∗ + h) − f (x∗ ) < 0;
a contradiction. This proves that ∇2 f (x) is positive semidefinite.
The two necessary optimality conditions in Theorem 14.1 are called the first-
order and the second-order conditions, respectively. The first-order condition
says that the gradient must be zero at x∗ , and such a point if often called a
stationary point. The second-order condition may be interpreted by f being
"convex locally" at x∗ , although this is not a precise term. A stationary point
which is neither a local minimum or a local maximum is called a saddle point.
So, every neighborhood of a saddle point contains points with larger and points
with smaller f -value.
Theorem 14.1 gives a connection to nonlinear equations. In order to find a
stationary point we may solve ∇f (x) = 0, which is a n × n (usually nonlinear)
system of equations. (The system is linear whenever f is a quadratic function.)
One may solve this equation, for instance, by Newton’s method and thereby
get a candidate for a local minimum. Sometimes this approach works well,
in particular if f has a unique local minimum and we have an initial point
"sufficiently close". However, there are other better methods which we discuss
later.
It is important to point out that any algorithm for finding a minimum of f
has to be able to find a stationary point. Therefore algorithms in this area are
typically iterative and move to gradually better points where the norm of the
gradient becomes smaller, and eventually almost equal to zero.

A simple example. Consider a convex quadratic function

f (x) = (1/2) xT Ax − bT x
where A is the (symmetric) Hessian matrix is (constant equal to) A and this
matrix is positive semidefinite. Then ∇f (x) = Ax−b so the first-order necessary
optimality condition is
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 417

Ax = b
which is a linear system of equations. If f is strictly convex, which happens when
A is positive definite, then A is invertible and the unique solution is x∗ = A−1 b.
Thus, there is only one candidate for a local (and global) minimum, namely
x∗ = A−1 b. Actually, this is indeed a unique global minimum, but to verify
this we need a suitable argument. One way is to use convexity (with results
presented later) or an alternative is to use sufficient optimality conditions which
we discuss next. The linear system Ax = b, when A is positive definite, may be
solved by several methods. A popular, and very fast, method is the conjugate
gradient method. This method, and related methods, are discussed in detail in
the course INF-MAT4360 Numerical linear algebra [28].
In order to present a sufficient optimality condition we need a result from
linear algebra. Recall from linear algebra that a symmetric positive definite
matrix has only real eigenvalues and all these are positive.
Lemma 14.2. Smallest eigenvalue.
Let A be an n × n symmetric positive definite matrix, and let λn > 0 denote
its smallest eigenvalue. Then

hT Ah ≥ λn khk2 (h ∈ Rn ).

Proof. By the spectral theorem there is an orthogonal matrix P (containing the


orthonormal eigenvectors as its columns) such that

A = P DP T
where D is the diagonal matrix with the eigenvalues λ1 , . . . , λn on the diagonal.
Let h ∈ Rn and define y = P T h. Then kyk = khk and

n
X n
X
hT Ah = hT P DP T h = y T Dy = λi yi2 ≥ λn yi2 = λn kyk2 = λn khk2 .
j=1 i=1

Next we consider sufficient optimality conditions in the general differentiable


case. These conditions are used to prove that a candidate point (say, found by
an algorithm) is really a local minimum.
Theorem 14.3. Sufficient conditions for a minimum.
Assume that f : Rn → R has continuous second order partial derivatives in
some neighborhood of a point x∗ . Assume that ∇f (x∗ ) = 0 and ∇2 f (x∗ ) is
positive definite. Then x∗ is a local minimum of f .

Proof. From Theorem 11.4 (second order Taylor) and Lemma 14.2 we get
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 418

f (x∗ + h) = f (x∗ ) + ∇f (x∗ )T h + 21 hT ∇2 f (x∗ )h + (h)khk2


≥ f (x∗ ) + 12 λn khk2 + (h)khk2

where λn > 0 is the smallest eigenvalue of ∇2 f (x∗ ). Dividing here by khk2 gives
1
(f (x∗ + h) − f (x∗ ))/|hk2 = λn + (h)
2
Since limh→0 (h) = 0, there is an r such that for khk < r, |(h)| < λn /4. This
implies that

(f (x∗ + h) − f (x∗ ))/|hk2 ≥ λn /4


for all h with khk < r. This proves that x∗ is a local minimum of f .
We remark that the proof of the previous theorem actually shows that x∗
is a strict local minimum of f meaning that f (x∗ ) is strictly smaller than f (x)
for all other points x in some neighborhood of x∗ . Note the difference between
the necessary and the sufficient optimality conditions: a necessary condition is
that ∇2 f (x) is positive semidefinite, while a part of the sufficient condition is
the stronger property that ∇2 f (x) is positive definite.
Let us see what happens when we work with a convex function.
Theorem 14.4. Minima for convex functions.
Let f : Rn → R be a convex function. Then a local minimum is also a global
minimum. If, in addition, f is differentiable, then a point x∗ is a local (and then
global) minimum of f if and only if

∇f (x∗ ) = 0.
Proof. Let x1 be a local minimum. If x1 is not a global minimum, there is an
x2 6= x1 with f (x2 ) < f (x1 ). Then for 0 < λ < 1

f ((1 − λ)x1 + λx2 ) ≤ (1 − λ)f (x1 ) + λf (x2 ) < f (x1 )


and this contradicts that f (x) ≥ f (x1 ) for all x in a neighborhood of x∗ .
Therefore x1 must be a global minimum.
Assume f is convex and differentiable. Due to Theorem 14.1 we only need to
show that if ∇f (x∗ ) = 0, then x∗ is a local and global minimum. So assume
that ∇f (x∗ ) = 0. Then, from Theorem 12.10 we have

f (x) ≥ f (x∗ ) + ∇f (x∗ )T (x − x∗ )


for all x ∈ Rn . If ∇f (x∗ ) = 0, this directly shows that x∗ is a global minimum.
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 419

14.2 Methods
Algorithms for unconstrained optimization are iterative methods that generate
a sequence of points with gradually smaller values on the function f which is
to be minimized. There are two main types of algorithms in unconstrained
optimization:

• Line search methods: Here one first chooses a search direction dk from
the current point xk , using information about the function f . Then one
chooses a step length αk so that the new point xk+1 = xk + αk dk has a
small, perhaps smallest possible, value on the half-line {xk + αdk : α ≥ 0}.
αk describes how far one should go along the search direction. The problem
of choosing αk is a one-dimensional optimization problem. Sometimes we
can find αk exactly, and in such cases we refer to the method as exact line
search. In cases where αk can not be found analytically, algorithms can be
used to approximate how we can get close to the minimum on the half-line.
Such a method is also refered to as inexact line search.

• Trust region methods: In these methods one chooses an approximation


fˆk to the function in some neighborhood of the current point xk . The
function fˆk is simpler than f and one minimizes fˆk (in the mentioned
neighborhood) and let the next iterate xk+1 be this minimizer.

These types are typically both based on quadratic approximation of f , but they
differ in the order in which one chooses search direction and step size. In the
following we only discuss the first type, the line search methods.
A very natural choice for search direction at a point xk is the negative
gradient, dk = −∇f (xk ). Recall that the direction of maximum increase of a
(differentiable) function f at a point x is ∇f (x), and the direction of maximum
decrease is −∇f (x). To verify this, Taylor’s theorem gives
1
f (x + h) = f (x) + ∇f (x)T h + hT ∇2 f (x + th)h.
2
So, for small h, the first order term dominates and we would like to make this
term small. By the Cauchy-Schwarz inequality 1 .

∇f (x)T h ≥ −k∇f (x)k khk


and equality holds for h = −α∇f (x) for some α ≥ 0. In general, we call h a
descent direction at x if ∇f (x)T h < 0. Thus, if we move in a descent direction
from x and make a sufficiently small step, the new point has a smaller f -value.
With this background we shall in the following focus on gradient methods given
by

xk+1 = xk + αk dk (14.4)
1 The Cauchy-Schwarz’ inequality says: |u · v| ≤ kuk kvk for u, v ∈ Rn .
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 420

where the direction dk satisfies

∇f (xk )T dk < 0 (14.5)


There are two gradient methods we shall discuss:

The steepest descent method. Here we choose the search direction dk =


−∇f (xk ), we get

xk+1 = xk − αk ∇f (xk ).
In each step it moves in the direction of the negative gradient. Sometimes
this gives slow convergence, so other methods have been developed where other
choices of direction dk are made.

Newton’s method. Here we choose

xk+1 = xk − αk (∇2 f (xk ))−1 ∇f (xk ). (14.6)


2 −1
This is the gradient method with dk = −(∇ f (xk )) ∇f (xk ); this vector dk is
called the Newton step. The so-called pure Newton method is when one simply
chooses step size αk = 1 for each k. We then also say that we take a full Newton
step. To interpret this method consider the second order Taylor approximation
of f in xk

f (xk + h) ≈ f (xk ) + ∇f (xk )T h + (1/2)hT ∇2 f (xk )h


If we minimize this quadratic function w.r.t. h, assuming ∇2 f (xk ) is positive
definite, we get (see Exercise 14.8)

h = −(∇2 f (xk ))−1 ∇f (xk )


which explains the Newton step.
In the following we follow the presentation in [2]. In a gradient method we
need to choose the step length. This is the one-dimensional optimization problem

min{f (x + αd) : α ≥ 0}.


Sometimes (maybe not too often) we may solve this problem exactly. Most
practical methods try some candidate α’s and pick the one with smallest f -value.
Note that it is not necessary to compute the exact minimum (this may take too
much time). The main thing is to assure that we get a sufficiently large decrease
in f without making a too small step.
A popular method for choosing the step size is backtracking line search:
Definition 14.5. Backtracking line search.
The method of backtracking line search for choosing a step size is defined
as follows: We assume that (in advance) we have chosen parameters s ≤ 1, a
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 421

reduction factor β satisfying 0 < β < 1, and 0 < σ < 1 (typically this is chosen
very small, e.g. σ = 10−3 ). We define the integer

mk = min{m : m ≥ 0, f (xk ) − f (xk + β m sdk ) ≥ −σβ m s∇f (xk )T dk }. (14.7)

The step size is then defined to be αk = β mk s. The inequality

f (xk ) − f (xk + β m sdk ) ≥ −σβ m s∇f (xk )T dk (14.8)


is also called the stopping condition of backtracking line search.
The parameter s fixes the search for step size to lie within the interval [0, s].
This can be important: for instance, we can set s so small that the initial step
size we try is within the domain of definition for f . The natural thing would
be to choose s = 1: if the stopping condition then applies immediately, then
αk = 1. If Newton’s method is used this corresponds to using the pure Newton
step, i.e. a full Newton step is chosen.
According to [2] β is usually chosen in [1/10, 1/2]. In the literature one may
find a lot more information about step size rules and how they may be adjusted
to the methods for finding search direction, see [2], [32].
Now, we return to the choice of search direction in the gradient method (14.4).
A main question is whether it generates a sequence {xk }∞ k=1 which converges
to a stationary point x∗ , i.e., where ∇f (x∗ ) = 0. It turns out that this may
not be the case; one needs to be careful about the choice of dk to assure this
convergence. The problem is that if dk tends to be nearly orthogonal to ∇f (xk )
one may get into trouble. For this reason one introduces the following notion:
Definition 14.6. Gradient related.
{dk } is called gradient related to {xk } if for any subsequence {xkp }∞
p=1 of
{xk } converging to a nonstationary point, then the corresponding subsequence
{dkp }∞ T
p=1 of {dk } is bounded and lim supp→∞ ∇f (xk ) dk < 0.

What this condition assures is that kdk k is not too small or large compared
to k∇f (xk )k and that the angle between the vectors dk and ∇f (xk ) is not too
close to 90◦ . The proof of the following theorem may be found in [2].
Theorem 14.7. Backtracking line search and gradient related.
Let {xk }∞ ∞
k=0 be generated by the gradient method (14.4), where {dk }k=0 is

gradient related to {xk }k=0 and the step size αk is chosen using backtracking
line search. Then every limit point of {xk }∞
k=0 is a stationary point.

We remark that in Theorem 14.7 the same conclusion holds if we use exact
minimization as step size rule, i.e., f (xk + αdk ) is minimized exactly with respect
to α.
A very important property of a numerical algorithm is its convergence
speed. Let us consider the steepest descent method first. It turns out that the
convergence speed for this algorithm is very well explained by its performance on
minimizing a quadratic function, so therefore the following result is important.
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 422

In this theorem A is a symmetric positive definite matrix with eigenvalues


λ1 ≥ λ2 ≥ · · · ≥ λn > 0.
Theorem 14.8. Minima and teh smallest eigenvalue.
If the steepest descent method xk+1 = xk −αk ∇f (xk ) using exact line search
is applied to the quadratic function f (x) = xT Ax where A is positive definite,
then (the minimum value is 0 and)

f (xk+1 ) ≤ mA f (xk )
where mA = ((λ1 − λn )/(λ1 + λn ))2 .
The proof may be found in [2]. Thus, if the largest eigenvalue is much
larger than the smallest one, mA will be nearly 1 and one typically have slow
convergence. In this case we have mA ≈ cond(A) where cond(A) = λ1 /λn is
the condition number of the matrix A. So the rule is: if the condition number
of A is small we get fast convergence, but if cond(A) is large, there will be
slow convergence. A similar behavior holds for most functions f because locally
near a minimum point the function is very close to its second order Taylor
approximation in x∗ which is a quadratic function with A = ∇2 f (x∗ ).
Thus, Theorem 14.8 says that the sequence obtained in the steepest descent
method converges linearly to a stationary point (at least for quadratic functions).
We now turn to Newton’s method.
Recall that the pure Newton step minimizes the second order Taylor ap-
proximation of f at the current iterate xk . Thus, if the function we minimize
is quadratic, we are done in one step. Similarly, if the function can be well
approximated by a quadratic function, then one would expect fast convergence.
We shall give a result on the convergence of Newton’s method (see [3] for
further details). When A is symmetric, we let λmin (A) denote that smallest
eigenvalue of A.
For the convergence result we need a lemma on strictly convex functions.
Assume that x0 is a starting point for Newton’s method and let S = {x ∈ Rn :
f (x) ≤ f (x0 )}. We shall assume that f is continuous and convex, and this
implies that S is a closed convex set. We also assume that f has a minimum
point x∗ which then must be a global minimum. Moreover the minimum point
will be unique due to a strict convexity assumption on f . Let f ∗ = f (x∗ ) be the
optimal value.
The following lemma says that for a convex function as just described, a
point is nearly a minimum point (in terms of the f -value) whenever the gradient
is small in that point.

Lemma 14.9. Norm of teh gradient.


Assume that f is convex as above and that λmin (∇2 f (x)) ≥ m for all x ∈ S.
Then
1
f (x) − f ∗ ≤ k∇f (x)k2 . (14.9)
2m
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 423

Proof. From Theorem 11.3, the second order Taylor’ theorem, we have for each
x, y ∈ S

f (y) = f (x) + ∇f (x)T (y − x) + (1/2)(y − x)T ∇2 f (z)(y − x)


for suitable z on the line segment between x and y. Here a lower bound for the
quadratic term is (m/2)ky − xk2 , due to Lemma 14.2. Therefore

f (y) ≥ f (x) + ∇f (x)T (y − x) + (m/2)ky − xk2 .


Now, fix x and view the expression on the right-hand side as a quadratic function
of y. This function is minimized for y ∗ = x − (1/m)∇f (x). So, by inserting
y = y ∗ above we get

f (y) ≥ f (x) + ∇f (x)T (y ∗ − x) + (m/2)ky ∗ − xk2


1
= f (x) − 2m k∇f (x)k2
This holds for every y ∈ S so letting y = x∗ gives
1
f ∗ = f (x∗ ) ≥ f (x) − k∇f (x)k2
2m
which proves the desired inequality.
In the following convergence result we consider a function f as in Lemma 14.9.
Moreover, we assume that the Hessian matrix is Lipschitz continuous over S;
this is essentially a bound on the third derivatives of f . We do not give the
complete proof (it is quite long), but consider some of the main ideas. Recall
the definition of the set S from above.
Theorem 14.10. Quadratic convergence of Newton’s method for convex func-
tions.
Let f be convex and twice continuously differentiable and assume that
1. λmin (∇2 f (x)) ≥ m for all x ∈ S.
2. k∇2 f (x) − ∇2 f (y)k2 ≤ Lkx − yk for all x ∈ S.

Moreover, assume that f has a minimum point x∗ . Then Newton’s method


generates a sequence {xk }∞ ∗ 0
k=0 that converges to x . From a certain k the
convergence speed is quadratic.
Proof. The proof is based on [3]). Define f ∗ = f (x∗ ). We will prove the result
by establishing two lemmas. The proofs of these lemmas are rather technical, so
they are put in their own sections which are not part of the curriculum, and are
only included for the sake of completeness.
The first lemma applies to the first iterations of Newton’s method. In
this phase the convergence of the method may be slow, and we will see that
backtracking line search may choose a step size which is very small. This phase
of Newton’s method is therefore called the damped Newton phase:
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 424

Lemma 14.11. First lemma.


For any η, there exists γ > 0 so that, for each k, if k∇f (xk )k ≥ η, then

f (xk+1 ) ≤ f (xk ) − γ. (14.10)


The proof can be found in Section 14.2. After the damped Newton phase,
the Newton method will enter a phase where the convergence is much quicker,
as the following result shows. It is in this phase that we have a quadratical
convergence rate, so that this phase also is called the quadratically convergent
phase. It turns out that backtacking line search always chooses a step size equal
to 1 in this phase:
Lemma 14.12. Second lemma.
There exists η with 0 < η ≤ m2 /L so that, for each k, if k∇f (xk )k < η, then
αk = 1 satisfies the stopping criterion of backtracking line search in Newton’s
method, and Newton’s method with backtracking line search gives
 2
L L
k∇f (xk+1 )k ≤ k∇f (xk )k . (14.11)
2m2 2m2
The proof can be found in Section 14.2. Now, let us combine these two lemmas
to prove the theorem. In each iteration where (14.10) occurs f is decreased by
at least γ, so the number of such iterations must be bounded by

(f (x0 ) − f ∗ )/γ
which is a finite number. For some k we must thus have by Lemma 14.11 that
k∇f (xk )k < η, and we can then use (14.11) and Lemma 14.12 to obtain

2
2m2

L
k∇f (xk+1 )k ≤ k∇f (x k )k
L 2m2
L L 2 1 L 1
= 2
k∇f (xk )k2 ≤ 2
η = 2
ηη ≤ η ≤ η.
2m 2m 2m 2
Therefore, as soon as (14.11) occurs in the iterative process, in all the remaining
iterations (14.11) will occur. Actually, let us show that as soon as (14.11) “kicks
in”, quadratic convergence starts:
L 2
Define µl = 2m 2 k∇f (xl )k for each l ≥ k. Then 0 ≤ µk < 1/2 as η ≤ m /L.

From (14.11) it follows that

µl+1 ≤ µ2l (l ≥ k).


So (by induction)
l−k l−k
µl ≤ µ2k ≤ (1/2)2 (l = k + 1, k + 2, . . .).
Next, from Lemma 14.9
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 425

2
1 4m4

∗ 1 2 L
f (xl ) − f ≤ k∇f (xl )k = k∇f (xl )k
2m 2m L2 2m2
2m3 2m3 l−k+1
= 2 µ2l ≤ 2 (1/2)2 ,
L L
for l ≥ k. This inequality shows that f (xl ) → f ∗ , and since the minimum
point is unique due to convexity, we must have xl → x∗ . It follows that the
convergence is quadratic.
From the proof it is also possible to say something about haw many iterations
that are needed to reach a certain accuracy. In fact, if  > 0 a bound on the
number of iterations until f (xk ) ≤ f ∗ +  is

2m3
(f (x0 ) − f ∗ )/γ + log2 log2 .
L2
Here γ is the parameter introduced in the proof above. The second term in
this expression (the logarithmic term) grows very slowly as  is decreased, and
it may roughly be replaced by the constant 6. So, whenever the second stage
(14.11) occurs, the convergence is extremely fast, it takes about 6 more Newton
iterations. Note that quadratic convergence means, roughly, that the number of
correct digits in the answer doubles for every iteration.

*The proof for Lemma 14.11. We have that

1
kdk k2 = (∇f (xk ))T (∇2 f (xk ))−2 ∇f (xk ) ≤ (∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ),
m
since the largest eigenvalue of (∇f (xk ))T (∇2 f (xk ))−2 ∇f (xk ) is less than 1/m.
Since there also is an upper bound M on the highest eigenvalue of ∇2 f (x), the
second order Taylor approximation gives

1
f (xk + αk dk ) = f (xk ) + αk ∇f (xk )T dk + (αk )2 (dk )T ∇2 f (z)dk
2
M kdk k2 2
≤ f (xk ) + αk ∇f (xk )T dk + αk
2
M (∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ) 2
≤ f (xk ) − αk (∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ) + αk
2m
If we try the value αˆk = m/M we get

1
f (xk + αˆk dk ) ≤ f (xk ) − αˆk (∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ),
2
which can be written as
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 426

1
f (xk ) − f (xk + αˆk dk ) ≥ αˆk (∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk )
2
≥ σ αˆk (∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk )
= −σ αˆk (∇f (xk ))T dk ,

which shows that αˆk = m/M satisfies the stopping criterion of backtracking line
search. Since we may not have exactly m/M = β n s for some n,we may still
conclude that backtracking line search stops at αk ≥ βm/M , so that

f (xk+1 ) ≤ f (xk ) + σ αˆk (∇f (xk ))T dk


≤ f (xk ) − σαk (∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk )
m 1
≤ f (xk ) − σβ k∇f (xk )k2
MM
m
≤ f (xk ) − σβη 2 2 .
M
m
This shows that we can choose γ = σβη 2 M 2.

*The proof for Lemma 14.12. We will first show that backtracking line
search chooses unit steps provided that η ≤ 3(1 − 2σ)m2 /L. By condition (ii),

k∇2 f (xk + αk dk ) − ∇2 f (xk )k2 ≤ αk Lkdk k,


so that

(dk )T ∇2 f (xk + αk dk ) − ∇2 f (xk ) dk ≤ αk Lkdk k3 .




Now we define the function g(t) = f (x + tdk ). The chain rule gives that

g 0 (t) = ∇f (x + tdk )T dk g 00 (t) = (dk )T ∇2 f (xk + tdk )dk .

In particular, note that g 00 (0) = (∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ). The inequality
above can therefore be written as

|g 00 (t) − g 00 (0)| ≤ tLkdk k3 ,


so that

g 00 (t) ≤ g 00 (0) + tLkdk k3


L
≤ (∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ) + t ((∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ))3/2
m3/2
where we have used that
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 427

mkdk k2 = m(∇f (xk ))T (∇2 f (xk ))−2 ∇f (xk ) ≤ (∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ).

We integrate this inequality to get

g 0 (t) ≤ g 0 (0) + t(∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk )


L
+ t2 ((∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ))3/2
2m3/2
= −(∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ) + t(∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk )
L
+ t2 ((∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ))3/2 .
2m3/2
We integrate this once more and get

g(t) ≤ g(0) − t(∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk )


1
+ t2 (∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk )
2
L
+ t3 ((∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ))3/2 .
6m3/2
If we here set t = 1 we get

1
f (xk + dk ) ≤ f (x) − (∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk )
2
L
+ ((∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ))3/2 .
6m3/2
Assume now that also k∇f (xk k ≤ 3(1 − 2σ)m2 /L. Since the biggest eigenvalue
of (∇2 f (xk ))−1 is less than 1/m, we have that

1
(∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ) ≤ (3(1 − 2σ)m2 /L)2 = (3(1 − 2σ)m3/2 /L)2 .
m
This implies that

L((∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ))1/2


1/2 − ≥ σ.
6m3/2
We therefore have that

 
1 L
f (xk + dk ) ≤ f (x) − ∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ) − ((∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ))1/2
2 6m3/2
≤ f (x) − σ∇f (xk ))T (∇2 f (xk ))−1 ∇f (xk ) = f (x) + σ∇f (xk ))T dk ,
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 428

which proves that αk = 1 is accepted by the stopping criterion of backtracking


line search. We also have that

k∇f (xk + dk )k = k∇f (xk + dk ) − ∇f (xk ) − ∇2 f (xk )dk k


Z 1
=k (∇2 f (xk + tdk ) − ∇2 f (xk ))dk dtk
0
Z 1
≤ k(∇2 f (xk + tdk ) − ∇2 f (xk ))dk k2 dt
0
Z 1
L L
≤ tkdk k2 dt = kdk k2 = k(∇2 f (xk ))−1 ∇f (xk )k2
0 2 2
L
≤ k∇f (xk )k2 .
2m2
This proves the lemma.

Exercise 14.1: Solve


Consider the function f (x1 , x2 ) = x21 + ax22 where a > 0 is a parameter. Draw
some of the level sets of f (for different levels) for each a in the set {1, 4, 100}.
Also draw the gradient in a few points on these level sets.

Exercise 14.2: Solve


State and prove a theorem similar to Theorem 14.1 for maximization problems.

Exercise 14.3: Solve


Let f (x) = xT Ax where A is a symmetric n × n matrix. Assume that A is
indefinite, so it has both positive and negative eigenvalues. Show that x = 0 is
a saddlepoint of f .

Exercise 14.4: Solve


Let f (x1 , x2 ) = 4x1 + 6x2 + x21 + 2x22 . Find all stationary points and determine
if they are minimum, maximum or saddlepoints. Do the same for the function
g(x1 , x2 ) = 4x1 + 6x2 + x21 − 2x22 .

Exercise 14.5: Solve


Let the function f be given by f (x1 , x2 ) = (x1 − 1)2 + (x2 − 2)2 + 1.
a) Compute the search direction dk which is chosen by the steepest descent
method in the point xk = (2, 3).
b) Compute in the same way the search direction dk which is chosen when we
instead use Newton’s method in the point xk = (2, 3).
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 429

Exercise 14.6: Solve


The function f (x1 , x2 ) = 100(x2 − x21 )2 + (1 − x1 )2 is called the Rosenbrock
function. Compute the gradient and the Hessian matrix at every point x. Find
every local minimum. Also draw some of the level sets (contour lines) of f .

Exercise 14.7: When steepest descent finds the minimum


in one step
Let f (x) = (1/2)xT Ax − bT x where A is a positive definite n × n matrix.
Consider the steepest descent method applied to the minimization of f , where
we assume exact line search is used. Assume that the search direction happens
to be equal to an eigenvector of A. Show that then the minimum is reached in
just one step.

Hint. Start by writing out one step with Newton’s method when the search
direction happens to be equal to an eigenvector of A, and establish a connection
with the steepest descent method.

Exercise 14.8: Solve


Consider the second order Taylor approximation

Tf2 (x; x + h) = f (x) + ∇f (x)T h + (1/2)hT ∇2 f (x)h.


a) Show that ∇h Tf2 = ∇f (x) + ∇2 f (x)h.
b) Minimizing Tf2 with respect to h implies solving ∇h Tf2 = 0, i.e. ∇f (x) +
∇2 f (x)h = 0 from a.. If ∇2 f (x) is positive definite, explain that it also is invert-
ible, so that this equation has the unique solution h = −(∇2 f (xk ))−1 ∇f (xk ),
as previously noted for the Newton step.

Exercise 14.9: Solve


We want to find the minimum of f (x) = 12 xT Ax−bT x, defined on Rn . Formulate
one step with Newton’s method, and one step with the steepest descent method,
where you set the step size to αk = 1. Which of these methods works best for
finding the minimum for functions on this form?

Exercise 14.10: Solve


Implement the steepest descent method. Test the algorithm on the functions in
exercises 14.4 and 14.6. Use different starting points.
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 430

Exercise 14.11: Solve


What can go wrong when you apply backtracking line search (Equation (14.7))
to a function f where ∇2 f er negative definite (i.e. all eigenvalues of ∇2 f are
negative)?

Hint. Substitute the Taylor approximation

f (xk + β m sdk ) ≈ f (xk ) + ∇f (xk )T (β m sdk )


in Equation (14.7), and remember that σ there is chosen so that σ < 1.

Exercise 14.12: Solve


Write a function newtonbacktrack which performs Newton’s method for un-
constrained optimization. The input parameters are the function, its gradient,
its Hesse matrix, and the initial point. The function should also return the
number of iterations, and at each iteration write the corresponding function
value. Use backtracking line search to compute the step size, i.e. compute mk
from Equation (14.7) with β = 0.2, s = 0.5, σ = 10−3 , and use α = β mk s as the
step size. Test the algorithm on the functions in exercises 14.4 and 14.6. Use
different starting points.

Exercise 14.13: Solve


Let us return to the maximum likelihood example on the desintegration of muons.
a) Run the function newtonbacktrack with parameters being the function f
and its and derivaties defined as in the example with n = 10 and

x = (0.4992, −0.8661, 0.7916, 0.9107, 0.5357, 0.6574, 0.6353, 0.0342, 0.4988, −0.4607)

Use the start value α0 = 0 for Newtons method. What estimate for the minimum
of f (and thereby α) did you obtain?
b) The ten measurements from a) were generated from a probability distribution
where α = 0.5. The answer you obtained was quite far from this. Let us therefore
take a look at how many measurements we should use in order to get quite
precise estimates for α. You can use the function

function ret=randmuon(alpha,m,n)

to generate an m × n-matrix with measurements generated with a probability


distribution with a given parameter α. This function can be found at the
homepage of the book.
With α = 0.5, generate n = 10 measurements with the help of the function
randmuon, and find the maximum likelihood estimate as above. Repeat this
CHAPTER 14. UNCONSTRAINED OPTIMIZATION 431

10 times, and plot the ten estimates you obtain. Repeat for n = 1000, and
for n = 100000 (in all cases you are supposed to plot 10 maximum likelihood
estimates). How many measurements do we need in order to obtain maximum
likelihood estimates which are reliable?
Note that it is possible for the maximum likelihood estimates you obtain here to
be outside the domain of definition [−1, 1]. You need not take this into account.
Chapter 15

Constrained optimization -
theory

In this chapter we consider constrained optimization problems. A general


optimization problem is

minimize f (x) subject to x ∈ S.


where S ⊆ Rn is a given set and f : S → R. We here focus on a very general
optimization problem which often occurs in applications. Consider the nonlinear
optimization problem with equality/inequality constraints

minimize f (x)
subject to
(15.1)
hi (x) = 0 (i ≤ m)
gj (x) ≤ 0 (j ≤ r)
where f , h1 , h2 , . . . , hm and g1 , g2 , . . . , gr are continuously differentiable functions
from Rn into R. A point x satisfying all the m + r constraints will be called
feasible. Thus, we look for a feasible point with smallest f -value.
Our goal is to establish optimality conditions for this problem, starting with
the special case with only equality constraints. Then we discuss algorithms for
solving this problem. Our presentation is strongly influenced by [3] and [2].

15.1 Equality constraints and the Lagrangian


Consider the nonlinear optimization problem with equality constraints

minimize f (x)
subject to (15.2)
hi (x) = 0 (i ≤ m)

432
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 433

where f and h1 , h2 , . . . , hm are continuously differentiable functions from Rn


into R. We introduce the vector field H = (h1 , h2 , . . . , hm ), so H : Rn → Rm
and H(x) = (h1 (x), h2 (x), . . . , hm (x)).
We first establish necessary optimality conditions for this problem. A point
x∗ ∈ Rn is called regular if the gradient vectors ∇hi (x∗ ) (i ≤ m) are linearly
independent.
Theorem 15.1. Lagrange.
Let x∗ be a local minimum in problem (15.1) and assume that x∗ is a regular
point. Then there is a unique vector λ∗ = (λ∗1 , λ∗2 , . . . , λ∗m ) ∈ Rm such that
m
X
∇f (x∗ ) + λ∗i ∇hi (x∗ ) = 0. (15.3)
i=1

If f and each hi are twice continuously differentiable, then the following also
holds

m
X
hT (∇2 f (x∗ ) + λ∗i ∇2 hi (x∗ ))h ≥ 0 for all h ∈ T (x∗ ) (15.4)
i=1

where T (x∗ ) is the subspace T (x∗ ) = {h ∈ Rn : ∇hi (x∗ ) · h = 0 (i ≤ m)}.


The numbers λ∗i in this theorem are called the Lagrangian multipliers. Note
that the Lagrangian multiplier vector λ∗ is unique; this follows directly from
the linear independence assumption as x∗ is assumed regular. The theorem may
also be stated in terms of the Lagrangian function L : Rn × Rm → R given by

m
X
L(x, λ) = f (x) + λi hi (x) = f (x) + λT H(x) (x ∈ Rn , λ ∈ Rm ).
i=1

Then

X
∇x L(x, λ) = ∇f (x) + λi ∇hi
i
∇λ L(x, λ) = H(x).

Therefore, the first order conditions in Theorem 15.1 may be rewritten as follows

∇x L(x∗ , λ∗ ) = 0, ∇λ L(x∗ , λ∗ ) = 0.
Here the second equation simply means that H(x) = 0. These two equations
say that (x∗ , λ∗ ) is a stationary point for the Lagrangian, and it is a system of
n + m (possibly nonlinear) equations in n + m variables.
Let us interpret Theorem 15.1. First of all, T (x∗ ) can be interpreted as a
linear subspace consisting of the “first order feasible directions” at x∗ , i.e. search
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 434

directions we can choose which do not violate the constraints (so that hi (x∗ +h) =
0 whenever hi (x∗ ) = 0, i ≤ m). To see this, note that ∇hi (x∗ ) · h is what is
called the directional derivative of hi in the direction h. This quantity measures
the change of hi in direction h, and if this is zero, hi remains zero when we move
in direction h, so that the constraints are kept. Actually, if each hi is linear, then
T (x∗ ) consists of those h such that x∗ + h is also feasible, i.e., hi (x∗ + h) = 0
for each i ≤ m. Thus, Equation (15.3) says that in a local minimum x∗ the
gradient ∇f (x∗ ) is orthogonal to the subspace T (x∗ ) of the first order feasible
variations. This is reasonable since otherwise there would be a feasible direction
in which f would decrease. In Figure 15.1 we have plotted a curve where two
constraints are fulfilled. In Figure 15.2 we have then shown an interpretation of
Theorem 15.1. Note that this necessary optimality condition corresponds to the
condition ∇f (x∗ ) = 0 in the unconstrained case. The second condition (15.4) is
a similar generalization of the second order condition in Theorem 14.1 (saying
that ∇2 f (x∗ ) is positive semidefinite).

h2 (x) = b2

h1 (x) = b1

Figure 15.1: The two surfaces h1 (x) = b1 og h2 (x) = b2 intersect each other in
a curve. Along this curve the constraints are fulfilled.

It is possible to prove the theorem by eliminating variables based on the


equations and thereby reducing the problem to an unconstrained one. Another
proof, which we shall present below is based on the penalty approach. This
approach is also interesting as it leads to algorithms for actually solving the
problem.
Proof. (Theorem 15.1) For k = 1, 2, . . . consider the modified objective function

F k (x) = f (x) + (k/2)kH(x)k2 + (α/2)kx − x∗ k2


where x∗ is the local minimum under consideration, and α is a positive constant.
The second term is a penalty term for violating the constraints and the last term
is there for proof technical reasons. As x∗ is a local minimum there is an  > 0
such that f (x∗ ) ≤ f (x) for all x ∈ B̄(x∗ ; ). Choose now an optimal solution xk
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 435

∇h1 (x∗ ) ∇f (x∗ )


CO COC
C  C
h2 (x∗ ) = b2C  C
C  C
C 
: ∇h2 (x∗ )
s 
C   :

C
h1 (x∗ ) = b1

Figure 15.2: ∇f (x∗ ) as a linear combination of ∇h1 (x∗ ) and ∇h2 (x∗ ).

of the problem min{F k (x) : x ∈ B̄(x∗ ; )}; the existence here follows from the
extreme value theorem (F k is continuous and the ball is compact). For every k

F k (xk ) = f (xk ) + (k/2)kH(xk )k2 + (α/2)kxk − x∗ k2 ≤ F k (x∗ ) = f (x∗ ).

By letting k → ∞ in this inequality we conclude that limk→∞ kH(xk )k =


0. So every limit point x̄ of the sequence {xk } satisfies H(x̄) = 0. The
inequality above also implies (by dropping a term on the left-hand side) that
f (xk ) + (α/2)kxk − x∗ k2 ≤ f (x∗ ) for all k, so by passing to the limit we get

f (x̄) + (α/2)kx̄ − x∗ k2 ≤ f (x∗ ) ≤ f (x̄)


where the last inequality follows from the facts that x̄ ∈ B̄(x∗ ; ) and H(x̄) = 0.
Clearly, this gives x̄ = x∗ . We have therefore shown that the sequence {xk }
converges to the local minimum x∗ . Since x∗ is the center of the ball B̄(x∗ ; ),
the points xk lie in the interior of S for suitably large k. The conclusion is then
that xk is the unconstrained minimum of F k when k is sufficiently large. We
may therefore apply Theorem 14.1 so ∇F k (xk ) = 0. Note first that the Jacobi
matrix of (k/2)kH(x)k2 is the row vector kH(xk )T H 0 (xk ), so that the gradient
is kH 0 (xk )T H(xk ). We now obtain

0 = ∇F k (xk ) = ∇f (xk ) + kH 0 (xk )T H(xk ) + α(xk − x∗ ). (15.5)

For suitably large k the matrix H 0 (xk )H 0 (xk )T is invertible (as the rows of
H 0 (xk ) are linearly independent due to rank(H 0 (x∗ )) = m and a continuity
argument). Multiply equation (15.5) by (H 0 (xk )H 0 (xk )T )−1 H 0 (xk ) on the left
to obtain
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 436

kH(xk ) = −(H 0 (xk )H 0 (xk )T )−1 H 0 (xk )(∇f (xk ) + α(xk − x∗ )).

Letting k → ∞ we see that the sequence {kH(xk )} is convergent and its limit
point λ∗ is given by

λ∗ = −(H 0 (x∗ )H 0 (x∗ )T )−1 H 0 (x∗ )∇f (x∗ ).


Finally, by passing to the limit in (15.5) we get

0 = ∇f (x∗ ) + H 0 (x∗ )T λ∗
This proves the first part of the theorem; we omit proving the second part which
may be found in [2].
The first order necessary condition (15.3) along with the constraints H(x) =
0 is a system of n + m equations in the n + m variables x1 , x2 , . . . , xn and
λ1 , λ2 , . . . , λm . One may use e.g. Newton’s method for solving these equations
and find a candidate for an optimal solution. But usually there are better
numerical methods for solving the optimization (15.1), as we shall see soon.
Necessary optimality conditions are used for finding a candidate solution
for being optimal. In order to verify optimality we need sufficient optimality
conditions.
Theorem 15.2. Lagrange, necessary condition.
Assume that f and H are twice continuously differentiable functions. More-
over, let x∗ be a point satisfying the first order necessary optimality condition
(15.3) and the following condition

y T ∇2 L(x∗ , λ∗ )y > 0 for all y 6= 0 with H 0 (x∗ )y = 0 (15.6)


where ∇2 L(x∗ , λ∗ ) is the Hessian of the Lagrangian function with second order
partial derivatives with respect to x. Then x∗ is a (strict) local minimum of f
subject to H(x) = 0.
This theorem may be proved (see [2] for details) by considering the augmented
Lagrangian function

Lc (x, λ) = f (x) + λT H(x) + (c/2)kH(x)k2 (15.7)


where c is a positive scalar. This is in fact the Lagrangian function in the
modified problem

minimize f (x) + (c/2)kH(x)k2 subject to H(x) = 0 (15.8)

and this problem must have the same local minima as the problem of minimizing
f (x) subject to H(x) = 0. The objective function in (15.8) contains the penalty
term (c/2)kH(x)k2 which may be interpreted as a penalty (increased function
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 437

value) for violating the constraint H(x) = 0. In connection with the proof of
Theorem 15.2 based on the augmented Lagrangian one also obtains the following
interesting and useful fact:
if x∗ and λ∗ satisfy the sufficient conditions in Theorem 15.2 then there
exists a positive c̄ such that for all c ≥ c̄ the point x∗ is also a local minimum of
the augmented Lagrangian Lc (·, λ∗ ).
Thus, the original constrained problem has been converted to an uncon-
strained one involving the augmented Lagrangian. And, as we know, uncon-
strained problems are easier to solve (solve the equations saying that the gradient
is equal to zero).

15.2 Inequality constraints and KKT


We now consider the general nonlinear optimization problem where there are
both equality and inequality constraints. The problem is then

minimize f (x)
subject to
(15.9)
hi (x) = 0 (i ≤ m)
gj (x) ≤ 0 (j ≤ r)
We assume, as usual, that all these functions are continuously differentiable
real-valued functions defined on Rn . In short form we write the constraints
as H(x) = 0 and G(x) ≤ 0 where we let H = (h1 , h2 , . . . , hm ) and G =
(g1 , g2 , . . . , gr ).
A main difficulty in problems with inequality constraints is to determine which
of the inequalities that are active in an optimal solution. If we knew the active
inequalities, we would essentially have a problem with only equality constraints,
H(x) = 0 plus the active equalities, i.e., a problem of the form discussed in
the previous section. For very small problems (solvable by hand-calculation) a
direct method is to consider all possible choices of active inequalities and solve
the corresponding equality-constrained problem by looking at the Lagrangian
function.
Interestingly, one may also transform the problem (15.9) into the following
equality-constrained problem

minimize f (x)
subject to
(15.10)
hi (x) = 0 (i ≤ m)
gj (x) + zj2 = 0 (j ≤ r).
We have introduced extra variables zj , one for each inequality. The square of
these variables represent slack in each of the original inequalities. Note that
there is no sign constraint on zj . Clearly, the problems (15.9) and (15.10) are
equivalent. This transformation can also be useful computationally. Moreover,
it is useful theoretically as one may apply the optimality conditions from the
previous section to problem (15.10) to derive the theorem below (see [2]).
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 438

We now present a main result in nonlinear optimization. It gives optimality


conditions for this problem, and these conditions are called the Karush-Kuhn-
Tucker conditions, or simply the KKT conditions. In order to present the KKT
conditions we introduce the Lagrangian function L : Rn × Rm × Rr → R given
by

m
X r
X
L(x, λ, µ) = f (x) + λi hi (x) + µj gj (x) = f (x) + λT H(x) + µT G(x).
i=1 j=1
(15.11)
The gradient of L with respect to x is given by
m
X r
X
∇x L(x, λ, µ) = ∇f (x) + λi ∇hi (x) + µj ∇gj (x).
i=1 j=1

The Hessian matrix of L at (x, λ, µ) containing second order partial derivatives


of L with respect to x will be denoted by ∇xx L(x, λ, µ). Finally, the indices of
the active inequalities at x is denoted by A(x), so A(x) = {j ≤ r : gj (x) = 0}.
A point x is called regular if {∇h1 (x), . . . ∇hm (x)} ∪ {∇gi (x) : i ∈ A(x)} is
linearly independent.
In the following theorem the first part contains necessary conditions while
the second part contains sufficient conditions for optimality.
Theorem 15.3. KKT.
Consider problem (15.9) with the usual differentiability assumptions.
1. Let x∗ be a local minimum of this problem and assume that x∗ is a regular
point. Then there are unique Lagrange multiplier vectors λ∗ = (λ∗1 , λ∗2 , . . . , λ∗m )
and µ∗ = (µ∗1 , µ∗2 , . . . , µ∗r ) such that

∇x L(x∗ , λ∗ , µ∗ ) = 0
µ∗j ≥ 0 (j ≤ r) (15.12)
µ∗j = 0 (j 6∈ A(x∗ )).
If f , g and h are twice continuously differentiable, then the following also holds

y T ∇2xx L(x∗ , λ∗ , µ∗ )y ≥ 0 (15.13)


∗ T ∗ T ∗
for all y with ∇hi (x ) y = 0 (i ≤ m) and ∇gj (x ) y = 0 (j ∈ A(x )).
2. Assume that x∗ , λ∗ and µ∗ are such that x∗ is a feasible point and (15.12)
holds. Assume, moreover, that (15.13) holds with strict inequality for each y.
Then x∗ is a (strict) local minimum in problem (15.9).

Proof. We shall derive this result from Theorem 15.1.


1. By assumption x∗ is a local minimum of problem (15.9), and x∗ is a
regular point. Consider the constrained problem
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 439

minimize f (x)
subject to
(15.14)
hi (x) = 0 (i ≤ m)
gj (x) = 0 (j ∈ A(x∗ ))
which is obtained by removing all inactive constraints in x∗ . Then x∗ must
be a local minimum in (15.14); otherwise there would be a point x0 in the
neighborhood of x∗ which is feasible in (15.14) and satisfying f (x0 ) < f (x∗ ).
By choosing x0 sufficiently near x∗ we would get gj (x0 ) < 0 for all j 6∈ A(x∗ ),
contradicting that x∗ is a local minimum in (15.9). Therefore we may apply
Theorem 15.1 to problem (15.14) and by regularity of x∗ there must be unique
Lagrange multiplier vectors λ∗ = (λ∗1 , λ∗2 , . . . , λ∗m ) and µ∗j (j ∈ A(x∗ )) such that
m
X X
∇f (x∗ ) + λ∗i ∇hi (x∗ ) + µ∗j ∇gj (x∗ ) = 0
i=1 j∈A(x∗ )

By defining µj = 0 for j 6∈ A(x∗ ) we get (15.12), except for the nonnegativity of


µ.
The remaining part of the theorem may be proved, after some work, by
studying the equality-constrained reformulation (15.10) of (15.9) and applying
Theorem 15.1 to (15.10). The details may be found in [2].
The KKT conditions have an interesting geometrical interpretation. They
say that −∇f (x∗ ) may be written as linear combination of the gradients of the
hi ’s plus a nonnegative linear combination of the gradients of the gj ’s that are
active at x∗ .

Example 15.1: A simple optimization problem


Let us consider the following optimization problem:

min{x1 : x2 ≥ 0, 1 − (x1 − 1)2 − x22 ≥ 0}.


Here there are two inequality constraints:

g1 (x1 , x2 ) = −x2 ≤ 0
g2 (x1 , x2 ) = (x1 − 1)2 + x22 − 1 ≤ 0.

If we compute the gradients we see that the KKT conditions take the form
     
1 0 2(x1 − 1)
+ µ1 + µ2 = 0,
0 −1 2x2
where the two last terms on the left hand side only are included if the correspond-
ing inequalities are active. It is clear that we find no solutions if no inequalities
are active. If only the first inequality is active we find no solution either. If only
the second inequality is active we get the equations
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 440

(x1 − 1)2 + x22 = 1


1 + 2µ2 (x1 − 1) = 0
2µ2 x2 = 0.

From the last equation we see that either x2 = 0 or µ2 = 0. But here x2 > 0
since only the second inequality is active, so that µ2 = 0. µ2 = 0 is in conflict
with the second equation, however. Finally, let us consider the case where both
equalities are active. This occurs only in the points (0, 0) and (2, 0). These two
points give the gradients ∇g2 = (∓2, 0), so that the gradient equation can be
written as
     
1 0 ∓2
+ µ1 + µ2 = 0,
0 −1 0
These give µ1 = 0 and µ1 = ±1/2. Since we require µ2 ≥ 0, the only candidate
we obtain is (0, 0).
Finally we should comment on any points which are not regular. If only the
first inequality is active it is impossible to have that ∇g1 = 0. If only the second
inequality is active it is impossible to have ∇g2 = 0 since this would require
x1 = 1, x2 = 0 which contradicts an active point. If both inequalities are active,
we saw that (0, 0) and (2, 0) are the only possible points. This gave the gradients
(0, −1) and (∓2, 0), which clearly are linearly independent. We therefore have
that all points are regular.
We remark that the assumption that x∗ is a regular point may be too
restrictive in some situations, for instance there may be more than n active
inequalities in x∗ . There exist several other weaker assumptions that assure the
existence of Lagrangian multipliers (and similar necessary conditions).
In the proof of Theorem 15.3 we did not prove the nonnegativity of µ. To
show this is actually quite hard, but let us comment on the main lines. We first
need the concept of a tangent vector.

Definition 15.4. Tangent vector.


Let C ⊆ Rn and let x ∈ C. A vector d ∈ Rn is called a tangent (vector) to
C at x if there is a sequence {xk } in C and a sequence {αk } in R+ such that

lim (xk − x)/αk = d.


k→∞

The set of tangent vectors at x is denoted by TC (x).


TC (x) always contains the zero vector and it is a cone, meaning that it
contains each positive multiple of its vectors. We now restrict to the problem
(15.9) and let C be the set of feasible solutions (those x satisfying all the
equality and inequality constraints). One first shows that (see [32]) x∗ satisfies
∇f (x∗ )T d ≥ 0 for all d ∈ TC (x∗ ). After this, the following concept is needed.
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 441

Definition 15.5. Linearized feasible directions.


A linearized feasible direction at x ∈ C is a vector d such that

d · ∇hi (x) = 0 (i ≤ m)
d · ∇gj (x) ≤ 0 (j ∈ A(x))
(since H 0 (x) is the matrix with rows ∇hi (x), the first condition is the same as
H 0 (x)d = 0. Similarly, when all constraints are active the second condition is
the same as G0 (x)d ≤ 0). We denote by LFC (x) the set of all linearized feasible
directions at x.
So, if we move from x along a linearized feasible direction with a suitably
small step, then the new point is feasible if we only care about the linearized
constraints at x∗ (the first order Taylor approximations) of each hi and each
gj for active constraints at x∗ , i.e., those inequality constraints that hold with
equality. With this notation we have the following lemma. The proof may be
found in [32] and it involves the implicit function theorem from multivariate
calculus [26].
Lemma 15.6. Tangent cone and feasible directions.
Let x∗ ∈ C. Then TC (x∗ ) ⊆ LFC (x∗ ). If x∗ is a regular point, then
TC (x∗ ) = LFC (x∗ ).
Putting these things together, when x∗ is regular, ∇f (x∗ )T d ≥ 0 for all
d ∈ LFC (x∗ ). Now we need a lemma called Farkas’ lemma.
Lemma 15.7. Farkas lemma.
If B and C are matrices with n rows, and K is the cone defined by K =
{By + Cw, with y ≥ 0}, then exactly one of the following two alternatives are
true:

1. g ∈ K
2. There exists a d ∈ Rn so that g T d < 0, B T d ≥ 0, and C T d = 0.

If we apply this lemma with g = ∇f (x∗ ), B = −G0 (x∗ )T , and C =


−H 0 (x∗ )T , the conditions B T d ≥ 0, and C T d = 0 simply says that d ∈
LFC (x∗ ) = TC (x∗ ). But for all such d we have proved that g T d = ∇f (x∗ )T d ≥
0, so that point 2 of Farkas lemma does not hold for g = ∇f (x∗ ). We conclude
that g = ∇f (x∗ ) ∈ K, so that we can find y ≥ 0 and w so that

g = ∇f (x∗ ) = −H 0 (x∗ )T w − G0 (x∗ )T y = By + Cw.


But this states exactly what we want to prove, namely that ∇f (x∗ )+H 0 (x∗ )T w+
G0 (x∗ )T y = 0, and that w contains the Lagrange multipliers λi , and y contains
the µi , which must be non-negative. For a more thorough discussion of these
matters, see e.g. [32, 2].
In the remaining part of this section we discuss some examples; the main
tool is to establish the KKT conditions.
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 442

Example 15.2: a one-variable problem


Consider the one-variable problem: minimize f (x) subject to x ≥ 0, where
f : R → R is a differentiable convex function. We here let g1 (x) = −x and
m = 0. The KKT conditions then become: there is a number µ such that
f 0 (x) − µ = 0, µ ≥ 0 and µ = 0 if x > 0. This is one of the (rare) occasions
where we can eliminate the Lagrangian variable µ via the equation µ = f 0 (x).
So the optimality conditions are: x ≥ 0 (feasibility), f 0 (x) ≥ 0, and f 0 (x) = 0 if
x > 0 (x is an interior point of the domain so the derivative must be zero), and
if x = 0 we must have f 0 (0) ≥ 0.

Example 15.3: a multi-variable problem


More generally, consider the problem to minimize f (x) subject to x ≥ 0, where
f : Rn → R. So here C = {x ∈ Rn : x ≥ 0} is the nonnegative orthant. We have
that gi (x) = −xi , so that ∇gi = −ei . The KKT conditions say that −∇f (x∗ ) is
a nonnegative combination of −ei for i so that xi = 0. In other words, ∇f (x∗ )
is a nonnegative combination of ei for i so that xi = 0. This means that

∂f (x∗ )/∂xi = 0 for all i ≤ n with x∗i > 0, and


∂f (x∗ )/∂xi ≥ 0 for all i ≤ n with x∗i = 0.
It we interpret this for n = 3 we get the following cases:

• No active constraints: This means that x, y, z > 0. The KKT-conditions


say that all partial derivatives are 0, so that ∇f (x∗ ) = 0. This is reasonable,
since these points are internal points.
• One active constraint, such as x = 0, y, z > 0 The KKT-conditions say
that ∂f (x∗ )/∂y = ∂f (x∗ )/∂z = 0, so that ∇f (x∗ ) points in the positive
direction of e1 , as shown in Figure 15.3(a).
• Two active constraints, such x = y = 0, z > 0. The KKT-conditions
say that ∂f (x∗ )/∂z = 0, so that ∇f (x∗ ) lies in the cone spanned by
e1 , e2 , i.e. ∇f (x∗ ) lies in the first quadrant of the xy-plane, as shown in
Figure 15.3(b).
• Three active constraints: This means that x = y = z = 0. The KKT
conditions say that ∇f (x∗ ) is in the cone spanned by e1 , e2 , e3 , as shown
in Figure 15.3(c).

In all cases ∇f (x∗ ) points into a cone spanned by gradients corresponding


to the active inequalities (in general, by a cone we mean the set of all linear
combinations of a set of vectors, with positive coefficients). Note that for the
third case above, we are used to finding minimum values from before: if we
restrict f to values where x = y = 0, we have a one-dimensional problem where
we want to minimize g(z) = f (x, y, z), which is equivalent to finding z so that
g 0 (z) = ∂f (x∗ )/∂z = 0, as stated by the KKT-conditions.
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 443

1 1 1
z

z
0.5 0.5 0.5

0 0 0
1 1 1
1 1 1
0.5 0.5 0.5 0.5 0.5 0.5
y 0 0 x y 0 0 x y 0 0 x

Figure 15.3: The different possibilities (one, two, and three active constraints)
for ∇f in a minimum of f , under the constraints x ≥ 0.

Example 15.4: Quadratic optimization problem with linear


equality constraints
Consider the problem

minimize (1/2) xT Dx − q T x
subject to
Ax = b
where D is positive semidefinite and A ∈ Rm×n , b ∈ Rm . This is a special
case of (15.16) where f (x) = (1/2) xT Dx − q T x. Then ∇f (x) = Dx − q (see
Exercise 11.9 in Chapter 11). Thus, the KKT conditions are: there is some
λ ∈ Rm such that Dx − q + AT λ = 0. In addition, the vector x is feasible so
we have Ax = b. Thus, solving the quadratic optimization problem amounts to
solving the linear system of equations

Dx + AT λ = q, Ax = b
which may be written as

AT
    
D x q
= . (15.15)
A 0 λ b
Under the additional assumption that D is positive definite and A has full row
rank, one can show that the coefficient matrix in (15.15) is invertible so this
system has a unique solution x, λ. Thus, for this problem, we may write down an
explicit solution (in terms of the inverse of the block matrix). Numerically, one
finds x (and the Lagrangian multiplier λ) by solving the linear system (15.15)
by e.g. Gaussian elimination or some faster (direct or iterative) method.

Example 15.5: Extension


Consider an extension of the previous example by allowing linear inequality
constraints as well:
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 444

minimize (1/2) xT Dx − q T x
subject to
Ax = b
x≥0
Here D, A and b are as above. Then ∇f (x) = Dx − q and ∇gk (x) = −ek .
Thus, the KKT conditions for this problem are: there are λ ∈ Rm and µ ∈ Rn
such that Dx − q + AT λ − µ = 0, µ ≥ 0 and µk = 0 if xk > 0 (k ≤ n). We
eliminate µ from the first equation and obtain the equivalent condition: there is
a λ ∈ Rm such that Dx + AT λ ≥ q and (Dx + AT λ − q)k · xk = 0 (k ≤ n). In
addition, we have Ax = b, x ≥ 0. This problem may be solved numerically, for
instance, by a so-called active set method, see [27].

Example 15.6: Linear optimization


Linear optimization is a problem of the form

minimize cT x subject to Ax = b and x ≥ 0


This is a special case of the convex programming problem (15.16) where gj (x) =
−xj (j ≤ n). Here ∇f (x) = c and ∇gk (x) = −ek . Let x be a feasible solution.
The KKT conditions state that there are vectors λ ∈ Rm and µ ∈ Rn such that
c + AT λ − µ = 0, µ ≥ 0 and µk = 0 if xk > 0 (k ≤ n). Here we eliminate µ and
obtain the equivalent set of KKT conditions: there is a vector λ ∈ Rm such that
c + AT λ ≥ 0, (c + AT λ)k · xk = 0 (k ≤ n). These conditions are the familiar
optimality conditions in linear optimization theory. The vector λ is feasible in
the so-called dual problem and complementary slack holds. We do not go into
details on this here, but refer to the course INF-MAT3370 Linear optimization
where these matters are treated in detail.

15.3 Convex optimization


A convex optimization problem is to minimize a convex function f over a convex
set C in Rn . These problems are especially attractive, both from a theoretic and
algorithmic perspective.
First, let us consider some general results.
Theorem 15.8. Optimizing convex functions.
Let f : C → R be a convex function defined on a convex set C ⊆ Rn .
1. Then every local minimum of f over C is also a global minimum.
2. If f is continuous and C is closed, then the set of local (and therefore
global) minimum points of f over C is a closed convex set.
3. Assume, furthermore, that f : C → R is differentiable and C is open.
Let x∗ ∈ C. Then x∗ ∈ C is a local (global) minimum if and only if
∇f (x∗ ) = 0.
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 445

Proof. 1.) The proof of property 1 is exactly as the proof of the first part of
Theorem 14.4, except that we work with local and global minimum of f over C.
2.) Assume the set C ∗ of minimum points is nonempty and let α =
minx∈C f (x). Then C ∗ = {x ∈ C : f (x) ≤ α} is a convex set, see Propo-
sition 12.5. Moreover, this set is closed as f is continuous.
3.) This follows directly from Theorem 12.10.
Next, we consider a quite general convex optimization problem which is of
the form (15.9):

minimize f (x)
subject to
(15.16)
Ax = b
gj (x) ≤ 0 (j ≤ r)
where all the functions f and gj are differentiable convex functions, and A ∈
Rm×n and b ∈ Rm . Let C denote the feasible set of problem (15.16). Then C is a
convex set, see Proposition 12.5. A special case of (15.16) is linear optimization.
An important concept in convex optimization is duality. To briefly explain
this introduce again the Lagrangian function L : Rn × Rm × Rr+ → R given by

L(x, λ, ν) = f (x) + λT (Ax − b) + ν T G(x) (x ∈ Rn , λ ∈ Rm , ν ∈ Rr+ )


Remark: we use the variable name ν here in stead of the µ used before because
of another parameter µ to be used soon. Note that we require ν ≥ 0.
Define the new function g : Rm × Rr+ → R̄ by

g(λ, ν) = inf L(x, λ, ν)


x

Note that this infimum may sometimes be equal to −∞ (meaning that the
function x → L(x, λ, ν) is unbounded below). The function g is the pointwise
infimum of a family of affine functions in (λ, µ), one function for each x, and this
implies that g is a concave function. We are interested in g due to the following
fact, which is easy to prove. It is usually referred to as weak duality.
Lemma 15.9. Weak duality.
Let x be feasible in problem (15.16) and let λ ∈ Rm , ν ∈ Rr where ν ≥ 0.
Then

g(λ, ν) ≤ f (x).
Proof. For λ ∈ Rm , ν ∈ Rr with ν ≥ 0 and x feasible in problem (15.16) we
have

g(λ, ν) ≤ L(x, λ, ν)
= f (x) + λT (Ax − b) + ν T G(x)
≤ f (x)
as Ax = b, ν ≥ 0 and G(x) ≤ 0.
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 446

Thus, g(λ, ν) provides a lower bound on the optimal value in (15.16). It is


natural to look for a best possible such lower bound and this is precisely the
so-called dual problem, which is

maximize g(λ, ν)
subject to (15.17)
ν ≥ 0.
Actually, in this dual problem, we may further restrict the attention to those
(λ, ν) for which g(λ, ν) is finite. g(λ, ν) is also called the dual objective function.
The original problem (15.16) will be called the primal problem. It follows
from Lemma 15.9 that

g∗ ≤ f ∗
where f ∗ denotes the optimal value in the primal problem and g ∗ the optimal
value in the dual problem. If g ∗ < f ∗ , we say that there is a duality gap. Note
that the derivation above, and weak duality, holds for arbitrary functions f and
gj (j ≤ r). The concavity of g also holds generally.
The dual problem is useful when the dual objective function g may be
computed efficiently, either analytically or numerically. Duality provides a
powerful method for proving that a solution is optimal or, possibly, near-optimal.
If we have a feasible x in (15.16) and we have found a dual solution (λ, ν) with
ν ≥ 0 such that

f (x) = g(λ, ν) + 
for some  (which then has to be nonnegative), then we can conclude that x is
“nearly optimal”, it is not possible to improve f by more than . Such a point x
is sometimes called -optimal, where the case  = 0 means optimal.
So, how good is this duality approach? For convex problems it is often perfect
as the next theorem says. We omit most of the proof, see [19, 2, 49]). For
non-convex problems one should expect a duality gap. Recall that G0 (x) denotes
the Jacobi matrix of G = (g1 , g2 , . . . , gr ) at x.
Theorem 15.10. Convex optimization.
Consider convex optimization problem (15.16) and assume this problem has
a feasible point satisfying

gj (x0 ) < 0 (j ≤ r).


Then f ∗ = g ∗ , so there is no duality gap. Moreover, x is a (local and global )
minimum in (15.16) if and only if there are λ ∈ Rm and ν ∈ Rr with ν ≥ 0 and

∇f (x) + AT λ + G0 (x)T ν = 0
and

νj gj (x) = 0 (j ≤ r).
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 447

Proof. We only prove the second part (see the references above). So assume that
f ∗ = g ∗ and the infimum and supremum are attained in the primal and dual
problems, respectively. Let x be a feasible point in the primal problem. Then x
is a minimum in the primal problem if and only if there are λ ∈ Rm and ν ∈ Rr
such that all the inequalities in the proof of Lemma 15.9 hold with equality.
This means that g(λ, ν) = L(x, λ, ν) and ν T G(x) = 0. But L(x, λ, ν) is convex
in x so it is minimized by x if and only if its gradient is the zero vector, i.e.,
∇f (x) + λT A + G0 (x)T ν = 0. This leads to the desired characterization.
The assumption stated in the theorem, that gj (x0 ) < 0 for each j, is called
the weak Slater condition.

Example 15.7: Comparing the primal and the dual problem


Consider the convex optimization problem where we want to minimize the
function f (x) = x2 +1 subject to the inequality constraint g(x) = (x−3)2 −1 ≤ 0.
From Figure 15.4(a) it is quite clear that the minimum is attained for x = 2,
and is f (2) = 5. Since both the constraint and the objective function are convex,
and since here the weak Slater condition holds, Theorem 15.10 guarantees that
the dual problem has the same solution as the primal problem. Let us verify this
by considering the dual problem as well. The Lagrangian function is given by

L(x, ν) = f (x) + νg(x) = x2 + 1 + ν((x − 3)2 − 1).



It is easy to see that this function attains its minimum for x = 1+ν . This means
that the dual objective function is given by

   2  2 !
3ν 3ν 3ν
g(ν) = L ,ν = +1+ν −3 −1 .
1+ν 1+ν 1+ν

This is shown in Figure 15.4(b).


It is quite clear from this figure that the maximum is 5, which we already
found by solving the primal problem. To prove this requires some more work, by
setting the derivative of the dual objective function to zero. Therefore, the primal
and the dual problem are two very different problems, where we in practice
choose the one which is simplest to solve.

15.3.1 A useful theorem on convex optimization


Finally, we mention a theorem on convex optimization which is used in several
applications.
Theorem 15.11. Characterization of a convex function.
Let f : C → R be a convex function defined on a convex set C ⊆ Rn , and
x ∈ C. Then x∗ is a (local and therefore global) minimum of f over C if and

only if
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 448

8
Objective function
15 Inequality constraint
6
10
4

5
2

0
0
0 1 2 3 4 0 1 2 3 4

Figure 15.4: The objective function and the dual objective function of Exam-
ple 15.7.

∇f (x∗ )T (x − x∗ ) ≥ 0 for all x ∈ C. (15.18)


Proof. Assume first that ∇f (x∗ )T (x − x∗ ) < 0 for some x ∈ C. Consider the
function g() = f (x∗ + (x − x∗ )) and apply the first order Taylor theorem to
this function. Thus, for every  > 0 there exists an t ∈ [0, 1] with

f (x∗ + (x − x∗ )) = f (x∗ ) + ∇f (x∗ + t(x − x∗ ))T (x − x∗ ).


Since ∇f (x∗ )T (x−x∗ ) < 0 and the gradient function is continuous (our standard
assumption!) we have for sufficiently small  > 0 that ∇f (x∗ + t(x − x∗ ))T (x −
x∗ ) < 0. This implies that f (x∗ + (x − x∗ )) < f (x∗ ). But, as C is convex, the
point x∗ + (x − x∗ ) = x + (1 − )x∗ also lies in C and so we conclude that
x∗ is not a local minimum. This proves that (15.18) is necessary for x∗ to be a
local minimum of f over C.
Next, assume that (15.18) holds. Using Theorem 12.10 we then get

f (x) ≥ f (x∗ ) + ∇f (x∗ )T (x − x∗ ) ≥ f (x∗ ) for every x ∈ C



so x is a (global) minimum.

Exercise 15.8: Find min


In the plane consider a rectangle R with sides of length x and y and with
perimeter equal to α (so 2x + 2y = α). Determine x and y so that the area of R
is largest possible.

Exercise 15.9: Find min


Consider the optimization problem
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 449

minimize f (x1 , x2 ) subject to (x, x2 ) ∈ C


where C = {(x1 , x2 ) ∈ R2 : x1 , x2 ≥ 0, 4x1 + x2 ≥ 8, 2x1 + 3x3 ≤ 12}. Draw the
feasible set C in the plane. Find the set of optimal solutions in each of the cases
given below.
a) f (x1 , x2 ) = 1.
b) f (x1 , x2 ) = x1 .
c) f (x1 , x2 ) = 3x1 + x2 .
d) f (x1 , x2 ) = (x1 − 1)2 + (x2 − 1)2 .
e) f (x1 , x2 ) = (x1 − 10)2 + (x2 − 8)2 .

Exercise 15.10: Find min


Solve
n
X
max{x1 x2 · · · xn : xj = 1, xj ≥ 0}.
j=1

Exercise 15.11: Find min


Let S = {x ∈ R2 : kxk = 1} be the unit circle in the plane. Let a ∈ R2 be a
given point. Formulate the problem of finding a nearest point in S to a as a
nonlinear optimization problem. How can you solve this problem directly using
a geometrical argument?

Exercise 15.12: Find min


Let S be the unit circle fromPthe previous exercise. Let a1 , a2 be two given points
2
in the plane. Let f (x) = i=1 kx − ai k2 . Formulate this as an optimization
problem and find its Lagrangian function L. Find the stationary points of L,
and use this to solve the optimization problem.

Exercise 15.13: Find min


Solve

minimize x1 + x2 subject to x21 + x22 = 1.


using the Lagrangian, see Theorem 15.1. Next, solve the problem by eliminating
x2 (using the constraint).
CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 450

Exercise 15.14: Find min


Let g(x1 , x2 ) = 3x21 + 10x1 x2 + 3x22 − 2. Solve

min{k(x1 , x2 )k : g(x1 , x2 ) = 0}.

Exercise 15.15: Find min


Same question as in previous exercise, but with g(x1 , x2 ) = 5x21 − 4x1 x2 + 4x22 − 6.

Exercise 15.16: Find min


Let f be a two times differentiable function f : Rn → R. Consider the optimiza-
tion problem

minimize f (x) subject to x1 + x2 + · · · + xn = 1.


Characterize the stationary points (find the equation they satisfy).

Exercise 15.17: Find min


Consider the previous exercise. Explain how to convert this into an unconstrained
problem by eliminating xn .

Exercise 15.18: Find min


Let A be a real symmetric n × n matrix. Consider the optimization problem
 
1 T
max x Ax : kxk = 1
2
Rewrite the constraint as kxk − 1 = 0 and show that an optimal solution of this
problem must be an eigenvector of A. What can you say about the Lagrangian
multiplier?

Exercise 15.19: Find min


Solve

min{(1/2)(x21 + x22 + x23 ) : x1 + x2 + x3 ≤ −6}.

Exercise 15.20: Find min


Solve

min{(x1 − 3)2 + (x2 − 5)2 + x1 x2 : 0 ≤ x1 , x2 ≤ 1}.


CHAPTER 15. CONSTRAINED OPTIMIZATION - THEORY 451

Exercise 15.21: Find min


Solve

min{x1 + x2 : x21 + x22 ≤ 2}.

Exercise 15.22: Find min


Write down the KKT conditions for the portfolio optimization problem of
Section 11.2.

Exercise 15.23: Find min


Write down the KKT conditions for the optimization problem
n
X
min{f (x1 , x2 , . . . , xn ) : xj ≥ 0 (j ≤ n), xj ≤ 1}
j=1

where f : Rn → R is a differentiable function.

Exercise 15.24: Find min


Consider the following optimization problem

 2
3
min{ x1 − + x22 : x1 + x2 ≤ 1, x1 − x2 ≤ 1, −x1 + x2 ≤ 1, −x1 − x2 ≤ 1}.
2

a) Draw the region which we minimize over, and find the minimum of f (x) =
2
x1 − 32 + x22 by a direct geometric argument.
b) Write down the KKT conditions for this problem. From a., decide which
two conditions g1 and g2 are active at the minimum, and verify that you can
find µ1 ≥ 0, µ2 ≥ 0 so that ∇f + µ1 ∇g1 + µ2 ∇g2 = 0 (as the KKT conditions
guarantee in a minimum) (it is not the meaning here that you should go through
all possibilities for active inequalities, only those you see must be fulfilled from
a.).

Exercise 15.25: Find min


Consider the following optimization problem

min{−x1 x2 : x21 + x22 ≤ 1}


Write down the KKT conditions for this problem, and find the minimum.
Chapter 16

Constrained optimization -
methods

In this final chapter we present numerical methods for solving nonlinear opti-
mization problems. This is a huge area, so we can here only give a small taste of
it! The algorithms we present are known good methods.

16.1 Equality constraints


We here consider the nonlinear optimization problem with linear equality con-
straints

minimize f (x)
subject to (16.1)
Ax = b
Newton’s method may be applied to this problem. The method is very similar
to the unconstrained case, but with two modifications. First, the initial point x0
must be chosen so that it is feasible, i.e., Ax0 = b. Next, the search direction d
must be such that the new iterate is feasible as well. This means that Ad = 0,
so the search direction lies in the nullspace of A.
The second order Taylor approximation of f at an iterate xk is

Tf1 (xk ; xk + h) = f (xk ) + ∇f (xk )T h + (1/2)hT ∇2 f (xk )h


and we want to minimize this under the constaint Axk+1 = A(xk +h) = Axk = b,
i.e.

Ah = 0 (16.2)
Since the gradient of Tf1 w.r.t. h is ∇f (xk ) + ∇2 f (xk )h, setting the gradient of
the Lagrangian w.r.t. h equal to zero gives

452
CHAPTER 16. CONSTRAINED OPTIMIZATION - METHODS 453

∇f (xk ) + ∇2 f (xk )h + AT λ = 0, (16.3)


where λ is the Lagrange multiplier. Equations (16.2)-(16.3) together give

∇ f (xk ) AT
 2    
h −∇f (xk )
= .
A 0 λ 0
The Newton step is only defined when the coefficient matrix in the KKT problem
is invertible. In that case, the problem has a unique solution (h, λ) and we define
dN t = h and call this the Newton step. Newton’s method for solving Equation
(16.1) can now be extended from the previous code.
TODO: Explain why the stop criterion η 2 /2 <  with η := dTN t ∇2 f (x)dN t is
used
This leads to an algorithm for Newtons’s method for linear equality con-
strained optimization which is very similar to the function newtonbacktrack
from Exercise 14.12. We do not state a formal convergence theorem for this
method, but it behaves very much like Newton’s method for unconstrained opti-
mization. Actually, it can be seen that the method just described corresponds to
eliminating variables based on the equations Ax = b and using the unconstrained
Newton method for the resulting (smaller) problem. So as soon as the solution
is “sufficiently near” an optimal solution, the convergence rate is quadratic, so
extremely few iterations are needed in this final stage.

16.2 Inequality constraints


We here briefly discuss an algorithm for inequality constrained nonlinear opti-
mization problems. The presentation is mainly based on [3, 32]. We restrict the
attention to convex optimization problems, but many of the ideas are used for
nonconvex problems as well.
The method we present is an interior-point method, more precisely, an
interior-point barrier method. This is an iterative method which produces a
sequence of points lying in the relative interior of the feasible set. The barrier
idea is to approximate the problem by a simpler one in which constraints are
replaced by a penalty term. The purpose of this penalty term is to give large
objective function values to points near the (relative) boundary of the feasible
set, which effectively becomes a barrier against leaving the feasible set.
Consider again the convex optimization problem

minimize f (x)
subject to
(16.4)
Ax = b
gj (x) ≤ 0 (j ≤ r)
where A is an m × n matrix and b ∈ R . The feasible set here is F = {x ∈ Rn :
m

Ax = b, gj (x) ≤ 0 (j ≤ r)}. We assume that the weak Slater condition holds,


and therefore by Theorem 15.10 the KKT conditions for problem (16.4) are
CHAPTER 16. CONSTRAINED OPTIMIZATION - METHODS 454

Ax = b, gj (x) ≤ 0 (j ≤ r)
ν ≥ 0, ∇f (x) + AT λ + G0 (x)T ν = 0 (16.5)
νj gj (x) = 0 (j ≤ r).
So, x is a minimum in (16.4) if and only if there are λ ∈ Rm and ν ∈ Rr such
that (16.5) holds.
Let us state an algorithm for Newton’s method for linear equality constrained
optimization with inequality constraints. Before we do this there is one final
problem we need to address: The α we get from backtracking line search may be
so that x + αdN t do not satisfty the inequality constraints (in the exercises you
will be asked to verify that this is the case for a certain function). The problem
comes from that the iterates xk + β m sdk from Armijo’s rule do not necessarily
satisfy the inequality constraints. However, we can choose m large enough so
that all succeeding iterates satisfy these constraints. We can modify the function
newtonbacktrack from Exercise 14.12 to a function newtonbacktrackg1g2 in
an obvious way so that, in addition to applying Armijos rule, we also choose a
step size so small that the inequality constraints are satisfied:

function [x,numit]=newtonbacktrackg1g2LEC(f,df,d2f,A,b,x0,g1,g2)
epsilon=10^(-3);
x=x0;
maxit=100;
for numit=1:maxit
matr=[d2f(x) A’; A zeros(size(A,1))];
vect=[-df(x); zeros(size(A,1),1)];
solvedvals=matr\vect;
d=solvedvals(1:size(A,2));
eta=d’*d2f(x)*d;
if eta^2/2<epsilon
break;
end
% Armijos rule with two inequalities
beta=0.2; s=0.5; sigma=10^(-3);
m=0;
while (f(x)-f(x+beta^m*s*d) < -sigma *beta^m*s *(df(x))’*d) || (g1(x+beta^m*s*d)>0) || (g2(x
m=m+1;
end
alpha = beta^m*s;
x=x+alpha*d;
end

Here g1 and g2 are function handles which represent the inequality constraints.
The new function works only in the case when there are exactly two inequality
constraints.
The interior-point barrier method is based on an approximation of problem
(16.4) by the barrier problem

minimize f (x) + µφ(x)


subject to (16.6)
Ax = b
where
CHAPTER 16. CONSTRAINED OPTIMIZATION - METHODS 455

r
X
φ(x) = − ln(−gj (x))
j=1

and µ > 0 is a parameter (in R). The function φ is called the (logarithmic)
barrier function and its domain is the relative interior of the feasible set

F ◦ = {x ∈ Rn : Ax = b, gj (x) < 0 (j ≤ r)}.


The same set F ◦ is the feasible set of the barrier problem. The key properties of
the barrier function are:
1. φ is twice differentiable and

r
X 1
∇φ(x) = ∇gj (x) (16.7)
j=1
(−gj (x))
r r
X 1 X 1
2
∇ φ(x) = 2 ∇gj (x)∇gj (x)T + ∇2 gj (x) (16.8)
j=1
gj (x) j=1
(−gj (x))

2. φ is convex. For this it is enough to show that ∇2 φ is positive semidefinite


at all points, which can be shown from Equation (16.8) as follows:

r
!
T 2
X 1 T T 1 T 2
h ∇ φ(x)h = h ∇gj (x)∇gj (x) h + h ∇ gj (x)h
j=1
gj2 (x) (−gj (x))
r
!
X 1 T 2 1 T 2
= k∇gj (x) hk + h ∇ gj (x)h ≥ 0
j=1
gj2 (x) (−gj (x))

since (−gj1(x)) > 0 and hT ∇2 gj (x)h ≥ 0 (since all gj are convex, ∇2 gj (x) is
positive semidefinite).
3. If {xk } is a sequence in F ◦ such that gj (xk ) → 0 for some j ≤ r, then
φ(xk ) → ∞. This is the barrier property.
The idea here is that for points x near the boundary of F the value of φ(x)
is very large. So, an iterative method which moves around in the interior F ◦ of
F will typically avoid points near the boundary as the logarithmic penalty term
makes the function value f (x) + µφ(x) very large.
The interior point method consists in solving the barrier problem, using
Newton’s method, for a sequence {µk } of (positive) barrier parameters; these
are called the outer iterations. The solution xk found for µ = µk is used as the
starting point in Newton’s method in the next outer iteration where µ = µk+1 .
The sequence {µk } is chosen such that µk → 0. When µ is very small, the barrier
function approximates the "ideal" penalty function η(x) which is zero in F and
−∞ when one of the inequalities gj (x) ≤ 0 is violated.
A natural question is why one bothers to solve the barrier problems for more
than one single µ, typically a very small value. The reason is that it would be
CHAPTER 16. CONSTRAINED OPTIMIZATION - METHODS 456

hard to find a good starting point for Newton’s method in that case; the Hessian
matrix of µφ is typically ill-conditioned for small µ.
Assume now that the barrier problem has a unique optimal solution x(µ);
this is true under reasonable assumptions that we shall return to. The point x(µ)
is called a central point. Assume also that Newton’s method may be applied to
solve the barrier problem. The set of points x(µ) for µ > 0 is called the central
path; it is a path (or curve) as we know it from multivariate calculus. In order
to investigate the central path we prefer to work with the equivalent problem 1 .
to (16.6) obtained by multiplying the objection function by 1/µ, so

minimize (1/µ)f (x) + φ(x)


subject to (16.9)
Ax = b.
A central point x(µ) is characterized by

Ax(µ) = b
gj (x(µ)) < 0 (j ≤ r)
and the existence of λ ∈ Rm (the Lagrange multiplier vector) such that

(1/µ)∇f (x(µ)) + ∇φ(x(µ)) + AT λ = 0

i.e.,

r
X 1
(1/µ)∇f (x(µ)) + ∇gj (x(µ)) + AT λ = 0. (16.10)
j=1
(−gj (x(µ)))

A fundamental question is: how far from being optimal is the central point
x(µ)? We now show that duality provides a very elegant way of answering this
question.

Theorem 16.1. Distance from minimum.


For each µ > 0 the central point x(µ) satisfies

f ∗ ≤ f (x(µ)) ≤ f ∗ + rµ.
Proof. Define ν(µ) = (ν1 (µ), . . . , νr (µ)) ∈ Rr and λ(µ) ∈ Rm as Lagrange
parameters for the original problem by

νj (µ) = −µ/gj (x(µ)), (j ≤ r);


(16.11)
λ(µ) = µλ.
where λ and x(µ) satisfy Equation (16.10),i.e. they are Lagrange parameters
for the barrier problem. We need to return to the dual problem (of the original
problem), defined in Section 15.3. We first claim that the pair (λ(µ), ν(µ))
1 Equivalent here means the same minimum points.
CHAPTER 16. CONSTRAINED OPTIMIZATION - METHODS 457

is feasible in the dual problem to (16.4). We thus need to show that ν(µ)
is nonnegative. This is immediate: since gj (x(µ)) < 0 and µ > 0, we get
νj (µ) = −µ/gj (x(µ)) > 0 for each j. We now also want to show that x(µ)
satisfies

g(λ(µ), ν(µ)) = inf L(x, λ(µ), ν(µ)) = L(x(µ), λ(µ), ν(µ)),


x

where g is the dual objective function. To see this, note first that the Lagrangian
function L(x, λ, ν) = f (x) + λT (Ax − b) + ν T G(x) is convex in x for given λ
and µ ≥ 0. Thus, x minimizes this function if and only if ∇x L = 0. Now,

∇x L(x(µ), λ(µ), ν(µ))


r
X
T
= ∇f (x(µ)) + A λ(µ) + νj (µ)∇gj (x(µ))
j=1
r
X 1
= ∇f (x(µ)) + µAT λ + µ ∇gj (x(µ))
j=1
(−gj (x(µ)))
 
r
1 X 1
= µ  ∇f (x(µ)) + AT λ + ∇gj (x(µ)) = 0,
µ j=1
(−gj (x(µ)))

by (16.10) and the definition of the dual variables (16.11). This shows that
g(λ(µ), ν(µ)) = L(x(µ), λ(µ), ν(µ)).
By weak duality and Lemma 15.9, we now obtain

f ∗ ≥ g(λ(µ), ν(µ))
= L(x(µ), λ(µ), ν(µ))
r
X
= f (x(µ)) + λ(µ)T (Ax(µ) − b) + νj (µ)gj (x(µ))
j=1

= f (x(µ)) − rµ
which proves the result.

This theorem is very useful and shows why letting µ → 0 (more accurately
µ → 0+ ) is a good idea.
Corollary 16.2. Convergence of the central path.
The central path has the following property

lim f (x(µ)) = f ∗ .
µ→0

In particular, if f is continuous and limµ→0 x(µ) = x∗ for some x∗ , then x∗ is a


global minimum in (16.4).
CHAPTER 16. CONSTRAINED OPTIMIZATION - METHODS 458

Proof. This follows from Theorem 16.1 by letting µ → 0. The second part follows
from

f (x∗ ) = f ( lim x(µ)) = lim f (x(µ)) = f ∗


µ→0 µ→0

by the first part and the continuity of f ; moreover x∗ must be a feasible point
by elementary topology.

After these considerations we may now present the interior-point barrier


method. The following code uses a tolerance  > 0 in its stopping criterion, and
assumes 2 inequality constraints:

function xopt=IPBopt(f,g1,g2,df,dg1,dg2,d2f,d2g1,d2g2,A,b,x0)
xopt=x0;
mu=1;
alpha=0.1;
r=2;
epsilon=10^(-3);
numitouter=0;
while (r*mu>epsilon)
[xopt,numit]=newtonbacktrackg1g2LEC(...
@(x)(f(x)-mu*log(-g1(x))-mu*log(-g2(x))),...
@(x)(df(x) - mu*dg1(x)/g1(x) - mu*dg2(x)/g2(x)),...
@(x)(d2f(x) + mu*dg1(x)*dg1(x)’/(g1(x)^2) ...
+ mu*dg2(x)*dg2(x)’/(g2(x)^2) - mu*d2g1(x)/g1(x)...
- mu*d2g2(x)/g2(x) ),A,b,xopt,g1,g2);
mu=alpha*mu;
numitouter=numitouter+1;
fprintf(’Iteration %i:’,numitouter);
fprintf(’(%f,%f)\n’,xopt,f(xopt));
end

Note that we here have inserted the expressions from Equation (16.7) and Equa-
tion (16.8) for the gradient and the Hesse matrix of the barrier function. The input
are f , g1 , g2 , their gradients and their Hesse matrices, the matrix A, the vector
b, and an initial feasible point x0 . The function calls newtonbacktrackg1g2LEC,
and returns the optimal solution x∗ . It also gives some information on the values
of f during the iterations. The iterations used in Newton’s method is called the
inner iterations. There are different implementation details here that we do not
discuss very much. A typical value on α is 0.1. The choice of the initial µ0 can
be difficult, if it is chosen too large, one may experience many outer iterations.
Another issue is how accurately one solves (16.6). It may be sufficient to find
a near-optimal solution here as this saves inner iterations. For this reason the
method is also called a path-following method; it follows in the neighborhood of
the central path.
Finally, it should be mentioned that there exists a variant of the interior-point
barrier method which permits an infeasible starting point. For more details on
this and various implementation issues one may consult [3] or [32].
CHAPTER 16. CONSTRAINED OPTIMIZATION - METHODS 459

Example 16.1: Numeric test of the internal-point barrier


method
Consider the function f (x) = x2 + 1, 2 ≤ x ≤ 4. Minimizing f can be considered
as the problem of finding a minimum subject to the constraints g1 (x) = 2−x ≤ 0,
and g2 (x) = x − 4 ≤ 0. The barrier problem is to minimize the function

f (x) + µφ(x) = x2 + 1 − µ ln(x − 2) − µ ln(4 − x).


Some of these are drawn in Figure 16.1, where we clearly can see the effect of
decreasing µ in the barrier function: The function converges to f pointwise as
µ → 0+ , except at the boundaries x = 2, x = 4.

20 20

15 15

10 10

5 5
2 2.5 3 3.5 4 2 2.5 3 3.5 4
20 20

15 15

10 10

5 5
2 2.5 3 3.5 4 2 2.5 3 3.5 4

Figure 16.1: The function from Example 16.1 and its barrier functions with
µ = 0.2, µ = 0.5, and µ = 1.

It is easy to see that x = 2 is the minimum of f under the given constraints,


and that f (2) = 5 is the minimum value. There are no equality constrains in
this case, so that we can use the barrier method with Newton’s method for
unconstrained optimization, as this was implemented in Exercise 14.12. We need,
however, to make sure also here that the iterates from Armijo’s rule satisfy the
inequality constraints. In fact, in the exercises you will be asked to verify that,
CHAPTER 16. CONSTRAINED OPTIMIZATION - METHODS 460

for the function f considered here, some of the iterates from Armijo’s rule do
not satisfy the constraints.
It is straightforward to implement a function newtonbacktrackg1g2 which
implements Newtons method for two inequality constraints and no equality con-
straints , similarly to how we implemented the function newtonbacktrackg1g2LEC.
This leads to the following algorithm for the internal point barrier method for
the case of no equality constraints, but 2 inequality constraints:

function xopt=IPBopt2(f,g1,g2,df,dg1,dg2,d2f,d2g1,d2g2,x0)
xopt=x0;
mu=1; alpha=0.1; r=2; epsilon=10^(-3);
numitouter=0;
while (r*mu>epsilon)
[xopt,numit]=newtonbacktrackg1g2(...
@(x)(f(x)-mu*log(-g1(x))-mu*log(-g2(x))),...
@(x)(df(x) - mu*dg1(x)/g1(x) - mu*dg2(x)/g2(x)),...
@(x)(d2f(x) + mu*dg1(x)*dg1(x)’/(g1(x)^2) ...
+ mu*dg2(x)*dg2(x)’/(g2(x)^2) ...
- mu*d2g1(x)/g1(x) - mu*d2g2(x)/g2(x) ),xopt,g1,g2);
mu=alpha*mu;
numitouter=numitouter+1;
fprintf(’Iteration %i:’,numitouter);
fprintf(’(%f,%f)\n’,xopt,f(xopt));
end

Note that this function also prints a summary for each of the outer iterations,
so that we can see the progress in the barrier method. We can now find the
minimum of f with the following code, where we have substituted functions for
f , gi , their gradients and Hessians.

IPBopt2(@(x)(x.^2+1),@(x)(2-x),@(x)(x-4),...
@(x)(2*x),@(x)(-1),@(x)(1),...
@(x)(2),@(x)(0),@(x)(0),3)

Running this code gives a good approximation to the minimum x = 2 after 4


outer iterations.

Example 16.2: Analytic test of the internal-point barrier


method
Let us consider the problem of finding the minimum of x21 + x22 subject to the
constraint x1 + x2 ≥ 2. We set f (x1 , x2 ) = x21 + x22 , and write the constraint as
g1 (x1 , x2 ) = 2 − x1 − x2 ≤ 0. Here it is not difficult to state the KKT conditions
and solve these, so let us do this first. The gradients are ∇f = (2x1 , 2x2 ),
∇g1 = (−1, −1), so that the KKT conditions take the form

(2x1 , 2x2 ) + ν1 (−1, −1) = 0

for a ν1 ≥ 0, where the last term is included only if x1 + x2 = 2 (i.e. when the
constraint is active). If the constraint is not active we see that x1 = x2 = 0,
which does not satisfy the inequality constraint. If the constraint is active we
CHAPTER 16. CONSTRAINED OPTIMIZATION - METHODS 461

see that x1 = x2 = ν1 /2, so that x1 = x2 = 1 and ν1 = 2 ≥ 0 in order for


x1 + x2 = 2. The minimum value is thus f (1, 1) = 2. It is clear that this must be
a minimum: Since f is bounded below and approaches ∞ when either x1 or x2
grows large, it must have a mimimum (f has no global maximum). For this one
can also argue that the Hessian of the Lagrangian for the constrained problem
becomes positive definit. All points are regular for this problem since ∇g1 6= 0.
Let us also see if we can come to this same solution by solving the barrier
problem. The barrier function is φ(x1 , x2 ) = − ln(x1 + x2 − 2), which has
gradient ∇φ = (−1/(x1 + x2 − 2), −1/(x1 + x2 − 2)). We set the gradient of
f (x1 , x2 ) + µφ(x1 , x2 ) to 0 and get

(2x1 , 2x2 ) + µ(−1/(x1 + x2 − 2), −1/(x1 + x2 − 2)) = 0.


µ
From this we see that x1 = x2 must fulfill 2x1 = so that 4x1 (x1 − 1) = µ,
2x1 −2 , √
so that 4x1 −4x1 −µ = 0. If we solve this problem we find that x1 = 4± 16+16µ
2
8 =

1± 1+µ
2 . If we choose the negative sign here we find that x1 < 0, which does
not lie inside the domain of definition for the function we optimize (i.e. points

where x1 + x2 > 2). If we choose the positive sign we find x1 = x2 = 1+ 21+µ .
It is clear that, when µ → 0, this will converge to x1 = x2 = 1, which equals the
solution we found when we solved the KKT conditions.

Exercise 16.3: Solve


Consider problem (16.1) in Section 16.1. Verify that the KKT conditions for
this problem are as stated there.

Exercise 16.4: Solve


Define the function f (x, y) = x + y. We will attempt to minimize f under the
constraints y − x = 1, and x, y ≥ 0
a) Find A, b, and functions g1 , g2 so that the problem takes the same form as
in Equation (16.4).
b) Draw the contours of the barrier function f (x, y) + µφ(x, y) for µ = 0.1, 0.2,
0.5, 1, where φ(x, y) = − ln(−g1 (x, y)) − ln(−g2 (x, y)).
c) Solve the barrier problem analytically using the Lagrange method.
d) It is straightforward to find the minimum of f under the mentioned con-
straints. State a simple argument for finding this minimum.
e) State the KKT conditions for finding the minimum, and solve these.
f) Show that the central path converges to the same solution which you found
in d. and e..
CHAPTER 16. CONSTRAINED OPTIMIZATION - METHODS 462

Exercise 16.5: Solve


Use the function IPBopt to verify the solution you found in Exercise 16.4. Initially
you must compute a feasible starting point x0 .

Exercise 16.6: Solve


State the KKT conditions for finding the minimum for the contstrained problem
of Example 16.1, and solve these. Verify that you get the same solution as in
Example 16.1.

Exercise 16.7: Solve


In the function IPBopt2, replace the call to the function newtonbacktrackg1g2
with a call to the function newtonbacktrack, with the obvious modification to
the parameters. Verify that the code does not return the expected minimum in
this case.

Exercise 16.8: Solve


Consider the function f (x) = (x − 3)2 , with the same constraints 2 ≤ x ≤ 4
as in Example 16.1. Verify in this case that the function IPBopt2 returns
the correct minimum regardless of whether you call newtonbacktrackg1g2 or
newtonbacktrack. This shows that, at least in some cases where the minimum
is an interior point, the iterates from Newtons method satisfy the inequality
constraints as well.

Exercise 16.9: Solve


In this exercise we will find the minimum of the function f (x, y) = 3x + 2y under
the constraints x + y = 1 and x, y ≥ 0.
a) Find a matrix A and a vector b so that the constraint x + y = 1 can be
written on the form Ax = b.
b) State the KKT-conditions for this problem, and find the minimum by solving
these.
c) Write down the barrier function φ(x, y) = − ln(−g1 (x, y)) − ln(−g2 (x, y)) for
this problem, where g1 and g2 represent the two constraints of the problem. Also
compute ∇φ.
d) Solve the barrier problem with parameter µ, and denote the solution by x(µ).
Is it the case that the limit limµ→0 x(µ) equals the solution you found in b.?
Appendix A

Basic Linear Algebra

This book assumes that the student has taken a beginning course in linear algebra
at university level. In this appendix we summarize the most important concepts
one needs to know from linear algebra. Note that what is listed here should not
be considered as a substitute for such a course: It is important for the student
to go through a full course in linear algebra, in order to get good intuition for
these concepts through extensive exercises. Such exercises are omitted here.

A.1 Matrices
An m × n-matrix is simply a set of mn numbers, stored in m rows and n columns.
We write akn for the entry in row k and column n of the matrix A. The zero
matrix, denoted 0 is the matrix with all zeroes. A square matrix (i.e. where
m = n) is said to be diagonal if akn = 0 whenever k 6= n. The identity matrix,
denoted I, or In to make the dimension of the matrix clear, is the diagonal
matrix where the entries on the diagonal are 1, the rest zeroes. If A is a matrix
we will denote the transpose of A by AT . If A is invertible we denote its inverse
by A−1 . We say that a matrix A is orthogonal if AT A = AAT = I. A matrix is
called sparse if most of the entries in the matrix are zero.

A.2 Vector spaces


A set of vectors V is called a vector space if . . . We Psay that the vectors
n−1
{v0 , v1 , . . . , vn−1 } are linearly independent if, whenever i=0 ci vi = 0, we must
have that all ci = 0. We will say that a set of vectors B = {v0 , v1 , . . . , vn−1 }
from V is a basis for V if the vectors are linearly independent, and span V .
Subspaces of RN , and function spaces.

463
APPENDIX A. BASIC LINEAR ALGEBRA 464

A.3 Inner products and orthogonality


Most vector spaces in this book are inner product spaces. A (real) inner product
on a vector space is a binary operation, written as (u, v) → hu, vi, which fulfills
the following properties for any vectors u, v, and w:

• hu, vi = hv, ui
• hu + v, wi = hu, wi + hv, wi
• hcu, vi = chu, vi for any scalar c
• hu, ui ≥ 0, and hu, ui = 0 if and only if u = 0.

u and v are said to be orthogonal if hu, vi = 0. In this book we have seen


two important examples of inner product spaces. First of all the Euclidean inner
product, which is defined by
n−1
X
hu, vi = ui vi (A.1)
i=0

for any u, v in Rn . For functions we have seen examples which are variants of
the following form:
Z
hf, gi = f (t)g(t)dt. (A.2)

Any set of mutually orthogonal elements are also linearly independent. A basis
where all basis vectors are mutually orthogonal is called an orthogonal basis. If
additionally the vectors all have length 1, we say that the basis is orthonormal.
If x is in a vector space with an orthogonal basis B = {vk }n−1
k=0 , we can express
x as

hx, v0 i hx, v1 i hx, vn−1 i


v0 + v1 + · · · + vn−1 . (A.3)
hv0 , v0 i hv1 , v1 i hvn−1 , vn−1 i
In other words, the weights in linear combinations are easily found when the
basis is orthogonal. This is also called the orthogonal decomposition theorem.
By the projection of a vector x onto a subspace U we mean the vector
y = projU x which minimizes the distance ky − xk. If vi is an orthogonal basis
for U , we have that projU x can be written by Equation (A.3).

A.4 Coordinates and change of coordinates


Pn−1
If B = {v0 , v1 , . . . , vn−1 } is a basis for a vector space, and x = i=0 xi vi , we
say that (x0 , x1 , . . . , xn−1 ) is the coordinate vector of x w.r.t. the basis B. We
also write [x]B for this coordinate vector.
APPENDIX A. BASIC LINEAR ALGEBRA 465

If B and C are two different bases for the same vector space, we can write
down the two coordinate vectors [x]B and [x]C . A useful operation is to transform
the coordinates in B to those in C, i.e. apply the transformation which sends [x]B
to [x]C . This is a linear transformation, and we will denote the n × n-matrix
of this linear transformation by PC←B , and call this the change of coordinate
matrix from B to C In other words, the change of coordinate matrix is defined
by requiring that

PC←B [x]B = [x]C . (A.4)


−1
It is straightforward to show that PC←B = (PB←C ) , so that matrix inversion
can be used to compute the change of coordinate matrix the opposite way. It
is also straightforward to show that the columns in the change of coordinate
matrix can be obtained by expressing the old basis in terms of the new basis,
i.e. finding the vectors [PB←C (vi )]C .
If L is a linear transformation between the spaces V and W , and B is a
basis for V , C a basis for W , we can consider the operation which sends the
coordinates of v ∈ V in the basis B to the coordinates of Lv ∈ W in the basis C.
This is represented by a matrix, called the matrix of L relative to the bases B
and C. Similarly to change of coordinate matrices, the columns of the matrix of
L relative to the bases B and C are given by [L(vi )]C .

A.5 Eigenvectors and eigenvalues


If A is a linear transformation from a vector space to itself, a vector v is called
an eigenvector if there exists a scalar λ so that Av = λv. λ is called the
corresponding eigenvalue.
If the matrix A is symmetric, the following hold:

• The eigenvalues of A are real,


• the eigenspaces of A are orthonormal,

• any vector can be decomposed as a sum of eigenvectors from A.

Fo non-symmetric matrices, these results do not hold in general. But for filters,
clearly the second and third property always hold, regardless of whether the
filter is symmetric or not.

A.6 Diagonalization
One can show that, for a symmetric matrix, A = P DP T where D is a digonal
matrix and the eigenvalues of A are the values on the diagonal of D, and P
is a matrix where the columns are the eigenvectors of A, with corresponding
eigenvalue appearing in the same column in D.
Appendix B

Signal processing and linear


algebra: a translation guide

This book should not be considered as a standard signal processing textbook.


There are several reasons for this. First of all, much signal processing literature
is written for people with an engineering background. This book is written for
people with a basic linear algebra background. Secondly, the book does not
give a comprehensive treatment of all basic signal processing concepts. Signal
processing concepts are introduced whenever they are needed to encompass
the mathematical exposition. In order to learn more about the different signal
processing concepts, the reader can consult many excellent textbooks, such
as [37, 1, 34, 43]. The translation guide of this chapter may be of some help
in this respect, when one tries to unify material presented here with material
from these signal processing textbooks. The translation guide handles both
differences in notation between this book and signal processing literature, and
topical differences. Most topical differences are also elaborated further in the
summaries of the different chapters. The book has adopted most of its notation
and concepts from mathematical literature.

B.1 Complex numbers


There are several differences between engineering literature and mathematics.
In mathematics literature, i is used for the imaginary complex number which
satisfies i2 = −1. In engineering literature, the name j is used instead.

B.2 Functions
What in signal processing are refered to as continuous-time signals, are here
refered to as functions. Usually we refer to a function by the letter f , according

466
APPENDIX B. SIGNAL PROCESSING AND LINEAR ALGEBRA: A TRANSLATION GUIDE467

to the mathematical tradition. The variable is mostly time, represented by the


symbol t.
In signal processing, one often uses capital letters to denote a function which
is the Fourier transform of another function, so that the Fourier ransform of
x would be denoted by X. Here we simply denote a periodic function by its
Fourier coefficients yn , and we avoid the CTFT. We use analog filters, however,
which also work in continuous time. Analog filters preserve frequencies, and we
have used ν to denote frequency (variations per second), and not used angular
frequency ω. In signal processing literature it is common to junp between the
two.

B.3 Vectors
Discrete-time signals, as they are used in signal processing, are here mostly
refered to as vectors. To as big extent as possible, we have attempted to keep
vectors finite-dimensional. Vectors are in boldface (i.e. x), but its elements are
not in boldface, and with subscripts (i.e. xn ). Superscripts are also used to
differ between vectors with the same base name (i.e. x(1) , x(2) etc.), so that
this does not interfer with the vector indices. In signal processing literature the
corresponding notation would be x for the signal, and x[n] for its elements, and
signals with equal base names could be named like x1 [n], x2 [n].
We have sometimes denoted the Fourier transform of x by x b, according to
the mathematical tradition. More often we have distuinguished between a vector
and its Discrete Fourier transform by using x for the first, and y for the latter.
This also makes us distuinguish between the input and output to a filter, where
we instead use z for the latter. Much signal processing literature write (capital)
X for the DFT of the vector x.

B.4 Inner products and orthogonality


Throughout the book we have defined inner products for functions (for Fourier
analysis and wavelets), and we have also used the standard inner product of
RN . from this we have deduced the orthogonality of several basis functions used
in signal processing theory. That the functions are orthogonal, as well as the
inner product itself are, however, often not commented on in signal processing
literature. As an unfortunate consequence, one has to explain the expression for
the Fourier series using other means than the orthogonal decomposition formula
and the least squares method. Also, one does not mention that the DFT is a
unitary transformation.

B.5 Matrices and filters


Boldface notation is not used for matrices, according to the mathematical
tradition. In signal processing, it is not common to formulate matrix equations,
APPENDIX B. SIGNAL PROCESSING AND LINEAR ALGEBRA: A TRANSLATION GUIDE468

such as for the DFT and DCT, or matrix factorizations. Instead one typically
writes down each equation, one equation for each row in y = Ax, i.e. not
recognizing matrix/vector multiplication. We have sticked to the name filtering
operations, but made it clear that this is nothing but a linear transformation
with a Toeplitz matrix as its matrix. In particular, we alternately use the terms
filtering and multiplication with a Toeplitz matrix. The characterization of filters
as circulant Toeplitz matrices is usually not done in signal processing literature
(but see [16]). In this text we allow for matrices also to be of infinite dimensions,
expanding on the common use in linear algebra. When infinite dimensions are
assumed, infinite in both directions is assumed. Matrices are scaled if necessary
to make them unitary, in particular the DCT and the DFT. This scaling is
usually not done in signal processing literature.
Representing a filter in terms of a finite matrix and restriction of a filter to a
finite signal. This is usually omitted in signal processing literature.
One of the most important statements in signal processing is that convolution
in time is equivalent to multiplication in frequency. We have presented a
compelling interpretation of this in linear algebra terms. Since the frequency
response simply are eigenvalues of the filter, and convolution simply is matrix
factorization, multiplication in frequency simply means to multiply two diagonal
matrices to obtain the frequency response of the product. Moreover, the Fourier
basis vectors can be interpreted as eigenvectors.

B.6 Convolution
While we have defined the concept of convolution, readers familiar with signal
processing may have noticed that this concept has not been used much. The
reason is that we have wanted to present convolution as a matrix multiplication
(to adapt to mathematical tradition), and that we have used the concept of
filtering often instead. In signal processing literature one defines convolution in
terms of vectors of infinite length. We have avoided this, since in practice vectors
always need to be truncated to finite lengths. Due to this, we also have analyzed
how a finite vector may be turned into a periodic vector (periodic or symmetric
extension), and how this affects our analysis. Also we have concentrated on
FIR-filters, and this makes us avoid convergence issues. Note that we do not
present matrix multiplication as a method of implementing filtering, due to
the special structure of this operation. We do not suggest other methods for
implementation than applying the convolution formula in a brute-force way, or
factoring the filter in simpler components.

B.7 Polyphase factorizations and lifting


In signal processing literature, it is not common to associate polyphase com-
ponents with matrices, but rather with Laurent polynomials generated from
the corresponding filter. The Laurent polynomial is nothing else than the Z-
APPENDIX B. SIGNAL PROCESSING AND LINEAR ALGEBRA: A TRANSLATION GUIDE469

transform of the associated filter. Associating polyphase components with blocks


in a block matrix makes this book fit with block matrix methods in linear algebra
textbooks.
The polyphase factorization serves two purposes in this book. Firstly, the
lifting factorization (as used for wavelets) is derived from it, and put in a linear
algebra framework as a factorization into sparse matrices, similarly to the FFT
factorization. Thereby it fits together with many of the matrix factorization
results from classical linear algebra, where also sparsity is what makes the
factorization good for computation.
Secondly, the polyphase factorization of the filter bank transforms in the
MP3 standard are derived (also as a sparse matrix factorization), and from this
it is apparent what properties to put on the prototype filters in order to obtain
useful transforms. In fact, from this factorization it became apparent that the
MP3 filter bank transforms could be expressed in terms of alternative QMF filter
banks (i.e. M = 2).
These two topics (lifting and the MP3 filter bank transform polyphase
factorization) are usually not presented in a unified way in textbooks. we see
here that there is a big advantage of doing this, since the second can build on
theory from the first.

B.8 Transforms in general


In signal processing, one often refers to the forward and reverse filter bank
transforms as analysis and synthesis, respectively, and for obvious reasons. In
mathematical literature, one instead often use the term change of coordinates in
a wavelet setting. These terms are not normally used in mathematical literature,
where the term basis vectors/change of coordinate matrices would be used instead.
Also, the output from a forward filter bank transform is often refered to as the
transformed vector, and the result we get when we apply the reverse filter bank
transform to this is called the reconstructed vector.
This exposition takes extra care in presenting how the DCT is derived
naturally from the DFT. In particular both the DFT and the DCT are derived as
matrices of eigenvectors for finite-dimensional filters. The DCT is derived from
the DFT in that one restricts to a certain subset of vectors. The orthogonality
of these matrices follows from the orthogonality of distinct eigenspaces.

B.9 Perfect reconstruction systems


The term biorthogonality is not used to describe a mutual property of the filters
of wavelets. Borthogonality corresponds simply to two matrices being inverses
of oneanother. For the same reason, the term perfect reconstruction is not used
much. Much wavelet theory refer to a property called delay normalization. This
terms has been avoided by mostly considering wavelets with symmetric filters,
APPENDIX B. SIGNAL PROCESSING AND LINEAR ALGEBRA: A TRANSLATION GUIDE470

for which delay-normalization is automatic. There are, however, many examples


of wavelets where this term is important.

B.10 Z-transform and frequency response


The Z-transform and the frequency response are much used in signal processing
literaure, and are important concepts for filter design. We have deliberately
dropped the Z-transform. Due to this, much signal processing has of course
been left out, since placements of poles and zeroes are not performed outside or
inside the unit circle, since the frequency response only captures the values on
the unit circle. Placement of poles and circles is perhaps the most-used design
feature in filter design. The focus here is on implementing filters, not designing
them, however.
In signal processing literature, the DTFT and the Z-transform is used,
assuming that the inputs and outputs are vectors of infinite length. In practice
of course, some truncation is needed, since only finite-dimensional arithmetic is
performed by the computer. How this truncation is to be done without affecting
the computations is thus never mentioned in signal processing, although it is
always performed somehow. This exposition shows that this truncation can be
taken as part of the theory, without seriously affecting the results.
Nomenclature

symbol definition
fs Sampling frequency
Ts Sampling period
T Period of a function
ν Frequency
fN N th order Fourier series of f
VN,T N th order Fourier space
DN,T Order N real Fourier basis for VN,T
FN,T Order N complex Fourier basis for VN,T
f˘ Symmetric extension of the function f
λs (ν) Frequency response of a filter
N Number of points in a DFT/DCT
FN = {φ0 , φ1 , · · · , φN −1 } Fourier basis for RN
FN N imesN -Fourier matrix
x̂ DFT of the vector x
A Conjugate of a matrix
AH Conjugate transpose of a matrix
x(e) Vector of even samples
x(o) Vector of odd samples
O(N ) Order of an algorithm
l(S) Length of a filter
x∗y Convolution of vectors
λS,n Vector frequency response of a digital filter
Ed Filter which delays with d samples
ω Angular frequency
λS (ω) Continuous frequency response of a digital filter
x̆ Symmetric extension of a vector
Sr Symmetric restriction of S
Sf Matrix with the columns reversed
DN = {d0 , d1 , · · · , dN −1 } N -point DCT basis for RN
DCTN N × N -DCT matrix

471
APPENDIX B. SIGNAL PROCESSING AND LINEAR ALGEBRA: A TRANSLATION GUIDE472

symbol definition
φ Scaling function
Vm Resolution space
rφm Basis for Vm
cm,n Coordinates in φm
Wm Detail space
rU ⊕ V Direct sum of vector spaces
ψm Basis for Wm
wm,n Coordinates in ψm
Cm Reordering of (φm−1 , ψm−1 )
φ̃ Dual scaling function
ψ̃ Dual mother wavelet
Ṽm Dual resolution space
W̃m Dual detail space
Dm Reordering of φm
EN = {e0 , e1 , · · · , eN −1 } Standard basis for RN
⊗ Tensor product
(0,1)
Wm Resolution m Complementary wavelet space, LH
(1,0)
Wm Resolution m Complementary wavelet space, HL
(1,1)
Wm Resolution m Complementary wavelet space, HH
AT Transpose of a matrix
A−1 Inverse of a matrix
hu, vi Inner product
[x]B Coordinate vector of x relative to the basis B
PC←B Change of coordinate matrix from B to C
Bibliography

[1] A. Ambardar. Digital Signal Processing: a Modern Introduction. Cengage


Learning, 2006.
[2] D.P. Bertsekas. Nonlinear Programming. Athena Scientific, 1999.
[3] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University
Press, 2004.
[4] C. M. Brislawn. Fingerprints go digital. Notices of the AMS, 42(11):1278–
1283, 1995.
[5] B. A. Cipra. The best of the 20th century: Edi-
tors name top 10 algorithms. SIAM News, 33(4), 2000.
http://www.uta.edu/faculty/rcli/TopTen/topten.pdf.
[6] A. Cohen and I. Daubechies. Wavelets on the interval and fast wavelet
transforms. Applied and computational harmonic analysis, 1:54–81, 1993.
[7] A. Cohen, I. Daubechies, and J-C. Feauveau. Biorthogonal bases of com-
pactly supported wavelets. Communications on Pure and Appl. Math.,
45(5):485–560, June 1992.
[8] J. W. Cooley and J. W. Tukey. An algorithm for the machine calculation
of complex fourier series. Math. Comp., 19:297–301, 1965.
[9] A. Croisier, D. Esteban, and C. Galand. Perfect channel splitting by use
of interpolation/decimation/tree decomposition techniques. Int. Conf. on
Information Sciences and Systems, pages 443–446, August 1976.
[10] G. Dahl. A note on diagonally dominant matrices. Linear Algebra and its
Appl., 317(1-3):217–224, 2000.
[11] G. Dahl. An Introduction to Convexity. Report, University of Oslo, 2010.
[12] I. Daubechies. Orthonormal bases of compactly supported wavelets. Com-
munications on Pure and Appl. Math., 41(7):909–996, October 1988.
[13] I. Daubechies. Ten Lectures on Wavelets. CBMS-NSF conference series in
applied mathematics. SIAM Ed., 1992.

473
BIBLIOGRAPHY 474

[14] P. Duhamel and H. Hollmann. ’split-radix’ FFT-algorithm. Electronic


letters, 20(1):14–16, 1984.
[15] FBI. WSQ gray-scale fingerprint image compression specification. Technical
report, IAFIS-IC, 1993.
[16] M. W. Frazier. An Introduction to Wavelets Through Linear Algebra.
Springer, 1999.
[17] M. Frigo and S. G. Johnson. The design and implementation of FFTW3.
Proceedings of the IEEE, 93(2):216–231, 2005.
[18] R. C. Gonzalez, R. E. Woods, and S. L. Eddins. Digital Image Processing
Using MATLAB. Gatesmark publishing, 2009.
[19] J. B. Hiriart-Urruty and C. Lemaréchal. Convex Analysis and Minimization
Algorithms I. Springer, 1993.
[20] ISI/IEC. Information technology - coding of moving pictures and associated
audio for digital storage media at up to about 1.5 mbit/s. Technical report,
ISO/IEC, 1993.
[21] ISO/IEC. Jpeg2000 part 1 final draft international standard. iso/iec fdis
15444-1. Technical report, ISO/IEC, 2000.
[22] S.G Johnson and M. Frigo. A modified split-radix FFT with fewer arithmetic
operations. IEEE Transactions on Signal Processing, 54, 2006.
[23] J.D. Johnston. A filter family designed for use in quadrature mirror filter
banks. Proc. Int. Conf. Acoust. Speech and Sig. Proc., pages 291–294, 1980.
[24] C. T. Kelley. Iterative Methods for Linear and Nonlinear Equations. SIAM,
1995.
[25] D. C. Lay. Linear Algebra and Its Applications (4th Edition). Addison-
Wesley, 2011.
[26] T. Lindstrøm and K. Hveberg. Flervariabel Analyse Med Lineær Algebra.
Pearson, 2011.
[27] D.G. Luenberger. Linear and Nonlinear Programming. Addison-Wesley,
1984.
[28] T. Lyche. Numerical Linear Algebra. Report, University of Oslo, 2010.
[29] S. Mallat. A Wavelet Tour of Signal Processing. Tapir Academic Press,
1998.
[30] C. D. Meyer. Matrix Analysis and Applied Linear Algebra. SIAM, 2000.
[31] Knut Mørken. Numerical Algorithms and Digital Representation. UIO,
2013.
BIBLIOGRAPHY 475

[32] J. Nocedal and S.J. Wright. Numerical Optimization. Springer, 2006.


[33] P. Noll. MPEG digital audio coding. IEEE Signal processing magazine,
pages 59–81, September 1997.
[34] A. V. Oppenheim and R. W. Schafer. Discrete-Time Signal Processing.
Prentice Hall, 1989.
[35] D. Pan. A tutorial on MPEG/audio compression. IEEE Multimedia, pages
60–74, Summer 1995.
[36] W. B. Pennebaker and J. L. Mitchell. JPEG Still Image Data Compression
Standard. Van Nostrand Reihnold, 1993.

[37] J. G. Proakis and D. G. Manolakis. Digital Signal Processing. Principles,


Algorithms, and Applications. Fourth Edition. Pearson, 2007.
[38] C. M. Rader. Discrete Fourier transforms when the number of data samples
is prime. Proceedings of the IEEE, 56:1107–1108, June 1968.

[39] T. A. Ramstad, S. O. Aase, and J. H. Husøy. Subband Compression of


Images: Principles and Examples: Principles and Examples, volume 6.
Elsevier Science, 1995.
[40] R.T. Rockafellar. Convex Analysis. Princeton University Press, 1970.
[41] A. Ruszczynski. Nonlinear Optimization. Princeton University Press, 2006.

[42] C. E. Shannon. Communication in the presence of noise. Proc. Institute of


Radio Engineers, 37(1):10–21, Jan. 1949.
[43] P. Stoica and R. Moses. Spectral Analysis of Signals. Prentice Hall, 2005.
[44] W. Sweldens. The lifting scheme: a new philosophy in biorthogonal wavelet
constructions. Wavelet Applications in Signal and Image Processing III,
pages 68–79, 1995.
[45] W. Sweldens. The lifting scheme: a custom-design construction of biorthog-
onal wavelets. Applied and computational harmonic analysis, 3:186–200,
1996.

[46] D. S. Taubman and M. W. Marcellin. JPEG2000. Image Compression.


Fundamentals, Standards and Practice. Kluwer Academic Publishers, 2002.
[47] M. Vetterli and J. Kovacevic. Wavelets and Subband Coding. Prentice Hall,
1995.

[48] M. Vetterli and H. J. Nussbaumer. Simple FFT and DCT algorithms with
reduced number of operations. Signal Processing, 6:267–278, 1984.
[49] R. Webster. Convexity. Oxford University Press, Oxford, 1994.
BIBLIOGRAPHY 476

[50] S. Winograd. On computing the discrete Fourier transform. Math. Comp.,


32:175–199, 1978.
[51] R. Yavne. An economical method for calculating the discrete Fourier
transform. Proc. AFIPS Fall Joint Computer Conf., 33:115–125, 1968.
Index

AD conversion, 1 Biorthogonal
affine function, 398 bases, 256
algebra, 96 Biorthogonality, 256
Alias cancellation, 233 bit rate, 1
Alias cancellation condition, 236 Bit-reversal
Aliasing, 233 DWT, 291
analysis, 14 FFT, 73
equations, 14 block diagonal matrices, 175
Analysis filter components of a forward block matrix, 71
filter bank transform, 241 Blocks, 352
Angular frequency, 104
Arithmetic operation count Cascade algorithm, 208
DCT, 155 Causal filter, 269
DFT direct implementation, 55 central path, 456
FFT, 77 central point, 456
revised DCT, 158 chain rule, 393
revised FFT, 158 Change of coordinate matrix, 464
symmetric filters, 149 Change of coordinates, 464
with tensor products, 347 in tensor product, 345
audioread, 6 Channel, 241
audiowrite, 6 Compact support, 37
Complex Fourier coefficients, 23
Backtracking line search, 420 Computational molecule, 329
Bandpass filter, 112 Partial derivative in x-direction,
barrier method, 453 335
barrier problem, 454 Partial derivative in y-direction,
Basis 336
C, 175 Second order derivatives, 338
D, 294 smoothing, 334
φm , 166 concave, 398
ψm , 170 condition number, 422
DCT, 141 Conjugate transpose, 52
for VN,T , 14, 22 continuous
Fourier, 50 sound, 1
basis, 463 Continuous-time Fourier transform, 47

477
INDEX 478

contraction, 408 Discrete Fourier transform, 51


convex combination, 399 Discrete Wavelet Transform, 173
convex function (multivariate), 397 downsampling, 214
convex optimization, 444 Dual
convex set, 396 detail space, 257
Convolution mother wavelet, 255
analog, 37 multiresolution analysis, 257
kernel, 37 resolution space, 257
vectors, 91 scaling function, 255
convolve, 92 wavelet transforms, 220
coordinate matrix, 345 dual problem, 446
Coordinate vector, 464 duality, 445
Coordinates in φm , 167 duality gap, 446
Coordinates in ψm , 170 DWT kernel parameter dual, 220
Cosine matrices, 142 DWT parameter bd_mode, 226
Cosine matrix inverse
type I, 230 eigenvalue, 465
type II, 142 eigenvector, 465
type III, 142, 146 elementary lifting matrix
critical sampling, 214 even type, 287
CTFT, 47 odd type, 287
used for non-symmetric filters, 299
DCT used for symmetric filters, 291
I, 229 error-resilient, 352
dct, 143
DCT basis, 141 Fejer kernel, 44
DCT coefficients, 141 FFT, 70
DCT matrix, 141 twiddle factors, 79
DCT-I factorization, 229 fft, 74
DCT-II factorization, 142 FFT algorithm
DCT-III factorization, 142 Non-recursive, 81
DCT-IV factorization, 146 Radix, 82
Detail space, 169 Split-radix, 82
DFT coefficients, 51 Filter
DFT matrix factorization, 72 bandpass, 112
Diagonalization high-pass, 112
with FN , 95 ideal high-pass, 112
diagonally dominant, 401 ideal low-pass, 112
digital length, 90
sound, 1 linear phase, 138
digital filter, 95 low-pass, 112
Direct sum moving average, 113
linear transformations, 185 MP3 standard, 117
vector spaces, 169 time delay, 96
Dirichlet kernel, 44 Filter bank, 241
Discrete Cosine transform, 141 Cosine-modulated, 245
INDEX 479

Filter bank transform, 241 revised, 157


Filter coefficients, 88 Split-radix, 82
Filter echo, 112 FFT2, 349
FIR filters, 125 Filtering an image, 331
First crime of wavelets, 207 Generic DWT, 179
fixed point, 407 Generic DWT2, 369
flop count, 85 Generic IDWT, 179
Forward filter bank transform, 241 Generic IDWT2, 369
in a wavelet setting, 216 IDCT, 154
Fourier analysis, 21 IDCT2, 349
Fourier coefficients, 12 IFFT2, 349
Fourier domain, 14 lifting step
Fourier matrix, 51 non-symmetric, 300
Fourier series, 12 listening to detail part in sound,
square wave, 14 181
triangle wave, 17 listening to high-frequency part in
Fourier space, 12 sound, 65
Frequency domain, 14 listening to low-frequency part in
Frequency response sound, 65
continuous, 104 listening to low-resolution part in
filter, 37 sound, 181
vector, 95 Tensor product, 341
Wavelet kernel
gradient, 391 alternative piecewise linear wavelet,
gradient method, 420 200
gradient related, 421 alternative piecewise linear wavelet
with 4 vanishing moments, 302
Haar wavelet, 180 CDF 9/7 wavelet, 300
Hessian matrix, 391 Haar wavelet, 178
Highpass filter, 112 orthonormal wavelets, 301
piecewise linear wavelet, 194
idct, 143
piecewise quadratic wavelet, 302
Ideal high-pass filter, 112
Spline 5/3 wavelet, 300
Ideal low-pass filter, 112
impulse response, 98
IDFT, 52
imread, 322
IDFT matrix factorization, 72
imshow, 322
ifft, 74
imwrite, 322
IMDCT, 147
In-place
Implementation
bit-reversal implementation, 73
Cascade algorithm to plot wavelet
DWT implementation, 178
functions, 262
FFT implementation, 73
DCT, 153
lifting implementation, 291
DCT2, 349
In-place implementation
DFT, 55
DWT, 291
FFT
Inner product
Nonrecursive, 81
INDEX 480

of functions in a Fourier setting, Lipschitz, 408


11 logarithmic barrier function, 455
of functions in a tensor product loglog, 80
setting, 355 Lowpass filter, 112
of functions in a wavelet setting, LTI filters, 97
163
of vectors, 50 matrix of a linear transformation rela-
interior point method, 453 tive to bases, 465
interpolating polynomial, 62 maximum, 383
interpolation formula, 64 maximum likelihood, 387
ideal MDCT, 146
periodic functions, 64 minimum, 383
Inverse Discrete Wavelet Transform, 173 mother wavelets, 173
MP3
Jacobi matrix, 391 and the DCT, 159
Jensen ’s inequality, 399 FFT, 67
JPEG filters, 117
standard, 351 standard, 45
JPEG2000 window, 108
lossless compression, 275 MP3 standard
lossy compression, 277 matrixing, 243
standard, 275 partial calculation, 243
windowing, 243
Karush-Kuhn-Tucker conditions, 438 MRA-matrix, 213
Kernel transformations, 175 multiresolution analysis, 205
KKT conditions, 438 multiresolution model, 163
Kronecker tensor product, 343
Near-perfect reconstruction, 233
Lagrangian function, 433 Newton’s method, 410, 420, 422
least square error, 11 Newton’s method with equality con-
length of a filter, 90 straints, 453
Lifting factorization, 290 nonlinear equations, 407
alternative piecewise linear wavelet, nonlinear optimization problem, 432
295
alternative piecewise linear wavelet objective function, 383
with 4 vanishing moments, 301 optimal control, 389
CDF 9/7 wavelet, 296 optimality conditions, 416
orthonormal wavelets, 299 Order N complex Fourier basis for VN,T ,
piecewise linear wavelet, 295 22
piecewise quadratic wavelet, 302 Order of an algorithm, 75
Spline 5/3 wavelet, 296 Orthogonal
linear convergence, 393 basis, 464
linear optimization, 389 matrix, 463
Linear phase filter, 138 vectors, 464
linearized feasible direction, 440 Orthogonal decomposition theorem, 464
linearly independent, 463 Orthonormal
INDEX 481

basis, 464 scaling function, 167, 206


MRA, 206 separable extension, 354
Orthonormal wavelets, 240 sound channel, 7
Outer product, 331 Sparse matrix, 463
square wave, 7
Parallel computing Standard
with the DCT, 155 JPEG, 351
with the DWT, 352 JPEG2000, 275
with the FFT, 77 MP3, 47
penalty term, 437 stationary point, 416
Perfect reconstruction, 233 steepest descent method, 420
perfect reconstruction condition, 236 strictly convex, 416
Perfect reconstruction filter bank, 242 subband
Periodic function, 5 HH, 364
Phase distortion, 233 HL, 364
play, 6 LH, 364
polyhedron, 397 LL, 364
Polyphase Subband coding, 241
component of a vector, 77 Subband samples of a filter bank trans-
Polyphase components, 285 form, 241
Polyphase representation, 285 superlinear convergence, 393
positive definite, 390 Support, 37
positive semidefinite, 390 Symmetric
primal problem, 446 vector, 133
projection, 464 Symmetric extension
psycho-acoustic model, 45 of function, 33
pure digital tone, 50 used by the DCT, 133
pure tone, 5 used by wavelets, 224
Symmetric restriction of a symmetric
QMF filter banks, 239 filter, 138
Alternative definition, 240 synthesis, 14
Classical definition, 239 equation, 14
quasi-Newton method, 411 vectors, 52
Synthesis filter components of a reverse
Regular point, 433
filter bank transform, 242
Resolution space, 165
Reverse filter bank transform tangent vector, 440
in a wavelet setting, 216 tensor product, 316
Reverse filter bank transforms, 242 of function spaces, 354
roots, 268 of functions, 354
of matrices, 332
samples, 1
of vectors, 331
sampling, 1
Tiles, 352
frequency, 1
time domain, 14
period, 1
time-invariant, 96
rate, 1
toc, 80
INDEX 482

Toeplitz matrix, 88
circulant, 88
Transpose DWT, 226
Transpose IDWT, 226
triangle wave, 8

Unitary matrix, 52
upsampling, 215

Vector space
of symmetric vectors, 133

Wavelets
Alternative piecewise linear, 198
CDF 9/7, 276
Orthonormal, 279
Piecewise linear, 191
Spline, 273
Spline 5/3, 275
weak duality, 445
weak Slater condition, 447
window, 108
Hamming, 108
Hanning, 111
in the MP3 standard, 108
rectangular, 108

You might also like