0% found this document useful (0 votes)
76 views

Digital Signal & Image Processing B Option - 8 Lectures: Stephen Roberts Sjrob@robots - Ox.ac - Uk

This document provides an overview of the first two lectures of a course on digital signal and image processing. Lecture 1 introduces linear systems and convolution, and how they can be analyzed in the time and frequency domains using tools like the Fourier transform. Key concepts covered include linear time-invariant systems, convolution, and the convolution theorem. Lecture 2 focuses on the Fourier transform and its properties, as well as introducing basic digital filters and their transfer functions.

Uploaded by

Dotressus Mmok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views

Digital Signal & Image Processing B Option - 8 Lectures: Stephen Roberts Sjrob@robots - Ox.ac - Uk

This document provides an overview of the first two lectures of a course on digital signal and image processing. Lecture 1 introduces linear systems and convolution, and how they can be analyzed in the time and frequency domains using tools like the Fourier transform. Key concepts covered include linear time-invariant systems, convolution, and the convolution theorem. Lecture 2 focuses on the Fourier transform and its properties, as well as introducing basic digital filters and their transfer functions.

Uploaded by

Dotressus Mmok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 172

DIGITAL SIGNAL & IMAGE PROCESSING

B Option 8 lectures

Stephen Roberts
[email protected]

Lecture 1 Foundations
1.1

Recommended books

Lynn. An introduction to the analysis and processing of signals. Macmillan.


Oppenhein & Shafer. Digital signal processing. Prentice Hall
Orfanidis Introduction to Signal Processing. Prentice Hall.
Proakis & Manolakis. Digital Signal Processing: Principles, Algorithms and
Applications.
Matlab Signal processing toolbox manual.

1.2

Introduction

This course is ultimately concerned with the problem of computation, inference,


manipulation and decision making using 1- and 2-dimensional data streams (signals and images). The course starts by considering the foundations of modern
signal processing theory, defining terms and offering a simple but profound framework under which data may be manipulated. Although theory is very important
in this subject area, an effort is made to provide examples of the major points
throughout the course.
1.3

Summary/Revision of basic definitions

For the sake of brevity in the nomenclature definitions and theorems are often
going to be introduced in their 1-d form (i.e. as signals) with the index variable
t or x. Please note that such an indexed set of samples is just a 1-d case of
a generic ordered set which could be 2-d or more, i.e. dependent on a set of
indices or variables, i, j or x, y for example. I try to keep to the convention that
f (x) is a 1-d system with 1-d frequency support indexed by u and f (x, y ) is a
2-d system with 2-d frequency support indexed by u, v .
2

1.3.1

Linear Systems

A linear system may be defined as one which obeys the Principle of Superposition. If f1(x) and f2(x) are inputs to a linear system which gives rise to outputs
r1(x) and r2(x) respectively, then the combined input af1(x) + bf2(x) will give
rise to an output ar1(x) + br2(x), where a and b are arbitrary constants.
Notes

If we represent an input signal by some support in a frequency domain, Fi n


(i.e. the set of frequencies present in the input) then no new frequency
support will be required to model the output, i.e.
Fout Fin
Linear systems can be broken down into simpler sub-systems which can be
re-arranged in any order, i.e.
f g1 g2 f g2 g1 f g1,2

1.3.2

Time or space Invariance

A time-invariant system is one whose properties do not vary with time (i.e. the
input signals are treated the same way regardless of their time of arrival); for
example, with discrete systems, if an input sequence f (x) produces an output
sequence r (x), then the input sequence f (x xo ) will produce the output sequence r (x xo ) for all xo . In the case of 2-d images, the system response does
not depend on the position within the image, i.e. not on xo , yo .
1.4

Linear Processes

Some of the common signal processing functions are amplification (or attenuation), mixing (the addition of two or more signal waveforms) or un-mixing and
filtering. Each of these can be represented by a linear time-invariant block with
an input-output characteristic which can be defined by:
The impulse response g(x) in the time domain.
The transfer function G(u) in a frequency domain. We will see that the
choice of frequency basis may be subtly different from time to time.

As we will see, there is (for many of the systems we examine in this course)
an invertable mapping between the time (image index) and (spatial) frequency
domain representations.
1.5

Convolution

Convolution allows the evaluation of the output signal from a LTI system, given
its impulse response and input signal.
The input signal can be considered as being composed of a succession of
impulse functions, each of which generates a weighted version of the impulse
response at the output, as shown in 1.1.

0.4

0.25
0.2

0.3

0.15
0.2
0.1
0.1

0.05

20

40

components
0.25

0.08

0.2

0.06

0.15

0.04

0.1

0.02

0.05

20

40

60

80

100

60

80

100

total

0.1

60

80

100

20

40

Figure 1.1: Convolution as a summation over shifted impulse responses.

The output at time x, r (x), is obtained simply by adding the effect of each
separate impulse function this gives rise to the convolution integral :
Z
X
d 0
r (x) =
{f (x )d }g( )
f (x )g( )d
0

is a dummy variable which represents time measured back into the past from
the instant x at which the output r (x) is to be calculated.
1.5.1

Notes

Convolution is commutative. Thus:


Z
r (x) is also
f ( )g(x )d
0

For discrete systems convolution is a summation operation:


r [n] =

f [k]g[n k] =

k=0

X
k=0

f [n k]g[k]

1.6

Frequency-Domain Analysis

Linear (time-invariant) systems, by definition, may be represented (in the continuous case) by linear differential equations (in the discrete case by linear difference equations). Consider the application of the linear differential operator,
D = d/dx, to the function f (x) = e sx :
Df (x) = sf (x)
An equation of this form means that f (x) is the eigenfunction of D. Just like the
eigen analysis you know from matrix theory, this means that f (x) and any linear
operation on f (x) may be represented using a set of functions of exponential
form, and that this function may be chosen to be orthogonal. This naturally
gives rise to the use of the Laplace and Fourier representations.
The Laplace transform:
F (s) Transfer function G(s) R(s)
where,

Z
F (s) =

f (x)e sx dx

Laplace transform of f (x)

0
8

R(s) = G(s)F (s)


where G(s) can be expressed as a pole-zero representation of the form:
G(s) =

A(s z1) . . . (s zm )
(s p1)(s p2) . . . (s pn )

The Fourier transform:


Z
F (u) =
f (x)e i 2ux dx Fourier transform of f (x)

and
R(iu) = G(iu)F (iu)
The output time function can be obtained by taking the inverse Fourier
transform:
Z
1
f (x) =
F (u)e i 2ux du
2

Theorem
Convolution in the time domain is equivalent to multiplication in the frequency
domain i.e.
r (x) = g(x) f (x) F 1{R(u) = G(u)F (u)}
and
r (x) = g(x) f (x) L1{R(s) = G(s)F (s)}
Proof
Consider the general integral (Laplace) transform of a shifted function:
Z
f (x )e sx dx
L{f (x )} =
= e

x
s

L{f (x)}

Now consider the Laplace transform of the convolution integral


Z Z
L{f (x) g(x)} =
f (x )g( )d e sx dx
Zx
=
g( )e s d L{f (x)}

= L{g(x)}L{f (x)}
By allowing s iu we prove the result for the Fourier transform as well.
10

1.7

Simple filtering - basics

It is very useful in image and signal processing to view transformations as inputoutput relationships, specified by a transfer function.

h(x)
f(x)

r(x)
Figure 1.2: Transfer function model.

11

The transfer function is defined by the impulse response of the system. This
specifies the response of the system to an impulse input (a dirac). The convolution theorem states that the response, r (x), of such an impulse input, i (x), may
be given as
r (x) = i (x) h(x)
where h(x) is the impulse response function. And for an arbitrary input, f (x)
r (x) = f (x) h(x)
or, by taking the FT
R(u) = F (u)H(u)
where H(u) is the transfer function in the Fourier domain. As F (u) represents
the spectrum of f (x), so multiplying by some function H(u) may be regarded as
modifying or filtering F (u) and hence filtering the data sequence f (x).
Key point : Filtering may be regarded as a multiplication in the spectral domain, or as a convolution in the image/signal domain.

12

The Ideal LPF (ILPF) : Consists of H(u, v ) = 1 if (u 2 + v 2) D and 0


otherwise. As the ILPF is symmetric about the origin in Fourier space, it requires
that the FT of the image is also centred on the origin.
The Butterworth LPF (BLPF) : Consists of
(

2n )1
D(u, v )
H(u, v ) = 1 +
D

where D(u, v ) = (u 2 + v 2) and D and n are filter settings The IHPF : Is the
inverse transfer function of the ILPF, and the corresponding Butterworth filter,
the BHPF, as the inverse of the BLPF.

13

(a)
1

| G(u ) |
|G()|

0.8

n=1

0.6
0.4

n=2

0.2

n=6

0.2

0.4

0.6

0.8

1.2

1.4

1.6

1.8

(b)

log |G()|

10

uc

1-d

10

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0
60

10

10

log

0
60
50

60

40

50
30
30

50
40
30
20

10

10
0

60

40
20

20

10

50
30

40
20

2-d

10

10
0

Figure 1.3: The ILPF and the BLPF

14

Lecture 2 The Fourier Domain & Digital Filters


2.1

The Fourier transform

Consider again the 1-D case of a signal f (x), the FT is defined as


Z +
F (u) =
f (x) exp[i 2ux]dx

and the inverse as

f (x) =

F (u)e i 2ux du

which form a Fourier transform pair. We note that the FT is, in general, a
complex function of the form F (u) = R(u) + i I(u). We call |F (u)| the Fourier
spectrum of f (x) and (u) = tan1[I(u)/R(u)] the phase spectrum.

15

2.1.1

2-D Fourier transform

There is no inherent change in theory for the 2-dimensional case, where f (x, y )
exists, so the FT, F (u, v ) is given as
Z Z +
F (u, v ) =
f (x, y ) exp[i 2(ux + v y )]dxdy

and the inverse as

Z Z

f (x, y ) =

F (u, v ) exp[i 2(ux + v y )]dudv

We note that the above are separable


2.1.2

Some basic theorems

Here are some basic (and useful) theorems related to the FT. They are shown
for a 1-D system, for ease of reading and notation, and directly translate into
higher dimensions as above.
Similarity theorem : if f (x) F (u) then f (ax)

1
|a| F (u/a)

Addition theorem : if f (x), g(x) F (u), G(u) then af (x) + bg(x)


aF (u) + bG(u)
16

Shift or twist theorem : if f (x) F (u) then f (x a) exp[i 2ua]F (u)


Convolution theorem : if

f (x) g(x) =

f ( )g(x )d

then F T [f (x) g(x)] = F (u)G(u). Note that this is of great use in filtering
Power theorem :

Z
|f (x)|2dx =

|F (u)|2du

i.e. a statement about conservation of energy.


Derivative theorem : if f (x) F (u) then f 0(x) = d/dx[f (x)] iuF (u)
2.1.3

The discrete FT (DFT)

We sample the continuous (start with 1-D) function, f (x), at M points spaced
x apart. We now describe the function as
f (x) = f (xo + xx)

17

where x now describes an index, with this transformation, u the Fourier varable
paired to x is discretised into M points. We thus obtain :
M1
1 X
F (u) =
f (x) exp[i 2ux/M]
M x=0
and
f (x) =

M1
X

F (u) exp[i 2ux/M]

u=0

and, for 2-D systems


M1 N1
1 1 XX
F (u, v ) =
f (x, y ) exp[i 2(ux/M + v y /N)]
M N x=0 y =0

and
f (x, y ) =

M1
X N1
X

F (u, v ) exp[i 2(ux/M + v y /N)]

u=0 v =0

if y is sampled evenly at N sample points.


The sampling in the space domain, x, y corresponds to a sampling in the
frequency domain of
1
1
v =
u =
Mx
My
18

2.1.4

Some useful results using the DFT

The total width of the samples in the x, y directions determines the lowest
spatial frequency we can resolve, umin = 1/(Mx)
The sample interval, x, y , dictates the highest spatial frequency we can
resolve, umax = 1/(2x)
The number of samples, M, N, dictates the number of spatial frequency
bins that can be resolved.
Addition & linearity : as with continuous functions
Shift theorem : : if f (x) F (u) then f (x a) exp[i 2ua/M]F (u),
hence
f (x, y ) exp[i 2(uo x + vo y )/M] F (u uo , v vo )
and
f (x xo , y yo ) F (u, v ) exp[i 2(uxo + v yo )/M]
if we let uo = vo = M/2 then we can shift the frequency space to the centre
of the frequency square
f (x, y )(1)x+y F (u M/2, v M/2)
19

Discrete convolution :
f (x) g(x) =

M1
X

f (m)g(x m)

m=0

and
DF T [f (x) g(x)] = MF (u)G(u)
Power theorem :

M1
X

|f (x)|2 = M

x=0

M1
X

|F (u)|2

u=0

Periodicity :
F (u, v ) = F (u + M, v ) = F (u, v + M) = F (u + M, v + M)
Note that this leads us to deduce the aliasing theorem
Rotation :

f (r, + o ) F (, + o )

Average value : If we define the average value of the 2-D function as


M1 M1
1 XX
f (x, y ) = 2
f (x, y )
M x=0 y =0
20

then

f (x, y ) = F (0, 0)

Laplacian : The Laplacian of a 2-D variable is defined as


2f
2f
f (x, y ) = 2 + 2
x
y
2

The DFT of the above is hence


(2)2(u 2 + v 2)F (u, v )
(each time we take a derivative we get a factor of i 2)

Trick : Plots of |F (u, v )| often decay very rapidly from a central peak, so it is
good to display on a log scale. Often the transform, F 0(u, v ) = log[1+|F (u, v )|]
is used.

21

(a)
20

20

40

40

60

60

80

80

100

100

120

120
20

(b)

40

60

80 100 120

20

20

40

40

60

60

80

80

100

40

60

80 100 120

20

40

60

80 100 120

100

22

120

(c)

20

20

40

60

80 100 120

120

Figure 2.1: Image (a), Fourier spectrum (b) and shifted Fourier spectrum (c).

20

20

20

40

40

40

60

60

60

80

80

80

100

100

100

120

120

120

20 40 60 80 100 120

20 40 60 80 100 120

20 40 60 80 100 120

20

20

20

40

40

40

60

60

60

80

80

80

100

100

100

120

120

120

20 40 60 80 100 120

20 40 60 80 100 120

20 40 60 80 100 120

Figure 2.2: Some 2-D functions and their resultant DFTs

23

20

20

40

40

60

60

80

80

100

100

120

120
20 40 60 80 100 120

20 40 60 80 100 120

Figure 2.3: Rotation in FT space (top) and 2-D Convolution (bottom).

24

2.2

Other transforms

The FT represents a specific case in a more general transform theory. In the FT


the image is decomposed into a series of harmonic functions (sines and cosines).
These have the property of being orthogonal functions and form a complete basis
set. They are not the only such functions, however.
FT kernel basis
Walsh
Hadamard
Discrete cosine transform (DCT)
Wavelets - localised functions.
We concentrate later in the course on the use of the DCT as this is the most
widely used of the above and forms the basis of JPEG compression. We also
look at wavelets as these form the basis of the JPEG-2000 compression scheme.

25

2.3
2.3.1

Digital filtering
The sampling process

This is performed by an analogue to digital converter (ADC) in which the continuous function f (x) is replaced by a discrete function f [k], which is defined
only at x = kT , with k = 0, 1, 2. We thence only need consider the digitised
sample set f [k] and the sample interval T . A simple generalisation allows for a
sampled set over the 2-D plane, with samples at uM, v N so that u, v indexes
the image pixels.
Aliasing
x
Consider f (x) = cos( 2 Tt ) (one cycle every 4 samples) and also f (t) = cos( 3
2 T)
(3 cycles every 4 samples) as shown in the Figure. Note that the resultant
samples are the same. This result is referred to as aliasing.

26

0.5

0.5

10

20

30

40

50

60

70

80

90

100

10

20

30

40

50

60

70

80

90

100

0.5

0.5

Figure 2.4: Aliasing.

27

2.4

Introduction to the principles of digital filtering

We can see that the numerical processing is at the heart of the digital filtering
process. How can the arithmetic manipulation of a set of numbers produce a
filtered version of that set? Consider the noisy signal of figure 2.5, together
with its sampled version:

-2

-1

Figure 2.5: Noisy data.

28

One way to (e.g) reduce the noise might be to try and smooth the data.
For example, we could try a polynomial fit using a least-squares criterion. If we
choose, say, to fit a parabola to every group of 5 points in the sequence, then, for
every point, we will make a parabolic approximation to that point using the value
of the sample at that point together with the values of the 4 nearest samples
(this forms a parabolic filter), as in Fig. 2.6

parabolic fit

-2

-1

1 2
centre point, k=0

Figure 2.6: Parabolic fit.

29

p[k] = s0 + ks1 + k 2s2


where p[k] = value of parabola at each of the 5 possible values of k = {2, 1, 0, 1, 2}
and s0, s1, s2 are the variables used to fit each of the parabolae to 5 input data
points.
We obtain a fit by finding a parabola (coefficients s0, s1 and s2) which best
approximates the 5 data points as measured by the least-squares error E:
E(s0, s1, s2) =

2
X

(x[k] [s0 + ks1 + k 2s2])2

k=2

Minimizing the least-squares error gives:


E
= 0,
s0

E
= 0,
s1

and

and thus:

k=2
X

5s0 + 10s2 =

x[k]

k=2

10s1 =

k=2
X
k=2
30

E
=0
s2

kx[k]

10s0 + 34s2 =

k=2
X

k 2x[k]

k=2

which therefore gives:


1
(3x[2] + 12x[1] + 17x[0] + 12x[1] 3x[2])
35
1
s1 = (2x[2] x[1] + x[1] + 2x[2])
10
1
s2 = (2x[2] x[1] 2x[0] x[1] + 2x[2])
14
The centre point of the parabola is given by:
s0 =

p[k] |k=0= s0 + ks1 + k 2s2 |k=0= s0


Thus, the parabola coefficient s0 given above is the output sequence number
calculated from a set of 5 input sequences points. The output sequence so
obtained is similar to the input sequence, but with less noise (i.e. low-pass
filtered) because the parabolic filtering provides a smoothed approximation to
each set of five data points in the sequence. Fig. 2.7 shows this filtering effect.

31

1.5

0.5

0.5

10

20

30

40

50

60

70

80

90

Figure 2.7: Noisy data (thin line) and 5-point parabolic filtered (thick line).

32

100

The magnitude response (which we will re-consider later) for the 5-point
parabolic filter is shown below in Fig. 2.8.
1

|H(z)|

0.8
0.6

samp

/2

0.4
0.2
0

100

200

300

400

500

Figure 2.8: Frequency response of 5-point parabolic filter.

33

600

700

The filter which has just been described is an example of a non- recursive
digital filter, which are defined by the following relationship (known as a difference
equation):
N
X
r [k] =
ai f [k i ]
i =0

where the ai coefficients determine the filter characteristics. The difference


equation for the 5-point smoothing filter, therefore, is:
1
r [k] = (3f [k + 2] + 12f [k + 1] + 17f [k] + 12f [k 1] 3f [k 2])
35
This is a non-causal filter since a given output value r [k] depends not only on
previous inputs, but also on the current input f [k], the input f [k + 1] and the
input f [k + 2]. The problem is solved by delaying the calculation of the output
value f [k] (the centre point of the parabola) until all the 5 input values have been
sampled (i.e. a delay of 2T where T = sampling period), ie:
1
(3f [k] + 12f [k 1] + 17f [k 2] + 12f [k 3] 3f [k 4])
35
P
It is of importance to note that the equation r [k] =
ai f [k i ] represents
a discrete convolution of the input data with the filter coefficients; hence these
r [k] =

34

coefficients constitute the impulse response of the filter.


Proof:
P

Let f [k] = 0, except at k = 0, where f [0] = 1. Then r [k] = i ai f [k i ] =


ak f [0] (all terms zero except when i = k). This is equal to ak since f [0] = 1.
Therefore r [0] = a0; r [1] = a1; etc . . .. As there is a finite number of as, the
impulse response is finite. For this reason, non-recursive filters are also called
Finite-Impulse Response (FIR) filters.
As we will see, we may also formulate a digital filter as a recursive filter; in
which, the output r [k] is also a function of previous outputs:
r [k] =

N
X

ai f [k i ] +

i =0

M
X

bi r [k i ]

i =1

Before we can describe methods for the design of both types of filter, we need
to review the concept of the z-transform.

35

2.5

The z-transform

The z-transform is important in digital filtering because it describes the sampling


process and plays a role in the digital domain similar to that of the Laplace
transform in analogue filtering.
The Laplace transform of a unit impulse occurring at time x = kT is e kT s .
Consider the discrete function f [k] to be a succession of impulses, for example
of area f (0) occurring at x = 0, f (1) occurring at x = T , etc . . .. The Laplace
transform of the whole sequence would be:
Fd (s) = f (0) + f (1)e T s + f (2)e 2T s + . . . + f [k]e kT s
The suffix d denotes the transform of the discrete sequence, not of the continuous f (t).
Let us replace e T s by a new variable z, and rename Fd (s) as F (z):
F (z) = f (0) + f (1)z 1 + f (2)z 2 + . . . f [k]z k
For many functions, the infinite series can be represented in closed form, in
general as the ratio of two polynomials in z 1.

36

2.5.1

The Pulse Transfer Function

This is the name for (z-transform of output)/(z-transform of input).


Let the impulse response, for example of an FIR filter, be a0 at t = 0, a1 at
x = T , . . . an at x = nT with n = 0 to N.
Let G(z) be the z-transform of this sequence:
G(z) = a0 + a1z 1 + a2z 2 + . . . + ai z i + . . . aN z N
Let F (z) be an input expressed in the z-domain as:
F (z) = f [0] + f [1]z 1 + f [2]z 2 + . . . + f [k]z k + . . .
The product G(z)F (z) is:
G(z)F (z) = (a0 +a1z 1 +. . . an z n +. . . aN z N )(f [0]+f [1]z 1 +. . . f [k]z k )
in which the coefficient of z k is:
a0f [k] + a1f [k 1] + . . . an f [k n] + . . . aN f [k N]
This is nothing else than the value of the output sample at x = kT . Hence
the whole sequence is the z-transform of the output, say R(z), where R(z) =
G(z)F (z). Hence the pulse transfer function, G(z), is the z-transform of the
impulse response.
37

For non-recursive filters:


G(z) =

N
X

an z n

n=0

For recursive filters:


R(z) =

N
X

ai z F (z ) +

n=0

M
X

bi z i R(z)

i =m

P
R(z )
an z n
n
P
G(z) =
=
F (z) 1 m bm z m
2.5.2

z-plane pole-zero plot

Let z = e sT , where T = sampling period. Since s = + i 2u, we have:


z = e T e i 2uT
If = 0, then | z |= 1 and z = e i 2uT = cos 2uT + i sin 2uT , i.e. the
equation of a circle of unit radius (the unit circle) in the z-plane.
Thus, the imaginary axis in the s-plane ( = 0) maps onto the unit circle in
the z-plane and the left half of the s-plane ( < 0) onto the interior of the unit
circle.
38

We know that all the poles of G(s) must be in the left half of the s-plane for
a continuous filter to be stable. We can therefore state the equivalent rule for
stability in the z-plane:
For stability all poles in the z-plane must be inside the unit circle.
2.6

Frequency response of a digital filter

This can be obtained by evaluating the (pulse) transfer function on the unit circle
( i.e. z = e 2i uT ).
Proof
Consider the general filter
r [k] =

an f [k n]

n=0

NB: A recursive type can always be expressed as an infinite sum by dividing out:
a0
eg., for G(z) =
,
1 b1z 1

we have r [k] =

X
n=0

39

a0.b1n f [k n]

Let input before sampling be cos(2ut + ), sampled at t = 0, T, . . . , kT .


Therefore f [k] = cos(2ukT + ) = 12 {e i (2ukT +) + e i (2ukT +)}

1X
1X
i {2u[kn]T +}
ie.r [k] =
an e
+
an e i 2{u[kn]T +}
2 n=0
2 n=0

1 i (2ukT +) X
1 i 2(ukT +) X
i 2unT
an e
an e i 2unT
= e
+ e
2
2
n=0
n=0

Now

an e i 2unT =

n=0

an (e i 2uT )n

n=0

But G(z) for this filter is

an z n

n=0

and so

an e i 2unT = G(z)z = e i 2uT

n=0
i

Let G(z)z = e i 2uT = Ae .


Then

X
an e i 2unT = Ae i
n=0
40

(complex conjugate)

Hence r [k] = 21 e i (2ukT +)Ae i + 12 e i (2ukT +)Ae i


or r [k] = A cos(2ukT + + )

when f [k] = cos(2ukT + )

Thus A and represent the gain and phase of the frequency response. i.e.
the frequency response (as a complex quantity) is
G(z)|z=e i 2uT

41

Lecture 3 Restoration : global filters


3.1
3.1.1

Noise removal
Signal in additive noise

Consider a signal f (x) corrupted by noise n(x) such that


s(x) = f (x) + n(x)
If the freqency support of the signal and the noise is different then we may recover
the signal by (typically) low-pass filtering.

42

1
0.5
0
0.5
1

100

200

300

400

500

600

700

800

900

1000

100

200

300

400

500

600

700

800

900

1000

100

200

300

400

500

600

700

800

900

1000

5
2
1
0
1
2

Figure 3.1: Simple low-pass filtering of additive noise

43

3.1.2

Signal in multiplicative noise

Consider our signal corrupted by noise of the form:


s(x) = f (x)n(x)
One way we can work with this is to use the idea of homomorphic filtering.
Without loss of generality we can make s(x) > 0 for all x (ie just a DC shift).
Then we can take logs,
log s(x) = log f (x) + log n(x)
which is just an additive noise system. We can then (low-pass) filter this system
and invert the log (ie exponentiate) to obtain an enhanced signal or image.

44

1
0.5
0
0.5
1

100

200

300

400

500

600

700

800

900

1000

100

200

300

400

500

600

700

800

900

1000

100

200

300

400

500

600

700

800

900

1000

0.5

0
1

0.5

Figure 3.2: Homomorphic filtering of multiplicative noise.

45

3.2

Inverse filters

In general, we will have something more to contend with than just a noise process.
Consider a camera with a blur. Consider taking a picture of a moving car, consider
a microphone with a non-perfect transfer function.
In general then, the signal or image we obtain will be (considering here the
additive noise case),
s(x) = b(x) f (x) + n(x)
where n(x) is our noise process and b(x) is a blur or convolution kernel. How do
we proceed to obtain our enhanced, restored f(x)?
If we take a Fourier Transform of s(x) we obtain
S(u) = B(u)F (u) + N(u)
which yields
F (u) = B 1(u) (S(u) N(u))
or, if we can filter out the effects of n(x) easily (using a low-pass filter, for
example)
F (u) = B 1(u)S(u)
where B 1(u) is the inverse filter which corrects for the deterministic corruption
of the original. Consider a simple raised cosine blur, shown below...
46

0.04

0.03

0.02

0.01

10

20

30

40

50

When we know the convolution kernel, b(x), we can achieve almost perfect results...

47

50

100

150

200

250

300

350

400

450

500

50

100

150

200

250

300

350

400

450

500

50

100

150

200

250

300

350

400

450

500

1
0.5
0
0.5
1
5

The example above was all in the absence of the noise, n(x).

48

50

100

150

200

250

300

350

400

450

500

50

100

150

200

250

300

350

400

450

500

4
0
x 10

50

100

150

200

250

300

350

400

450

500

50

100

150

200

250

300

350

400

450

500

1
1

1
1

Noisy deconvolution is a messy business! Also, in all the examples looked at above
we assume we know the convolution kernel, b(x). This may be reasonable if we
measure the spread of a camera lens, e.g. but for arbitrary images and signals
we dont always know this. There are methods, so called blind deconvolution
49

approaches, which can solve this but they are outside the scope of this course.
3.2.1

Motion blur

Let the true image be f (x, y ). Let the displacement in the x, y directions be
(t), (t). Hence, over a time period T /2, T /2 the resultant image is given
by
Z
T /2

g(x, y ) =
T /2

f (x (t), y (t))dt

Taking the FT gives


Z T /2 Z Z
G(u, v ) =
f (x (t), y (t)) exp{i 2(ux + v y )}dxdy dt
T /2

Letting = x (t) and = y (t) so



Z T /2 Z Z
G(u, v ) =
f (, )e i 2(u+v )dd exp{i [u(t) + v (t)]}dt
T /2

The term in the brackets is just the FT of f (x, y ) ie F (u, v ) which is independent
of t, so
Z T /2
G(u, v ) = F (u, v )
exp{i 2[u(t) + v (t)]}dt
T /2
50

which is of the form


G(u, v ) = F (u, v )H(u, v )
or
g(x, y ) = f (x, y ) h(x, y )
If the motion is constant, i.e (t) = Kt and (t) = Lt then
Z T /2
H(u, v ) =
exp{i 2[uK + v L]t}dt
T /2

sin([uK + v L]T )
[uK + v L]

i.e. a sinc function.

51

52

3.2.2

Other noise

The noise looked at above was additive or multiplication noise in which each
datum of our signal was corrupted. Other noise exists, for example we may have
a certain probability of corruption, such that most pixels or samples are fine, but
once in a while we get a huge noise spike. We will look at this kind of noise, that
replaces the true sample with an outlying one, later in this course.
3.3

The Wiener filter

The Wiener Filter is a noise filter based on Fourier iteration. its main advantage
is the short computational time it takes to find a solution.
Consider a situation such that there is some underlying, uncorrupted signal f (t)
that is required to measure. Error occur in the measurement due to imperfection
in equipments, thus the output signal is corrupted. There are two ways the signal
can be corrupted: First, the equipment can convolve, or smear the signal. This
occurs when the equipment doesnt have a perfect delta function response to the
signal. Let r (t) be the smeared signal and g(t) be the (known) response that
caused the convoltion. Then r (t) is related to f (t) by:
r (t) = f (t) g(t)
53

or
R(u) = F (u)G(u)
where R, F, G are Fourier Transform of r, f , g. The second source of signal
corruption is the unknown background noise n(t). Therefore the measured signal
c(t) is a sum of r (t) and n(t):
c(t) = r (t) + n(t)
To deconvolve r to find f , simply divide R(u) by G(u) i.e.
F (u) =

R(u)
G(u)

in the absense of noise n. To deconvolve c where n is present then one needs


to find an optimum filter function (t) or (u) which filters out the noise and
gives a signal by:
C(u)(u)
F(u) =
G(u)
where f(t) is as close to the original signal as possible. For f(t) to be similar to
f (t), their squared difference is as close to zero as possible, i.e.
Z
|f (t) f(t)|2dt
54

or

Z
|F (u) F(u)|2du

is minimised. After substitution and using the fact that the expectation of cross
terms between R(u) and N(u) are zero we require the following to be minimised
Z

|G(u)|2 |R(u)|2|1 (u)|2 + |N(u)|2|(u)|2 du
The best filter is one where the above integral is a minimum at every value of u.
This is when,
|R(u)|2
(u) =
|R(u)|2 + |N(u)|2
Now we can make the approximation
|R(u)|2 + |N(u)|2 |C(u)|2
hence

|R(u)|2
(u)
|C(u)|2
From the above theory, it can be seen that a program can be written to Wiener
Filter signal from noise using Fourier Transform.
55

log power spectrum


|C|

(measured)
| N |2 (extrapolated)
| R | 2 (deduced)
u

Figure 3.3: Extracting Fourier terms needed in Wiener filter.

56

There is another way to Wiener filtering a signal but this time without Fourier
Transforming the data this is the Mean-Squared Method.
uses the fact that the Wiener Filter is one that is based
on the least-squared principle, i.e. the filter minimises the square error between
the actual output and the desired output. To do this, first the variance of the
data matrix is to be found. Then, a box window is moved over the image matrix,
moving one pixel at a time. In every box window, the local mean and variance
is found. The filtered value of each datum (at index x, y say) is found by the
following formula:
The Mean-squared Method

fx,y = x,y

2
x,y
s2
+
(cx,y x,y )
2
x,y

2
where fx,y is the filtered signal, x,y is the local mean, x,y
is the local variance, s 2
is the noise variance of the entire data matrix, and cx,y is the original signal value.
From the above formula, it can be see that if the original signal is similar to the
local mean then the filtered value will be the local mean, and if the original signal
is very different from the local mean, then it will be filtered to give a higher/lower
intensity signal depending on the differences. Also, if the local variance is similar
to the matrix variance, which is around 1 (i.e. only noise exists in the box) then
57

the filtered signal will be that of the local mean, which should be close to zero.
But if the local variance is much bigger than the matrix variance (i.e. when the
box is at the actual signal), then the signal will be amplified. As the box moves
through the entire matrix, it will calculate the solution to each pixel using the
above formula, thus filtering the data.
The Wiener filter has been here introduced in its 2-d guise, but it is trivial to
go to 1-d.

58

Figure 3.4: Wiener filter (bottom right) improves over flat average filter (bottom left). 5 pixel square used in
both cases.

59

1.5
1
0.5
0
0.5

100

200

300

400

500

600

700

800

900

1000

100

200

300

400

500

600

700

800

900

1000

100

200

300

400

500

600

700

800

900

1000

1.5
1
0.5
0
0.5
1.5
1
0.5
0
0.5

Figure 3.5: Wiener filter adaptively removing noise - estimation of mean and one .

60

Lecture 4 Local derivative filters


4.1

Estimating derivatives in signals & images

In many applications it is useful to be able to estimate simple measures of derivatives of signals & images, to detect for example level changes or edges.
Consider a simple 1-d signal, f (t). A simple backwards-difference (first order)
estimate of the gradient may be written as
f 0(t) =

1
(f (t) f (t 1))
t

We can also represent this as


f 0(t) = f (t) g(t)/t
where g(t) = [1, 1] is a filter kernel (i.e. a simple linear filter). We must
remember that such a filter has error of order O(t) whereas the central difference
61

estimate of the gradient is better, with O(t 2). Letting t = 1 for ease of
nomenclature gives
1
f 0(t) = (f (t + 1) f (t 1)) = f (t) g(t)
2
where now g(t) = 21 [1, 0, 1]. So we have a local filter. This general approach
can, of course, be extended to higher-order derivatives. The second derivative,
f 00(t) may be estimated as
f 00(t) =

d 0
f (t) = f (t) (g(t) g(t))
dt

where the filter kernel that does the work is (g(t) g(t)). Letting g(t) = [1, 1]
then we get a filter for the 2nd derivative as
g2(t) = [1, 2, 1]
which you will have seen before. To make sure that the signal power out equals
that in we can normalise with a factor of |1| + | 2| + |1| = 4 so now g2(t) =
1/4[1, 2, 1]. The figures show this working on a signal.

62

f(t)

0.5

10

20

30

40

50

60

70

80

90

100

10

20

30

40

50

60

70

80

90

100

10

20

30

40

50

60

70

80

90

100

f(t)*g(t)

0.05

0.05

f(t)*g(t)*g(t)

x 10
2
0
2
0

Figure 4.1: Derivative estimates

63

We can, of course, simply extend this to images by running over the rows and
columns using one of the above filters, but normally we will extend these 1-d
filters to a 2-d mask, typically making the mask square. For example, the 1-d
central difference filter has a kernel (ignoring the factor of 1/2 for a moment)
g(t) = [1, 0, 1]. In its 2-d guise, operating along rows (x-direction), a 2d mask
may be made

1 0 1
gx (x, y ) = 1 0 1
1 0 1
and similarly in the y -direction

1 1 1
gy (x, y ) = 0 0 0
1 1 1

64

Figure 4.2: Derivative estimates on a simple image in y , x directions and magnitude.

65

We may formulate the total gradient in a 2-d setting via


q
g(x, y ) = gx2 + gy2
which is shown in the right-hand subplot of the figure.
The 2-d second derivatives may also easily be inferred via

1 2 1
g2x (x, y ) = 1 2 1
1 2 1
and similarly in the y -direction

1 1 1
g2y (x, y ) = 2 2 2
1 1 1

and hence the 2-d Laplacian may be given as

2 1 2
2(x, y ) = g2x (x, y ) + g2y (x, y ) = 1 4 1
2 1 2

and these give edge information, as valid potential edge points have 2(x, y ) = 0.
The Figure shows |2(x, y )| applied to the house image.
66

Figure 4.3: Laplacian estimates on a simple image.

67

The major problems with such simple gradient estimators, in both 1- and 2-d,
lie in their lack of robustness in the presence of noise. The way around this is
to use simple parametric functions, such as polynomials (as covered in the A1
course) or splines and then take derivatives of the fitted functions. The Figure
shows the profound improvement for a simple signal using this approach.

68

2
1
0
1
2

10

10

10

2
1
0
1
2

2
1
0
1
2

Figure 4.4: Gradient estimates on a simple signal. Top: noisy signal, middle: simple gradient filter, bottom:
gradient of polynomial fit.

69

4.2

Motion analysis

In many machine applications we would like to estimate the motion of objects which can be caused by
Moving objects
Moving camera, e.g. on a robot arm or an autonomous robot
Moving objects and camera
In essence we ask the question: is there a difference in frames? We may use a
simple difference operation, either globally or over a small window
X
f (x, y , t) =
|f (x, y , t) f (x, y , t 1)|
x,y

however this only tells us about total pixel intensity changes - we would like to
obtain motion vectors on a pixel by pixel basis.
4.3

The motion field

A 2-D representation of a 3-D motion is called the motion field. Each pixel in
the image has a velocity vector v = (vx , vy )T . This is the motion that we wish
70

to estimate and use for moving object evaluation.


We first consider some different motions.

Figure 4.5: Basic motion configurations.

71

Note that twisting and scaling both resemble deformation in that the object
appears to be non-rigid. This makes them more difficult to handle. To simplfy
much motion analysis a set of motion assumptions are often made.
1. Motion has a maximum velocity
2. Small (negligable) accelerations occur over the interval dt
3. Common motion all points in the same object move in the same way
4. Mutual correpondence objects remain rigid
One popular method of motion field estimation is optic flow
Optic flow
Assumes we have access to frames with a small dt
Can give false information e.g. spinning sphere or changing illumination
Assumes, therefore, that the illumination is constant so the observed brighness of objects is constant over t
Nearby points in the motion field have similar motion i.e. the velocity field
is smooth
72

Let f (x, y , t) be the grey-level at pixel x, y at time t. Consider the expansion


of partial derivatives
df = fx dx + fy dy + ft dt
if f remains a constant function (brightness) then df 0 whence
ft = fx vx + fy vy = f v
The most commonly used method to solve for the velocities is the Lucas-Kanade
method which assumes places a small neighbourhood around the pixel of interest,
in which it is assumed that the motion vector is stationary. If we define the set
{p1, p2, ..., pn } as the pixels in the neighbourhood then
fx (p1)vx + fy (p1)vy
fx (p2)vx + fy (p2)vy
...
fx (pn )vx + fy (pn )vy

= ft (p1)
= ft (p2)
...
= ft (pn )

which can be written in the form


Av = b
with solution given as
v = (AT A)1AT b
73

using the pseudo inverse of A.


Alternate approaches, such as the Horne-Schunck method assume that the
motion field is smooth. If we add the smoothness constraint from the motion
assuptions, then we may seek to minimize an energy function given by
E(x, y , t) = (fx vx + fy vy + ft )2 + 2(|vx |2 + |vy |2)
where is the smoothness multiplier of the system. We may reduce this to
solving a set of simultaneous differential equations
fx2vx + fx fy vy + fx ft 2vx = 0
fy fx vx + fy2vy + fy ft 2vy = 0
in which is the Laplacian operator,
2
2
= 2 + 2.
x
y
The choice of changes the importance we give to the criterion of velocity
smoothness in the optical flow estimation. If we allow to be small we are liable
to find optic flow in textured areas as the motion assuptions inevitably fail. What
we wish, therefore, to estimate the smoothest velocity field consistent with the
motion. The optimal value of may be found by trial and error or by a more
74

principled approach.
OF in motion analysis
We will assume that perceived motion in the image may arise from four mechanisms
1. Translation perpendicular to the line of sight
2. Scaling translation along the line of sight
3. Rotation in plane perpendicular to line of sight
4. Rotation in line of sight plane
Note that we assume a rigid body (no deformation).
Optic flow can analyse these four motions
1. A set of parallel velocity vectors
2. A set of vectors with a common focal point
3. A set of concentric vectors
4. Sets of vectors anti-parallel to one another
75

Figure 4.6: Four basic motions detectable by optic flow.

76

Figure 4.7: Example of optical flow applied to car motionm using the Lucas-Kanade method.

77

Lecture 5 - Stochastic Models


5.1

Introduction

We now discuss autocorrelation and autoregressive processes; that is, the correlation between successive values of a time series and the linear relations between
them. We also show how these models can be used for spectral estimation.
5.2

Autocorrelation

Given a time series xt we can produce a lagged version of the time series xtT
which lags the original by T samples. We can then calculate the covariance
between the two signals
N
1 X
xx (T ) =
(xtT x )(xt x )
(5.1)
N 1 t=1
78

where x is the signal mean and there are N samples. We can then plot xx (T )
as a function of T . This is known as the autocovariance function. The autocorrelation function is a normalised version of the autocovariance
rxx (T ) =

xx (T )
xx (0)

(5.2)

Note that xx (0) = x2. We also have rxx (0) = 1. Also, because xy = y x we
have rxx (T ) = rxx (T ); the autocorrelation (and autocovariance) are symmetric
functions or even functions. Figure 5.1 shows a signal and a lagged version of it
and Figure 5.2 shows the autocorrelation function.

79

6
5
4
3
2
1
0
0

20

40

60

80

100

t
Figure 5.1: Signal xt (top) and xt+5 (bottom). The bottom trace leads the top trace by 5 samples. Or we
may say it lags the top by -5 samples.

80

0.5

0.5
100

50

0
Lag

(a)

50

100

Figure 5.2: Autocorrelation function for xt . Notice the negative correlation at lag 20 and positive correlation
at lag 40. Can you see from Figure 5.1 why these should occur?

81

5.3

Autoregressive models

An autoregressive (AR) model predicts the value of a time series from previous
values. A pth order AR model is defined as
xt =

p
X

xti ai + et

(5.3)

i =1

where ai are the AR coefficients and et is the prediction error. These errors
are assumed to be Gaussian with zero-mean and variance e2. It is also possible
to include an extra parameter a0 to soak up the mean value of the time series.
Alternatively, we can first subtract the mean from the data and then apply the
zero-mean AR model described above. We would also subtract any trend from
the data (such as a linear or exponential increase) as the AR model assumes
stationarity.
The above expression shows the relation for a single time step. To show the
relation for all time steps we can use matrix notation.
We can write the AR model in matrix form by making use of the embedding
matrix, M, and by writing the signal and AR coefficients as vectors. We now

82

illustrate this for p = 4. This gives

x4

x
M = 5
..
xN1

x3
x4
..
xN2

x2
x3
..
xN3

x1
x2
..
xN4

(5.4)

We can also write the AR coefficients as a vector a = [a1, a2, a3, a4]T , the
errors as a vector e = [e5, e6, ..., eN ]T and the signal itself as a vector X =
[x5, x6, ..., xN ]T . This gives

x5
x4
x3
x2
x1
a1
e5

x4
x3
x2 a2 e6
x6 x5
(5.5)

=
+

..
..
..
.. ..
a3 ..
xN
xN1 xN2 xN3 xN4
a4
eN
which can be compactly written as
X = Ma + e

(5.6)

The AR model is therefore a special case of the multivariate regression model


(compare the above equation to that given in the second lecture). The AR
coefficients can therefore be computed from the equation

a = (MT M)1MT X
83

(5.7)

The AR predictions can then be computed as the vector


= M
X
a

(5.8)

The variance of the noise is then


and the error vector is then e = X X.
calculated as the variance of the error vector.
To illustrate this process we analyse our data set using an AR(4) model. The
AR coefficients were estimated to be

a = [1.46, 1.08, 0.60, 0.186]T

(5.9)

and the AR predictions are shown in Figure 5.3. The noise variance was estimated
to be e2 = 0.079 which corresponds to a standard deviation of 0.28. The
variance of the original time series was 0.3882 giving a signal to noise ratio of
(0.3882 0.079)/0.079 = 3.93.

84

1.5
1
0.5
0
0.5
1
1.5
0

20

40

60

80

100

60

80

100

(a)
1.5
1
0.5
0
0.5
1
1.5
0

(b)

20

40

85 t

from an AR(4) model and (b)


Figure 5.3: (a) Original signal (solid line), X, and predictions (dotted line), X,
the prediction errors, e. Notice that the variance of the errors is much less than that of the original signal.

5.3.1

Random walks

If p = 1 and a1 = 1 then the AR model reduces to a random walk model, an


example of which is shown in Figure 5.4.
8
6
4
2
0
2
4
0

20

40

60

Figure 5.4: A random walk.

86

80

100

5.3.2

Relation to autocorrelation

The autoregressive model can be written as


xt = a1xt1 + a2xt2 + ... + ap xtp + et

(5.10)

If we multiply both sides by xtk we get


xt xtk = a1xt1xtk + a2xt2xtk + ... + ap xtp xtk + et xtk

(5.11)

If we now sum over t and divide by N 1 and assume that the signal is zero
mean (if it isnt we can easily make it so, just by subtracting the mean value from
every sample) the above equation can be re-written in terms of covariances at
different lags
xx (k) = a1xx (k 1) + a2xx (k 2) + ... + ap xx (k p) + e,x

(5.12)

where the last term e,x is the covariance between the noise and the signal. But
as the noise is assumed to be independent from the signal e,x = 0. If we now
divide every term by the signal variance we get a relation between the correlations
at different lags
rxx (k) = a1rxx (k 1) + a2rxx (k 2) + ... + ap rxx (k p)
87

(5.13)

This holds for all lags. For an AR(p) model we


the first p lags. For p = 4

rxx (1)
rxx (0) rxx (1) rxx (2)


rxx (2) rxx (1) rxx (0) rxx (1)
=

rxx (3) rxx (2) rxx (1) rxx (0)


rxx (4)
rxx (3) rxx (2) rxx (1)

can write this relation out for

rxx (3)
a1

rxx (2) a2

rxx (1) a3
rxx (0)
a4

(5.14)

which can be compactly written as


r = Ra

(5.15)

where r is the autocorrelation vector and R is the autocorrelation matrix. The


above equations are known, after their discoverers, as the Yule-Walker relations.
They provide another way to estimate AR coefficients
a = R1r

(5.16)

This leads to a more efficient algorithm than the general method for multivariate linear regression (equation 5.7) because we can exploit the structure in the
autocorrelation matrix. By noting that rxx (k) = rxx (k) we can rewrite the

88

correlation matrix as

r (1)
R = xx
rxx (2)
rxx (3)

rxx (1)
1
rxx (1)
rxx (2)

rxx (2)
rxx (1)
1
rxx (1)

rxx (3)

rxx (2)

rxx (1)
1

(5.17)

Because this matrix is both symmetric and a Toeplitz matrix (the terms along
any diagonal are the same) we can use a recursive estimation technique known
as the Levinson-Durbin algorithm.
5.3.3

Relation to partial autocorrelation

The partial correlation coefficients in an AR model are known as reflection coefficients. At lag m, the partial correlation between xtm and xt , is written as km ;
the mth reflection coefficient. It can be calculated as the relative reduction in
prediction error
Em1 Em
km =
(5.18)
Em1
where Em is the prediction error from an AR(m) model. The reflection coefficients are to the AR coefficients what the correlation is to the slope in a univariate
89

AR model; if the mth reflection coefficient is significantly non-zero then so is the


mth AR coefficient and vice-versa. This enables an intelligent selection of the
order of the model.
The Levinson-Durbin algorithm computes reflection coefficients as part of a
recursive algorithm for computing the AR coefficients. It finds k1 and from it
calculates the AR coefficient for an AR(1) model, a1. It then computes k2 and
from it calculates the AR coefficients for an AR(2) model (a2 is computed afresh
and a1 is re-estimated from a1 for the AR(1) model - as it will be different).
The algorithm continues by calculating km and the coefficients for AR(m) from
AR(m 1).
5.4

Moving Average Models

A Moving Average (MA) model of order q is defined as


xt =

q
X

bi eti

(5.19)

i =0

where et is Gaussian random noise with zero mean and variance e2. They are a
type of FIR filter. These can be combined with AR models to get Autoregressive
90

Moving Average (ARMA) models


xt =

p
X

ai xti +

i =1

q
X

bi eti

(5.20)

i =0

which can be described as an ARMA(p,q) model. They are a type of IIR filter.
Usually, however, FIR and IIR filters have a set of fixed coefficients which
have been chosen to give the filter particular frequency characteristics. In MA or
ARMA modelling the coefficients are tuned to a particular time series so as to
capture the spectral characteristics of the underlying process.
5.5

Spectral Estimation

Autoregressive models can also be used for spectral estimation. An AR(p) model
predicts the next value in a time series as a linear combination of the p previous
values
p
X
xt =
ak xtk + et
(5.21)
k=1

where ak are the AR coefficients and et is IID Gaussian noise with zero mean and
variance 2.
91

The above equation can be solved by using the z-transform. This allows the
equation to be written as
ap z tp + ap1z t(p1) + ... + z t = et
It can then be rewritten in terms of the pulse transfer function
e
Pp t
H(z) =
1 + k=1 ak z k

(5.22)

(5.23)

Given that any complex number can be written in exponential form


z = exp(i 2uTs )

(5.24)

where u is frequency and Ts is the sampling period we can see that the frequency
domain characteristics of an AR model are given by
e2Ts
Pp
P (f ) =
|1 + k=1 ak exp(ik2uTs )|2

(5.25)

An AR(p) model can provide spectral estimates with p/2 peaks; therefore if
you know how many peaks youre looking for in the spectrum you can define the
AR model order. Alternatively, AR model order estimation methods should automatically provide the appropriate level of smoothing of the estimated spectrum.
92

35

25

30
20
25
15

20
15

10

10
5
5

(a)

0
0

10

20

30

40

50

60

70

(b)

0
0

10

20

30

40

50

60

70

Figure 5.5: Power spectral estimates of two sinwaves in additive noise using (a) Discrete Fourier transform
method and (b) autoregressive spectral estimation.

AR spectral estimation has two distinct advantages over methods based on


the Fourier transform (i) power can be estimated over a continuous range of
frequencies (not just at fixed intervals) and (ii) the power estimates have less
variance.

93

Lecture 6 - Multi-variate systems


6.1

Introduction

We now consider the situation where we have a number of time series and wish to
explore the relations between them. We first look at the relation between crosscorrelation and multivariate autoregressive models and then at the cross-spectral
density and coherence.
6.2

Cross-correlation

Given two time series xt and yt we can delay xt by T samples and then calculate
the cross-covariance between the pair of signals. That is
N
1 X
(xtT x )(yt y )
(6.26)
xy (T ) =
N 1 t=1
94

where x and y are the means of each time series and there are N samples
in each. The function xy (T ) is the cross-covariance function. The crosscorrelation is a normalised version
rxy (T ) = p

xy (T )
xx (0)y y (0)

(6.27)

where we note that xx (0) = x2 and y y (0) = y2 are the variances of each
signal. Note that
xy
rxy (0) =
(6.28)
x y
which is the correlation between the two variables. Therefore unlike the autocorrelation, rxy is not equal to 1 even at zero lag/lead. Figure 6.1 shows two time
series and their cross-correlation.

95

4
3
2
1
0
1
2
0

20

40

60

80

Figure 6.1: Signals xt (top) and yt (bottom).

96

100

0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
100

50

50

100

Figure 6.2: Cross-correlation function rxy (T ) for the data in Figure 6.1. A lag of T denotes the top series, x,
lagging the bottom series, y . Notice the big positive correlation at a lag of 25. Can you see from Figure 6.1
why this should occur ?

97

6.2.1

Cross-correlation is asymmetric

First, we re-cap as to why the auto-correlation is a symmetric function. The


autocovariance, for a zero mean signal, is given by
N
1 X
xx (T ) =
xtT xt
(6.29)
N 1 t=1
This can be written in the shorthand notation
xx (T ) = hxtT xt i

(6.30)

where the angled brackets denote the average value or expectation. Now, for
negative lags
xx (T ) = hxt+T xt i
(6.31)
Subtracting T from the time index (this will make no difference to the expectation) gives
xx (T ) = hxt xtT i
(6.32)
which is identical to xx (T ), as the ordering of variables makes no difference to
the expected value. Hence, the autocorrelation is a symmetric function.
The cross-correlation is a normalised cross-covariance which, assuming zero
mean signals, is given by
xy (T ) = hxtT yt i
(6.33)
98

and for negative lags


xy (T ) = hxt+T yt i

(6.34)

Subtracting T from the time index now gives


xy (T ) = hxt ytT i

(6.35)

which is different to xy (T ). To see this more clearly we can subtract T once


more from the time index to give
xy (T ) = hxtT yt2T i

(6.36)

Hence, the cross-covariance, and therefore the cross-correlation, is an asymmetric


function.
To summarise: moving signal A right (forward in time) and multiplying with
signal B is not the same as moving signal A left and multiplying with signal B;
unless signal A equals signal B.
6.2.2

Windowing

When calculating cross-correlations there are fewer data points at larger lags than
at shorter lags. The resulting estimates are commensurately less accurate. To
take account of this the estimates at long lags can be smoothed using various
window operators.
99

6.2.3

Time-Delay Estimation

If we suspect that one signal is a, possibly noisy, time-delayed version of another


signal then the peak in the cross-correlation will identify the delay. For example,
figure 6.1 suggests that the top signal lags the bottom by a delay of 25 samples.
Given that the sample rate is 125Hz this corresponds to a delay of 0.2 seconds.
6.3

Multivariate Autoregressive models

A multivariate autoregressive (MAR) model is a linear predictor used for modelling


multiple time series. An MAR(p) model predicts the next vector value in a ddimensional time series, xt (a row vector) as a linear combination of the p previous
vector values of the time series
x(t) =

p
X

x(t k)a(k) + et

(6.37)

k=1

where each ak is a d by d matrix of AR coefficients and et is an IID Gaussian


noise vector with zero mean and covariance C. There are a total of np = pd d
AR coefficients and the noise covariance matrix has d d elements. If we write
100

the lagged vectors as a single augmented row vector

x(t) = [x(t 1), x(t 2), ..., x(t p)]

(6.38)

and the AR coefficients as a single augmented matrix


A = [a(1), a(2), ..., a(p)]T

(6.39)

then we can write the MAR model as


x(t) =
x(t)A + e(t)

(6.40)

The above equation shows the model at a single time point t.


The equation for the model over all time steps can be written in terms of the
whose tth row is
embedding matrix, M,
x(t), the error matrix E having rows
e(t + p + 1) and the target matrix X having rows x(t + p + 1). This gives
+E
X = MA

(6.41)

which is now in the standard form of a multivariate linear regression problem. The
AR coefficients can therefore be calculated from

T 1 T

M X
(6.42)
A= M M
and the AR predictions are then given by

x(t) =
x(t)A
101

(6.43)

The predicion errors are


e(t) = x(t)
x(t)

(6.44)

and the noise covariance matrix is estimated as


1
eT (t)e(t)
C=
N np

(6.45)

The denominator N np arises because np degrees of freedom have been used


up to calculate the AR coefficients (and we want the estimates of covariance to
be unbiased).
6.3.1

Example

Given two time series and a MAR(3) model, for example, the MAR predictions
are

x(t) =
x(t)A

a(1)

x(t) = [x(t 1), x(t 2), x(t 3)] a(2)


a(3)

102

x1(t)
x2(t)

a11(1)
a (1)
21


a (2)
= x1(t 1), x2(t 1), x1(t 2), x2(t 2), x1(t 3), x2(t 3) 11
a21(2)

a11(3)
a21(3)

(6.46)

a12(1)
a22(1)

a12(2)

a22(2)

a12(3)
a22(3)

Applying an MAR(3) model to our data set gave the following estimates for
the AR coefficients, ap , and noise covariance C, which were estimated from
equations 6.42 and 6.45


1.2813 0.2394
a1 =
0.0018 1.0816

a2 =

a3 =

0.7453 0.2822
0.0974 0.6044

0.3259 0.0576
0.0764 0.2699
103


C=

0.0714 0.0054
0.0054 0.0798

104

4
3
2
1
0
1
2
0

20

40

60

80

100

t
Figure 6.3: Signals x1 (t) (top) and x2 (t) (bottom) and predictions from MAR(3) model.
105

6.4

Cross Spectral Density

Just as the Power Spectral Density (PSD) is the Fourier transform of the autocovariance function we may define the Cross Spectral Density (CSD) as the
Fourier transform of the cross-covariance function

X
P12() =
x1x2 (n) exp(in)
(6.47)
n=

Note that if x1 = x2, the CSD reduces to the PSD. Now, the cross-covariance
of a signal is given by
x1x2 (n) =

x1(l)x2(l n)

(6.48)

l=

Substituting this into the earlier expression gives


P12() =

X
X

x1(l)x2(l n) exp(in)

(6.49)

n= l=

By noting that
exp(in) = exp(il) exp(ik)
106

(6.50)

where k = l n we can see that the CSD splits into the product of two integrals
P12() = X1()X2()

(6.51)

where
X1() =
X2(w ) =

x1(l) exp(il)

(6.52)

l=

x2(k) exp(+ik)

k=

For real signals X2() = X2() where * denotes the complex conjugate.
Hence, the cross spectral density is given by
P12() = X1()X2()

(6.53)

This means that the CSD can be evaluated in one of two ways (i) by first
estimating the cross-covariance and Fourier transforming or (ii) by taking the
Fourier transforms of each signal and multiplying (after taking the conjugate of
one of them). A number of algorithms exist which enhance the spectral estimation
ability of each method.

107

The CSD is complex

The CSD is complex because the cross-covariance is asymmetric (the PSD is


real because the auto-covariance is symmetric; in this special case the Fourier
transorm reduces to a cosine transform).
6.4.1

More than two time series

The frequency domain characteristics of a multivariate time-series may be summarised by the power spectral density matrix. For d time series

P11() P12() P1d ()

P21() P22() P2d ()


P() =
(6.54)

..........................
Pd1() Pd2() Pdd ()
where the diagonal elements contain the spectra of individual channels and the
off-diagonal elements contain the cross-spectra. The matrix is called a Hermitian
matrix because the elements are complex numbers.

108

6.4.2

Coherence and Phase

The complex coherence function is given by


Pij ()
p
rij () = p
Pii () Pjj ()

(6.55)

The coherence, or magnitude squared coherence (MSC), between two channels


is given by
MSCij () =| rij () |2
(6.56)
The phase spectrum, between two channels is given by


Im(r
())
ij
i j () = tan1
Re(rij ())

(6.57)

The MSC measures the linear correlation between two time series at each frequency and is directly analagous to the squared correlation coefficient in linear
regression. As such the MSC is intimately related to linear filtering, where one
signal is viewed as a filtered version of the other. This can be interpreted as a
linear regression at each frequency. The optimal regression coefficient, or linear
filter, is given by
Pxy ()
(6.58)
H() =
Pxx ()
109

This is analagous to the expression for the regression coefficient a = xy /xx .


The MSC is related to the optimal filter as follows
2
rxy
() = |H()|2

Pxx ()
Py y ()

(6.59)

which is analagous to the equivalent expression in linear regression r 2 = a2(xx /y y ).


At a given frequency, if the phase of one signal is fixed relative to the other,
then the signals can have a high coherence at that frequency. This holds even if
one signal is entirely out of phase with the other (note that this is different from
adding up signals which are out of phase; the signals cancel out. We are talking
about the coherence between the signals).
At a given frequency, if the phase of one signal changes relative to the other
then the signals will not be coherent at that frequency. The time over which the
phase relationship is constant is known as the coherence time.
6.4.3

Welchs method for estimating coherence

Algorithms based on Welchs method (such as the cohere function in the matlab
system identification toolbox) are widely used. The signal is split up into a number
of segments, N, each of length T and the segments may be overlapping. The
110

complex coherence estimate is then given as


PN
n
n

n=1 Xi ()(Xj ())


ri j () = qP
qP
N
N
n
n
2
2
X
()
n=1 i
n=1 Xj ()

(6.60)

where n sums over the data segments. This equation is exactly the same form
as for estimating correlation coefficients. Note that if we have only N = 1 data
segment then the estimate of coherence will be 1 regardless of what the true
value is (this would be like regression with a single data point). Therefore, we
need a number of segments.
Note that this only applies to Welch-type algorithms which compute the CSD
from a product of Fourier transforms. We can trade-off good spectral resolution
(requiring large T ) with low-variance estimates of coherence (requiring large N
and therefore small T ). To an extent, by increasing the overlap between segments
(and therefore the amount of computation, ie. number of FFTs computed) we
can have the best of both worlds.

111

6.4.4

Multivariate Autoregressive (MAR) models

Just as the PSD can be calculated from AR coefficients so the PSDs and CSDs
can be calculated from MAR coefficients. First we compute
A() = I +

p
X

ak exp(ikT )

(6.61)

where I is the identity matrix, is the frequency of interest and T is the sampling
period. A() will be complex. This is analagous to the denominator in the
P
equivalent AR expression (1 + pk=1 ak exp(ikt)). Then we calculate the
PSD matrix as follows


1
T 1
PMAR () = T [A()] C A()
(6.62)
where C is the residual covariance matrix. Once the PSD matrix has been
calculated, we can calculate the coherences of interest using equation 6.56.
6.5

Example

To illustrate the estimation of coherence we generated two signals. The first,


x, being a 10Hz sine wave with additive Gaussian noise of standard deviation
112

0.9

0.9

0.8
0.8
0.7
0.6

0.7

0.5

0.6

0.4
0.5
0.3
0.4

0.2

(a)

0.1
0

10

20

30

40

50

60

70

(b)

0.3
0

10

20

30

40

50

60

70

Figure 6.4: Coherence estimates from (a) Fourier transform method and (b) Multivariate Autoregressive
model.

0.3 and the second y being equal to the first but with more additive noise of
the same standard deviation. Five seconds of data were generated at a sample
rate of 128Hz. We then calculated the coherence using (a) Welchs modified
periodogram method with N = 128 samples per segment and a 50% overlap
between segments and smoothing via a Hanning window and (b) an MAR(8)
model.

113

Ideally, we should see a coherence near to 1 at 10Hz and zero elsewhere.


However, the coherence is highly non-zero at other frequencies. This is because
due to the noise component of the signal there is power (and some cross-power)
at all frequencies. As coherence is a ratio of cross-power to power it will have a
high variance unless the number of data samples is large. You should therefore
be careful when interpreting coherence values.

114

Lecture 7 Non-linear filters


7.1

Order statistic filters the median filter

Consider a set of p digital samples from f [n p + 1] : f [n]. Order the set from
smallest to largest, and let this sample set be z[1] : z[p]. Order statistic filters
thence perform linear filtering on this ordered set, rather than the original.
What use might this be? Consider a signal, corrupted by spiking noise, as in
Fig 7.1.

115

3.5
3
2.5
2
1.5
1
0.5
0

100

200

300

400

500

600

700

800

900

1000

100

200

300

400

500

600

700

800

900

1000

1
0.8
0.6
0.4
0.2
0

Figure 7.1: (top) Spiky signal. (bottom) median filtered output, p = 5

116

We see that the signal (a sine wave) is corrupted by impulsive noise. The
effect of this noise is that the maximal and minimal values we observe in any
small window on the data are likely to be caused by the noise process, not from
the signal.
If we apply a filter mask to the ordered set z[k] such that the output of the
filter is just the middle value in the set (I assume that p is odd). This value is the
median of the values and the filter is refered to as the median filter. How well it
does at removing impulsive noise is shown in the lower plot of the figure.
7.2

Mathematical morphology

In general, such order statistic filters can be generalised to a set of filter operations
which respond to the shape of the data. Mathematical Morphological filters
allow both 1-d and 2-d (images in grey-scale and binary images) filtering using
this idea.
Algebraic system of operators acting on complex shapes
Decompose into meaningful parts
Separate from extraneous parts
117

Optimal reconstruction from distorted, noisy forms


Example of non-morphological algebraic system convolution & its frequency
domain representation. We may describe any function as a sum of harmonic
components. This enables the undoing of noise corruption etc. by filtering i.e.
convolving with some kernel function.
What the algebra of convolution does for linear systems, so the algebra of mathematical morphology does for shape.
Shape is a primary carrier of information in machine vision and in many signal
processing applications.
Identification & decomposition of objects
Binary morphology
Grey-scale morphology
Set theory
118

Apply to sets in any dimension


(signals & binary vision d = 2, grey-scale d = 3)
7.3

Primary morphological operations

Dilation
Erosion
7.3.1

Dilation

Combinations of two sets


Vector addition of set elements
If A, B are sets in N-space (E N ) with element sets a = {a1, ..., aN } and b =
{b1, ..., bN } then the dilation of A by B is defined as
A B = {c E N | c = a + b for all [a A, b B]}
This operation may be more simply regarded as a translation operation. If Ba
represents a translation such that the origin of B now lies on point a of A, then
[
AB =
Ba
aA
119

The following figure illustrates this

Figure 7.2: Binary dilation

Commutative, A B = B A
[
AB =
Ab
bB

A represents image, B the structuring element


120

O B or not? (see figure)


Dilation is associative
(A B) C = A (B C)
i.e. a chain rule

Figure 7.3: O B or not?

121

Figure 7.4: Dilation. Empty (blank) squares are now coloured dark, and binary 1 squares are white. (left)
Original data, (middle) Structure element with origin at middle left and (right) resultant dilation.

Figure 7.5: Dilation. (left) data, (middle) structure element with origin at centre and (right) dilated data.

122

What do different structure elements do?


Dilation by a disk isotropic expansion
Dilation by a rod row or column expansion
Example : noise removal by dilation. Consider
\
J = I (I N4)
If a pixel has any 4-neighbours then it will be in I N4, if not then it wont. So
T
I (I N4) gives only those points which are 4-connected.
Some other points
If Bt is a translated structure element, then
(B

A Bt = (A B)t

S
C) A = (B A) (C A)

If J = A B where origin B then A J this is known as an extensive


operator
If A B then

AK BK
123

Figure 7.6: Noise removal by intersection with dilated image. (top left) data, (top right) structure element,
origin in centre, (bottom left) dilation and (bottom right) output of intersection.

124

7.4

Binary erosion

Morphological dual to dilation


A B = {x E N | x + b A for all b B}
i.e. this may be regarded as a search for B in A.

Figure 7.7: Binary erosion. (left) data, (middle) structure element, origin in centre, (right) eroded output.

Figure 7.8: Binary erosion. (left) data, (middle) structure element, origin in centre, (right) eroded output.

125

Dilation regarded as a union of translates


Erosion regarded as an intersection of negative translates
\
A B =
Ab
bB

Like dilation, erosion is increasing so if A B then A K B K


Eroding with a larger element gives a smaller result
Dilation & erosion bear a foreground/background relationship. This is formally
represented as a duality relationship.
For example, de Morgans laws for binary sets are have dual relationships between and and or functions.
[
\
c
c
(A B) = A
Bc
For binary images, there are two complement operators
Logical negation, A 7 Ac
Geometric reflection about the origin, A 7 AR
126

(A B)c = Ac B R
and
(A B)c = Ac B R
Erosion is not associative, i.e.
(A B) C 6= A (B C)
Any large erosion can be accomplished using a series of smaller ones
Duality does not imply cancellation
if A = B C
A C = (B C) C 6= B

127

7.5

Uses

There are many uses of erosion / dilation, for example


Use of erosion to calculate the genus of a binary image (the number of
connected components minus the number of holes)
The hit-and-miss transform (HaM). This Selects out pixels which have
certain geometric properties (e.g. corner points).
Thickening and thinning
Template matching let the template be set T contained in a window W .
Look at all x in I for T and for W T in I c
Tx I and (W T )x I c
or, in terms of the HaM transform
I (T, [W T ])

128

7.6

Opening and Closing

Perform morphological (shape) filtering. Analogous to bandpass/stop filters


in frequency domain
Can select sub & super shapes of an image
Opening / closing like ideal BPFs. Once performed, repeated application
changes nothing (images are then said to be open or closed with respect to
some structure element)

129

Opening
The opening of image A with stucture element K is defined, in terms of erosion/dilation as :
A K = (A K) K
If we let B = A K then the above becomes
AK =BK
Remembering that the set-element definition of dilation is the union of translates,
i.e.
[
BK =
Kb
bB

hence

AK =

Ky

y A K

and as the erosion A K A so

AK =

Ky

Ky A

This means that we can picture opening as sweeping K over A from the inside,
never allowing any part of K to go outside the boundary of A.
130

K=
A

A K

Figure 7.9: Opening sweeping of element inside the image

131

7.6.1

An example of shape extraction

A
w
r

A K

A - (A K)
Figure 7.10: Shape extraction using opening

In morphological shape extraction, the order is important


Apply the larger primitive first
Eliminate the small structures first
132

7.7

Closing

Opening sweeps over the inside of a binary object. Consider now the case of
complementing the image A 7 Ac .
Open this shape with element K R (the reflection of K)
This sweeps over the inside of Ac
Any small gaps between structures in A are small links in Ac and are removed
Complement the resultant image
(Ac K R )c
This is defined as the closing of A by K. We may think of this as sweeping the
outside of A using a structure element, K R .
a duality exists (this can be proved from the dilation-erosion dualities also)
between opening and closing
(A K)c = Ac K R
(A K)c = Ac K R

133

K=
A

A K

Figure 7.11: Closing sweeping of element outside the image


134

Some more identities...


A K = (A K) K
A K = (A K) K
(A K) K = A K
(A K) K = A K
There are a variety of uses that morphological processing is put to, for example,
to enhance fine details in an image or to remove noise.

135

7.7.1

The top hat transform

The THT is defined as


T = A (A K)
the resultant image, T , hence consists of those pixels in A whose local distribution
is smaller than that of the structure element K. We may thus think of the THT
as a kind of morphological high-pass filter.

Figure 7.12: Original, binary image, opened image, THT image. Structure element is 5 pixel square.

136

7.7.2

Removing noise

There are many types of noise in grey-scale images, if we concentrate on binary images then noise is salt-and-pepper (a.k.a. drop-out), where pixels are
randomly complemented. One strategy for removing such noise is to open then
close with a disk element, D, where the radius of D is small compared to the
local boundary curvature of components in the image, but is larger than the local
noise (which is pixel-by-pixel). A good choice may be r = 3 pixels, say.
Perform (I D) D
Opening removes impulsive noise in the background
Closing removes dropout noise in objects
Trade-off between noise removal and boundary changing
At high noise levels, noise may by chance floc into clumps which are not
removed by a small disk. But a larger disk smooths the object boundaries...

137

7.8

Grey-scale Morphology

We have concentrated so far on binary images often we want to directly process


the grey-scale original. G-S morphology lets us operate on shapes defined by their
grey-scale levels and spatial attributes (i.e. the image space is now 3-D not 2D). We can then consider a grey-scale image with only one row - which is just a
signal - and then apply the same theory to obtain operators for signal processing.
7.8.1

The top surface & the umbra

Consider a grey-scale scan line (a row of an image, say) as shown in the figure.
The umbra (shadow) is defined by the G-S space lying below the scan line, fc ,
and is denoted as U[fc ], each point on the G-S line fc generates a set of umbra
points, {ui ,c }
The top of an umbra set is the set of maximum values of {ui,c } at for each
column co-ordinate, i.e.
Tc = max{ui,c }
so top and umbra are inverse operators,
T {U[f ]} = f
138

fc

c
U[fc]

c
Figure 7.13: The top and umbra operations.

139

7.9

Grey-scale dilation & erosion

G-S dilation is defined as


f k = T {U[f ] U[k]}
i.e. the binary dilation of the two umbra functions.
In the same way we may define the G-S erosion as
f k = T {U[f ] U[k]}
We note that, like the binary versions, G-S erosion and dilation are order
independant.
f k =k f
f k =k f

140

7.10

Grey-scale opening and closing

By the definitions of opening and closing in terms of erosions and dilations so :


f k = (f k) k
f k = (f k) k
And a duality relationship holds; remember that for binary operators
(A K)c = Ac K R
for G-S
(f k) = (f ) k R
Note that the G-S equivalent of complementing is inversion.
But, how do we interpret the G-S opening and closing more easily?
Using the definitions for G-S erosion & dilation :
f k = (f k) k
= (T {U[f ] U[k]}) k
= T {U [T {U[f ] U[k]}] U[k]}
141

and as T {U[f ]} = f so
f k = T {(U[f ] U[k]) U[k]}
The operation on the umbrae in the above is equivalent to a binary opening
between U[f ] and U[k], hence
f k = T {U[f ] U[k]}
Remember that binary opening was thought of as the sweeping of an element
round the inside boundary of an image structure.
We may thus interpret the G-S opening as the top surface of what remains of
U[f ] after sweeping U[k] along the underside of f .
By application of the duality between opening and closing, we may interpret
G-S closing as sweeping the reflection of U[k], U[k]R , over the topside of f .
Opening smooths from the underside
Closing smooths from the topside
Concepts of open, closed to an element hold
(f k) k = f k
(f k) k = f k
142

Can perform, for example, a top-hat transform or noise cleaning


Going from a set of scan lines to a 3-D image (G-S) is easy :
Opening sweeps the umbra of the structure element under the G-S surface,
smoothing from below
Closing sweeps the inverted umbra of the element over the G-S surface,
smoothing from above
Choice of structure element... Ball, paraboloid are commonly used

143

k=
fc

c
fc

c
fc

f k

c
Figure 7.14: Grey-scale opening

144

k=
fc

c
fc

c
fc

f k

c
Figure 7.15: Grey-scale closing
145

146

Figure 7.16: Grey-scale morphological operations. Original, dilation and erosion

Figure 7.17: Grey-scale morphological operations. Opening and closing.

147

7.10.1

Signals

Applying the previous theory to signals is straightforward given that we can consider an N point signal as just a 1 N grey-scale image. We may apply all
the same operations - the examples below shows hoe the shape of the signal is
changed and how easily noise components may be removed.
1
0.8
0.6
0.4
0.2
0

(a)

10

20

30

40

50

Figure 7.18: Simple signal structure element

148

(a)

(c)

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

50

100

150

(b)

0.1

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

50

100

150

(d)

0.1

50

100

150

50

100

150

Figure 7.19: (a) Dilation, (b) Erosion, (c) Closing and (d) Opening of signal.

149

3.5
3
2.5
2
1.5
1
0.5
0

100

200

300

400

500

600

700

800

900

1000

100

200

300

400

500

600

700

800

900

1000

100

200

300

400

500

600

700

800

900

1000

100

200

300

400

500

600

700

800

900

1000

(a)

3.5
3
2.5
2
1.5
1
0.5
0

1.4
1.2
1
0.8
0.6
0.4
0.2

(b)

150
Figure 7.20: (a) Closing and (b) Opening of noisy sine signal.

Lecture 8 Compression & coding


Why do we want to compress signals and images?
Storage
Bandwidth of transmission
When we consider compression, we rely on the fact that there is redundency
in the data.
PAL colour video 768 x 576 pixels at 25 frames per second = 33 Mbytes
per second. Hence a 2hour video = 238 Gbytes (cf CD holds 0.7 Gbytes, a DVD
holds 7 Gbytes). So we need compression!
Example

Lossless compression has no errors in reconstruction. Examples


include gzip, zip and winzip. Typical compression is of order 2:1 to 3:1.
Lossless or lossy?

151

Lossy compression can achieve much better compression ratios, at the loss of
some detail. Achieved rates can be 10:1 to 50:1 and still have no preceivable
difference! For example...

ORIGINAL

JPEG COMPRESSION 16:1

152

8.1

Lossless compression

Lets look at an image of a really simple object - a blob in an image...


There is a lot of redundency here. We can use a coding which tells us that at
particular pixels, as we scan from top left to bottom right, change their value.

10

15

20
5

10

15

20

1
0.8
0.6
0.4
0.2
0
0

50

100

150

200

250

153

300

350

400

450

Hence
[0], [218, +1], [223, 1], [0]
tells us all we need to know about this image.
Consider the signal:
4

140
120

100
0

80

60
40

4
6

20
0

200

400

600

800

0
6

1000

1.5

150

1
0.5

100

0
0.5

50

1
1.5

200

400

600

800

0
5

1000

It is clear that if we only describe changes to the data then we need less storage
154

and so achieve compression!


Although very clever lossless compression schemes exist, much better compression ratios can be achieved if we are prepared to lose some information.
8.2

Lossy compression

Lets look at the intensity histogram.


1500
USE SHORT CODE
1000
USE LONG CODE
500

50

100

150

200

250

We can save by using short codes for the intensity values that are very frequently
used, such as around 200, and long codes for the ones that are hardly used, such

155

as around 150 or close to 0. Optimally it turns out that the code length
codelength log p(intensity)
PDF

codelength (bits)

0.16

14

0.14
12
0.12
10
0.1

0.08

0.06
6
0.04
4
0.02

50

100

150

200

250

156

50

100

150

200

250

If we took this approach we would end up with lossless compression, but we


could go even further and quantise little used values more coarsely. Doing this
kind of compression on the original signal or image is not the easiest way though.
We can represent the data in a basis format first.
8.2.1

Representations, DFT, DCT

f (x) =

wi i (x)

is a general linear basis. We can see that the Discrete Fourier Transform is of
this form.
M1 N1
1 1 XX
F (u, v ) =
f (x, y ) exp[i 2(ux/M + v y /N)]
M N x=0 y =0

It turns out that it is simpler to use a cosine expansion, as it saves on coding.


This is the Discrete Cosine Transform (DCT) and it is defined as




4c(u)c(v ) X X
f (x, y ) cos
u(x + 1/2) cos
v (y + 1/2)
F (u, v ) =
MN
M
N
x=0 y =0
M1 N1

157

where
1
c(a) = p
for a = 0
(2)
= 1 for a = 1, 2, ...
The inverse DCT is defined as
f (x, y ) =

M1
X N1
X

c(u)c(v )F (u, v ) cos

u=0 v =0

158


u(x + 1/2) cos


N


v (y + 1/2)

DCT

DFT

159

DFT

TRUNCATE

IDFT

14:1 COMPRESSION

160

DCT

TRUNCATE

IDCT

20:1 COMPRESSION

161

8.2.2

JPEG

The Joint Photographics Experts Group (JPEG) compression scheme uses lossy
and lossless compression. The JPEG method
World standard for compression of general images
Allows progressive display of images (ie from low quality to high)
Rather than apply a transform to the entire image in one go, it uses the fact
that information is local and so divides the image into 8 8 blocks on which the
DCT is performed.
Divide image
8 x 8 blocks

Decode
coefficiends

DCT

Quantise DCT
coefficients
(lossy step)

Reconstruct each
block from its
DCT coefficients

Lossless code
coefficients

Recompose image

162

log (abs(J)+1)

Truncated at J = 100

60:1 COMPRESSION

163

The quantisation consists of two parts:

The human eye does not have the same sensitivity to all frequencies. Therefore, a coarse quantisation of the DCT coefficients corresponding to
high frequencies is less annoying for the human being than the same quantisation
applied to low frequencies. Hence, to obtain a minimal perceptual distortion;
each coefficients should be individually weighted: This is achieved by the use of
a weighting matrix which is sent as side information:
Weighting matrix

BASIS

Z(u, v )
out(u, v ) = round[F (u, v )/Z(u, v )]
164

Z(u, v ) is chosen to weight coefficients which are visually salient


we can scale Z(u, v ) to alter compression quality
In addition to this weighting matrix. a uniform quantisation factor can be applied to the entire image. This will quantise into a set of M
levels. The more M the greater the number of coefficients that are not set to
zero and so the lower the compression. The resultant quantised coefficients are
then de-normalised by multiplication by Z(u, v ).
Uniform quantisation factor

The DC DCT coefficient of each block can become quite large. Therefore only the difference between the DCT coefficient of
the current and the previous block is coded and transmitted.
Differential coding of DC coefficients

165

Figure 8.1: JPEG performance.

166

8.3

Wavelets

The WT is an operation that forms a signal representation that, unlike the Fourier
transform for example, is local in both time and frequency domains. The WT
relies upon smoothing the time-domain signal at different scales; thus if s (x)
represents a wavelet at scale s, then the WT of a function f (x) L2(<) is
defined as a convolution given by :
W f (s, x) = f s (x)

(8.63)

Moreover, the signal f (x) may be reconstructed to a good approximation from


a knowledge of the modulus maxima of the WT alone, and hence these maxima
offer a compact representation ideal for noise removal. The scaled wavelets are
constructed from a mother wavelet, (x), such that :
s (x) = (1/s)(x/s)

(8.64)

with the function (x) chosen to be a derivative of a smoothing function.


Just as a simple example we choose this smoothing function to be a Gaussian,
G(x), and define the mother wavelet as :
(x) =

dG(x)
dx

167

(8.65)

0.5

0.4

0.3

0.2

0.1

0.1

0.2

0.3

0.4

0.5
5

We can allow the scale to vary - it turns out that we can just retain decimatation
by a constant factor, normally 2.
sn = 2sn1 hence sn = 2n so

168

We note that as these wavelets are to be represented as a digital sequence,


the smallest wavelet requires at least 4 sample points for definition. This implies
that the wavelet decomposition has a high-frequency cut-off at fsamp /4.
Wavelets can be thought of (explicitely) as a filter bank whose frequencies are
in a scale-space.
169

8.3.1

JPEG-2000

This new version of JPEG uses a similar approach to the original JPEG but
instead of DCT coding, wavelets are used instead. This gives a big improvement
in compression quality at the same compression levels.

170

COMPRESSION
158:1
171

You might also like