Digital Signal & Image Processing B Option - 8 Lectures: Stephen Roberts Sjrob@robots - Ox.ac - Uk
Digital Signal & Image Processing B Option - 8 Lectures: Stephen Roberts Sjrob@robots - Ox.ac - Uk
B Option 8 lectures
Stephen Roberts
[email protected]
Lecture 1 Foundations
1.1
Recommended books
1.2
Introduction
For the sake of brevity in the nomenclature definitions and theorems are often
going to be introduced in their 1-d form (i.e. as signals) with the index variable
t or x. Please note that such an indexed set of samples is just a 1-d case of
a generic ordered set which could be 2-d or more, i.e. dependent on a set of
indices or variables, i, j or x, y for example. I try to keep to the convention that
f (x) is a 1-d system with 1-d frequency support indexed by u and f (x, y ) is a
2-d system with 2-d frequency support indexed by u, v .
2
1.3.1
Linear Systems
A linear system may be defined as one which obeys the Principle of Superposition. If f1(x) and f2(x) are inputs to a linear system which gives rise to outputs
r1(x) and r2(x) respectively, then the combined input af1(x) + bf2(x) will give
rise to an output ar1(x) + br2(x), where a and b are arbitrary constants.
Notes
1.3.2
A time-invariant system is one whose properties do not vary with time (i.e. the
input signals are treated the same way regardless of their time of arrival); for
example, with discrete systems, if an input sequence f (x) produces an output
sequence r (x), then the input sequence f (x xo ) will produce the output sequence r (x xo ) for all xo . In the case of 2-d images, the system response does
not depend on the position within the image, i.e. not on xo , yo .
1.4
Linear Processes
Some of the common signal processing functions are amplification (or attenuation), mixing (the addition of two or more signal waveforms) or un-mixing and
filtering. Each of these can be represented by a linear time-invariant block with
an input-output characteristic which can be defined by:
The impulse response g(x) in the time domain.
The transfer function G(u) in a frequency domain. We will see that the
choice of frequency basis may be subtly different from time to time.
As we will see, there is (for many of the systems we examine in this course)
an invertable mapping between the time (image index) and (spatial) frequency
domain representations.
1.5
Convolution
Convolution allows the evaluation of the output signal from a LTI system, given
its impulse response and input signal.
The input signal can be considered as being composed of a succession of
impulse functions, each of which generates a weighted version of the impulse
response at the output, as shown in 1.1.
0.4
0.25
0.2
0.3
0.15
0.2
0.1
0.1
0.05
20
40
components
0.25
0.08
0.2
0.06
0.15
0.04
0.1
0.02
0.05
20
40
60
80
100
60
80
100
total
0.1
60
80
100
20
40
The output at time x, r (x), is obtained simply by adding the effect of each
separate impulse function this gives rise to the convolution integral :
Z
X
d 0
r (x) =
{f (x )d }g( )
f (x )g( )d
0
is a dummy variable which represents time measured back into the past from
the instant x at which the output r (x) is to be calculated.
1.5.1
Notes
f [k]g[n k] =
k=0
X
k=0
f [n k]g[k]
1.6
Frequency-Domain Analysis
Linear (time-invariant) systems, by definition, may be represented (in the continuous case) by linear differential equations (in the discrete case by linear difference equations). Consider the application of the linear differential operator,
D = d/dx, to the function f (x) = e sx :
Df (x) = sf (x)
An equation of this form means that f (x) is the eigenfunction of D. Just like the
eigen analysis you know from matrix theory, this means that f (x) and any linear
operation on f (x) may be represented using a set of functions of exponential
form, and that this function may be chosen to be orthogonal. This naturally
gives rise to the use of the Laplace and Fourier representations.
The Laplace transform:
F (s) Transfer function G(s) R(s)
where,
Z
F (s) =
f (x)e sx dx
0
8
A(s z1) . . . (s zm )
(s p1)(s p2) . . . (s pn )
and
R(iu) = G(iu)F (iu)
The output time function can be obtained by taking the inverse Fourier
transform:
Z
1
f (x) =
F (u)e i 2ux du
2
Theorem
Convolution in the time domain is equivalent to multiplication in the frequency
domain i.e.
r (x) = g(x) f (x) F 1{R(u) = G(u)F (u)}
and
r (x) = g(x) f (x) L1{R(s) = G(s)F (s)}
Proof
Consider the general integral (Laplace) transform of a shifted function:
Z
f (x )e sx dx
L{f (x )} =
= e
x
s
L{f (x)}
= L{g(x)}L{f (x)}
By allowing s iu we prove the result for the Fourier transform as well.
10
1.7
It is very useful in image and signal processing to view transformations as inputoutput relationships, specified by a transfer function.
h(x)
f(x)
r(x)
Figure 1.2: Transfer function model.
11
The transfer function is defined by the impulse response of the system. This
specifies the response of the system to an impulse input (a dirac). The convolution theorem states that the response, r (x), of such an impulse input, i (x), may
be given as
r (x) = i (x) h(x)
where h(x) is the impulse response function. And for an arbitrary input, f (x)
r (x) = f (x) h(x)
or, by taking the FT
R(u) = F (u)H(u)
where H(u) is the transfer function in the Fourier domain. As F (u) represents
the spectrum of f (x), so multiplying by some function H(u) may be regarded as
modifying or filtering F (u) and hence filtering the data sequence f (x).
Key point : Filtering may be regarded as a multiplication in the spectral domain, or as a convolution in the image/signal domain.
12
where D(u, v ) = (u 2 + v 2) and D and n are filter settings The IHPF : Is the
inverse transfer function of the ILPF, and the corresponding Butterworth filter,
the BHPF, as the inverse of the BLPF.
13
(a)
1
| G(u ) |
|G()|
0.8
n=1
0.6
0.4
n=2
0.2
n=6
0.2
0.4
0.6
0.8
1.2
1.4
1.6
1.8
(b)
log |G()|
10
uc
1-d
10
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
60
10
10
log
0
60
50
60
40
50
30
30
50
40
30
20
10
10
0
60
40
20
20
10
50
30
40
20
2-d
10
10
0
14
f (x) =
F (u)e i 2ux du
which form a Fourier transform pair. We note that the FT is, in general, a
complex function of the form F (u) = R(u) + i I(u). We call |F (u)| the Fourier
spectrum of f (x) and (u) = tan1[I(u)/R(u)] the phase spectrum.
15
2.1.1
There is no inherent change in theory for the 2-dimensional case, where f (x, y )
exists, so the FT, F (u, v ) is given as
Z Z +
F (u, v ) =
f (x, y ) exp[i 2(ux + v y )]dxdy
Z Z
f (x, y ) =
Here are some basic (and useful) theorems related to the FT. They are shown
for a 1-D system, for ease of reading and notation, and directly translate into
higher dimensions as above.
Similarity theorem : if f (x) F (u) then f (ax)
1
|a| F (u/a)
f (x) g(x) =
f ( )g(x )d
then F T [f (x) g(x)] = F (u)G(u). Note that this is of great use in filtering
Power theorem :
Z
|f (x)|2dx =
|F (u)|2du
We sample the continuous (start with 1-D) function, f (x), at M points spaced
x apart. We now describe the function as
f (x) = f (xo + xx)
17
where x now describes an index, with this transformation, u the Fourier varable
paired to x is discretised into M points. We thus obtain :
M1
1 X
F (u) =
f (x) exp[i 2ux/M]
M x=0
and
f (x) =
M1
X
u=0
and
f (x, y ) =
M1
X N1
X
u=0 v =0
2.1.4
The total width of the samples in the x, y directions determines the lowest
spatial frequency we can resolve, umin = 1/(Mx)
The sample interval, x, y , dictates the highest spatial frequency we can
resolve, umax = 1/(2x)
The number of samples, M, N, dictates the number of spatial frequency
bins that can be resolved.
Addition & linearity : as with continuous functions
Shift theorem : : if f (x) F (u) then f (x a) exp[i 2ua/M]F (u),
hence
f (x, y ) exp[i 2(uo x + vo y )/M] F (u uo , v vo )
and
f (x xo , y yo ) F (u, v ) exp[i 2(uxo + v yo )/M]
if we let uo = vo = M/2 then we can shift the frequency space to the centre
of the frequency square
f (x, y )(1)x+y F (u M/2, v M/2)
19
Discrete convolution :
f (x) g(x) =
M1
X
f (m)g(x m)
m=0
and
DF T [f (x) g(x)] = MF (u)G(u)
Power theorem :
M1
X
|f (x)|2 = M
x=0
M1
X
|F (u)|2
u=0
Periodicity :
F (u, v ) = F (u + M, v ) = F (u, v + M) = F (u + M, v + M)
Note that this leads us to deduce the aliasing theorem
Rotation :
f (r, + o ) F (, + o )
then
f (x, y ) = F (0, 0)
Trick : Plots of |F (u, v )| often decay very rapidly from a central peak, so it is
good to display on a log scale. Often the transform, F 0(u, v ) = log[1+|F (u, v )|]
is used.
21
(a)
20
20
40
40
60
60
80
80
100
100
120
120
20
(b)
40
60
80 100 120
20
20
40
40
60
60
80
80
100
40
60
80 100 120
20
40
60
80 100 120
100
22
120
(c)
20
20
40
60
80 100 120
120
Figure 2.1: Image (a), Fourier spectrum (b) and shifted Fourier spectrum (c).
20
20
20
40
40
40
60
60
60
80
80
80
100
100
100
120
120
120
20 40 60 80 100 120
20 40 60 80 100 120
20 40 60 80 100 120
20
20
20
40
40
40
60
60
60
80
80
80
100
100
100
120
120
120
20 40 60 80 100 120
20 40 60 80 100 120
20 40 60 80 100 120
23
20
20
40
40
60
60
80
80
100
100
120
120
20 40 60 80 100 120
20 40 60 80 100 120
24
2.2
Other transforms
25
2.3
2.3.1
Digital filtering
The sampling process
This is performed by an analogue to digital converter (ADC) in which the continuous function f (x) is replaced by a discrete function f [k], which is defined
only at x = kT , with k = 0, 1, 2. We thence only need consider the digitised
sample set f [k] and the sample interval T . A simple generalisation allows for a
sampled set over the 2-D plane, with samples at uM, v N so that u, v indexes
the image pixels.
Aliasing
x
Consider f (x) = cos( 2 Tt ) (one cycle every 4 samples) and also f (t) = cos( 3
2 T)
(3 cycles every 4 samples) as shown in the Figure. Note that the resultant
samples are the same. This result is referred to as aliasing.
26
0.5
0.5
10
20
30
40
50
60
70
80
90
100
10
20
30
40
50
60
70
80
90
100
0.5
0.5
27
2.4
We can see that the numerical processing is at the heart of the digital filtering
process. How can the arithmetic manipulation of a set of numbers produce a
filtered version of that set? Consider the noisy signal of figure 2.5, together
with its sampled version:
-2
-1
28
One way to (e.g) reduce the noise might be to try and smooth the data.
For example, we could try a polynomial fit using a least-squares criterion. If we
choose, say, to fit a parabola to every group of 5 points in the sequence, then, for
every point, we will make a parabolic approximation to that point using the value
of the sample at that point together with the values of the 4 nearest samples
(this forms a parabolic filter), as in Fig. 2.6
parabolic fit
-2
-1
1 2
centre point, k=0
29
2
X
k=2
E
= 0,
s1
and
and thus:
k=2
X
5s0 + 10s2 =
x[k]
k=2
10s1 =
k=2
X
k=2
30
E
=0
s2
kx[k]
10s0 + 34s2 =
k=2
X
k 2x[k]
k=2
31
1.5
0.5
0.5
10
20
30
40
50
60
70
80
90
Figure 2.7: Noisy data (thin line) and 5-point parabolic filtered (thick line).
32
100
The magnitude response (which we will re-consider later) for the 5-point
parabolic filter is shown below in Fig. 2.8.
1
|H(z)|
0.8
0.6
samp
/2
0.4
0.2
0
100
200
300
400
500
33
600
700
The filter which has just been described is an example of a non- recursive
digital filter, which are defined by the following relationship (known as a difference
equation):
N
X
r [k] =
ai f [k i ]
i =0
34
N
X
ai f [k i ] +
i =0
M
X
bi r [k i ]
i =1
Before we can describe methods for the design of both types of filter, we need
to review the concept of the z-transform.
35
2.5
The z-transform
36
2.5.1
N
X
an z n
n=0
N
X
ai z F (z ) +
n=0
M
X
bi z i R(z)
i =m
P
R(z )
an z n
n
P
G(z) =
=
F (z) 1 m bm z m
2.5.2
We know that all the poles of G(s) must be in the left half of the s-plane for
a continuous filter to be stable. We can therefore state the equivalent rule for
stability in the z-plane:
For stability all poles in the z-plane must be inside the unit circle.
2.6
This can be obtained by evaluating the (pulse) transfer function on the unit circle
( i.e. z = e 2i uT ).
Proof
Consider the general filter
r [k] =
an f [k n]
n=0
NB: A recursive type can always be expressed as an infinite sum by dividing out:
a0
eg., for G(z) =
,
1 b1z 1
we have r [k] =
X
n=0
39
a0.b1n f [k n]
1X
1X
i {2u[kn]T +}
ie.r [k] =
an e
+
an e i 2{u[kn]T +}
2 n=0
2 n=0
1 i (2ukT +) X
1 i 2(ukT +) X
i 2unT
an e
an e i 2unT
= e
+ e
2
2
n=0
n=0
Now
an e i 2unT =
n=0
an (e i 2uT )n
n=0
an z n
n=0
and so
n=0
i
X
an e i 2unT = Ae i
n=0
40
(complex conjugate)
Thus A and represent the gain and phase of the frequency response. i.e.
the frequency response (as a complex quantity) is
G(z)|z=e i 2uT
41
Noise removal
Signal in additive noise
42
1
0.5
0
0.5
1
100
200
300
400
500
600
700
800
900
1000
100
200
300
400
500
600
700
800
900
1000
100
200
300
400
500
600
700
800
900
1000
5
2
1
0
1
2
43
3.1.2
44
1
0.5
0
0.5
1
100
200
300
400
500
600
700
800
900
1000
100
200
300
400
500
600
700
800
900
1000
100
200
300
400
500
600
700
800
900
1000
0.5
0
1
0.5
45
3.2
Inverse filters
In general, we will have something more to contend with than just a noise process.
Consider a camera with a blur. Consider taking a picture of a moving car, consider
a microphone with a non-perfect transfer function.
In general then, the signal or image we obtain will be (considering here the
additive noise case),
s(x) = b(x) f (x) + n(x)
where n(x) is our noise process and b(x) is a blur or convolution kernel. How do
we proceed to obtain our enhanced, restored f(x)?
If we take a Fourier Transform of s(x) we obtain
S(u) = B(u)F (u) + N(u)
which yields
F (u) = B 1(u) (S(u) N(u))
or, if we can filter out the effects of n(x) easily (using a low-pass filter, for
example)
F (u) = B 1(u)S(u)
where B 1(u) is the inverse filter which corrects for the deterministic corruption
of the original. Consider a simple raised cosine blur, shown below...
46
0.04
0.03
0.02
0.01
10
20
30
40
50
When we know the convolution kernel, b(x), we can achieve almost perfect results...
47
50
100
150
200
250
300
350
400
450
500
50
100
150
200
250
300
350
400
450
500
50
100
150
200
250
300
350
400
450
500
1
0.5
0
0.5
1
5
The example above was all in the absence of the noise, n(x).
48
50
100
150
200
250
300
350
400
450
500
50
100
150
200
250
300
350
400
450
500
4
0
x 10
50
100
150
200
250
300
350
400
450
500
50
100
150
200
250
300
350
400
450
500
1
1
1
1
Noisy deconvolution is a messy business! Also, in all the examples looked at above
we assume we know the convolution kernel, b(x). This may be reasonable if we
measure the spread of a camera lens, e.g. but for arbitrary images and signals
we dont always know this. There are methods, so called blind deconvolution
49
approaches, which can solve this but they are outside the scope of this course.
3.2.1
Motion blur
Let the true image be f (x, y ). Let the displacement in the x, y directions be
(t), (t). Hence, over a time period T /2, T /2 the resultant image is given
by
Z
T /2
g(x, y ) =
T /2
f (x (t), y (t))dt
The term in the brackets is just the FT of f (x, y ) ie F (u, v ) which is independent
of t, so
Z T /2
G(u, v ) = F (u, v )
exp{i 2[u(t) + v (t)]}dt
T /2
50
sin([uK + v L]T )
[uK + v L]
51
52
3.2.2
Other noise
The noise looked at above was additive or multiplication noise in which each
datum of our signal was corrupted. Other noise exists, for example we may have
a certain probability of corruption, such that most pixels or samples are fine, but
once in a while we get a huge noise spike. We will look at this kind of noise, that
replaces the true sample with an outlying one, later in this course.
3.3
The Wiener Filter is a noise filter based on Fourier iteration. its main advantage
is the short computational time it takes to find a solution.
Consider a situation such that there is some underlying, uncorrupted signal f (t)
that is required to measure. Error occur in the measurement due to imperfection
in equipments, thus the output signal is corrupted. There are two ways the signal
can be corrupted: First, the equipment can convolve, or smear the signal. This
occurs when the equipment doesnt have a perfect delta function response to the
signal. Let r (t) be the smeared signal and g(t) be the (known) response that
caused the convoltion. Then r (t) is related to f (t) by:
r (t) = f (t) g(t)
53
or
R(u) = F (u)G(u)
where R, F, G are Fourier Transform of r, f , g. The second source of signal
corruption is the unknown background noise n(t). Therefore the measured signal
c(t) is a sum of r (t) and n(t):
c(t) = r (t) + n(t)
To deconvolve r to find f , simply divide R(u) by G(u) i.e.
F (u) =
R(u)
G(u)
or
Z
|F (u) F(u)|2du
is minimised. After substitution and using the fact that the expectation of cross
terms between R(u) and N(u) are zero we require the following to be minimised
Z
|G(u)|2 |R(u)|2|1 (u)|2 + |N(u)|2|(u)|2 du
The best filter is one where the above integral is a minimum at every value of u.
This is when,
|R(u)|2
(u) =
|R(u)|2 + |N(u)|2
Now we can make the approximation
|R(u)|2 + |N(u)|2 |C(u)|2
hence
|R(u)|2
(u)
|C(u)|2
From the above theory, it can be seen that a program can be written to Wiener
Filter signal from noise using Fourier Transform.
55
(measured)
| N |2 (extrapolated)
| R | 2 (deduced)
u
56
There is another way to Wiener filtering a signal but this time without Fourier
Transforming the data this is the Mean-Squared Method.
uses the fact that the Wiener Filter is one that is based
on the least-squared principle, i.e. the filter minimises the square error between
the actual output and the desired output. To do this, first the variance of the
data matrix is to be found. Then, a box window is moved over the image matrix,
moving one pixel at a time. In every box window, the local mean and variance
is found. The filtered value of each datum (at index x, y say) is found by the
following formula:
The Mean-squared Method
fx,y = x,y
2
x,y
s2
+
(cx,y x,y )
2
x,y
2
where fx,y is the filtered signal, x,y is the local mean, x,y
is the local variance, s 2
is the noise variance of the entire data matrix, and cx,y is the original signal value.
From the above formula, it can be see that if the original signal is similar to the
local mean then the filtered value will be the local mean, and if the original signal
is very different from the local mean, then it will be filtered to give a higher/lower
intensity signal depending on the differences. Also, if the local variance is similar
to the matrix variance, which is around 1 (i.e. only noise exists in the box) then
57
the filtered signal will be that of the local mean, which should be close to zero.
But if the local variance is much bigger than the matrix variance (i.e. when the
box is at the actual signal), then the signal will be amplified. As the box moves
through the entire matrix, it will calculate the solution to each pixel using the
above formula, thus filtering the data.
The Wiener filter has been here introduced in its 2-d guise, but it is trivial to
go to 1-d.
58
Figure 3.4: Wiener filter (bottom right) improves over flat average filter (bottom left). 5 pixel square used in
both cases.
59
1.5
1
0.5
0
0.5
100
200
300
400
500
600
700
800
900
1000
100
200
300
400
500
600
700
800
900
1000
100
200
300
400
500
600
700
800
900
1000
1.5
1
0.5
0
0.5
1.5
1
0.5
0
0.5
Figure 3.5: Wiener filter adaptively removing noise - estimation of mean and one .
60
In many applications it is useful to be able to estimate simple measures of derivatives of signals & images, to detect for example level changes or edges.
Consider a simple 1-d signal, f (t). A simple backwards-difference (first order)
estimate of the gradient may be written as
f 0(t) =
1
(f (t) f (t 1))
t
estimate of the gradient is better, with O(t 2). Letting t = 1 for ease of
nomenclature gives
1
f 0(t) = (f (t + 1) f (t 1)) = f (t) g(t)
2
where now g(t) = 21 [1, 0, 1]. So we have a local filter. This general approach
can, of course, be extended to higher-order derivatives. The second derivative,
f 00(t) may be estimated as
f 00(t) =
d 0
f (t) = f (t) (g(t) g(t))
dt
where the filter kernel that does the work is (g(t) g(t)). Letting g(t) = [1, 1]
then we get a filter for the 2nd derivative as
g2(t) = [1, 2, 1]
which you will have seen before. To make sure that the signal power out equals
that in we can normalise with a factor of |1| + | 2| + |1| = 4 so now g2(t) =
1/4[1, 2, 1]. The figures show this working on a signal.
62
f(t)
0.5
10
20
30
40
50
60
70
80
90
100
10
20
30
40
50
60
70
80
90
100
10
20
30
40
50
60
70
80
90
100
f(t)*g(t)
0.05
0.05
f(t)*g(t)*g(t)
x 10
2
0
2
0
63
We can, of course, simply extend this to images by running over the rows and
columns using one of the above filters, but normally we will extend these 1-d
filters to a 2-d mask, typically making the mask square. For example, the 1-d
central difference filter has a kernel (ignoring the factor of 1/2 for a moment)
g(t) = [1, 0, 1]. In its 2-d guise, operating along rows (x-direction), a 2d mask
may be made
1 0 1
gx (x, y ) = 1 0 1
1 0 1
and similarly in the y -direction
1 1 1
gy (x, y ) = 0 0 0
1 1 1
64
65
1 2 1
g2x (x, y ) = 1 2 1
1 2 1
and similarly in the y -direction
1 1 1
g2y (x, y ) = 2 2 2
1 1 1
2 1 2
2(x, y ) = g2x (x, y ) + g2y (x, y ) = 1 4 1
2 1 2
and these give edge information, as valid potential edge points have 2(x, y ) = 0.
The Figure shows |2(x, y )| applied to the house image.
66
67
The major problems with such simple gradient estimators, in both 1- and 2-d,
lie in their lack of robustness in the presence of noise. The way around this is
to use simple parametric functions, such as polynomials (as covered in the A1
course) or splines and then take derivatives of the fitted functions. The Figure
shows the profound improvement for a simple signal using this approach.
68
2
1
0
1
2
10
10
10
2
1
0
1
2
2
1
0
1
2
Figure 4.4: Gradient estimates on a simple signal. Top: noisy signal, middle: simple gradient filter, bottom:
gradient of polynomial fit.
69
4.2
Motion analysis
In many machine applications we would like to estimate the motion of objects which can be caused by
Moving objects
Moving camera, e.g. on a robot arm or an autonomous robot
Moving objects and camera
In essence we ask the question: is there a difference in frames? We may use a
simple difference operation, either globally or over a small window
X
f (x, y , t) =
|f (x, y , t) f (x, y , t 1)|
x,y
however this only tells us about total pixel intensity changes - we would like to
obtain motion vectors on a pixel by pixel basis.
4.3
A 2-D representation of a 3-D motion is called the motion field. Each pixel in
the image has a velocity vector v = (vx , vy )T . This is the motion that we wish
70
71
Note that twisting and scaling both resemble deformation in that the object
appears to be non-rigid. This makes them more difficult to handle. To simplfy
much motion analysis a set of motion assumptions are often made.
1. Motion has a maximum velocity
2. Small (negligable) accelerations occur over the interval dt
3. Common motion all points in the same object move in the same way
4. Mutual correpondence objects remain rigid
One popular method of motion field estimation is optic flow
Optic flow
Assumes we have access to frames with a small dt
Can give false information e.g. spinning sphere or changing illumination
Assumes, therefore, that the illumination is constant so the observed brighness of objects is constant over t
Nearby points in the motion field have similar motion i.e. the velocity field
is smooth
72
= ft (p1)
= ft (p2)
...
= ft (pn )
principled approach.
OF in motion analysis
We will assume that perceived motion in the image may arise from four mechanisms
1. Translation perpendicular to the line of sight
2. Scaling translation along the line of sight
3. Rotation in plane perpendicular to line of sight
4. Rotation in line of sight plane
Note that we assume a rigid body (no deformation).
Optic flow can analyse these four motions
1. A set of parallel velocity vectors
2. A set of vectors with a common focal point
3. A set of concentric vectors
4. Sets of vectors anti-parallel to one another
75
76
Figure 4.7: Example of optical flow applied to car motionm using the Lucas-Kanade method.
77
Introduction
We now discuss autocorrelation and autoregressive processes; that is, the correlation between successive values of a time series and the linear relations between
them. We also show how these models can be used for spectral estimation.
5.2
Autocorrelation
Given a time series xt we can produce a lagged version of the time series xtT
which lags the original by T samples. We can then calculate the covariance
between the two signals
N
1 X
xx (T ) =
(xtT x )(xt x )
(5.1)
N 1 t=1
78
where x is the signal mean and there are N samples. We can then plot xx (T )
as a function of T . This is known as the autocovariance function. The autocorrelation function is a normalised version of the autocovariance
rxx (T ) =
xx (T )
xx (0)
(5.2)
Note that xx (0) = x2. We also have rxx (0) = 1. Also, because xy = y x we
have rxx (T ) = rxx (T ); the autocorrelation (and autocovariance) are symmetric
functions or even functions. Figure 5.1 shows a signal and a lagged version of it
and Figure 5.2 shows the autocorrelation function.
79
6
5
4
3
2
1
0
0
20
40
60
80
100
t
Figure 5.1: Signal xt (top) and xt+5 (bottom). The bottom trace leads the top trace by 5 samples. Or we
may say it lags the top by -5 samples.
80
0.5
0.5
100
50
0
Lag
(a)
50
100
Figure 5.2: Autocorrelation function for xt . Notice the negative correlation at lag 20 and positive correlation
at lag 40. Can you see from Figure 5.1 why these should occur?
81
5.3
Autoregressive models
An autoregressive (AR) model predicts the value of a time series from previous
values. A pth order AR model is defined as
xt =
p
X
xti ai + et
(5.3)
i =1
where ai are the AR coefficients and et is the prediction error. These errors
are assumed to be Gaussian with zero-mean and variance e2. It is also possible
to include an extra parameter a0 to soak up the mean value of the time series.
Alternatively, we can first subtract the mean from the data and then apply the
zero-mean AR model described above. We would also subtract any trend from
the data (such as a linear or exponential increase) as the AR model assumes
stationarity.
The above expression shows the relation for a single time step. To show the
relation for all time steps we can use matrix notation.
We can write the AR model in matrix form by making use of the embedding
matrix, M, and by writing the signal and AR coefficients as vectors. We now
82
x4
x
M = 5
..
xN1
x3
x4
..
xN2
x2
x3
..
xN3
x1
x2
..
xN4
(5.4)
We can also write the AR coefficients as a vector a = [a1, a2, a3, a4]T , the
errors as a vector e = [e5, e6, ..., eN ]T and the signal itself as a vector X =
[x5, x6, ..., xN ]T . This gives
x5
x4
x3
x2
x1
a1
e5
x4
x3
x2 a2 e6
x6 x5
(5.5)
=
+
..
..
..
.. ..
a3 ..
xN
xN1 xN2 xN3 xN4
a4
eN
which can be compactly written as
X = Ma + e
(5.6)
a = (MT M)1MT X
83
(5.7)
(5.8)
(5.9)
and the AR predictions are shown in Figure 5.3. The noise variance was estimated
to be e2 = 0.079 which corresponds to a standard deviation of 0.28. The
variance of the original time series was 0.3882 giving a signal to noise ratio of
(0.3882 0.079)/0.079 = 3.93.
84
1.5
1
0.5
0
0.5
1
1.5
0
20
40
60
80
100
60
80
100
(a)
1.5
1
0.5
0
0.5
1
1.5
0
(b)
20
40
85 t
5.3.1
Random walks
20
40
60
86
80
100
5.3.2
Relation to autocorrelation
(5.10)
(5.11)
If we now sum over t and divide by N 1 and assume that the signal is zero
mean (if it isnt we can easily make it so, just by subtracting the mean value from
every sample) the above equation can be re-written in terms of covariances at
different lags
xx (k) = a1xx (k 1) + a2xx (k 2) + ... + ap xx (k p) + e,x
(5.12)
where the last term e,x is the covariance between the noise and the signal. But
as the noise is assumed to be independent from the signal e,x = 0. If we now
divide every term by the signal variance we get a relation between the correlations
at different lags
rxx (k) = a1rxx (k 1) + a2rxx (k 2) + ... + ap rxx (k p)
87
(5.13)
rxx (1)
rxx (0) rxx (1) rxx (2)
rxx (2) rxx (1) rxx (0) rxx (1)
=
rxx (3)
a1
rxx (2) a2
rxx (1) a3
rxx (0)
a4
(5.14)
(5.15)
(5.16)
This leads to a more efficient algorithm than the general method for multivariate linear regression (equation 5.7) because we can exploit the structure in the
autocorrelation matrix. By noting that rxx (k) = rxx (k) we can rewrite the
88
correlation matrix as
r (1)
R = xx
rxx (2)
rxx (3)
rxx (1)
1
rxx (1)
rxx (2)
rxx (2)
rxx (1)
1
rxx (1)
rxx (3)
rxx (2)
rxx (1)
1
(5.17)
Because this matrix is both symmetric and a Toeplitz matrix (the terms along
any diagonal are the same) we can use a recursive estimation technique known
as the Levinson-Durbin algorithm.
5.3.3
The partial correlation coefficients in an AR model are known as reflection coefficients. At lag m, the partial correlation between xtm and xt , is written as km ;
the mth reflection coefficient. It can be calculated as the relative reduction in
prediction error
Em1 Em
km =
(5.18)
Em1
where Em is the prediction error from an AR(m) model. The reflection coefficients are to the AR coefficients what the correlation is to the slope in a univariate
89
q
X
bi eti
(5.19)
i =0
where et is Gaussian random noise with zero mean and variance e2. They are a
type of FIR filter. These can be combined with AR models to get Autoregressive
90
p
X
ai xti +
i =1
q
X
bi eti
(5.20)
i =0
which can be described as an ARMA(p,q) model. They are a type of IIR filter.
Usually, however, FIR and IIR filters have a set of fixed coefficients which
have been chosen to give the filter particular frequency characteristics. In MA or
ARMA modelling the coefficients are tuned to a particular time series so as to
capture the spectral characteristics of the underlying process.
5.5
Spectral Estimation
Autoregressive models can also be used for spectral estimation. An AR(p) model
predicts the next value in a time series as a linear combination of the p previous
values
p
X
xt =
ak xtk + et
(5.21)
k=1
where ak are the AR coefficients and et is IID Gaussian noise with zero mean and
variance 2.
91
The above equation can be solved by using the z-transform. This allows the
equation to be written as
ap z tp + ap1z t(p1) + ... + z t = et
It can then be rewritten in terms of the pulse transfer function
e
Pp t
H(z) =
1 + k=1 ak z k
(5.22)
(5.23)
(5.24)
where u is frequency and Ts is the sampling period we can see that the frequency
domain characteristics of an AR model are given by
e2Ts
Pp
P (f ) =
|1 + k=1 ak exp(ik2uTs )|2
(5.25)
An AR(p) model can provide spectral estimates with p/2 peaks; therefore if
you know how many peaks youre looking for in the spectrum you can define the
AR model order. Alternatively, AR model order estimation methods should automatically provide the appropriate level of smoothing of the estimated spectrum.
92
35
25
30
20
25
15
20
15
10
10
5
5
(a)
0
0
10
20
30
40
50
60
70
(b)
0
0
10
20
30
40
50
60
70
Figure 5.5: Power spectral estimates of two sinwaves in additive noise using (a) Discrete Fourier transform
method and (b) autoregressive spectral estimation.
93
Introduction
We now consider the situation where we have a number of time series and wish to
explore the relations between them. We first look at the relation between crosscorrelation and multivariate autoregressive models and then at the cross-spectral
density and coherence.
6.2
Cross-correlation
Given two time series xt and yt we can delay xt by T samples and then calculate
the cross-covariance between the pair of signals. That is
N
1 X
(xtT x )(yt y )
(6.26)
xy (T ) =
N 1 t=1
94
where x and y are the means of each time series and there are N samples
in each. The function xy (T ) is the cross-covariance function. The crosscorrelation is a normalised version
rxy (T ) = p
xy (T )
xx (0)y y (0)
(6.27)
where we note that xx (0) = x2 and y y (0) = y2 are the variances of each
signal. Note that
xy
rxy (0) =
(6.28)
x y
which is the correlation between the two variables. Therefore unlike the autocorrelation, rxy is not equal to 1 even at zero lag/lead. Figure 6.1 shows two time
series and their cross-correlation.
95
4
3
2
1
0
1
2
0
20
40
60
80
96
100
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
100
50
50
100
Figure 6.2: Cross-correlation function rxy (T ) for the data in Figure 6.1. A lag of T denotes the top series, x,
lagging the bottom series, y . Notice the big positive correlation at a lag of 25. Can you see from Figure 6.1
why this should occur ?
97
6.2.1
Cross-correlation is asymmetric
(6.30)
where the angled brackets denote the average value or expectation. Now, for
negative lags
xx (T ) = hxt+T xt i
(6.31)
Subtracting T from the time index (this will make no difference to the expectation) gives
xx (T ) = hxt xtT i
(6.32)
which is identical to xx (T ), as the ordering of variables makes no difference to
the expected value. Hence, the autocorrelation is a symmetric function.
The cross-correlation is a normalised cross-covariance which, assuming zero
mean signals, is given by
xy (T ) = hxtT yt i
(6.33)
98
(6.34)
(6.35)
(6.36)
Windowing
When calculating cross-correlations there are fewer data points at larger lags than
at shorter lags. The resulting estimates are commensurately less accurate. To
take account of this the estimates at long lags can be smoothed using various
window operators.
99
6.2.3
Time-Delay Estimation
p
X
x(t k)a(k) + et
(6.37)
k=1
(6.38)
(6.39)
(6.40)
(6.41)
which is now in the standard form of a multivariate linear regression problem. The
AR coefficients can therefore be calculated from
T 1 T
M X
(6.42)
A= M M
and the AR predictions are then given by
x(t) =
x(t)A
101
(6.43)
(6.44)
(6.45)
Example
Given two time series and a MAR(3) model, for example, the MAR predictions
are
x(t) =
x(t)A
a(1)
102
x1(t)
x2(t)
a11(1)
a (1)
21
a (2)
= x1(t 1), x2(t 1), x1(t 2), x2(t 2), x1(t 3), x2(t 3) 11
a21(2)
a11(3)
a21(3)
(6.46)
a12(1)
a22(1)
a12(2)
a22(2)
a12(3)
a22(3)
Applying an MAR(3) model to our data set gave the following estimates for
the AR coefficients, ap , and noise covariance C, which were estimated from
equations 6.42 and 6.45
1.2813 0.2394
a1 =
0.0018 1.0816
a2 =
a3 =
0.7453 0.2822
0.0974 0.6044
0.3259 0.0576
0.0764 0.2699
103
C=
0.0714 0.0054
0.0054 0.0798
104
4
3
2
1
0
1
2
0
20
40
60
80
100
t
Figure 6.3: Signals x1 (t) (top) and x2 (t) (bottom) and predictions from MAR(3) model.
105
6.4
Just as the Power Spectral Density (PSD) is the Fourier transform of the autocovariance function we may define the Cross Spectral Density (CSD) as the
Fourier transform of the cross-covariance function
X
P12() =
x1x2 (n) exp(in)
(6.47)
n=
Note that if x1 = x2, the CSD reduces to the PSD. Now, the cross-covariance
of a signal is given by
x1x2 (n) =
x1(l)x2(l n)
(6.48)
l=
X
X
x1(l)x2(l n) exp(in)
(6.49)
n= l=
By noting that
exp(in) = exp(il) exp(ik)
106
(6.50)
where k = l n we can see that the CSD splits into the product of two integrals
P12() = X1()X2()
(6.51)
where
X1() =
X2(w ) =
x1(l) exp(il)
(6.52)
l=
x2(k) exp(+ik)
k=
For real signals X2() = X2() where * denotes the complex conjugate.
Hence, the cross spectral density is given by
P12() = X1()X2()
(6.53)
This means that the CSD can be evaluated in one of two ways (i) by first
estimating the cross-covariance and Fourier transforming or (ii) by taking the
Fourier transforms of each signal and multiplying (after taking the conjugate of
one of them). A number of algorithms exist which enhance the spectral estimation
ability of each method.
107
The frequency domain characteristics of a multivariate time-series may be summarised by the power spectral density matrix. For d time series
..........................
Pd1() Pd2() Pdd ()
where the diagonal elements contain the spectra of individual channels and the
off-diagonal elements contain the cross-spectra. The matrix is called a Hermitian
matrix because the elements are complex numbers.
108
6.4.2
(6.55)
(6.57)
The MSC measures the linear correlation between two time series at each frequency and is directly analagous to the squared correlation coefficient in linear
regression. As such the MSC is intimately related to linear filtering, where one
signal is viewed as a filtered version of the other. This can be interpreted as a
linear regression at each frequency. The optimal regression coefficient, or linear
filter, is given by
Pxy ()
(6.58)
H() =
Pxx ()
109
Pxx ()
Py y ()
(6.59)
Algorithms based on Welchs method (such as the cohere function in the matlab
system identification toolbox) are widely used. The signal is split up into a number
of segments, N, each of length T and the segments may be overlapping. The
110
(6.60)
where n sums over the data segments. This equation is exactly the same form
as for estimating correlation coefficients. Note that if we have only N = 1 data
segment then the estimate of coherence will be 1 regardless of what the true
value is (this would be like regression with a single data point). Therefore, we
need a number of segments.
Note that this only applies to Welch-type algorithms which compute the CSD
from a product of Fourier transforms. We can trade-off good spectral resolution
(requiring large T ) with low-variance estimates of coherence (requiring large N
and therefore small T ). To an extent, by increasing the overlap between segments
(and therefore the amount of computation, ie. number of FFTs computed) we
can have the best of both worlds.
111
6.4.4
Just as the PSD can be calculated from AR coefficients so the PSDs and CSDs
can be calculated from MAR coefficients. First we compute
A() = I +
p
X
ak exp(ikT )
(6.61)
where I is the identity matrix, is the frequency of interest and T is the sampling
period. A() will be complex. This is analagous to the denominator in the
P
equivalent AR expression (1 + pk=1 ak exp(ikt)). Then we calculate the
PSD matrix as follows
1
T 1
PMAR () = T [A()] C A()
(6.62)
where C is the residual covariance matrix. Once the PSD matrix has been
calculated, we can calculate the coherences of interest using equation 6.56.
6.5
Example
0.9
0.9
0.8
0.8
0.7
0.6
0.7
0.5
0.6
0.4
0.5
0.3
0.4
0.2
(a)
0.1
0
10
20
30
40
50
60
70
(b)
0.3
0
10
20
30
40
50
60
70
Figure 6.4: Coherence estimates from (a) Fourier transform method and (b) Multivariate Autoregressive
model.
0.3 and the second y being equal to the first but with more additive noise of
the same standard deviation. Five seconds of data were generated at a sample
rate of 128Hz. We then calculated the coherence using (a) Welchs modified
periodogram method with N = 128 samples per segment and a 50% overlap
between segments and smoothing via a Hanning window and (b) an MAR(8)
model.
113
114
Consider a set of p digital samples from f [n p + 1] : f [n]. Order the set from
smallest to largest, and let this sample set be z[1] : z[p]. Order statistic filters
thence perform linear filtering on this ordered set, rather than the original.
What use might this be? Consider a signal, corrupted by spiking noise, as in
Fig 7.1.
115
3.5
3
2.5
2
1.5
1
0.5
0
100
200
300
400
500
600
700
800
900
1000
100
200
300
400
500
600
700
800
900
1000
1
0.8
0.6
0.4
0.2
0
116
We see that the signal (a sine wave) is corrupted by impulsive noise. The
effect of this noise is that the maximal and minimal values we observe in any
small window on the data are likely to be caused by the noise process, not from
the signal.
If we apply a filter mask to the ordered set z[k] such that the output of the
filter is just the middle value in the set (I assume that p is odd). This value is the
median of the values and the filter is refered to as the median filter. How well it
does at removing impulsive noise is shown in the lower plot of the figure.
7.2
Mathematical morphology
In general, such order statistic filters can be generalised to a set of filter operations
which respond to the shape of the data. Mathematical Morphological filters
allow both 1-d and 2-d (images in grey-scale and binary images) filtering using
this idea.
Algebraic system of operators acting on complex shapes
Decompose into meaningful parts
Separate from extraneous parts
117
Dilation
Erosion
7.3.1
Dilation
Commutative, A B = B A
[
AB =
Ab
bB
121
Figure 7.4: Dilation. Empty (blank) squares are now coloured dark, and binary 1 squares are white. (left)
Original data, (middle) Structure element with origin at middle left and (right) resultant dilation.
Figure 7.5: Dilation. (left) data, (middle) structure element with origin at centre and (right) dilated data.
122
A Bt = (A B)t
S
C) A = (B A) (C A)
AK BK
123
Figure 7.6: Noise removal by intersection with dilated image. (top left) data, (top right) structure element,
origin in centre, (bottom left) dilation and (bottom right) output of intersection.
124
7.4
Binary erosion
Figure 7.7: Binary erosion. (left) data, (middle) structure element, origin in centre, (right) eroded output.
Figure 7.8: Binary erosion. (left) data, (middle) structure element, origin in centre, (right) eroded output.
125
(A B)c = Ac B R
and
(A B)c = Ac B R
Erosion is not associative, i.e.
(A B) C 6= A (B C)
Any large erosion can be accomplished using a series of smaller ones
Duality does not imply cancellation
if A = B C
A C = (B C) C 6= B
127
7.5
Uses
128
7.6
129
Opening
The opening of image A with stucture element K is defined, in terms of erosion/dilation as :
A K = (A K) K
If we let B = A K then the above becomes
AK =BK
Remembering that the set-element definition of dilation is the union of translates,
i.e.
[
BK =
Kb
bB
hence
AK =
Ky
y A K
AK =
Ky
Ky A
This means that we can picture opening as sweeping K over A from the inside,
never allowing any part of K to go outside the boundary of A.
130
K=
A
A K
131
7.6.1
A
w
r
A K
A - (A K)
Figure 7.10: Shape extraction using opening
7.7
Closing
Opening sweeps over the inside of a binary object. Consider now the case of
complementing the image A 7 Ac .
Open this shape with element K R (the reflection of K)
This sweeps over the inside of Ac
Any small gaps between structures in A are small links in Ac and are removed
Complement the resultant image
(Ac K R )c
This is defined as the closing of A by K. We may think of this as sweeping the
outside of A using a structure element, K R .
a duality exists (this can be proved from the dilation-erosion dualities also)
between opening and closing
(A K)c = Ac K R
(A K)c = Ac K R
133
K=
A
A K
135
7.7.1
Figure 7.12: Original, binary image, opened image, THT image. Structure element is 5 pixel square.
136
7.7.2
Removing noise
There are many types of noise in grey-scale images, if we concentrate on binary images then noise is salt-and-pepper (a.k.a. drop-out), where pixels are
randomly complemented. One strategy for removing such noise is to open then
close with a disk element, D, where the radius of D is small compared to the
local boundary curvature of components in the image, but is larger than the local
noise (which is pixel-by-pixel). A good choice may be r = 3 pixels, say.
Perform (I D) D
Opening removes impulsive noise in the background
Closing removes dropout noise in objects
Trade-off between noise removal and boundary changing
At high noise levels, noise may by chance floc into clumps which are not
removed by a small disk. But a larger disk smooths the object boundaries...
137
7.8
Grey-scale Morphology
Consider a grey-scale scan line (a row of an image, say) as shown in the figure.
The umbra (shadow) is defined by the G-S space lying below the scan line, fc ,
and is denoted as U[fc ], each point on the G-S line fc generates a set of umbra
points, {ui ,c }
The top of an umbra set is the set of maximum values of {ui,c } at for each
column co-ordinate, i.e.
Tc = max{ui,c }
so top and umbra are inverse operators,
T {U[f ]} = f
138
fc
c
U[fc]
c
Figure 7.13: The top and umbra operations.
139
7.9
140
7.10
and as T {U[f ]} = f so
f k = T {(U[f ] U[k]) U[k]}
The operation on the umbrae in the above is equivalent to a binary opening
between U[f ] and U[k], hence
f k = T {U[f ] U[k]}
Remember that binary opening was thought of as the sweeping of an element
round the inside boundary of an image structure.
We may thus interpret the G-S opening as the top surface of what remains of
U[f ] after sweeping U[k] along the underside of f .
By application of the duality between opening and closing, we may interpret
G-S closing as sweeping the reflection of U[k], U[k]R , over the topside of f .
Opening smooths from the underside
Closing smooths from the topside
Concepts of open, closed to an element hold
(f k) k = f k
(f k) k = f k
142
143
k=
fc
c
fc
c
fc
f k
c
Figure 7.14: Grey-scale opening
144
k=
fc
c
fc
c
fc
f k
c
Figure 7.15: Grey-scale closing
145
146
147
7.10.1
Signals
Applying the previous theory to signals is straightforward given that we can consider an N point signal as just a 1 N grey-scale image. We may apply all
the same operations - the examples below shows hoe the shape of the signal is
changed and how easily noise components may be removed.
1
0.8
0.6
0.4
0.2
0
(a)
10
20
30
40
50
148
(a)
(c)
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
50
100
150
(b)
0.1
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
50
100
150
(d)
0.1
50
100
150
50
100
150
Figure 7.19: (a) Dilation, (b) Erosion, (c) Closing and (d) Opening of signal.
149
3.5
3
2.5
2
1.5
1
0.5
0
100
200
300
400
500
600
700
800
900
1000
100
200
300
400
500
600
700
800
900
1000
100
200
300
400
500
600
700
800
900
1000
100
200
300
400
500
600
700
800
900
1000
(a)
3.5
3
2.5
2
1.5
1
0.5
0
1.4
1.2
1
0.8
0.6
0.4
0.2
(b)
150
Figure 7.20: (a) Closing and (b) Opening of noisy sine signal.
151
Lossy compression can achieve much better compression ratios, at the loss of
some detail. Achieved rates can be 10:1 to 50:1 and still have no preceivable
difference! For example...
ORIGINAL
152
8.1
Lossless compression
10
15
20
5
10
15
20
1
0.8
0.6
0.4
0.2
0
0
50
100
150
200
250
153
300
350
400
450
Hence
[0], [218, +1], [223, 1], [0]
tells us all we need to know about this image.
Consider the signal:
4
140
120
100
0
80
60
40
4
6
20
0
200
400
600
800
0
6
1000
1.5
150
1
0.5
100
0
0.5
50
1
1.5
200
400
600
800
0
5
1000
It is clear that if we only describe changes to the data then we need less storage
154
Lossy compression
50
100
150
200
250
We can save by using short codes for the intensity values that are very frequently
used, such as around 200, and long codes for the ones that are hardly used, such
155
as around 150 or close to 0. Optimally it turns out that the code length
codelength log p(intensity)
PDF
codelength (bits)
0.16
14
0.14
12
0.12
10
0.1
0.08
0.06
6
0.04
4
0.02
50
100
150
200
250
156
50
100
150
200
250
f (x) =
wi i (x)
is a general linear basis. We can see that the Discrete Fourier Transform is of
this form.
M1 N1
1 1 XX
F (u, v ) =
f (x, y ) exp[i 2(ux/M + v y /N)]
M N x=0 y =0
157
where
1
c(a) = p
for a = 0
(2)
= 1 for a = 1, 2, ...
The inverse DCT is defined as
f (x, y ) =
M1
X N1
X
u=0 v =0
158
u(x + 1/2) cos
N
v (y + 1/2)
DCT
DFT
159
DFT
TRUNCATE
IDFT
14:1 COMPRESSION
160
DCT
TRUNCATE
IDCT
20:1 COMPRESSION
161
8.2.2
JPEG
The Joint Photographics Experts Group (JPEG) compression scheme uses lossy
and lossless compression. The JPEG method
World standard for compression of general images
Allows progressive display of images (ie from low quality to high)
Rather than apply a transform to the entire image in one go, it uses the fact
that information is local and so divides the image into 8 8 blocks on which the
DCT is performed.
Divide image
8 x 8 blocks
Decode
coefficiends
DCT
Quantise DCT
coefficients
(lossy step)
Reconstruct each
block from its
DCT coefficients
Lossless code
coefficients
Recompose image
162
log (abs(J)+1)
Truncated at J = 100
60:1 COMPRESSION
163
The human eye does not have the same sensitivity to all frequencies. Therefore, a coarse quantisation of the DCT coefficients corresponding to
high frequencies is less annoying for the human being than the same quantisation
applied to low frequencies. Hence, to obtain a minimal perceptual distortion;
each coefficients should be individually weighted: This is achieved by the use of
a weighting matrix which is sent as side information:
Weighting matrix
BASIS
Z(u, v )
out(u, v ) = round[F (u, v )/Z(u, v )]
164
The DC DCT coefficient of each block can become quite large. Therefore only the difference between the DCT coefficient of
the current and the previous block is coded and transmitted.
Differential coding of DC coefficients
165
166
8.3
Wavelets
The WT is an operation that forms a signal representation that, unlike the Fourier
transform for example, is local in both time and frequency domains. The WT
relies upon smoothing the time-domain signal at different scales; thus if s (x)
represents a wavelet at scale s, then the WT of a function f (x) L2(<) is
defined as a convolution given by :
W f (s, x) = f s (x)
(8.63)
(8.64)
dG(x)
dx
167
(8.65)
0.5
0.4
0.3
0.2
0.1
0.1
0.2
0.3
0.4
0.5
5
We can allow the scale to vary - it turns out that we can just retain decimatation
by a constant factor, normally 2.
sn = 2sn1 hence sn = 2n so
168
8.3.1
JPEG-2000
This new version of JPEG uses a similar approach to the original JPEG but
instead of DCT coding, wavelets are used instead. This gives a big improvement
in compression quality at the same compression levels.
170
COMPRESSION
158:1
171