Digital Signal Processing - Principles, Devices, and Applications
Digital Signal Processing - Principles, Devices, and Applications
DIGITAL SIGNAL
PROCESSING:
principles, devices
and applications
Edited by
N. B. Jones and
J. D. McK. Watson
Peregrinus Ltd. on behalf of the Institution of Electrical Engineers
IEE CONTROL ENGINEERING SERIES 42
Series Editors: Professor P. J. Antsaklis
Professor D. P. Atherton
Professor K. Warwick
DIGITAL SIGNAL
PROCESSING:
principles, devices
and applications
Other volumes in this series:
Volume 1 Multivariable control theory J. M. Layton
Volume 2 Elevator traffic analysis, design and control G. C. Barney and S. M.
dos Santos
Volume 3 Transducers in digital systems G. A. Woolvet
Volume 4 Supervisory remote control systems R. E. Young
Volume 5 Structure of interconnected systems H. Nicholson
Volume 6 Power system control M. J. H. Sterling
Volume 7 Feedback and multivariable systems D. H. Owens
Volume 8 A history of control engineering, 1800-1930 S. Bennett
Volume 9 Modern approaches to control system design N. Munro (Editor)
Volume 10 Control of time delay systems J. E. Marshall
Volume 11 Biological systems, modelling and control D. A. Linkens
Volume 12 Modelling of dynamical systems—1 H. Nicholson (Editor)
Volume 13 Modelling of dynamical systems—2 H. Nicholson (Editor)
Volume 14 Optimal relay and saturating control system synthesis
E. P. Ryan
Volume 15 Self-tuning and adaptive control: theory and application
C. J. Harris and S. A. Billings (Editors)
Volume 16 Systems modelling and optimisation P. Nash
Volume 17 Control in hazardous environments R. E. Young
Volume 18 Applied control theory J. R. Leigh
Volume 19 Stepping motors: a guide to modern theory and practice
P. P. Acarnley
Volume 20 Design of modern control systems D. J. Bell, P. A. Cook and N.
Munro (Editors)
Volume 21 Computer control of industrial processes S. Bennett and
D. A. Linkens (Editors)
Volume 22 Digital signal processing N. B. Jones (Editor)
Volume 23 Robotic technology A. Pugh (Editor)
Volume 24 Real-time computer control S. Bennett and D. A. Linkens (Editors)
Volume 25 Nonlinear system design S. A. Billings, J. O. Gray and
D. H. Owens (Editors)
Volume 26 Measurement and instrumentation for control M. G. Mylroi and G.
Calvert (Editors)
Volume 27 Process dynamics estimation and control A. Johnson
Volume 28 Robots and automated manufacture J. Billingsley (Editor)
Volume 29 Industrial digital control systems K. Warwick and D. Rees (Editors)
Volume 30 Electromagnetic suspension—dynamics and control P. K. Sinha
Volume 31 Modelling and control of fermentation processes J. R. Leigh
(Editor)
Volume 32 Multivariable control for industrial applications J. O'Reilly (Editor)
Volume 33 Temperature measurement and control J. R. Leigh
Volume 34 Singular perturbation methodology in control systems
D. S. Naidu
Volume 35 Implementation of self-tuning controllers K. Warwick (Editor)
Volume 36 Robot control K. Warwick and A. Pugh (Editors)
Volume 37 Industrial digital control systems (revised edition) K. Warwick and
D. Rees (Editors)
Volume 38 Parallel processing in control P. J. Fleming (Editor)
Volume 39 Continuous time controller design R. Balasubramanian
Volume 40 Deterministic non-linear control of uncertain systems
A. S. I. Zinober (Editor)
Volume 41 Computer control of real-time processes
S. Bennett and G. S. Virk (Editors)
DIGITAL SIGNAL
PROCESSING:
principles, devices
and applications
Edited by
N. B. Jones and
J. D. McK. Watson
While the author and the publishers believe that the information and
guidance given in this work is correct, all parties must rely upon their own
skill and judgment when making use of it. Neither the author nor the
publishers assume any liability to anyone for any loss or damage caused
by any error or omission in the work, whether such error or omission is
the result of negligence or any other cause. Any and all such liability is
disclaimed.
page
1. Introduction—Engineering and Signal Processing 1
N. B. Jones
2. Devices overview 4
R. J. Chance
4. Z-transforms 28
A. S. Sehmi
page
14. Comparison of DSP devices in control systems applications 181
P. A. Witting
20. Systolic arrays for high performance digital signal processing 276
J. V. McCanny, R. F. Woods and M. Yan
22. A case study of a DSP chip for on-line monitoring and control in anaesthesia 313
S. Kabay and N. B. Jones
Index 410
Chapter 1
assimilate and process an enormous amount of data which arrives in the form of signals
from some forms of sensor, either local or remote. It is the successful and efficient
processing of these signals which is the key to progress in this new field of endeavour.
For reasons of simplicity and flexibility associated with the binary nature of the
electronics, processing of signals is most conveniently done digitally and it is this major
area of electronics, information technology and control engineering known as digital
signal processing which is the subject of this book.
backgrounds.
The first part of the book contains the essential fundamentals of the theoretical bases of
DSP. However most of the book is devoted to the practicalities of designing hardware
and software systems for implementing the desired signal processing tasks and to
demostrating the properties of the new families of DSP chips which have enhanced
progress so much recently. A book of this nature, which has a significant technological
content, will inevitably tend to go out of date as time goes on. However the material has
been written so that the fundamentals of the design and processing methods involved
can be seen independently of the technology and thereby retain their value indefinitely.
Consideration is given to the use of DSP methods in control, communications, speech
processing and biomedical monitoring. Much of the material is illustrated by case studies
and it is hoped that, in this way, many of the practical points associated with DSP
engineering can be highlighted.
Chapter 2
Devices overview
R. J. Chance
ANALOGUE ANALOGUE
t
INPUTS OUTPUTS
SHIFT
—
— 40 WDS
I —
—
RAM ALU D to A
^ A to D
— 25 BITS —
^ J
^6+6 ^4
PROGRAM MEMORY
192 WORDS
4 WORD PC
T DP
STACK
•23
ALU/
SHIFT SERIAL
DATA ROM I/O
RP 512 WORDS
ACC A
ACC B
PARALLEL
I/O
s
16 H ^32
I
F
T ^32 ^32
RAM 144
ADDRESS WDS \ f
POINTERS DATA
RAM r
ARO < y
A L U
AR1
/s S
f 16 ^,16 ^ H
I
F
T 1 ' T \
/ 32 BIT ACC
16 ) ^32 I .
A
l
i > ' ••
X
16
PRODUCT
f
12 EXTERNAL
r^—>- ADDRESS
4 WD R l JCi
STACK PC
^ ^16 ^ EXTERNAL
DATA BUS
FIG 2.6
REFERENCES
[ 1 ] Estrin G., 'A description of the electronic computer at
the institute for advanced studies', Association for
Computing Machinery, 1952, vol 9, p 95.
[2] Buttle, A., Constantinides AG. & Brignell, J.E. 'Online
digital filtering', 1968, June 14th, pp252-253.
[3] Jackson, L.B. Kaiser, J.F., McDonald H.S., 'Implement-
ation of digital filters', 1968, IEEE Int. Conf. digest,
March 1968, p213.
[4] Allen J. & Jensen E.R. 4A new high speed multiplier
design', Res. Lab Electron., M.I.T., Quart, progress
report, 1972, Apr 15 105.
[ 5 ] Allen, J 'Computer architecture for signal processing',
IEEE Proa, 1975, 63,4,624-633
[6] Component Data Catalog, Intel Corp. Santa Clara CA.,
1981, pp 4-45 to 4-55.
[7] NEC Electronics catalog, peripherals: uPD7720 digital
signal processor, 1982, pp 551-565.
[8] TMS32010 user's guide, Texas Instruments, 1983
Chapter 3
It is worth noting in passing that an important field of engineering is concerned with the
processing of non-digital discrete signals. Hardware associated with this has some of
the advantages of flexibility of digital systems and yet can be used to process
essentially analogue signals directly (3). These devices are not considered in this book
except as ancillary devices to purely digital processors.
As with signals, processing systems can be classified into continuous, discrete and
digital; again with digital systems being special cases of discrete systems. The
classification of systems is entirely in line with that of the signals being processed and
where more than one type is involved the system is regarded as hybrid.
There is also an important classification of systems for present purposes into linear and
non-linear and into time invariant and time varying. Linear systems are those which
obey the principle of superposition (4) and time invariant (shift invariant) systems are
those whose responses are not effected by the absolute value of the time origin of the
signals being processed (5). This book is almost exclusively devoted to digital, linear,
time invariant signal processing.
Much of the design of digital signal processing algorithms and devices can be conducted
by treating digital signals as discrete signals, that is without reference to amplitude
quantisation. This property is then introduced as a secondary factor effecting the
accuracy and resolution of the computations.
3.2.1 Sampling
The ideal sampling process applied to the continuous signal x(t) is conveniently
modelled by the process of multiplying x(t) by d(t) to give the sampled signal xs(t)
and d(t) is known as a Dirac comb, since 8(t-kT) is a Dirac delta function at the point kT
where k is an integer and T is the sampling inteval.
20 Discrete signals and systems
This yields
xs(t)=Zxk5(t-kT) [3.3]
k=-oo
It seems reasonable to conclude that since xs(t) is only known at the sample points t=kT
then some of the information in the original signal x(t) is lost. Inparticular it would not
be expected that the value of x(t) between the sample points could be deduced from the
set x k alone. This is generally true. However for one important class of signals it is
possible to reconstruct the signal x(t) from the samples x k , at least in theory. If x(t) has
no components with frequencies greater than some limiting value F Hz then x(t) is given
at all values of t from the samples x k by Whittaker's interpolation formula (7).
~ Sinirfg(t-kT)
(/ xt )
?
where fs (= 1/T) is the sampling frequency in Hz. This formula only applies provided that
fs >2F. The limiting sampling frequency fs=2F is known as the Nyquist rate.
3.2.2 Quantisation
Once the sampling process is complete the next step is quantisation. This process
allocates binary numbers to samples. In practice sampling and quantising usually take
place together but the processes are essentially separate. In order to quantise the signal
the gain of the converter is adjusted so that the signal occupies most of the range R but
Discrete signals and systems 21
never exceeds it. In this situation an n-bit converter will allocate one of N=2 n binary
numbers to each sample by comparing the size of the sample to the set of voltage levels
mR/N (m=0,N) where the difference between successive voltage levels is:
d = - B - volts
N-l
It can be shown (8) that the quantisation process results in the introduction of extra
noise into the signal with RMS value
if3
3.2.3 Other Conversion Errors
Aliasing errors and quantisation noise are intrinsic to the conversion process and exist
in the conditions described above even if the converter is perfectly engineered.
Other errors due to imperfections in the converter may also need to be accounted for.
These include aperture uncertainty errors, arising from jitter in the sampling process, and
non-linearity errors due to uneven quantisation of the amplitude range R. No attempt is
made to discuss these here and the reader is referred to reference (9) for details.
dt T
where z = exp(sT) and s is the Laplace complex variable. Equation [3.6] is the Laplace
Transform of equation [3.3] assuming x(t)=0 for t<0.
This transform has the property that the z-transform of xk-i = z -1 X(z) (provided Xk=0 for
k<0). This property, which is easily proved by shifting the summation count by one in the
definition of the transform of Xk-i, is, in some ways, equivalent to the statement that the
Laplace transform of the derivative of a continuous signal is the Laplace transform of the
signal itself multiplied by s.
The major consequence of the use of this transform is that [3.5] can be transformed into
an algebraic equation allowing Y(z) to be computed from X(z) and a knowledge of the
parameters ^ and b r of the processor.
Details of how to evaluate z-transforms and their inverses are given in the next chapter.
If the impulse response of a discrete system is the sequence h k (k=0, <*>) then a unit
impulse at time t=nT elicits as a response the sequence h^k_n^ starting at t=nT. Similarly
a single sample x n at time nT elicits a response sequence xnh/k_n^ starting at t=nT.
Given that any input is a sequence of impulses of weight x n occuring at points t=nT
(n=0, oo) then, after consideration of equation [3.3], it can be seen that the response to
the sequence x k (x k = 0 for k< 0) is:
[3.8]
Discrete signals and systems 23
m=-n n=0
£ m
m
Z xnnz
m=0 n=0
in which the lower limit is set to zero since in general the impulse response h m cannot
exist before the impulse arrives at the input.
Thus, from the definition of the z-transform [3.6], the above can be written
Similar results can be derived for sequences, including impulse response sequences,
which are not zero for k<0. While an analagous situation normally implies that the
system is not physically realisable in continuous processing, for discrete processing it
merely implies a time delay and the ideas are still useful in practical design situations.
Readers are referred to Oppenheim & Schafer (10) for discussion of double-sided z-
transforms which are required in this situation.
Equation [3.7] provides a means of computing the z-transform, Y(z), of the output if the
z-transform, X(z), of the input and the coefficients of the difference equation are known.
This equation can be rewritten as:
m=l
Equation [3.8] provides a means of relating the input and output z-transforms via the z-
transform, H(z), of the impulse response. This equation can be rewritten as
HW ,3.,,,
X(z)
By comparing [3.10] and [3.11] it can be seen
24 Discrete signals and systems
n=0 n
H(z) = [3 12]
M
m
m=l
and the frequency response of a system with transfer function G(s) is easily shown to be
G(jco) (11). Problems exist in that [3.13] often does not converge and other descriptions
of spectral content, such as the power-density spectrum (12) need to be invoked.
f°°~
Xs(jco) = E xk6(t-kT) exp (-jcot) dt
_~r k=-oo
= £xkexp(-jcokT) [3.14]
k=-oo
If x(t) = 0 for t < 0 then [3.13] is the Laplace transform of x(t) with s replaced by jco.
Similarly for x k =0 when k<0 the lower limit of [3.14] is zero and the equation is the z-
transform of the sequence x k with z replaced by exp(jcoT), Equation [3.14] is a Fourier
series and so the spectrum of the sampled signal Xs(jco) is periodic; with period 2TZ/T
rads/s.
Discrete signals and systems 25
In order to represent Xs(jco) in a computer this function must also be turned into a
discrete set of samples, say X n , and by considering the inverse Fourier transform in
terms of the samples X n it can be seen, by an analagous argument to that above, that the
time signal must also now be periodic as well. Thus the process of quantising time and
frequency to allow the signal and its Fourier transform to both be represented in discrete
form defines that both the time sequence and its spectrum will be periodic.
This interesting constraint on the nature of the discrete signal x k and its Fourier
transform X n implies that there is only a finite number of different terms in the Fourier
series defining x k and also in that defining X n . Thus the upper limit of summation in
equation [3.14] and of its inverse must be finite or the sums will not converge.
The distance between the spectral lines defined by the sequence X n is the inverse of the
repetition period of x k ; that is:
5f = Hz [3 15]
Similarly the time interval between samples x k is the inverse of the period of the
sequence X n ; that is
where M is the number of samples to be included in one period of the spectral sequence.
Comparing [3.15] and [3.16] shows that N=M; that is the number of samples in the time
sequence x k equals the number of samples in the spectral sequence X n .
Hence the N spectral lines X n are spaced out by 2n/NT rad/s and equation [3.14] and its
inverse can be written:
where [3.18] can be established from [3.17] by using the orthogonality condition
N-l ( \
X exp 2jcjm && = 0 for n* k [3.19]
m=0 (^ J
26 Discrete signals and systems
Equations [3.17] and [3.18] are a discrete Fourier transform (DFT) pair. More will be
said about these equations and their implementation in Chapter 6.
Consideration of equations [3.12] and [3.14] show that the frequency response function
of a discrete filter is periodic since the frequency response is derived by putting
z=exp(jG)T) which is itself periodic in (0 of period 27t/T. In this way the frequency
response function of a discrete system is radically different from that of a continuous
system which leads to design complications which are considered in Chapters 7 to 10.
3.4 Summary
In order to make progress in the design of DSP systems it is important to understand the
basic features of the analogue to digital conversion process and the fundamental
principles of linear processing of discrete signals.
The key issues in signal conversion are the sampling rate and the number of bits of
resolution to be used in the digital version of the signal. Some fundamental criteria for
choice of these parameters have been presented here, but it must be remembered that
other issues, of cost and the characteristics of the later processing system may also be
important.
The essential basis for understanding of design in DSP systems is that of linear
analysis, the central feature of which is the z-transform. This concept provides a means
of reducing difference equations to algebraic equations and of "unscrambling" convolution
sums. A powerful concept arising from this is the z-transfer function which defines the
relationship between the z-transforms of the input and the output sequences in a DSP
system.
A useful method for describing linear systems is by means of the frequency response
function. The frequency response function of a discrete system can be derived from the
transfer function by the substitution z=exp(jooT) and is periodic. The associated
transform, the discrete Fourier transform (DFT), is a convenient way of describing the
spectral content of discrete signals and is a very important algorithm in digital signal
processing.
3.5 References
1). Roberts, J.B. "Spectral Analysis of Signals sampled at Random Times" In:-
Digital Signal Processing Ed. N.B. Jones, Peter Peregrinus 1982, 194-212.
2). Lago, PJ.A. & Jones, N.B. "Turning points spectral analysis of the
interference myoelectric activity". Med. & Biol. Eng. & Comput. 1983, 21,
333-342.
3). Regan, T."Introducing the MF10 - a versatile Monolithic active filter"
National Semi Conductor Application Nat. 307, 1982, 217-227.
4). Oppenheim, A.V. & Willsky, A.S. "Signals & Systems", Prentice-Hall, 1983,
p.74
5). Oppenheim, A.V. & Willsky, A.S. "Signals & Systems", Prentice-Hall, 1983,
p.75
Discrete signals and systems 27
Z-transforms
A. S. Sehmi
4.1 INTRODUCTION
Many signals are represented by a function of a continuous-time variable. Typical
examples of these signals are radar, sonar, thermal noise, and various bioelectric
potentials. In many cases these signals can be manipulated by analogue processing
techniques incorporating amplifiers, filters, envelope and peak detectors, clippers and so
on. The advent of digital computers and their inherent flexibility makes it more
convenient to use a signal representation scheme that is a function of a discrete-time
variable. A set of numbers at equally spaced sampling intervals T and with amplitudes
corresponding to those of the continuous signal, is the discrete signal representation
used.
The chapter begins with a discussion of types of signals encountered in digital
systems. It will be shown that in the discrete-time domain most digital systems can be
represented by a linear difference equation and that the system response is facilitated by
transforming the difference equation into the z-domain. This leads to the definition of
the system transfer function and finally the frequency behaviour of the system.
There is a correspondence between the properties of signals and the analytical
methods used in the continuous analogue world (convolution integral, differential
equations, complex s-domain and Laplace transforms, transfer functions and frequency
response) and those in the discrete digital world. Those familiar with the fundamentals
of continuous-time systems and their analysis will find this chapter straight-forward.
Input Output
SYSTEM
(Signal Processor)
4,2.2 Sequences
The input and output signals are sequences of values defined at integer values of T.
In a discrete system the magnitudes of the members in the sequence are continuous. The
usual mathematical representation of sequences is:
In a digital system the input and output sequence members assume values quantised
in both time and magnitude. The magnitude is encoded digitally in some binary manner
using standard methods of analogue to digital conversion.
To simplify our discussions we will restrict ourselves to discrete linear shift invariant
systems. The definition of a linear system can be expressed with the following
equations:
One way of visualising linearity is that a graph of the input vs. output for a system
must always be a straight line through the origin.
Shift invariance means that the response of a system to a given input does not vary
with time. Mathematically we represent shift invariance in the following manner:
30 Z-transforms
A consequence of linearity and shift invariance is that the output of a digital (or
discrete) system can be expressed as the convolution sum of the input and the unit
impulse response of the system. Mathematically the output is given by:
(4.1)
where {h(n)} is the unit impulse response - i.e. the response of the system to a sequence
that only consists of a unit sample at n = 0 and is zero for all other values of n. The
convolution sum in equation (4.1) is often written in the form (also see chapter three,
equation (3.8)):
It is therefore seen that the output from any discrete or digital system can be represented
by the following difference equation (also see chapter three, section 3.3.1)
where x(n-r) represents the present and past values of the input sequence, y(n) is the
present value of the output sequence and y(n-k) signifies past values of the output
sequence. In digital filter design we will always assume that ar and bk are constants that
do not vary with time. In adaptive signal processing systems both ar and bk are time
varying.
Z-transforms 31
It should be noted that the difference equation shown in equation (4.3) both
represents the system in mathematical terms and can be used for system implementation.
It shows what data is to be stored and how that data must be manipulated to compute the
present value of the output.
A system which can be represented by equation (4.3) is referred to as a recursive or
infinite impulse response (IIR) system. The word recursive is used because the present
output is obtained recursively from past values of the output, and if a unit impulse is
applied to the system then the feedback term
4.4 Z-TRANSFORMATION
The z-transformation of a sequence is a way of converting a time series into a
function of a complex variable z. It has the property of enabling the solutions to linear
constant coefficient difference equations to be obtained via algebraic manipulation, it
also enables designers to get a visual appreciation of the time and frequency behaviour
of a discrete system from pole-zero diagrams which can thereby circumvent a lot of
mathematical analysis.
The z-transform of a sequence x(n) is defined as
(4.5)
For most practical sequences x(n)=0 for n<0 and the two sided z-transform becomes the
one sided z-transform defined as
32 Z-transforms
We will assume from now on that we are only dealing with the one sided z-transform.
Example:
This very simple example illustrates a very useful property of the z-transform. A
physical interpretation of z" is a time delay of one time period in a sequence.
Example:
Note that X(z) only converges for Izl > Igl. The region in the complex z-plane over
which the z-transform of a sequence converges is known as the region of convergence
and the region over which the transform diverges is known as the region of divergence.
From the z-transform of the geometric sequence it is possible to obtain the
z-transform of some other useful sequences. It is straight forward to see that the
z-transform of a sequence x(n)=ejnco is given by:
Example:
le + e]
z2 - 2z cosco + 1
zsinco
X(z) = 2
z -2z cosco + 1
A table of some of the common sequences and their z transform can be found in [1,2].
= A £xi(/f)z-' l + fl
n=0 n=0
Assuming
y(n) = x(n - k)
i.e. the sequence y(n) is the sequence x(n) delayed by k time periods.
34 Z-transforms
n
rOO = £ x { n - k ) z = x(-k) + x ( \ - k ) z ~ l + x{2-k)z~2 + ... x ( - l ) z ' ( k ~ l ) + x(0)z~k
Y(z)=X(z)z~k (4.8)
We know that the output sequence of a linear shift invariant system is given by the
convolution sum:
//(z)=g§ (4.10)
H(z) is the output of a system divided by the input of the system and is usually called
the transfer function of the system.
These are the basic properties of the z-transform that we require to understand digital
theory at this stage. A more detailed discussion on the properties of the z-transform can
be found in [1, 2].
Z-transforms 35
• Long division
• Partial fractions
• Contour integration
We will concentrate on the first two methods and briefly mention the third.
Example:
2z2 - 0.752
z 2 - 0 . 7 5 2 + 0.125
2 + 0.75Z'1 + 0.3125z~2
2
0.752 + 0.125 | 2z 0.75z
2z2 1.5z + 0.25
0.75z 0.25
0.75z 0.5625 + 0.09375
0.3125 - 0.09375
and the first three terms in the discrete time series are
x(n)= {2,0.75,0.3125,...}
Example:
z 2 -0.75z + 0.125
= 2z 2 - Q.75z
(z - 0.5) ( z - 0 . 2 5 )
Az Bz
z-0.5 z-0.25
X(Z)=
7^05 ++ 7^025
furthermore, we have seen that this will give
Notice that this expression for x(n) contains the first three terms evaluated by the long
division method.
Using the Cauchy integral theorem from complex variable theory it can be shown
that [2]
where c is a closed conour within the region of convergence of X(z) and encircling the
origin in the z-plane. The integral canbe evaluated using the residue theorem and hence:
In most cases the first two methods will be adequate to evaluate the inverse
z-transform.
Z'transforms 37
M N
y(n)= (4.11)
r=0 *=1
M N
-k
r=0 k=l
M
Y(z) = X(z)
r=0
The transfer function of a system is by definition the ratio of the output of the system
over the input of the system, i.e.
r=0
(4.12)
X(z)
~k
k=l
The transfer function H(z) in equation (4.12) is the ratio of two polynomials in z.
The values of the complex number z which satisfy the equation
M
%arz-r =0
r=0
are called the zeros of H(z) and the values of z which satisfy the equation
Poles and zeros of a transfer function is an important concept in both analogue and
digital system design since the position of these singularities in either the s-domain
(analogue) or the z-domain (digital) provides the system designer with much useful
information.
Equation (4.12) is the transfer function of an IIR filter. The transfer function of an
FIR filter is simply
M
(4.13)
(4.14)
s - plane z - plane
coi
<y\
-jco
Fig.[4.2] Transformation of s-plane to z-plane
Z-transforms 39
So a point a i + jcoi in the s-plane transforms to a point I eGlT I ZcoiT in the complex
z-domain. This is illustrated in Fig.[4.2].
We can now consider the mapping between the s-domain and the z-domain for
several important regions in the s-domain. When a transfer function in the s-domain is
evaluated along the jco axis then the frequency response of the system can be
determined. Therefore it is important to relate the jco axis in s-plane to the z-plane.
When a = 0 equation (4.14) becomes
The first important point is that the jco axis in the s-domain maps onto a unit circle in
the z-domain. However e*03 is a periodic function and the frequencies along the jco axis
between 0 < co < 2n/r map onto the complete circle. Similarly the jco axis frequencies
between 2n/r < co < 4 ¥r also map onto the complete unit circle in the z-domain. We
therefore have the following important results:
In a continuous system the poles of any transfer function must be restricted to the
left half of the s-plane if the system is to be stable, i.e. a < 0 for all poles. What is the
implication of this in the z-domain ?
z = le aT l ZjcoT
Therefore, (a strip cos wide in .. ) the left half of the s-plane (a < 0, Izl < 1) maps
inside the unit circle in the z-domain and a digital system will be stable if the poles of
the transfer function lie inside the unit circle. The right half of the s-plane (a > 0,
Izl > 1) maps outside the unit circle in the z-plane. The imaginary axis in the s-plane
(a = 0, Izl = 1) transforms to the circumference of the unit circle in the z-plane. The
complete mapping between the s-plane and the z-plane is shown in Fig.[4.3].
40 Z-transforms
s - plane z - plane
unit circle
lift
-jco
Fig.[4.3] Correspondence between regions
in s-plane and z-plane
H(z) = (4.15)
-k
The frequency response is determined by evaluating the transfer function along the
unit circle and is therefore given by
r=0
(4.16)
N
However it is also possible to express the transfer function in terms of its poles and
zeros.
M
G n (z-Olr)
r=l
(4.17)
N
n (z-fa)
k=l
Z-transforms 41
where:
G is a gain constant
ocr is the r1*1 zero of H(z)
pkisthek^poleofHCz)
Once again to obtain the frequency response we must set z = e*00 in equation (4.17).
M
G n (/°r-ocr)
(4.18)
n
At any particular frequency coi, the terms ( e1®1 - <xr) in the numerator of equation
(4.18) represent vectors from the r l zero to a point coiT on the unit circle and similarly
the terms ( e1®1 -$k) in the denominator are vectors from k pole to the same point
coiT on the unit circle. This is illustrated in Fig.[4.4] for M = 2, N = 2.
4.9 CONCLUSION
In this chapter the z-transform representation of discrete signals and systems has
been developed. The basic concepts have been reviewed, and the use of the transform in
evaluating the frequency response has been shown. It is seen that the z-transform
method of solving difference equations is analogous to the Laplace transform method;
i.e. both techniques first transform the equations to polynomial representations in the
corresponding complex variable and then invert them to obtain a closed-form solution.
In the next chapter the relationship between the z-transform and the discrete Fourier
transform will be established.
4.10 REFERENCES
1. J. Candy, Signal Processing: The Modern Approach, McGraw-Hill, 1988.
2. L. Rabiner and B. Gold, Theory and Application of Digital Signal Processing,
Prentice-Hall, New Jersey, 1975.
Chapter 5
5.1 INTRODUCTION
This chapter introduces the real and complex Fourier series. The Fourier integral is
developed as a method for extracting spectral information from signals in the continuous
time domain. The Fourier transform is developed and leads to the discrete Fourier
transform. Important properties of the discrete Fourier transform and issues in its
practical usage are explained. The discrete Fourier transform is related to the
z-transform (chapter four) and other uses for it are highlighted.
) + bn sir
n=f
• During the fundamental period T the average of the product of two different
harmonics is zero.
the average of the first term over period [ ( m - n)/] is zero and the average of the
_i
second term over period [ ( m + n)/] is also zero. The average of the expression
must therefore be zero over T = f *; the period of repetition.
1 (
^ J [A\ cos{2nmft) ] [ A 2 cos(2%nft + <J>) ] dt = 0 (5.2)
• During the fundamental period T the average of the product of two harmonics of
the same frequency is = ( li the product of their peak amplitudes multiplied by
the cosine of their phase difference).
1 r^ l
- J [ A\ co$(2nnft) ] [ A2 cos(2nnft + (|>) ] dt = -A1A2COS § (5.3)
1 -T/i 2,
From these two basic rules of harmonic analysis, it can be seen that if x(t) in
equation (5.3) is averaged over the interval T, all sine and cosine terms will equate to
zero, i.e.:
1 r%
x(t) dt- — , therefore,ao = J x(t) dt
i -T/i
which gives the value of the zero frequency (dc) term in the Fourier series.
In order to find the value of ai, x(t) is multiplied by cos 27tft and the average of the
resulting product is taken. Using equation (5.3) the average
By equation (5.3) the average of all other terms in the product ( x(t) cos 2ftft) are zero,
therefore we can say
1 f7^ a\
—J x(t) cos 2nftdt- — , and it follows that
I -T/i 1
oo
The Fourier series of x(t) contains an infinite number of sines and cosines. This can
be reduced to a more compact form by use of trigonometric identities.
(fln+ bn ) ^ I" cos 2nnft cos ())n - sin 2nnft sin (j)^ 1 and by letting
(5.8)
Example:
The voltage across a 1 ohm resistor is given by V(t) = A n cos 27mft + B n sin 2rcnft, the
4 al + bl 2
the average power is given by -xL — 2- — = —=l• Cn
T T
The square of the amplitude spectrum is thus a measure of the power dissipated in a
resistor of 1 ohm at the different frequencies n=0,1,2 ,3, etc.
Example:
Find the Fourier Series of a rectangular pulse train of period T and pulse width ti.
(Fig.[5.1]). Using equation (5.4)
/% f V2
an = J x(t) cos 2nnft dt= J x(t) cos Innft dt
A r •
an= -—-S sin (nnft\)- sin (-nnfn ) 1
The discrete Fourier transform 47
cos
bn = 2^r[ ( m fti) ~ cos ( - nnfti) J = 0
Here Cn= (ak + b\ )X/l = ^ and ())«= t a n - 1 0 = 0. The amplitude spectrum is:
sin (i
n = \an = (5.9)
|Cn|
envelope
[V- a2
-•- {
±2.1.
T TT
function. This function is unity at x = 0 and has zeros at (x - nn). Equation (5.9) has
zeros where / = —. The amplitude of the dc component is ao and the
2an
amplitudes of the harmonics are
T
If ti = /2 the pulse becomes a square wave and its amplitude spectrum is given by
Cn
~ 2 A sin ( nf Yi)
nnfT/ 2 ' b u t T
- /f
'
Cn nr (5.10)
If /2
\/T<
_k—i—\^
=9
_ e-jnm
cos Azcof = , sin
oo
x(t) + | (5.11)
n=l
if Cn = (CLn -jbn ) its complex conjugate Cn = (an +jbn), and we know that
n=l
00
~°° J/>
ince, £ C-n e~jnm = £ Cn ^ , and Co = J x(r) e y ° dt = ao , it follows that
(5.12)
This is the complex form of the Fourier Series. The negative frequencies are
fictitious and they arise from the use of the complex form for sines and cosines. The
real form of the Fourier series can be derived by letting Cn = Cn in equation
(5.12)
Cn
Cn Cn
n=\
Equations (5.12) and (5.13) are equivalent. In the complex case the amplitude of a
harmonic at both positive and negative frequencies is given by Cn / T .
In the real form the amplitude of each harmonic is / T ICnl. The corresponding
harmonic amplitudes in the complex case are half those in the real case. This is to be
expected as only the positive frequency harmonics exist in reality. The positive and
negative components summate to give the true value.
*)=? + f (5.15)
(5.16)
An even function will have a Fourier Series which contains cosines terms only, the
cosine being an even function itself. We know that by definition x(t) = x(-t) and this can
only be true if all bn = 0, so:
An odd function will have a Fourier Series which contains sine terms only, the sine
being an odd function itself. We know that by definition x(t) = -x(-t) and this can only
be true if all an = 0, so:
Many functions which are non-symmetric about t=0 can be made either odd or even
by addition or subtraction of a suitable delay. The Fourier Series can thus be greatly
simplified by addition of a time delay. The changing of the time axis will not affect the
relative amplitudes of the frequency components but it will affect the phase angles.
Shifting of the time axis is illustrated in Fig.[5.5].
In the first case the waveform is symmetric about t=0 and the Fourier Series is given by
In the second case the waveform is skew-symmetric about t=0 and the values of C n will
be complex and each component will have a phase shift 0n, i.e.
If a delay of -t2 seconds is added to the second pulse train it will be identical with the
first. This delay is equivalent to a phase shift in each component.
52 The discrete Fourier transform
9w h e r e f=
As T -> oo, Af -> 0, the discrete harmonics in the Fourier Series merge and a continuous
spectrum results, i.e. the amplitude spectrum will contain all possible frequencies. Cn is
now defined for all frequencies and becomes a continuous function of f.
Since there are now an infinite number of harmonics the summation sign can be
replaced by an integral.
X(f)
n now has all possible values between ±oo and the integral is simply written as
(5.17)
r (5.18)
The integrals in equations (5.17) and (5.18) are known as Fourier Transform pairs.
Example:
Find the Fourier transform of a pulse with amplitude A and duration ti seconds.
This is the sine function (cf. equation (5.9)) and its spectrum is plotted in Fig.[5.6]. It
should be noted that the spectral envelope of a single pulse of width ti is the same as that
of a periodic pulse train of the same pulse width.
•- f
So,X(f)= i dt
54 The discrete Fourier transform
-*£4r
i -oc
n--oo L
/t = — oo
The Fourier transform of a periodic signal therefore consists of a set of impulses located
at harmonic frequencies of the signal. The weight (area) of each impulse is equal to the
corresponding coefficient amplitude of the complex Fourier series.
X(t)
T •" t
X S (O
Referring to Fig. [5.7] we may regard the sampling process as a multiplication of the
continuous signal x(t) by a periodic pulse train xs(t). The pulse train, being periodic,
may be expanded in a Fourier series thus
The discrete Fourier transform 55
xs{i)=ao + 2a\ cos( o v ) + 2a2 cos( 2co^) + ..., where cos = 2nh
Letting x(t) = cos cot the sampled version x m (t) = x(t) xs(t) is
The spectrum of the sampled signal contains sum and difference components
between the spectrum of x(t) and harmonics of the sampling frequency, as shown in
Fig.[5.8].
Signal Spectrum
The diagram shows that in order to avoid distortion (aliasing) the sampling
frequency fs must be at least twice the expected frequency f in the signal. In practice
x(t) will be represented at the sampling instants by single numerical values. This is
equivalent to sampling x(t) by impulses spaced uniformly at intervals of At. The
spectrum of such a signal has components of equal amplitude spaced at frequency
intervals of n /At.
V
-fi C(n) fx
-fs
^ "
-f .".\ Jil
•• » f
Since x(t) has no frequency components above fx, the sampling theorem states that
x(t) can be completely defined by samples taken at intervals of t = 1/(2/*) (strictly
speaking the signal should be sampled at t - \/(2fx-\). This would however produce
aliasing between fx and (fs-fx) )• In fact/? can have any value > 2fx-\, though it is
convenient to let fs = 2fx.
The effect of sampling is to make the spectrum of x(t) periodic with period 2fx. This
is shown in Fig.[5.10] where the sampled version of x(t) is denoted x(kAt) and the
corresponding amplitude spectrum is denoted by C(nAf).
Sampled
Periodic
C(nAf)
1 ll 111 ll
-f "t1
. 2nnkAt
x(kAt) = ^ C(n) e JJ T , (the constant /T has been omitted)
Over the interval -(x-1) < n < (x-1), C(n) is identical to C(nAf), i.e.
(x l)
~ . 2nnkAt
x(kAt) = ]T C(nAf) e J T (5.19)
k = 0,± 1,± 2 , . . . , ± * / 2 - l
If N samples are taken during the interval T then T = NAt. But t = Vifx = %* , so N = 2x.
Equation (5.19) can therefore be rewritten
.2**
Both C(nAf) and e J N are periodic, hence the range of the summation may be
changed and the last equation usually written
N-l
. 2nnk
(5.20)
N
A: = 0, 1, 2 iV—
N-l
\
\
\ 1I^ II 11,
h
Equation (5.20) contains no explicit frequency or time scale - the variables k, n and
N simply take on numerical values. One difficulty arises in that should the number of
samples N double, so will the number of frequency components in the spectrum. The
average power in the signal x(t) is independent of N so in order to obey Parseval's
theorem equation (5.20) is shown, intentionally, preceeded by the factor /N. In practice
this factor can be omitted as the ratios of the various values of C(n) are generally of
greater significance than their actual values.
The complementary transform is
2nnk
(5.21)
/i = 0, 1,2, . . . , W - 1
JV-l N-l
_J• 2™*
) e J
N e N
N-l N-l
.2nk,
= C(n)
r= 0
The results obtained for this type of waveform depend strongly on the duration of
the sampled waveform and on the position of the samples. Assume initially that the N
samples cover a whole number of periods of x(t). This is the particular case for which
the DFT was developed since no discontinuities are produced when the N samples are
made periodic. Under these circumstances the DFT produces the exact replica of the
continuous transform. A waveform of this type, and its spectrum is shown in Fig.[5.12].
In the Fig.[5.12] there were four complete cycles of a cosine wave in the truncation
interval T. This gives rise to two components in the DFT corresponding to the fourth
harmonic (when written in the form of equation (5.17) the DFT produces a spectrum
mirrored about &?). The mirror image about M must be taken into account when the
IDFT is to be calculated from a sampled frequency function.
The discrete Fourier transform 59
XCk^t) J =0
1-
k=31
/*
1 1 1. 1 1 1
1
0
I 1' 1 1 1 1
C(nZif) *
6.5 -
n=Q n=28 n=31
V / i \
Truncation
UaMsforn
Fig.[5.13]. The truncation waveform has a spectral density of the form sincf-
f
The amplitude spectrum of the truncated signal is given by the convolution of the
original spectrum (in this case a pair of delta functions at ±f0) with the sine function.
This causes a spreading of the original spectrum due to the infinite duration of the sine
function and is called leakage.
60 The discrete Fourier transform
Truncation removes any band-limitation that may have been performed (because of
discontinuities introduced at either end of the truncated interval), and hence it violates
the sampling theorem. A distortion called aliasing is introduced into the spectrum
because the the repeated spectra of the sampled waveform overlap.
Aliasing can be reduced either by decreasing leakage (i.e. by increasing the
truncation interval) or by increasing the sampling frequency ([1] has a good discussion
of these DFT pitfalls).
This type of signal is not band-limited, a typical example being a square wave.
When sampled such waveforms produce aliasing in the frequency domain. There are no
general rules for choice of sampling frequency and some experimentation is necessary.
As a guide it is usual to make the sampling frequency at least ten times the 3dB
bandwith of the signal to be processed.
Special care is required if a discontinuity exists within the truncation interval. If the
discontinuity does not coincide with a sampling point the exact position of the
discontinuity (and thus the pulse width in the case of a square wave) will be undefined.
This will result in spectral errors.
If possible, when a discontinuity coincides with a sampling instant the sample should
be given the mean value at the discontinuity. The use of the mean value is justified in
terms of the inverse Fourier transform.
Such waveforms are not periodic and are usually not band-limited either. It is
necessary to first truncate the waveform and to make the truncated waveform periodic.
This can produce considerable leakage and the spectrum obtained via the DFT requires
careful interpretation.
This is a function of the truncation interval T and need not necessarily be related to
the spectral components of the sampled waveform. The spectral resolution can be
increased at the expense of processing-time by augmenting the data points with zeros.
This does not of course affect leakage which is dependent on truncation of the true
signal sequence. It should be noted that adding zeros to the signal x(t) will not resolve
hidden spectral components despite the increase in resolution. This is because zero
padding only enhances previously visible components by an interpolation of spectral
values using discrete sinusoidal functions [1], True resolution increases are only
achieved by a reducing the sampling interval.
Leakage in the frequency domain results from the convolution of the signal spectrum
and the spectrum of the truncating waveform. The spectrum of a rectangular truncating
waveform has a sine function envelope with zeros at intervals of 1/T.
The discrete Fourier transform 61
Clearly if T (which is not necessarily = NAt) is increased the width of the sine
function is reduced thereby reducing leakage. It is not always feasible to increase T as
this requires a pro rata increase in the number of samples N.
One alternative is the use of a window function [2] which has an amplitude spectrum
which decays faster than the sine function. A widely used truncating waveform is the
raised cosine pulse also known as the Hanning Window function. The pulse waveform
is given by
T sin nJT
2
KfT(l-fTL)
The effect of the window is to remove the discontinuities at the edges of the
truncation interval. Other window functions also exist, e.g. the triangular window
function, Kaiser and the 50% raised cosine function.
The significant feature of the these window functions is that the side lobes decrease
faster than the rectangular case. The penalty paid for this reduction, however, is the
increase in the width of the main lobe. In general the smaller the amplitude of the side
lobes the greater the width of the main lobe.
The effect of increasing the width of the main lobe is to reduce he effective
resolution of the DFT, i.e. the spectral peak becomes broader and less well defined.
There is a trade-off therefore between leakage reduction and spectral resolution.
n 1
~ .2nnk
This is equivalent to the output of a non-recursive digital filter (see chapter eight) in
which the tap weights are given by
,2nnk
A(k) = e~J N
n-\
A{z) = N z , and noting that
-k l-gNz~N .
z = §lves
l a z
II T
T ''
~
l-z -N
A(z) =
. .Inn
l-z~le~J N
-co(N-l) nn sinN(iy
J
A((O) = e 2 e J N -r-TTZ—~—x
' sin( <¥i + Nnn )
This function essentially describes the frequency response of N digital filters. The
input to each filter is the wideband original signal. The response of each of these filters
is similar in shape to a sine function. The amplitude at any one filter output will be the
sum of all components falling within the filter passband. This clearly demonstrates the
effect of performing a DFT on a non band-limited signal.
N-l
Evaluation of this equation at the point z = e} ^ ' , i.e. at a point on the unit circle
with angle 27t%/, gives
(5.22)
n=0
Comparison of the DFT equation (5.21) with equation (5.22) shows equivalence. Thus
the DFT coefficients of a finite duration sequence are the values of the z-transform of
that same sequence of N evenly spaced points around the unit circle. More importantly,
the DFT coefficients of a finite duration sequence are a unique representation of that
The discrete Fourier transform 63
sequence because the IDFT (equation (5.20)) can be used to reconstruct the desired
sequence exactly from the DFT coefficients. So the DFT transform pair can be used to
represent finite duration sequences.
Before we leave the subject of the DFT, it must be stated that the DFT is calculated
in the computer by a very efficient algorithm called the fast Fourier transform (FFT, see
chapter six). The FFT takes advantage of the symmetry of calculation on the unit circle.
It reduces the N calculations required by the DFT to Nlog2N calculations. Since the
FFT provides a very efficient method of computing the DFT and IDFT it becomes
attractive to use the DFT in computing other functions.
Power Spectrum
S(n)=\ C(n) I2
Autocorrelation
N-l
,2nnk
J
N
Convolution
M
- 0 , m = 0, 1, 2,... , AM
The maximum lag being N=l. The FFT algorithm may be employed to advantage by
computing the DFT of xi and X2 multiplying them together and taking the inverse
transform of the product to give F(m). i.e. convolution in time domain = multiplication
in frequency domain.
64 The discrete Fourier transform
5.12 CONCLUSION
It has been shown that under certain conditions the DFT produces an acceptable
approximation to the continuous Fourier transform of a signal x(t). The properties of the
DFT and several of its pitfalls in the practical use have been discussed and methods of
coping with these problems have been put forward. The FFT algorithm enables the DFT
to be become a very powerful tool for the digital signal processor.
5.13 REFERENCES
1. J. V. Candy, Signal Processing: The Modern Approach, McGraw-Hill, 1988.
2. L. Rabiner and B. Gold, Theory and Application of Digital Signal Procesing,
Prentice-Hall, New Jersey, 1975.
3. E. Brigham, The Fast Fourier Transform, Prentice-Hall, New Jersey, 1974.
4. J. V. Candy, Signal Processing: The Model-Based Approach, McGraw-Hill, 1986.
Chapter 6
The FFT and other methods of discrete
Fourier analysis
S. Kabay
6.1 INTRODUCTION
The Fourier transform, or spectrum, of a continuous signal x(t) is defined as:
oo
both x(t) and X(jco) may be complex functions of a real variable. The basic property of
the Fourier transform is its ability to distinguish waves of different frequencies that have
been additively combined. For instance, a sum of sine waves overlapping in time,
transforms into a weighted sum of impulses which, by definition, are non-overlapping. In
signal processing terminology, the Fourier transform is said to represent a signal in the
frequency domain, and 0), the argument of the Fourier transform, is referred to as the
angular frequency.
The discrete Fourier transform (DFT) is used to approximate the Fourier transform of a
discrete signal. If we take the continuous function x(t) and represent it as a sequence of
N samples x(nT), where 0<n<N-l and T is the inter-sample time interval. The time
function given by:
N-l
xs(t) = £x(nT)5(t-nT) (6.2)
where 5(t) is the Kronecker delta function, represents an impulse train xs(t) whose
amplitude is modulated by x(t) at intervals of T (Oppenheim and Schafer, p82, 1989).
Similarly, let the spectrum X(jco) be represented by X(kco), 0<k<N-l, where co is the
chosen increment between samples in the frequency domain. The DFT becomes:
N-l
X s (kco)=Zx(nT)e-J coTnk (6.3)
s
n=0
where co=27c/NT. The latter formula yields a periodic sequence of numbers with period N.
Alternatively, the discrete Fourier transform may be thought of as an evaluation of the z-
66 Methods of discrete Fourier analysis
transform of the finite sequence x(nT) at N points in the z-plane, all equally spaced
along the unit circle at angles of kco radians (Rabiner and Gold, p390, 1975). In digital
signal processing we consider only discrete samples of both the time function and the
spectrum and only a finite number of samples of each.
Direct calculation of the DFT requires N 2 complex multiplications. In the past, such
intensive calculations was limited to large computer systems. However, major
developments in a class of efficient algorithms known as the fast Fourier transform
(FFT) for computing the DFT has dramatically reduced the computational load imposed
by the method. First reported by Cooley and Tukey (1965), the FFT algorithm now has
many implementations with various levels of performance and efficiency.
The FFT is based on the concept of sub-dividing a large computational problem into a
large number of sub-problems which can be solved more easily. This approach has
allowed for computation of the DFT on relatively inexpensive computers. The
implementation of a particular FFT algorithm will depend on both the available
technology and the intended application. In general, the speed advantage of the FFT is
such that, in many cases, it is more efficient to implement a time-domain calculation by
transforming the analysis into the frequency-domain, and inverse transforming the result
back into the time-domain.
This chapter will consider the radix-2 FFT algorithm where the total number of data
samples is an integer factor of two and discusses its application to spectral analysis of
signals.
The direct evaluation of the DFT equation (6.3) requires N 2 complex multiplications and
additions, and for large N (>1024 points) this direct evaluation can be too time-
consuming. The FFT has dramatically reduced the number of computations to Nlog 2(N),
where N is an integer power of two. For example, with a sample length of N=1024, a
computational saving of 99 % is achieved. Efficient application of the algorithms requires
that N be highly composite, but for most problems it is possible to impose this restriction
on the data to be transformed so that FFT algorithms may be used.
N-l
X k =Z o x n W n k (6.4)
Fast algorithms can be developed for any value of N that can be expressed as a product
of prime factors; however, the algorithms suffer from the limitation of being specific to
that particular value of N. One variation on the FFT is that due to Winograd (Burrus and
Parks, 1985), which uses more additions but fewer multiplications than the FFT.
However, this method offers no real advantage, since DSP devices are now as efficient
Methods of discrete Fourier analysis 67
at multiplication as they are at addition. For cases where N is not a power of two, it is
more efficient to pad the sequence with zeros up to the nearest power of two.
In-place FFT algorithms make optimal usage of memory since they only require enough
storage to hold the DFT input sequence. As the calculation proceeds between stages,
the results are overwritten to the original input buffer. Two classes of in-place algorithm
exist. In the decimation in time (DIT) method, the transforms of shorter sequences,
each composed of every rth sample, are computed and then combined into one big
transform. The decimation in frequency (DIF) method where short pieces of the
sequence are combined in /--ways to form r short sequences, whose separate transforms
taken together constitute the complete transform.
forl=0,l,...,(N/2)-l. (6.6)
h x
r 2i+i
The DFTs of these sequences may be regarded as periodic sequences with period N/2.
These DFTs may be written:
N/2-1
G k = 2j giW21k (6.7)
N/2-1
Hk= 2j hjW^ (6 . 8)
Since we are interested in the DFT of the whole sequence, which can be written in terms
of gj and hj as:
N/2-1
Xk= Z (gjW^+^W^ 1 *) (6.9)
l—u
N/2-1
=> Xk= X (g1W2Ik+Wkh1W21k)
=* Xk= G k +W k H k (6.10)
68 Methods of discrete Fourier analysis
The last relationship has important computational implications. The direct method of
computing G k and H k would require (N/2)2 each, and an extra N operations are required
to form X k , for a total of N+(N/2) 2 operations. Direct computation of X k would require
N 2 , and so, for large N, the last relationship reduces the computation by a factor of 2.
The index k runs from 0 to N-l. However, G k and H k have a period N/2 and need to be
computed only for the range 0 to N/2-1. Hence, a full description of X k is given by:
X k = G k +W k H k (0<k<N/2-l) (6.11)
x
k = Gk-N/2+Wk-Hk-N/2 (N/2<k<N-l) (6.12)
The real power of the above relationship is realised when it is used recursively. Where
N is an integer power of 2, it follows G k and H k can in turn be reduced in the same way
as the computation of X k . Thus, G k and H k is computed from the N/4-point transform of
the odd and even numbered sequences. In this way, the computation of X k is
successively reduced until we reach the stage where N/2 DFT computations are required
of a two sample sequence. Fig. 6.1 (a) shows the data flow diagram for an 8-point DIT
FFT calculation.
6.3.2 Decimation in Frequency
The DFT of a sequence x{ of length N points is considered as the DFTs of two
sequences of length N/2 points, gl and hj, where:
N/2-1
: (s v
x
k= >S '
N/2-1
/Ik+h
lW(I+W
x
k= (6.14)
Taking the even and odd frequency components of X j , separately and replacing k by 2k,
then:
N/2-1
2
X
2k== ^ (gf ^ ( W ) * (6.15)
Equations (6.15) and (6.16) are the results of N/2-point DFTs performed on the sum
and difference of the first and second halves of the original input sequence. Fig. 6.1(b)
70 Methods of discrete Fourier analysis
shows the data flow diagram for an 8-point DIF FFT calculation.
DIT
Butterfly
A+V^B
DTF
Butterfly k
B A-WfiB
There are two features worth noting from Fig. 6.1. First, the DFT calculation makes
repeated use of the butterfly operation shown in Fig. 6.2. The butterfly is fundamental to
the FFT and its execution speed can be used to benchmark the performance of the FFT
algorithm. Secondly, the data sequence is scrambled during the course of the calculation.
In the DIT algorithm the input is bit-reversed and the output is in correct order. Also, the
DIT butterfly is twiddled before the DFT. The reverse is true for the DIF algorithm. Fig.
6.2 shows the DIT and DIF butterfly representations.
6.4 BIT-REVERSAL
For in-place computation of the radix-2 FFT, the data sequence must be scrambled to
correct for the repeated movement of odd-numbered members of the input data to the
end of the sequence during the development of the algorithm. The data scrambling is
known as bit-reversal because the sample whose time index is given by the binary
number a2a!aQ must be exchanged with the sample at location a Q a ^ . Many digital
signal processor devices incorporate bit-reversed addressing to overcome the
compromise between speed, memory and versatility associated with bit-reversal in
software.
Other variations to the radix-2 FFT algorithm, which use a more complex data flow
arrangement, eliminate bit-reversal altogether (Stockham, 1966). Although both the
Methods of discrete Fourier analysis 71
input data and the resulting spectrum are in natural order, extra memory is required to
duplicate the input data sequence.
Table 6.1
Table 6.1 shows the execution speed for a 1024-point complex FFT for various computer
systems (All machines have a floating point accelerator where appropriate). This is
intended to provide an informal comparison of the order of performance across
architectures.
72 Methods of discrete Fourier analysis
(6.17)
Since the use of the FFT requires finite-length signals, windowing must be applied in
advance of the analysis. The spectral resolution of this method increases with the
window length. Truncation or windowing of a signal will inherently produce sharp
discontinuities in the time-domain, or equivalently results in a sin(x)/x function with
characteristic side-lobes in the frequency domain (Oppenheim and Schafer, p702, 1989).
The side-lobes are responsible for the distortion which tends to leak energy across
spectral components. Spectral leakage can be significantly reduced by selecting a
Methods of discrete Fourier analysis 73
window function with minimal side-lobe characteristics. However, the more one reduces
leakage, the lower the spectral resolution due to smearing of the spectrum.
The periodogram uses the FFT to estimate the power spectrum of a stationary random
signal (Oppenheim and Schafer, p730, 1989). This is calculated from the squared
magnitude of the FFT of a segment of the signal. For large data sequences, the accuracy
of the periodogram estimate can be improved by dividing the signal into shorter
segments and averaging the associated periodograms. Alternatively, the FFT can be
used to estimate the autocorrelation function. Applying a window to the autocorrelation
estimates followed by the FFT will result in the smoothed periodogram. This is a good
spectral estimate.
A further discussion of the FFT and its application to identifying respiratory system
dynamics of patients receiving anaesthesia can be found in Chapter 22 of this book.
B(z)
A(z) ^
From the model of the dynamic system shown in Fig. 6.3, we assume that y is a
stationary stochastic process given by
u^yj+vj (6.18)
where yA is a deterministic process (predictable from its past values) and Vj is a non-
deterministic moving average (MA) process given by:
q
V; = Z b k x i k (6.19)
k=0
Both V| and Vj are stationary and mutually uncorrelated signals. Equation (6.19)
describes a time series model with a finite set of q parameters. Applying the Z-
transform to we get an expression for the transfer function B(z), where:
q
= Z bkz"kX(z) = B(z)X(z) (6.20)
k=0
Methods of discrete Fourier analysis 75
Then:
X(z) = (l+a 1 z- 1 +a 2 z- 2 +...)Y(z)
P
=* y i = - k S a k y i . k + xi (6.22)
From equation (6.22), we see that the present yf is a linear combination of the past
outputs yj_k and the present input x{. This can be considered as a linear prediction
equation for yi? with x i as a prediction error. Thus, A(z) is a pre-whitening filter,
transforming a signal y into white noise x. The inverse power spectral density of y is
then given by the squared frequency response of A(z).
Vi—J^y^+j;^^ (6.23)
Parametric spectral analysis results in a mathematical model which can be used for
classification, identification or other purposes. An analytical description of the process
can lead to some insight about the structure of the system under analysis.
The parametric method achieves a relatively high data reduction. The result is often a
smoothed, high resolution spectral estimate derived from a limited data set. However,
the recursive nature of parametric analysis results in a computational effort that is
comparable with Fourier methods.
6.8 BIBLIOGRAPHY
Brigham, E.O. (1988): The fast Fourier transform and its applications, Prentice Hall,
Englewood Cliffs, NJ.
Burg, J.P. (1975):" Maximum entropy spectral analysis". PhD Thesis, Stanford
University.
Burrus, C.S. and Parks, T.W. (1985): DFTIFFT and computation algorithms theory and
implementation, John Wiley and Sons, New York.
Cooley, J.W., Lewis, P., and Welch, P. (1967): "The fast Fourier transform and its
applications". IBM Res. Paper, RC-1743.
Cooley, J.W. and Tukey, J.W. (1965): "An algorithm for the machine computation of
complex Fourier series". Math. Comput., 19: 297-301.
Danielson, G.C. and Lanczos, C. (1942): "Some improvements in practical Fourier
analysis, and their application to x-ray scattering from liquids". / . Franklin Inst., 233:
365-380.
DSP Committee (1976): Selected papers in digital signal processing, IEEE ASSP, IEEE
Press, New York.
DSP Committee (1979): Programs for digital signal processing, IEEE ASSP, IEEE
Press, New York.
Gersch, W. and Liu, R.S.-Z. (1976): "Time series methods for the synthesis of random
vibration systems". Trans. ASME, J. Appl. Mech., 43(1): 159-165.
Hildebrand, F.B. (1956): Introduction to numerical analysis, McGraw-Hill, New York.
Methods of discrete Fourier analysis 77
Jones, N.B. (Ed.) (1982): Digital signal processing, IEE Control Engineering Series 22,
Peter Peregrinus, England.
Linkens, D.A. (1982): "Non-Fourier methods of spectral analysis", In: Digital signal
processing, (Ed.) N.B. Jones, IEE Control Engineering Series 22, 213-254.
Oppenheim, A.V. and Schafer, R.W. (1989): Discrete-time signal processing, Prentice
Hall, Englewood Cliffs, NJ.
Papamichalis, P. and So, J. (1986): "Implementation of fast Fourier transform algorithms
with the TMS32020". DSP Applications with the TMS320 Family, Texas
Instruments, p32.
Rabiner, L.R. and Gold, B. (1975): Theory and application of digital signal processing,
Prentice Hall, Englewood Cliffs, NJ.
Stockham, T.G. (1966): "High speed convolution and correlation". AFIPS Proc, 28: 229-
233.
Van den Bos, A. (1971): "Alternative representation of maximum entropy spectral
analysis". IEEE Trans., IT-17(4): 493-494.
Wold, H. (1938): A study in the analysis of stationary time series, Almqvist and
Wiksell, Upsalla.
Chapter 7
7.1 INTRODUCTION
Anti-aliasing
filter
\f
A/D Converter
x*(t), t y r ir i t'
MPU and External
Peripheral chips ROM/RAM
y*(t) t f'
t^ f i t ^
D/A Converter
f(t), r
Reconstruction
filter
T
y(t) [Continuous output signal]
It has been shown that digital filters process sampled signals, and
that consequently the existence of the complementary spectra
(Figure 7.2) dictates that the input signal must be bandlimited to a
maximum value of a>s I 2 radians per second. This is normally
achieved using a suitable analogue lowpass filter (e.g. Butterworth or
Chebyshev), which is the anti-aliasing filter.
Introduction to digital filters 81
Frequency spectrum
Baseband of x(t)
-co
-co
-kT)
x*(t)
The value of x(t) is only known for t = kT, therefore we may write
L[x*(f)] = f>(/(7-)exp(-/csr)
k =0
.- k
X(z)= (7.2)
k =0
X Z
k
~k (7.3)
Introduction to digital filters 83
The input and output signals of a digital filter are related through
the convolution-summation process, which is defined as
where >\ is the filter output sequence and Q\ is the filter impulse
response sequence.
The ratio Y(z) / X(z), equal to G(z), is the transfer function of the
digital filter.
It follows that
-1 -2 - Pi
+ a2z 2 + - - a z »]
. -^ , _ 2 . - q~\
bAz + bnz + -• b z
1 2 q J
—1 —2 ~P
i.e.^X(z)+ a1X(z)z + a2X(z)z +• • •+ a X(z)z
— /c
but z corresponds to a delay equal to k sampling periods,
consequently equation (7.6) may be written in a linear difference
equation form:
a
0 X /c 4 " a X
i (/c-1) + a
2X{k-2) +
' ' ' + a
pX(k- p)
l
Let G(z) = y-f-^-. then Z [G(z)] =gf = a '
Introduction to digital filters 85
(*-p)
(Note FEEDBACK)
Figure 7.4 Block Diagram of IIR Digital Filter
(7.8)
Y(z)
X(z)
and it follows that
-2) (7.9)
7.3 REFERENCES
8.1 INTRODUCTION
Compared with an IIR filter, the FIR counterpart will generally use
more memory and arithmetic for its implementation. The technique
of frequency denormalisation (Chapter 9) is not generally suitable for
FIR filter design because a recursive filter will normally be produced.
The one exception is the lowpass to highpass transformation, in
which the linear phase characteristic is preserved.
We will assume that the FIR filter coefficients are all equal to unity,
and correspondingly the impulse response sequence of the Af-term
moving averager is therefore
<- N - t e r m s ->
G(z) =
1
G(exp(jco T)) = Jj
ILLUSTRATIVE EXAMPLE
|G(exp(/c*T))|
|G(exp(y a>7))|
|G(exp(/a>7))|
Solution:
f
NT 4x103
N 1
~ - /
G(z)= X flf. z
/ =0
Using the inverse discrete Fourier transform (IDFT) relationship we
obtain
= I if I exp
n=0 /=0
92 FIR filter design methods
A/-1 - N
G(z)= 1- z
N j 2K (8.2)
(O =
N
ILLUSTRATIVE EXAMPLE
Specification:
|Gd(exp(/fl>r))
1 -4
0 1 2 3 4 5 6
Figure 8.2 Desired Frequency Response
FIR filter design methods 93
Solution:
In this case N = 8, and substituting this value in equation (8.2) gives
G(z)= 1- z"
8
n=0 1 - exp J
7
1- z
X - 1
n=0
but G 0 = G 1 = G 7 = 1 and =0
\-z'
8
This reduces to
,-8 2-V2z"1
G(z) =
1- I-V2z~ + z
ILLUSTRATIVE EXAMPLE
I I I I I I i i r
Go 12 15
I 1
(o = 0 co = co / 2 CO = CO
s
Figure 8.3
The ideal frequency response is sampled at intervals of I/NT Hz,
and the impulse response sequence is related to the frequency
samples by the inverse discrete Fourier transform (IDFT), which is
A/-1
. ,- / n
(8.3)
/7=
{1, 1, 1 , 1 , 0 . 5 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 . 5 , 1, 1,1}
{0.5,0.314,0, - 0 . 0 9 4 , 0 , }
FIR filter design methods 95
W =0.54 + 0.46cos(n;r/ /)
n
{0.5,0.27,0,-0.02,0}
{0,-0.02,0,0.27,0.5,0.27,0,-0.02,0}
G(z) = - 0 . 0 2 + 0 . 2 7 z - 2 + 0 . 5 z - 3 + 0 . 2 7 z - 4
- 6
(eJB)
Actual
-10-
-20-
-30-
-40-
-50
CO
0
180 1
0)
J
-180
Figure 8.4 Magnitude/Frequency and
Phase/Frequency Characteristics for Design Example
A number of comments can be made concerning the design of FIR
filters using window functions.
Other methods of FIR filter design are sometimes used, such as the
equiripple approximation method [5] or the analytical technique [6],
but their design is relatively more difficult and complex.
8.5 REFERENCES
9.1 INTRODUCTION
Step (1) involves the design of the prototype analogue filter, and
since Butterworth and Chebyshev analogue filters are commonly
used for this purpose, then the design of both types will be
summarised herein.
!
|G(/cy)| = — — , whereco i s t h e radian c u t - o f f frequncy.
1+
On
(f" )
(9.1)
n =1 n=2 n= 3 n =4 n= 5 n=6
Table 9.1
or
n odd
100 IIR filter design methods
However, only the poles in the left-hand half of the s-plane are
used in deriving G(s) (to preserve stability). Having determined the
pole locations, the transfer function,G(s), may be written down in
the form
G{s) = 1
.:G(S) = (9.2)
1.0000000
1.4142136 1.0000000
Table 9.2
s
Lowpass
Highpass
s
Bandstop
[s2 + (cocuxcocl)]
[s2 + (cocuxcocJ)]
Bandpass
S a)
( cu'(Ocl)
Table 9.3
Specification:
Solution:
2ri
15=10 log10'
.-. use n = 3
Now using Table 9.2 the transfer function of the required 3rd-order
normalised Butterworth lowpass filter is determined as:
G(s)=
s 3 +2s 2
This transfer function is now denormalised (denoted by subscript
CO
DN) using the transform given in Table 9.3, i.e. s - » - ^ - , as follows:
i.e. G ( s ) _ = - 2 2 3
- s +2w c s +2a)c s + cac
3 3
but co =2nf = 2 ^ x 2 x 1 0 =4^x10 rad/ s
07
S 3 + 8TT x 1 0 3 s 2 +3 2^2x 106s +64TT3X 1 09
G(jco) = ^ g 1
a
-• ** -.^^9 • rs* o in ,/»^ 3 in
co3
(co3-32n2x ^06(a)+ }{§4K3 X109-8TT
IIR filter design methods 103
8/r3x10 x
G(jco)
DN ^6+64
-•f (kHz)
\G(jco)\ = 1
Taking COQ = 1 (the usual value for the normalised frequency scale),
then the Chebyshev cosine polynomial is defined as
vn
cosvncos
C(co) =
cosh(n cosh co) I
CO > 1
.: X d B s 2 0 l o g 1 0 £ + 2 0 l o g 1 0 ( 2 n " 1 x con)
= 2 0 l o g 1 0 e +C/7-1) 2 0 l o g t 0 2 + 2 0 n log ) Q co
i. e. X d B = 2 0 l o g 1 0 e + 6 ( n - 1) + 2 0 l o g 1 Q co (9.3)
unit-circle
circle of
radius cosh A
s-plane
circle of
radius sinh A
- JO)
Figure 9.2 Translation of Butterworth Pole
Position (PI) to Chebyshev Pole Position (P3)
ILLUSTRATIVE DESIGN EXAMPLE
Specification:
Solution:
At 0) = 1, | = -0.5dB-20log 1 0 - r l
.: -0.5 = 20log101-20log10Vi+
/. e = V i O ° ° 5 - 1 =0.3493
IIR filter design methods 107
20 = 2 0 l o g 1 0 0 . 3 4 9 3 + 6 ( n - 1) + 20n log 1Q 5
unit-circle
s-plane
Figure 9.3 Butterworth Pole Positions for n = 2
is located at = + / — = , and
i s located a t - —^ - j; 1
V2
708 IIR filter design methods
Re [ChebyshevJ = - - 4 = x 0.71 = - j
I m[Chebyshev1]= ± j - 4 = 4
Re[Chebyshev2]=--|-x 1.42 = -
••• G(s) =
Denormalising s - ^ - J -
c
1
i.e. G(s) =
DN s* V2s _3
2+ (0 +
P
c
(0 C -
c
coc2
2
G( s = 1 6 T Z : X 106
w 2
G, - , = 1 6K2 x 1 0
2
™ (24^ x10 -o)2) 6
+ j ( 4 V ^ ^ x 1 0 3 o>)
16*2x106
CN
\2 4n2x 106 - 4 * 2 x 1 O6 X f 2 J + y X 1 O3 x2n x10 3 x f)
(e-fz)+ x 0
4
DN
=0.944
i.e. k =1.416
4 x 1.416
DN
V74-4f2+36
Other forms of prototype filter are available, and in some cases the
Cauer filter [1] or the Bessel filter [1] could be used as an appropriate
denormalised transfer function, G(s) .
DN
-o.;
V
-5 \
(dB) \
N
-10
\
>
-15
\
\
-20
>
\
-25
-30
—•
f (kHz)
0 1 2 3 4 5 6 7 8 9 1 0
Figure 9.4 Magnitude/Frequency Characteristic
IIR filter design methods 111
G ( Z f = T x G(z) = T x £ o, \ Z _ * D T , ] (9.9)
ILLUSTRATIVE EXAMPLE
P z e x p ( - aT) sinpT
z
(s + a) + p ~2z exp(- a T) cos pT +exp(- 2a T)
(9.12)
ILLUSTRATIVE EXAMPLE
G(s) =
s 2 + V 2 o) s + 0) 2
CSL Cd.
IIR filter design methods 113
Solution:
f
cd = 1 0 0 Hz
(given), .-. cocd = 2K X 1 00 rad/ s
2 f T\ 1 1
co = = - t a n 200/r x — , but T =-—=—— = 1.6 ms
ca
I \ c. J T OdD
3
.\co = tan G 00^ x 1.6 x 1 0 ) = 687.2 rad/ s
ca 3
1.6 x 1 0 "
(687.2)
W
S 2 + A/2" X 687. 2 S +(687.2)"
1.6 x 1 0 "
.21
(687. 2)
(z -
1250' + A/2 x 687. 2 x 1 250p-^|j + (687.
6 . 8 8 z 2 - 4 . 6 2 z + 1.74
ILLUSTRATIVE EXAMPLE
G(z) = 4 * 1 0 3 x
|G[exp(/o> 7 ) ] | 2 = 1
[
2
We require suitable trigonmetric functions for [^(^"O] , which for
the substitution z =exp(jcoT), produces a mirror image polynomial
in the z-plane. For example, the following relationship satisfies the
lowpass filter requirement:
V 2 ) - Az
IIR filter design methods 115
|G[expUa>r)]| =
s\n2(coT I 2) 2)] 2)]
sin 2 (w c 7 / 2)
where
n+ n
q =sin2(o)cr/ 2 ) andp = - ( z - 1 ) / 4 z #
pk = q exp(y ^ ) , /c = 0, 1, 2, • • - , ( n - 1 ) a n d
(2/c +1)
for n even
2k
for n odd
(9.13)
ILLUSTRATIVE EXAMPLE
Specification
Solution
q =s\n2(cocT I 2) = s i n 2 ( 2 / r x 1 0 3 / 1 0 4 ) = 0 . 0 9 6
p = s\n2(coT I 2) = s i n 2 ( 2 * x 3 x 1 0 3 / 1 0 4 ) = 0 . 6 5 5
|G(z)|2 = f = 1 = 1 1
(6.823)
Therefore
Similarly
Now selecting the two poles that lie inside the unit-circle we obtain
f
G(z) =
[z - (0.58 - y 0. 27)] [ z - ( 0 . 5 8 + j 0.27)]
G(z) = 0.25
[ z - ( 0 . 5 8 - y'0.27)][z - (0.58+ j 0. 27)]
G(Z)= °- 25
z*+1.16z +0.41
1 - az sin [(0-^)7-/2]
Lowpass a=
z -a
cos[(/?- o)c)T I 2]
Highpass - [ ^ ] a = - cos[(/J + co )T I 2]
c
1-^flfe_ + . COS
(6
Bandpass a = cos[(© - 2
-+ z
2
J
c o t [ ( © 2 -- « i ) 7 '2]
6=
1/ [tan )37"/ 2 ]
2az , z 2 f l - 6
(6 + 1) (6+1)
COS[(CD2 + »,)! / 2 J
Bandstop a = cos[(a> - / 2
2az + Z 2
V (6+1)
2
J
tan[(o>2 - / 2]
6=
1/ 2]
Table 9.4
118 IIR filter design methods
The methods presented in this chapter are suitable for the design of
recursive digital filters, and they readily implement signal filtering
processes in a variety of practical applications. However, other design
methods exist, each generally having merits for particular situations.
Further examples and details can be found in References [8-11].
9.10 REFERENCES
[11] FJ. Taylor, Digital Filter Design Handbook, (Marcel Decker, New
York, 1983).
Chapter 10
10.1 INTRODUCTION
01 1
3Q -
01 0
2Q -
001
Q -
O (10.1)
°i 12 12
k\Q k k
where 0 < F < 1. In this case, and assuming that the quantisation
errors are due to rounding, the signal-to-noise ratio (SNR) is
\F2
SNR=10log dB
10
G.
2 2 2
where r o^ j s m e variance of ^ x / c , and ^ is the variance of
the quantisation noise. As a rule-of thumb, for negligible clipping, F
is generally set equal to 1 ^ (^ a x ) , and for this case it follows that
SNR = 1 0 l o g 1 dB
10 (10.2)
25 a.'
122 Quantisation in digital filters
The commonly used 12-bit A/D converter has a SNR = 68.8 dB.
2 Q2 v<
ILLUSTRATIVE EXAMPLE
2
C^
a
o"^i"^"' T n u s it is seen that the pole location (filter structure) has
2
a significant influence on the value of °Q . Indeed, in this respect,
this form of round-off error analysis in digital filter implementations
may be undertaken to enable the optimum form of realisation
structure to be selected [2-5].
If one or more poles of the digital filter are located close to the
circumference of the unit-circle in the z-plane, making G(z)
Quantisation in digital filters 123
The pole positions of the ideal digital filter may be computed from
its characteristic equation, i.e. from
q
- i
b z
1+ X - =0
J
j =1
q
1 + J b z~J +Ab z~ h=0 (10.5)
y=1 J
Ab = 1+ Z b (10.6)
h
j =1 J
ILLUSTRATIVE EXAMPLE
Specification:
Solution:
< 2
0.1
Thus the minimum word length, n, to maintain stability is n = 6.
Note, however, that if coefficients are truncated rather than rounded
the calculated value is increased by one, to n = 7 in this example.
[ r 5V/v 1 N
n = smallest integer exceeding] - log A/ 0\ Yl
2 /V +
{ L2 ^J/c = (10.7)
ILLUSTRATIVE EXAMPLE
Specification:
Solution:
The z-plane poles are located at z =0.85 ± j 0.1 5, and G{z) has a
second-order denominator, therefore N = 2. Since the poles are a
complex conjugate pair, then
G(z) = I ' 2
1 + 1.86z~1+0.98z ~ 2
which has two zeros at z = ±1 and a pair of complex conjugate poles
at z = 0.9302957± j 0.3385999. The poles are thus located quite
close to the circumference of the unit-circle, and any pole movement
due to coefficient quantisation, plus cumulative arithmetic errors,
may place the poles on or outside the unit-circle, leading to
instability.
When the input signal to a digital filter is constant or zero, the fixed-
point finite word length arithmetic rounding errors cannot be
assumed to be uncorrelated random variables. In fact, in these
circumstances, the performance of the filter may exhibit unwanted
limit cycle oscillations [8-10]. This implies that the round-off noise is
significantly dependent on the input signal, and its effect is easily
seen by considering a simple example of a first-order IIR filter.
satisfying R < -*- x (1 - \C\) , will produce the same resulting steady
output value.
Let us now consider a change of sign for the coefficient in the above
digital filter, i.e.
= - 0 . 9 3 x 1 2 =-11.16-> -11
y = - 0 . 9 3 x ( - 1 1 ) =10.23-» -10
128 Quantisation in digital filters
y, , = - 0 . 9 3 x 10 = - 9 . 3 - > - 9
y = - 0 . 9 3 x ( - 9 ) = 8.37-> 8
y =-0.93x8=-7.44-> -7
\O )
y, =-0.93x(-7)=6.51-> 7
y =-0.93x7=-6.51-> -7
y. v = - 0 , 9 3 x ( - 7 ) = 6 . 5 1 - > 7
v8)
10.7 REFERENCES
[10] S.R. Parker and S.F. Hess. Limit Cycle Oscillations in Digital
Filters, Trans. Circuit Theory, IEEE, 18 (1971) pp 687-97.
11.1 INTRODUCnQN
systems. The aim of this chapter is to introduce the state space or modern approach
modern control design can readily be extended to cover systems with several inputs
and outputs.
state space models including the key ideas of controllability and observability. The
implementation of state feedback control laws assumes that all the state variables are
known. In practice, the state vector is estimated from the measurements or plant
The chapter finishes by examining linear quadratic optimal control. Here the
state feedback gains are chosen to minimise a scalar quadratic cost or index of
performance which is chosen by the designer. This approach removes the need to
define the closed-loop pole positions which can be difficult, especially for high
the form
y = Cx (2)
The solution of equation (1), analogous with the scalar case, is given by
A
x(t) = c ( l " ^ ( t o ) + [ e A ( t ~ X ) B C u(x)dx
e (3)
J
to
Equation (3) expresses the familiar idea that the state response x(t) has two
components, one due to the initial conditions x(to) and the other from the input
Ac t
u(t) via a convolution integral. The matrix exponential e is defined as
Act
e = I + A c t + 5 j Act2 + ...
matrices.
difference equations. Thus consider Figure 1 which illustrates the sampling of the
input and output vectors for the plant at intervals of T. If a zero-order hold is
employed, the control input u(t) is replaced by the staircase function given in
Figure 2, where
u(0)
0 T 2T t
x(tk+1) = (4)
tk
Equation (4) can be expressed more simply as
Xk+i = Ax k + Bu k (5)
A = e A ' T•; B =[
The corresponding discrete measurement equation is
y k = Cx k (6)
sequence of inputs UQ, ui, M2 ... u that will translate the system from an initial
It can be shown that the system is controllable if the matrix in equation (7) below
Definition: The system in equations (5) and (6) is observable provided that any
arbitrary initial state xo can be calculated from the N measurements yo, yi, ...
V
N_i> w * m N finite.
It can be shown that the system is observable if the following matrix has rank n i.e.
C
CA
CA2
The diagram in Figure 3 illustrates the principle of stable feedback where the
control signals are formed from linear combinations of the state variables
PLANT
-K
Although the state vector is generally not available for measurement, state feedback
control is useful to study for the following reasons, (i) It is instructive to see what
can be done in shaping the desired closed-loop response when full state information
is available, (ii) Techniques exist for estimating the states from measurements of
the plant inputs and outputs, as will be shown later, (iii) Optimal control laws often
take a state feedback form so that it is useful to understand the effects of this
strategy.
The open-loop system is defined in equations (5) and (6) and the control law is
given by
uk = - K x k (8)
x k+1 = ( A - B K ) x k (9)
Yk = Qc k
Now, the stability of a discrete linear system is determined by the roots of the
characteristic equation (or the eigenvalues of the system matrix). The characteristic
Oo(z) = | zl - A |
ccc(z) = | z I - A + BK|
The basic idea behind state feedback is to position the closed-loop poles by choice
m tn
of K. Suppose it is decided to locate the closed-loop poles at Xh X2, ..., ^ n e
ccc(z) = (z-\l)(z-\2U(z-\n)
= | z I - A + BK| (10)
Equation (10) can then be solved for the unknown gains by equating coefficients.
system in controllable canonical form where the plant and input matrices have the
Now, if the desired characteristic equation for the closed-loop system is expressed
as
ac(z) = (13)
State space control 135
where the a's are of course all known, then the unknown gains can be determined
In practice most single-input state models are not in canonical form. However
controllable systems can be transformed to canonical form. The state feedback gains
are then calculated using equation (14) and finally transformed for application to the
original state model. Ackermanris formula is based on this principle. The matrix
polynomial occ(A) is first formed using the coefficients of the desired characteristic
equation (13)
Example: Design a state feedback controller for the system below to produce
Hence,
= [4.52 1.12]
An observer or state estimator is used to reconstruct the full state vector from
a dynamical system, driven by the inputs and outputs of the plant, whose state vector
1
V
PLANT
(A,B)
MODEL
(A.B)
L OBSERVER
an tne
Given a current state estimate x k d control input u k , a reasonable prediction
of the state vector at the next sample instant can be produced from the system model
as
= Ax k + Bu k
The quality of this open-loop estimate can then be checked by comparing the
A
observer output y k with the actual plant output y k . The difference between the
ekAxk-xk (17)
follow from equations (5) and (16). Thus
ek+i = (A-LC)e k (18)
In practice the observer gain matrix L is chosen to stabilise the observer and hence
produce satisfactory error convergence.
To select L the same approach is adopted as in designing the state feedback
control law. The stability is determined by specifying the desired estimator root
locations in the z-plane, typically to make the observer 2 to 4 times faster than the
closed-loop control system. Then, provided y^ is scalar and the system is
observable, L is uniquely determined. Two alternatives are available for computing
L. The first is to match coefficients in like powers of z on the two sides of the
equation below
ae(z) = | z I - A + LC| = ( z - p i ) ( z - B 2 ) . . . ( z - P n ) d9)
In equation (19), the p's are the desired estimator root locations. Ackermann's
estimator formula can be employed instead. This is given by
L = cce(A)
c
CA
-1 0'
0
CA2
CAn-l 6I
It is interesting to note that state feedback design to find K for a single-input
system and state estimator design to find L for a single-output system are dual
problems i.e. the control design procedure can be used for the "fictitious"
closed-loop system AT - CTLT to find L instead of A - KC for K.
11.5 COMBINED CONTROL AND ESTIMATION
The third section of the chapter dealt with the design of a state feedback control
law assuming full state knowledge while the last section described estimation of the
state vector from the system outputs using an observer. The complete control
system is implemented by using this estimated state vector in the control law of
equation (8), as shown in Figure 5.
138 State space control
PLANT
r
K OBSERVER
|_CDNTROLLER .J
altered?
x.k+1 = A x k - B K { x k - e k }
When this is combined with the estimator error equation (18), the complete system
e k+1 l T A - LC 0 l|ek-
xkj [ BK A-BKj[xk
The poles of this composite system follow from the characteristic equation (20)
zl - A + LC 0 (20)
= 0
-BK zl - A + BK
This can be written as
Then the combined control-estimator system includes the same poles as those of the
control system above. This illustrates the Separation Principle by which control and
So far control design by pole placement has been discussed and only
State space control 139
the specification of, the feedback gains K and L. An alternative way of designing
control law.
function is chosen to give generally desirable properties to the system and also to be
The most general problem which has been completely solved is that for
time-dependent linear systems with quadratic cost function and the treatment will be
and given the initial state vector XQ, determine the control
Mk = fkfek)
that minimises the cost function
JN = 2 S [ x t Q k x k + ulR k u k ], (22)
k=0
The motivation behind this particular choice of cost function is that, in some
sense, the states will be driven towards zero whilst at the same time keeping the
For the linear-quadratic problem with finite time-horizon, N, it turns out that the
where the feedback gains, K k , are time-varying even if the plant and cost function
condition XQ.
Taking the former course the above constrained problem can be rewritten as the
and
3J
N
adjoint equations (25)
= Qx k - i k + A2ik+i = Q
Using (23) to substitute for u_k in (24), these equations provide a set of coupled
an
difference equations which can be solved for the optimal values of Xk d hk and
hence u k through (24) again. The initial xo is presumed known (in fact it is not
required), but not ^o- However, for minimum J , u will be zero because it
Thus, finally, it is seen that the problem has been transformed into the solution
of two difference equations with the initial xo given and the end condition on X
State space control 141
assuming
2* = Pk& (27)
This assumption arises naturally in the Dynamic Programming method of solution to
the original design problem, which also gives a physical meaning to the matrix P k
J
N-k = T^PkXk = J- I[xTQXr + uTRuJ.
n
*• 2 2
Using (27) to eliminate X from the adjoint equation and combining the result and
[ P k - A T P k + i A + A T P k + 1 BS-iB T P k + 1 A-Q]x k = 0
which must be true for arbitrary x k , hence the [ ] matrix must be zero. This gives a
backward-in-time equation in P k ,
Or,
P k = A T M k+1 A + Q
where
The initial condition for this backward recursion for P k + 1 is given by (26) and
(27), namely
P
N = %•
where
142 State space control
1. Let PN f- QN and
2. Let k f - N
3. Let M k 4- P k - PkBk_!|
4. Let K k _ ^ [ F
5. Store Kk_!
7. Let k « - k - l
8. Go to step 3.
For any given initial condition
Uk = -K k x k .
finite horizon, N, even when the system and cost matrices (A, B, Q, R) are constant,
is provided by the system with transfer function 1/s2 and Q = diag[l, 0], R = 1,
K = [k b k 2 ]:
<1
0-9
50
k—•- k
Figure 6. Example of time—varying feedback gains
State space control 143
K k with distant horizon (i.e. k « N) provided that the system and cost are time
independent. This leads to the concept of 'steady-state optimal gains' which can be
Various methods exist for solving this equation for P and hence obtaining K.
time-dependent algorithm and calculate the values of the P k matrix until the matrix
elements become constant. This will give the solution to (30) and the corresponding
steady-state value of K.
design method for obtaining constant feedback gains for multi-input, multi-output
asymptotically stable and this is ensured3 if the matrices A and Q are such that
J ^ even though x ^ * 0.
assumed that the output, y k , can be measured exactly and, once the initial state is
known, then the state at any later time is pre-ordained. However, in practice the
output can rarely be measured exactly and the identification of the system
x.k+1 = Ax k + Bu k + v k (31)
yk = Cx k + w k ,
744 State space control
where v k , w^ are 'disturbance' vectors which are only known in some statistical
sense.
control problem in which the system is described by equations (31) and the control
u k as a function of {y^-i, yk-2> •••> Hk_i, Hk-2> ...} is to be chosen to minimise the
cost function
N-1
T T T
JN = E{x^Q N x N + KxkQkXk + UkRkUk)},
k=0
Programming. It turns out that the Separation Principle applies again which means
that the design procedure can be separated into two parts; namely, the design of a
state estimator which gives the best estimates of the states from the observed
outputs, for this purpose a Kalman filter (see elsewhere) is used, and the choice of a
linear-feedback law using the estimated states. The optimal feedback controller is
the same as that used when there are no disturbances acting on the system.
BIBLIOGRAPHY
Wesley, 1980.
Kalman filters
P. Harg rave
12.1 INTRODUCTION
i=n+l
xn+1 - J L S y,.
n+1 i=l
One can express this new estimate in terms of the previous one, to obtain
the following update equation
x
n+1= f-O- 1
| J n+1
V — TT -4- 1 fv V
n+1
146 Kalman filters
=
Yn+1 "n+1 xn+1 •
= x
Yn+1 "n+1 n+1 "*" v n+1»
Rn+i = E {vn+12>
We seek the Kalman Gain Kn+1 which minimizes the variance of xn+1 (+)
which we will denote by Pn+1 (+), and hence gives rise to an updated
estimate with the minimum possible uncertainty.
Kalman filters 147
Ax n + 1 ( + ) - ( 1 - K n+1 H n + 1 ) Ax n + 1 ( - ) + K n+1 Ay B + 1 .
P n + 1 ( + ) = ( 1 - K n+1 H n + 1 ) 2 P n + 1 ( - ) + K 2 n+1 R n + 1 .
We find
H2 P (-) + R
=
3£n+1 5En Xn ' —n»
Neglecting for the moment the effects of process noise we then have
that
Ax n + 1 ( - ) A* n + 1 T ( - ) = £„ A$ n ( + ) A* n T ( + ) fnT.
x n + 1 ( + ) » x n + 1 < - ) + Kn+1 [ x n + 1 - H n + 1 i n + 1 ( - ) ] •
Here Kn+1 is the Kalman gain matrix, and £ n+1 the vector of actual
measurements. The Kalman gain is chosen to minimise the sum of the
diagonal elements of Pn+1 (+) , otherwise known as the trace of this
matrix and denoted by tr [Pn+1 (+) ] .
Hence
Assuming that the errors associated with the components of x(-) are
independent of the measurement errors associated with £, we therefore
have that
Thus
We seek the value of K such that A tr [P (+)] = 0 for all AK. That
is, the value of K such that tr [P (+)] is minimized. We thus require
K to be the solution to
E KT - H P (-) + H P (-) HT KT = 0.
Or
K = P (-) HT [R + H P (-) H T ]-\
I (+) - [I - K H] P (-)•
The smallest number of elements that can be used for the state
vector, x, of such a filter is eight. These are the three user's
position co-ordinates (x,y,z), three user's velocity components (x,y,z),
150 Kalman filters
x n+1 ( - ) = * „ x n (+)
=
^n+1 Pn+1 (") Hn+1
P n+ t ( + ) = [ I " E n + 1 H n+1 ] P n + 1 ( - )
l O O O t O O O t2 0 0
2
O l O O O t O O 0 tf 0
2
O O l O O O t O 0 0 tf
2
O O O l O O O t 0 0 0
O O O O 1 O O O
0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 1
the user's clock offset (r) and the fractional frequency bias of the
master oscillator (Af/f). The next level of complexity involves an
eleven state filter, the three additional states being the user's
acceleration components (x,y,z). The state vector may therefore be
written
x = ( x y z r x y z Af/f | x y z) T ,
where the difference between the eight and eleven state model is
indicated by the vertical line. The simplest model that can be used
for the variation of x with time is that of linear motion, together with
a constant frequency bias for the master oscillator. The transition
matrix for an elapsed time t is then as shown in Fig. 3. In this model
the user's acceleration components (jerk components in the case of the
eleven state filter) and the rate of change of frequency bias have been
neglected. Allowance may be made for these unmodelled parameters by an
appropriate choice of the process noise covariance matrix Q.
=
Xn+1 Mn+1 5n+1
we have
Y.n+1 - h (X n + 1 ) .
d[h £
12.7 ACKNOWLEDGEMENT
The author thanks the Directors of STC Technology Ltd (STL) for
permission to publish this tutorial paper. Work at STL on GPS user
equipment has been supported and sponsored by the Procurement Executive,
Ministry of Defence (Royal Aerospace Establishment).
12.8 BIBLIOGRAPHY
13.1 INTRODUCTION
The application of conventional 8 and 16 bit
microcomputers to control systems is now well established.
Such processors have general purpose (usually Von Neumann)
architectures which make them applicable to a wide range of
tasks, though not remarkably efficient in any. In control
applications such devices may pose problems such as
inadequate speed, difficulties with numerical manipulation
and relatively high cost for the completed system. This
latter being due to both the programming effort and the cost
of the peripheral hardware. These problems may be overcome
by the design of specially tailored architectures, provided
that there is a sufficient volume of production to carry the
overheads inherent in this approach. Special purpose I/O
processors and signal processors are examples of
applications where dedicated design has been successfully
applied.
Digital Signal processors are designed to implement an
algorithm of the form
13.1
2
a1mz a2mz
m=1
Single loop digital controllers may also be written in
this form although they are usually of a much lower order
than algorithms used for signal processing. The use of
digital signal processors (DSP) for controller realisation
overcomes some limitations associated with controller design
using microprocessors. Programmable DSPs use architectures
and dedicated arithmetic circuits which provide high
resolution and high speed arithmetic, making them ideally
suited for use as controllers.
Certain signal processors may have particular advantages
or limitations. For instance certain devices can only
implement non-recursive structures while some will provide
156 Digital control algorithms
•a E G C s) G(s) C
I I
1 Zero I
f- '
7\ * GC(Z) Order G(s)
Hold
I
J
V 1 + sy
- (s) K 13.2
E 1 + say Y = time constant
dv(t) de(t)
v (t) + ay Ke(t) 13.3
dt dt
de(t) e(nT) - e
n " e n-1
13.4
dt
t = nT
en - en-1
ay Ker Ky
ie
ay Y Y ay
1 + — en - K - en-1 •f-
T T T T
158 Digital control algorithms
T + y Ky ay
Vn = en K + v
n-1
T + ay 1 + ay 1 + ay
13.5
d 2 e(t)
E(s)
de(t) de(t)
d 2 e(t) dt dt
t=nT t=(n-1)T
dt2
e
n " e n-1 en-1 e n -2
e 2e e
n " n-1 + n-2
13.6
1 - z- 1
Ln(z) * 2
1 + z-1
hence
1 - z- 1
s « 13.8
1 + z- 1
ay 1 - z-1
1 + 2 -
1 + z-1
1 + 2 - 1 - 2 -
T T
E(z) K 13.10
ay ay
1 + 2 - 1 - 2 -
T T
ay ay
ie V(z) 1 + 2 - 1 - 2 -
T T
13.11
Y Y
E(z) K 1 + 2 - 1 - 2 - ,-1
T T
160 Digital control algorithms
or
ay ay
1 + 2 - 1 - 2 -
T T
Y
1 + 2 - en_-| 1 - 2 -
13.12
2y T - 2y T - 2ay
K + en_-| K -
2ay T + 2ay T + 2ay
13.13
wT
Tan
T 2 13.14
G c (s)
Figure 13.2
Digital control algorithms 161
Ideal
(z)
reconstructors
•zdo not
[GH (s) G c (s)]
exist, therefore
13.15
Table 13.1
In practice the zero order hold is used most frequently
and gives rise to the following transfer function for a
lag/lead controller, using equation 13.15 with equation 13.2
= (1 - z-1 K a 1 - a
a s + 1/ay
= (1 - z-1 ) K 1 - a
1 - exp(-T/ay)Z"
K - exp(-T/ay)] - 13.17
a - exp(- T/ay)Z"1
162 Digital control algorithms
S1, s 2 = - W n ±j W d 13.21
Wd = W n
using the usual continuous domain criteria, and then map
these into the z domain via
z = exp(sT) 13.22
(1 + sT2)
T! = 1
T2 = 2
A Digital controller D(z) is to be designed to achieve
the following specification
(i) overshoot < 30%
(ii) 5% settling time <9 seconds
(iii) Zero steady state error for step inputs
(1 - z"1) Z"1
s)
0.4
z(z- 0.6) 13.26
> 0.38
W n > 0.88
(iii) For zero steady state error to step inputs one pole
is required at z = 1
D(z) Kz)
I 13.29
(z - 1)
D(z) = K z - a
z - 1 13.30
D(z) = Z - 0.6
Z - 1 13.31
Hence K = 1.3
z - 0.6
and D ( z ) = 1.3
z - 1
T T
0.5 •-
5T 10T
Zeros of G(z)
[1 - K(z)] = (1 - Z " 1 ) 1 A(z)
Outside unit circle
A d (z)
Poles of G(Z)
ie K(z) = Z" n B(Z) 13.35
Outside unit circle
B d (Z)
(i) The Polynomials for K(z) and 1 - K(z) are of the same
order.
Thus
0.4Z- 2
1 - 0.6Z~1
Hence
K(z) = Z- 2 <b0 + b
1 z
~1 •••)
1 - ao = 0
aQ - a-j = 0
a =
1 bo
or
a =
0 1
a-| = 1
b
9 0 = 1
Thus K(z) = Z" z
(1 - 0.6 Z"1)
= 2.5
(1 - Z"1 )(1 + Z"1 )
Z (Z - 0.6)
or 2.5
(Z - 1)(Z + 1)
T1 NT
1 - exp(- T/T2)
K(z) = Z"(N + D 13.42
- exp(-
exp (-
G(s) = K 13.43
(1 + ST3)
since T-j = NT
Digital con trol algorithms 171
D(z) = 1 K(z)
G(z) 1 - K( z )
1 1 - e x p ( - T/T 2 ) - exp(~T/T3)Z" 1
- [1 - exp(- T/T 2 )] Z
ZERO A !
D(z) ORDER G 1 (s) exp(-ST-| ) OUTPUT
HOLD !
!
L _ j
MODEL A
PLANT
G* (S)
ZERO
D(z) ORDER ' (s) exp(-ST-| )
HOLD
MODEL MODEL
PLANT DELAY
G 1 (s) exp(-sT-j )
1 - e x p ( - sT)
G'(z) G'(s) 13.46
ZERO
D(z) ORDER G 1 (s) exp(-sTi)
HOLD
MODEL MODEL
PLANT DELAY
G 1 (z)
V D(z)
-(z) = 13.47
E 1 + D(z) G1(z)(1 - z~N)
13.5.1 Linearisation
13.5.2 Adaption
true derivative + 2e
T
Clearly the error term, 2e/T gets larger as the sampling
period shortens.
Rule (vi) may be illustrated by reference to the
difference equation of the low pass filter given by equation
13.50.
e1 1
— (s) = 13.50
e 1 + sy
T
i.e e'n = — e n - e'n_-| if T/y <<1
Y
T = sample time
Y = filter time constant
Clearly we have to represent the coefficients one and T/y.
Assuming that we require at least 5% precision on the
representation of the relative sizes of the coefficients
(which, of course, govern the filter action) this implies a
dynamic range in the number systems of
20 y
Hence a careless choice of T « y/100 leads to a dynamic
range of 2000 : 1 requiring 12 bit accuracy (including sign
bit) . In general rapid sampling implies high arithmetic
accuracy, and this matter has been given a more extensive
treatment in the literature than is possible in this
chapter. [4 - 10]
Digital control algorithms 177
12
12
N(z) N(z)
F(z) =
K JL n z
H bn z JT n - - )
a=1 p = zp
may be shown to be [13]
n
n
/
) Z i = TT 13.51
P1
ii
n
- Ln bits 13.52
2n+2
i1
m
E bk
K=0
13.51
1 + E a L Z" 1
1=1
k=0
REFERENCES
Plessey MS2014
Inmos A100
Intel 2920
Texas TMS 320
1 + Az-1 + Bz -2
14.1
1 - az" 1 - bz"2
DSP devices in control systems 183
I Serial
I Coefficient
I Input
B T fft£>
S I P 0 lit DflC
SIGNRL PROCESSOR
SERIAL COEFFICIENT
INPUT
2NNz-
-511KHz
TIMING
CONTROLS
fc
fs = 14.2
32N
Reset
Control logic Go
Clock
| SCR J
Error
Busy
,2 1
DOL
*~|
DOH
36 24of36
selector
r* 24
SAMPLE
GENERATOR
TIMING
and
CONTROL
INMOS A100
FAST
ADC INMOS A100 ADC
ADDER
PROGRAM STORAGE
(EPROM)
192 x 24
RUN/PROG -
SIGIN (0)
«• SIGOUT (0)
SIGIN (1) A/D
MUX CIR.
SIGIN (2) - SIGOUT (1)
&
SIGIN (3) S&H
• SIGOUT (2)
DMUX • S I G O U T (3)
&
S&H's • S I G O U T (4)
- S I G O U T (5)
CLOCK LOGIC
& • SIGOUT (6)
PROGRAM
COUNTER
I rrnTTi
• SIGOUT (7)
A' + B -> B
B - A1 -> B
A' -> B
A1 © B -> B
A1 . B -> B
lA' 1 + B -> B
lA 1 1 -> B
Number of instructions 4N
T = 4 x = — 14.3
Clock Frequency f
OUTPUT
14.5 CONCLUSION
REFERENCES
15.1 INTRODUCTION
Modern communication systems increasingly employ digital
techniques for the interchange of information. The growth in the use of
digital communications can be attributed to several factors including:-
1) The ability to provide virtually error free communication through
the use of digital data encoding techniques.
2) The increasing need for inter-computer data transfer in near real-
time.
3) Modern integrated solid state electronics technology allows for
the implementation of complex data handling systems at low cost.
4) Digital techniques allow for the handling of data from various
sources in a uniform and flexible manner.
5) The availability of wide band channels provided by geostationary
satellites, optical fibres and coaxial cables.
6) The use of computer systems as tools for communications.
7) The ability to implement powerful encryption algorithms for
security sensitive applications.
Whilst all of these factors have played an important role in the trend
towards digital communication systems, it is the economic benefits
which have provided the main justification for the development of
digital switching and signal processing. This is perhaps most apparent
in the rapid adoption of digital transmission for telephony.
Communication systems allow the transmission of information from
one point to another. The main elements of such a system are
illustrated in Figure 15.1. Information from the data source is encoded
before being transmitted over a communication channel to some
remote location were it is received, decoded and the original data
regenerated. The channel may take many forms (eg. electrical
conductors, optical fibres or radio links), however the signal is almost
certain to be subject to some degree to distortion, interference and
noise. The effect of these may be minimised through the application of
a range of appropriate techniques.
196 Digital communication systems
Data Output
source device
>
f
Communications
channel
Variable length codes suffer from the major disadvantage that it may
be difficult to regain synchronism between sender and receiver if some
of the data bits become corrupt. This occurs because the receiver
cannot determine, merely from the time, the start and end of each
character. This is a particularly difficult problem when complete
computer control of the link is envisaged.
K-l
H =-2>*k>g2/>* (15.3)
k=0
K-l
= I>*4 (15.4)
k=0
ib)
number for odd parity. This scheme is often applied to 7 bit ASCII
codes to make a total word width, with parity, of 8 bits. The addition of
this parity bit allows each received character representation to be
checked and any single bit error to be detected. This technique can
provide an error free channel provided a reverse channel is available to
allow the receiver to request the re-transmission of a data word
containing an error.
O 1 o o 1 1
r—
0
Row parity bits
1 o 1 1 1 o 1
1 0 1 o 1 1 1
1 0 1 o 1 1 1
1 1 o 1 o o o
Column »- O o 1 o o 1 1 ••—Check on column parity bits
parity
bits t_ Check on row parity bits
O I O O I
Erroneous Parity violation
O O I O O
-Parity violation
Many codes have been developed to provide EDC but they are all
derived from the simple parity checking technique. They are broadly
divided into two classes:-
C o = Io © ti © 13
C1 = Io © 12 © 13 (15.5)
C2 = II © 12 © 13
The information bits and the check bits are transmitted together
and the receiver calculates the its own version of the check bits by
means of the same algorithm from the received information bits. Each
of the received check bits are compared with those calculated at the
the receiver. If we assign them the value 0 if they are the same, and 1 if
they differ then we can readily establish the bit that was received in
error as illustrated in Table 15.1 [2].
Co Ca c2 Digit in error
0 0 0 None
0 0 1 Co
0 1 0 c
l
0 1 1 Io
1 0 0 c2
1 0 1 h
1 1 0
1 1 1 13
The usefulness of the Hamming code lies in this ability to locate the
bit in error even if it is one of the check bits. Hamming codes can only
detect single bit errors. However, there is a method which will permit
Hamming codes to correct burst errors [3]. A sequence of k
consecutive cqde words are arranged as a matrix, one code word per
row. Normally the data would be transmitted one code word at a time,
from left to right. To correct burst errors, the data are transmitted one
column at a time, starting at the leftmost column. When all k bits have
been sent the second column is sent, and so on. When the message
arrives at the receiver, the matrix is reconstructed, one column at a
time. If a burst error of length k occurs, one bit in each of the k code
words will have been affected, but the Hamming code can correct one
error per code word, so the entire block can be restored. In this
method kr check bits make blocks of km data bits immune to a single
burst error of length k or less. Where m is the number of message
bits and r is the number of check bits. This general concept can be
expanded to include much larger blocks of data and for the detection
and correction of multiple errors, (eg. Golay codes [4]).
It is clear that the choice of the most suitable form of EDC depends
upon the the performance of the communication channel over which it
is to be implemented. The ability to detect and possibly correct errors
relies on the addition of enough redundant information to overcome
the limitations of the channel. This process decreases the effective
bandwidth of the channel in exchange for the reliable transfer of
information.
204 Digital communication systems
If the sampling rate is insufficient, the replicas of X(co) will overlap and
it will not be possible to recover the original signal x(t) by filtering.
Digital communication systems 205
x(t)
pit)
y(t)
CO
Quantisation
code
I
S
55
000
Time
A = vp (15.9)
The maximum difference between the signal x(t) and its quantised
equivalent q(t) is 0.5A. Since there will usually be a random
distribution of error voltage (e), between -0.5A and 0.5A the mean
Digital communication systems 207
(15.11)
where :
a = 10.1og|42
15.4.1 Synchronisation
When the binary data stream is received some form of timing signal
is necessary to indicate when each data bit is present. This timing
signal may be derived in one of two ways:
15.4.1.1 Synchronous data transmission. In this case, a suitable clock
signal is transmitted with the data. Synchronous transmissions start
with a unique pattern of 8 or 16 bits to establish the initial
synchronisation of the clock within the receiver. Subsequently the
clock is locked to the timing information derived from the data stream
until the message is complete. Synchronous transmission is normally
confined to higher data rates where the additional complexity can be
justified.
15.4.1.2 Asynchronous data transmission. In this case, data are
segmented into groups of 7 or 8 bits, together with 2 (or 3) additional
bits to provide timing information. These are known as start and stop
208 Digital communication systems
Start Stop
bit bit
t
Bit Bit Bit Bit Bit Bit Bit Bit
1 2 3 4 5 6 7 8
0
0 1 0 1 1 0 0
15.6 REFERENCES
16.1 INTRODUCTION
7777777}
Linked 1/ Target
Object " H y Hardware^
Code
W/////S
', I n - C i r c u i t '
Emulator
Source
Code
/
Assembler
\ Object
Code V/////A
\ /
DSP Applications
t f
Solutions : Solutions :
Position
Feedback
CONSTANT
CONSTANT Constant Definitions
CONSTANT
INITIAL
Initial Conditions and
Parameter Calculations
END $"Of initial"
DERIVATIVE
Continuous Plant Model
TERMT $"the terminate conditiorf
END $"Of derivative"
DISCRETE
INTERVAL TSAMP = $"The sample period"
Discrete Controller
Control Technology Inc and MATLAB from The Math Works Inc.
CYPROS is an integrated package for control systems design and
data analysis, which incorporates data acquisition, signal processing,
statistical analysis, mathematical functions including matrix
operations and curve fitting, simulators for complex dynamic systems
and linear systems and a variety of tool boxes for advanced control
functions such as parameter estimation, extended Kalman filtering
and adaptive control.
ACET (Advanced Control Engineering Techniques) is an interactive
control engineering tool which integrates the different stages of the
design cycle including model building, control system design, system
analysis and simulation.
Model-C is a block diagram-driven modelling and simulation
program which models continuous, discrete time or mixed systems.
Hierarchical block diagram model structures are supported, and
Fortran simulation code is automatically generated. Ctrl-C is an
interactive language for classical and modern control system design
and analysis, signal processing and system identification.
MATLAB is an interactive system for control system design
applications. Features include control systems design tools, matrix
computations, one and two dimensional signal processing, system
identification and multidimensional graphics. Optional toolboxes
provide facilities for expressing systems as either transfer functions or
in state-space form, with support for both continuous and discrete
time systems.
When using any simulation language to investigate and evaluate the
performance of control systems and algorithms, it is essential to
ensure that the plant model being used is sufficiently accurate. Any
inaccuracies in the model description or invalid assumptions used in
the modelling process will be reflected in the simulated control
action. This will cause discrepancies between the simulated results
and the actual response of the real plant. To achieve the best possible
correlation between simulated results and the actual system response,
the simulation process must be performed in a number of stages.
Firstly, the continuous plant model must be verified by applying a
known stimulus or set of stimuli to both the model and the real plant
and comparing the responses. By iteratively improving the model and
repeating the tests, a stage will be reached when the match between
the simulated response and the actual response has the required
accuracy. At this stage the design of the control algorithm can
proceed, with confidence that the simulated response will be
accurately reproduced by the real plant.
Simulation of DSP devices and systems 221
Block
Diagram
Editor
r r
Automatic
Simulation Code
Generation
f f
Graphical Representation ^
Of
Function Block
Dedicated Pre-coded
Simulator Assembly Language
Function Module (TMS320C1x)
Pre-coded
Assembly Language
Module (68000)
Pre-coded
Assembly Language
Module (MCS 96)
Review of architectures
R. J. Chance
17.1 INTRODUCTION
There are several approaches to the solution, in real
time, of numerically intensive computations of the type
used in digital signal processing. The main groups of these
are shown in fig 17.1. Those in the largest group can be
described as monolithic digital microcomputers, using a
sequentially addressed stored program. This group in-
cludes most of the best known DSP devices and will be
referred to as the general purpose monolithic digital signal
processor group (GPMDSP). The operation of these devices
can broadly be compared to the familiar von Neumann all
purpose microprocessor (vNAPP) in that a program counter
addresses a series of instructions from memory, which are
decoded and executed. The GPMDSP has evolved into
several different subgroups directed, for example, at high
performance or high volume production, low cost applica-
tions. However there are other quite different approaches
to digital signal processing and in some fields, bit slice
designs, systolic arrays and array processors have become
relevant to practical circuit designers as well as research-
ers. We shall now look at all these categories in greater
detail.
FIG 17.1
promised in the near future. Not only are these times short
compared to those of the vNAPP, but the GPMDSP often
carries out several operations in one cycle which would be
performed sequentially by a vNAPP. The single cycle multi-
plier is at the heart of any GPMDSP. The delivery of data
to and from the multiplier has affected the whole architec-
ture of these devices.
Some questions that might usefully be posed by a de-
signer attempting to choose a GPMDSP are:
17.3 PARALLELISM
Speed is normally the overwhelmingly dominant require-
ment in digital signal processing work. Therefore the iden-
tification and implementation of operations that may be
performed concurrently has been a major influence on
GPMDSP design. These can be either within one chip or
through interprocessor colaboration.
17.3.1 Intra-processor parallelism
Most (but not all) GPMDSPs use multiple memory spaces,
supported by multiple address and data busses. These bus-
ses can be used to simultaneously carry an instruction and
perhaps several data words. Until quite recently, this has
meant that the memory areas to be accessed in this way had
to be on the chip, due to a restricted pinout capability. Fig
17.2 shows the memory configurations of some current and
announced processors. On-chip memory varies between 0
and 100 percent and the number of address spaces may be
one, two or three.
Most DSPs use a Harvard type of structure where the
memory used for program storage is separate from that
used for data.
Review of architectures 227
UPD77230 32 4k 32 4k
(2k) 32 4k
(512 + 512 + Ik)
TMS32010 16 4k 1 6 144
(0) (144)
TMS320C50 16 64k 16 4k
(!2k +!8k + !256) (!8k + !256 + 288)
{selectable as program or data
DSP96002 32 4G 32 4G
(IK) (512 + IK)
32 4G
(512 + IK)
CYCLE 1 2 3 4
1/4 CYCLE 1234 1234 1234 1234
A
FETCH 1
MULTIPLY 1
ACCUMULATE 1
WRITE 1
FETCH 2
MULTIPLY 2
ACCUMULATE 2
DSP32C multiply/ accumulate operation
FIG 17.4
efficiency, data has to be ordered so that pairs of operands
occupy separate data memory spaces.
17.3.2 The co-processor approach
The closely coupled interfacing of a digital signal proc-
essor to a general purpose computer is commonly practised.
By doing this, the high level language and software support
of the general purpose host machine can be
INPUT
DSP
1
DSP
2
DSP OUTPUT
3
for the DSP program memory. The host can then download
a number of different programs. There is a tendency for
most modern general purpose DSPs to support stand alone
and co-processor systems equally well. Many modern DSPs
can be connected directly to the data busses of a host
computer, without intervening buffers. An economical
multiprocessor system can often be designed where only on-
chip DSP memory need be used; the host processor can
download programs to on-chip DSP program memory. Thus
many GPMDSP devices can form co-processors to a host
computer, with very little in the way of additional hardware.
Usually, the main problem, in these co-processor systems,
is the rapid transfer of data between the host processor and
the co-processor. In the past, the sharing of a common
memory space has been the usual method. However,
processors such as the Texas Instruments TMS320C30,
Motorola 96002 and AT&T DSP32 support direct memory
access (DMA) controllers on-chip. An important point is
that this allows efficient data transfer to and from on-chip
as well as off-chip DSP memory.
17.3.3 Multiple processors
Many DSP algorithms can be distributed between several
processors, without too much intellectual effort. This may
often be done by the connection of processors in series and
using a pipeline technique to distribute the computing, as
shown in fig 17.5. Most GCMDSP devices support a serial
port, primarily for the connection of the CODEC analogue/
digital converters used in the telecommunications industry.
Such serial ports can normally be used to support quite fast
(about 5 Mbaud) serial communication between processors
with a low hardware overhead. This can often provide
sufficiently fast communication in series connected processors
to handle data such as sampled audio.
Where multiple processor systems require fast bidirectional
communication, memory to memory transfers between
processors may be used. Dual port read/ write memory can
be used with most devices. The TMS320C25 [3] has
hardware support on the chip to allow external data
memory to be accessed by more than one processor.
17.3.4 Qn-chip testing
The in circuit emulator (ICE) is the traditional method
for developing microprocessor systems. The ICE normally
plugs into the socket intended for the processor and allows
the target system program to be executed under the control
of a monitor system. In this way, the program developer can
set 'break' points to stop program execution under particular
conditions and examine the processor status. The very fast
clock speeds of modern GPMDSP devices and the large
number of pins make the construction of an ICE increasingly
difficult. The extra loading on input and output pins can
affect performance. Even at slower speeds, it is not
Review of architectures 231
MULTIPLY
7k A
f 32
r - i
.32
A
V INSTRUCTIONS
INTERFACE
RAM RAM CONTROL
T
REAL IMAGINARY
EXTERNAL
ADDRESS AND
DATA BUS
The Zoran ZR34325 vector signal processor
FIG 17.6
that occurs in signal processing. The Zoran vector signal
processors (VSP) are produced in integer (ZR34161) and
floating point (ZR34325) forms. These are designed to be
used as co-processors, specialising in signal processing
algorithms. The integer VSP has a vocabulary of only 23
instructions but they operate at a much higher level than
those of a normal assembler language. An example is the
ZR34161 instruction TFT' which can perform a fast Fourier
transform on an array of 128 complex integers. The
instruction takes 1856 clock cycles. They appear to the
user more like subroutines, but are in fact constructed
from specialised hardware and microcoded software. These
devices do not use anything approaching the maximum
possible degree of concurrency, like a true array processor.
All the same, there are several processors operating in
parallel. The architecture of the floating point processor is
234 Review of architectures
shown in fig 17.6. The dual port on-chip RAM may be used
as an array of 64 complex or 128 real words of 32 bits and
there are corresponding real and imaginary accumulators.
The separate multiplier, adder and subtracter are able to
operate not only concurrently, but on different data.
Separate processing units allow the data transfer between
internal and external memory to be carried out at the same
time as instruction execution. These devices look rather
like the SPS-41 of the nineteen seventies [13]. They both
use parallel arithmetic processors and predefined high-
level signal processing instructions. This assists in the
programming of a structure with such complex parallelism.
The latest floating point Zoran processor has an extended
set of 52 instructions, compared to the earlier integer
version. One interesting point about this is that, not only
are there are more signal processing functions but there are
nine general purpose program control instructions CALL,
PUSH, POP etc. This follows the trend in GPMDSP devices
towards increased programming versatility as well as
increased speed.
17.5 THE INMOS A1Q0 SYSTOLIC MULTIPLIER ARRAY
The transversal filter was one of the first digital signal
processing algorithms to be implemented on a stored
program computer. The algorithm is based on a series of
multiply/ accumulate operations between a series of filter
coefficients and a series of data samples. On a GPMDSP,
multiply/accumulate operations are carried out sequentially
as shown in fig 17.7. It can be seen that, because the
multipliers and multiplicands are known before the sequence
starts, they could be performed in parallel if more than one
multiplier and adder were available. The Inmos A100 is
such a system and is used here to illustrate the very high
performance that can be achieved by a much more
specialised architecture than the digital signal processors
mentioned so far.
The architecture of the A100 [14] is shown in fig 17.8.
It has an array of 32 multiplier/ accumulator stages.
Coefficients are stored in RAM. Two sets of coefficients are
stored in order that the inoperative set may be reprogrammed
and switched in synchronously. The whole device is
designed to be
A = A + X(l) * Y(l)
A = A + X(2) * Y(2)
A = A + X(3) * Y(3)
A = A + X(N) * Y(N)
Transversal filter algorithm
FIG 17.7
interfaced with a general purpose computer, which can per-
Review of architectures 235
CASCADE
INPUT
REFERENCES
[ 1 ] WE DSP32C digital signal processor, AT&T, December
1988.
[2] Motorola semiconductor technical data: DSP96002, 96
bit general purpose IEEE floating point DSP, 1989.
[3] TMS320C25 user's guide, Texas Instruments 1988.
[4] TMS320C30, The third generation of the TMS320 fam-
ily of digital signal processors rev. 2.1.0 12 Feb. '88.
[5] Texas Instruments TMS320C50 preview bulletin, 1989
[ 6 ] Chance R. J. & Jones B.S. ' A combined software/hard
ware development tool for the TMS32020 digital signal
processor', Journal of Microcomputer Applications Vol.
10, 1988, pp 179-197
[7] Chance R.J., 'Simulation of multiple digital signal proc-
essor systems', Journal of Microcomputer Applications,
Vol 11, 1988, pages 1-19
[8] Chance R.J., 'A system for the verification ofDSP
simulation by comparison with the hardware', Micro-
processors and Microsystems, Vol 12, No 9 (October
1988) pages 497-503
[9] Motorola sim56000 digital signal processors user's
manual 1986
[10] Bier J.C., 'Frigg: a simulation environment for multi-
processor DSP system development', Master's Thesis,
University of California, Berkeley, 1989
[11] Advanced Micro Devices, Am2900 family data book,
'Bipolar Microprocessor logic and interface', 1983.
[12] Texas Instruments, '32-bit chip set and bit-slice family'
product overview, 1986.
[13] Allen, J 'Computer architecture for signal processing',
IEEE Proc, 1975, 63,4,624-633
[14] The digital signal processing databook, Inmos/SGS
Thomson, 1st edition 1989.
Chapter 18
EXTERNALADDRESS
BUS (PROGRAM / DATA) SHIFT
0 - 7
^32
SHIFT
0 - 16
256 WDS
288 WDS
DATA /
DATA
PROG. ALU
RAM
RAM
EXTERNAL DATA
BUS (PROGRAM/DATA) 16
8 WD x
STACK 16
EXTERNAL
ADDRESS
^4 BUS
RX:NX:MX:
ADDRESS PROG- Y
RAM X
POINTERS DATA DATA
BOOT- ROM/
STRAP RAM MEM. MEM.
/>24
STACK
15 WDS
32 BITS >24 i-24
PORTS
X0 Y0 DMA
X1 Y1
••24 ,24 A
V
•*•
24x24 EXTERNAL
SHIFT H
HOOST
ALU 1,0,-1 DATA BUS COMPUTER
SHIFT
1,0,-1 SHIFT
56 BIT ACC A 1,0,-1
56 BIT ACC B
EXTERNAL
ADDRESS
BUS
• 24
• 32 ;,32
EXTERNAL A
DATA
BUS • 32
PORTS 32x32 IR
.32
ALU
DMA IR1
F.P. CONTROL
DATA IR2
ARITHMETIC
ARITHMETIC IR3 UNIT
UNIT IR4
32+32 PC
F.P. INSTRUCTION
PERIPHERALS, REGISTERS
COPROCESSOR,
HOST COMPUTER AO R1-R14
A1
A2
A3
R15-R19
# BIT 15 (SIGN)
1011000011110101 (-20235 represents -0.61752 Q15 in memory)
load to the accumulator with a left shift of 12 and
sign extension
# BIT 31
##### SIGN BITS
accumulator = 1111101100001111 0101000000000000
0100000000000000 (16384 represents +1.0 Q14 in memory)
add +1.0 adjust scaling with a left shift of 13
+ 1.0 (Q14) represents #### SIGN BITS
in accumulator 0000100000000000 0000000000000000
perform addition = 0000001100001111 0101000000000000
store with left shift of 3 in Q14 format =
0001100001111010 (6266 represents 0.3824 Q14 in memory)
Using the shift and sign extension for fractional/integer
arithmetic in the TMS320C25
14 15 14
0.3826 (x2 ) = -0.6175 (x2 ) + 1.0 (x2 )
FIG 18.4
i := 1;
a := k[i] * x[i];
i := i + 1;
repeat
a := a + k[i] * x[i];
x[i] := x[i + 1];
i := i + 1;
until i = nooftaps;
a = a + k[i] * x[i];
x[i] = datasample;
A FIR multiply/accumulate algorithm
FIG 18.5
18,6.1 The TMS320 FIR filter
In the TMS320, the two multiplier input operands are
normally held in a dedicated register (T) and data memory.
The method used to perform a single cycle multiply/
accumulate within this architecture is to use program
memory as one of the multiplier inputs. Fig 18.6 shows the
steps in performing a FIR filter on the TMS320. Coefficients
are in program memory and data samples in data memory.
A loop counter is used to allow a single instruction to
1 zero accumulator & product registers
2 load data pointer (highest address)
3 load coefficients pointer (lowest address)
4 repeat
accumulate previous product
load data to T register
load data to data address + 1
multiply by coefficient
decrement data pointer
increment coefficient pointer
5 accumulate final product
FIR multiply/ accumulate on the TMS320C25
FIG 18.6
be repeated in hardware. This means that the FIR multiply/
accumulate operation is carried out as a specialised
instruction on the TMS320. The 'repeat' operation means
that the instruction fetch only needs to be performed once,
when the multiply/ accumulate instruction is first encoun-
tered. The multiply and accumulation operations are
pipelined, but the instruction fetch is only executed once.
The disadvantage of this approach is that the instruction is
extremely specialised aimed at only this one function. If
variable filter coefficients are to be used e.g. for adaptive
filters, program memory must be RAM. A block of on chip
memory may be switched into either program or data
memory space, for the purpose. Note the two data memory
DSP chips—a comparison 245
accesses per cycle used to move data. This can only be done
using on-chip data memory.
18.6.2 Multiply/accumulate on the DSP5600Q
The 56000 uses two registers to hold the inputs to the
multiplier. This processor relies for its efficiency on the
pre-loading of these registers in the cycle before the data
are needed. The register load operations may be carried out
concurrently with a previous arithmetic operation such as
multiply/ accumulate. In addition, the 56000 can carry out
the multiplication and the accumulation in a single cycle,
without pipelining. Fig 18.7 shows the FIR multiply/
accumulate operation on the 56000. Note that the two data
memory areas, separate from program memory must hold
the data and coefficients; they may then be loaded
simultaneously.
1 load data pointer with address in X memory
2 load coefficient pointer with address in Y memory
3 load multiplier input registers A & B with data
& coefficient
4 repeat
multiply A by B & accumulate
load register A from X memory
increment data pointer (modulo N)
load register B from Y memory
increment coefficient pointer (modulo N)
Microcontrollers
J. D. M. Watson
19.1. INTRODUCTION
Over the last 14 years microcontrollers have provided the engineer
with an attractive means of realising a broad span of increasingly
complex and flexible instruments. Microcontrollers originated from
the concept of the microprocessor system, typified by separate
chip-level processor, memory and I/O devices. Such configurations
had been used in systems for half a decade before technological
advances and an appreciation of market requirements caused
manufacturers to embark on the development of integrated solutions.
The resulting devices combine many typical microprocessor system
needs within a single device. Figure 19.1 shows the hardware
configuration of an archetypical microcontroller. Key features include
program execution from on-chip ROM rather than external RAM and a
von Neumann architecture, i.e. shared data and instruction paths
(even though data and program memory may be partitioned).
Hardware and instruction set tend to be less 'general' in structure
than those of a microprocessor. This biases devices to particular
classes of application which require similar hardware resources; many
industrial and commercial applications share common requirements
and fall within these classes.
Developed before digital signal processors (DSPs),
microcontrollers were aimed at I/O intensive, yet algorithmically
simple, real time applications. However, the increasing functional
complexity required of many contemporary products has motivated
designers to look beyond microcontrollers and to consider using
DSPs. Unfortunately DSPs are not usually provided with sophisticated
I/O facilities (although on-chip memory is typically present) nor
appropriate instruction sets, as they were conceived to implement the
fast repetitive arithmetic sequences characteristic of discrete-time
signal processing rather than I/O aspects of real time control.
Microcontrollers 253
1 I
RxD TxD High speed I/O
Data/instruction bus
Register n
Multiple I/O lines Bus synchronisation
low cost and simple 'I squared C on-card serial bus orientated the
device towards the consumer market; Philips Components (1989).
Siemens enhanced the MCS-51 to high levels of capability with the
80537/517 devices which, complete with analogue I/O and multiple
ports can address sophisticated instrument requirements; Siemens
(1988). The Intel MCS-96, using 16-bit internal busses, and a
significant development from the 8-bit 48 and 51 families, was
designed to meet the requirements of the automotive industry,
including fast arithmetic and high resolution timing for injectors and
ignition. In satisfying these requirements it also suited itself to a
variety of demanding industrial tasks such as those associated with
velocity and position control.
Many other manufacturers produce high performance
microcontroller families; Hitachi, Motorola and Zylog are among
important contributors. The choice of Intel devices for inspection in
the following section does not imply that these are especially
significant in performance; they, in the author's opinion, represent a
historical strand in the evolution of the technology. Two examples of
recent microcontrollers with latest-generation features will be
described in a later section.
19.2.1. MCS-48
The MCS-48 family has separate data and program stores which
share a common 8-bit data bus; Intel Corp (1982).The device requires
fifteen clock periods per machine cycle. Principle hardware features
include an ALU with accumulator, a register bank, a IK byte program
store ROM, a timer/event counter and multifunction bus and parallel
I/O port controllers.
The instruction set was innovative when the device was
introduced, having, for example, logical instructions capable of
operating on ports directly and allowing branching on the state of
selected accumulator bits. By present standards it seems restricted;
the fact that all operations are based on the accumulator, and the lack
of bit addressing and restricted arithmetic capability limit the
MCS-48 family throughput.
19.2.2. MCS-51
As in MCS-48 devices, the MCS-51 uses an 8-bit architecture with
separate program and memory stores; Intel Corp (1982), Intel Corp
(1985). Design refinements permitted machine cycles of twelve clock
periods duration and clock frequencies above 12MHz, yielding
instruction times of between l\is and 4JIS. Throughput is significantly
increased over the 8048 family. The cycle time improvement is
augmented by the inclusion of significant hardware and instruction
enhancements including an expanded arithmetic capability with
subtraction, multiplication and (short) division. An ALU with
accumulator, register bank, program store ROM, two 16-bit
timer/event counters, developed interrupt controller, USART and
Microcontrollers 255
N N
Gi(s) = J
19.1
262 Microcontrollers
O(t) and I(t) are outputs and inputs at timestep t. T is the loop
sample interval.
The backward difference approach is widely used, although other
mappings such as the Bilinear or Tustin give a better approximation to
the continuous case for lower sampling frequencies. Katz (1981)
provides some useful comparisons between the discrete-time
mappings for s and shows that the backward difference method or
'mapping of differentials' has good low frequency gain fidelity, but
poor phase and high frequency gain performance, even when sampled
at 15 times the crossover frequency. The bilinear approach on the
other hand preserves both magnitude and phase performance for
sampling ratios as low as 8 times the crossover frequency.
In practice, a PID controller is considerably more complex than
described; the D term is normally rolled-off by at 3 - 10 times its zero1
corner frequency, and measures are taken to avoid 'integral wind-up
which occurs when the output variable is constrained (as it must be in
a practical system). Various non-linearities are often also built in to
allow optimum large and small signal responses. Astrom (1984) deals
with some of these refinements.
19.4.3. Digital filtering
A comprehensive treatment of this wide ranging topic is beyond
the limited space available here and other chapters will deal with it in
more detail. It is however important to recognise that filtering forms
part of many microcontroller applications. Low pass and notch
filtering is frequently applied to the time series resulting from
analogue data acquisition, and it is noted that control system
compensators are nothing more than special-purpose filters.
Two general approaches for implementation are available, the
infinite impulse response (IIR type) with rational pulse transfer
function of the form;
Microcontrollers 263
G(z) = aoz -1
1 + biz-1 + ... +bnz~n
19.3
and the finite duration impulse (FIR) response form, defined by;
G(z) = a^ ^
19.4
Filters are characterised by more than their impulse response; the
following are key properties:
• Amplitude response
• Phase response
• Group delay
• Magnitude response
Given a complete description, the designer can consider how the
filter may be realised. A variety of discretisation methods are available,
and each has strengths and weaknesses with respect to preserving
particular properties of a continuous-time prototype. Katz (1981) and
Rabiner (1975) explore these considerations in depth.
The FIR topology or 'all zeros' filter has the advantage of being able
to offer a linear phase response (constant group delay) and guaranteed
stability, but must be of a higher order than a IIR filter of equivalent
characteristics.
x(n)
x(n)
provided to reset the counter, which may have a time out period of 10
- 20 ms.
19.7. INSTRUCTION SETS
As discussed under specific example microcontroller headings,
instruction sets are orientated towards simplifying the types of
operations commonly encountered. Early devices were constrained by
architecture and silicon real-estate and this limited opcode
functionality. Current examples with complexity similar to 'top-end'
microprocessors do not suffer from these restrictions, and offer a
range of sophisticated and highly composite instructions.
19.7.1. Structure & efficiency
A key aspect of the structure of an instruction set is the uniformity
of application of each operation type across the available repertoire of
operand reference modes. These might include;
• Immediate
• Direct
• Indirect
• Indexed
It is advantageous to have the facility to perform arithmetic or
logical operations with all these types. The hardware feature of a RALU
also enhances efficiency by allowing direct - to - direct operations
without the intermediate bottleneck of an accumulator. Efficiency is
improved through the provision of composite instructions, for
example those which perform an arithmetic operation on an
indirectly referenced variable and which also increment the pointer
to that variable. In sophisticated microcontrollers like the Motorola
68332 this type of facility has been enhanced to the point of providing
an instruction which can produce a linearly interpolated value from a
look-up table.
Clearly the provision of arithmetic instructions for each of a variety
of data types (categorised by word length) can save a number of short
operand instructions and result combine operations. Multiply and
divide instructions further reduce both code lines and execution
times.
It has become increasingly fashionable to write applications code
in high-level programming languages (for example 'C'). Instruction
sets of recent microcontrollers have been designed with at least some
recognition of the needs of compiler writers.
19.8. ATTRIBUTES
Microcontroller attributes are (by design) well suited to small and
medium-scale instrumentation applications. Their strengths lie in;
Microcontrollers 269
and a parallel I/O port. Bank selection is used to map these functions
into the limited I/O space of the TMS320C1X family.
The event manager provides capture and compare subsystems
using two of the hardware timers. Time-stamping of input transitions
(programmable edge sensitivity) on four channels is available in the
former mode with time values buffered in FIFO stacks. Interrupt
sources are associated with the FIFOs to signal their loading. Compare
facilities permit the contents of compare registers to be matched
against timer values. On a match, various actions can be programmed,
including set, reset or toggling of an output pin, and the activation of
one or two interrupts. A high precision PWM mode capable of 14-bits
resolution is also provided by the event manager.
The serial port is a full-duplex USART capable of operating in
asynchronous and synchronous modes at up to 400K bits/second and
6.4M bits/second, respectively. Double buffering, address detect and
match, and a local baud rate generator are provided. Separate
interrupts are sourced for transmit and receive events.
19.10.2. Applications
19.11. CONCLUSIONS
It is likely that the future will see further convergence of digital
signal controllers and microcontrollers. Next generation DSCs are
likely to offer peripheral handling, bit manipulation and interrupt
implementation comparable to state-of-the-art microcontrollers.
Microcontrollers will continue to be available across a broad span of
capability and cost, as many applications will not need fast signal
processing capability. On the other hand, as low cost high
performance processing power becomes available in DSC-class
devices, new applications will be possible and families of industrial
and consumer products not presently envisaged will appear.
19.12. REFERENCES
Astrom, K. J., Wittenmark, B. (1984) Computer Controlled Systems.
Englewood Cliffs, N.J. 07632; Prentice-Hall International ISBN
0-13-164302-9
Hitachi (1988) H8/330 Microcontroller Overview.
Intel Corp (1976) Data Catalog 1976.
Intel Corp (1982) Microcontroller User's Manual, order number
210359-001.
Intel Corp (1985) Microcontroller Handbook, order number
210918-003.
Katz, P. (1981) Digital Control using Microprocessors. London:
Prentice-Hall International ISBN 0-13-212191-3
Microcontrollers 275
INTRODUCTION
In the past few years considerable effort has been devoted to the design of VLSI
Systems for real time digital signal processing applications. In particular, a number
of manufacturers have developed programmable DSP microprocessor chips which
incorporate facilities such as on-board multipliers [1,2]. Whilst such devices are now
enjoying widespread use, their real time capability is limited mainly to low
bandwidth applications such as speech applications and sonar. In many other (eg
image processing and radar signal processing) the computation rates demanded
significantly exceed that offered by commercial DSP microprocessors. Considerable
attention has therefore been focused on systems which utilise parallel processing
techniques. An important contribution to this field has been the development of
systolic array systems. These types of systems are well suited to a wide range of
signal processing requirements in that they are both highly parallel and highly
pipelined. They also exhibit a number of properties, such as regularity and nearest
neighbour connections, which make them attractive from a VLSI design point of
view.
Systolic arrays 277
The concept of a systolic array was first proposed by H T Kung and C E Leiserson
[3]. In their seminal paper they showed how a number of important matrix
computations such as matrix x vector multiplication, matrix x matrix multiplication
and matrix LU decomposition could be computed using arrays of inner product step
processors interconnected in the form of a regular lattice. In a typical systolic array,
all the cells are identical except for a number of "special" cells which are required on
the boundary in some applications. On each cycle of a system clock, every cell in
the array receives data from its neighbouring cells and performs a specific
processing operation on it. The resulting data is stored within the cell and then
passed on to a neighbouring cell on the next clock cycle. As a result, each item of
data is passed from cell to cell across the array in a specific direction and the term
278 Systolic arrays
Figure 1 shows the original Kung/Leiserson systolic array for banded matrix x
matrix multiplication, which serves to illustrate the operation of such a system. This
consists of a hexagonal array of processing elements whose function is as illustrated.
On each cycle of a system clock the processor takes as its inputs values a, b and c. It
then computes the inner product step function c <- c + a x b before making the
input values a and b, together with the new value of c, available on its output lines.
Figure 1: Systolic array for banded matrix x matrix multiplication (after Kung and
Leiserson [3])
Systolic arrays 279
The resulting outputs are then latched into neighbouring processors on the next
clock cycle as illustrated.
The circuit in Figure 1 operates by allowing each element in a given diagonal in the
A and B matrices to move along a line of processors from the NW to the SE and
from the NE to the SW respectively, as illustrated. As the elements in the A and B
matrices meet on processors within the array, computations are performed with the
results being propagated in the upward (or northward) direction to form the
elements in the result matrix C.
The original work on Systolic Arrays acted as a major stimulus for research on VLSI
architectures for real time matrix algebraic computations. In the past ten years or
so considerable effort has been devoted to the design and optimisation of systolic
systems for problems which range from convolution to Kalman filtering. An
overview of these developments is given in references [4-9]. The general consensus
emerging from this work seems to be that structures such as that shown in Figure 1
should be regarded as "systolic algorithms" rather than specific hardware designs.
These can then be implemented in practice by mapping these algorithms onto a
more general purpose system designed for such applications. A good example of
this is the WARP machine developed by H T Kung and his colleagues at CMU in
collaboration with General Electric [10]. This machine is implemented as a linear
array of ten systolic processing nodes with each node being implemented on a single
printed circuit board. The resulting system has a peak performance of 10 MFLOPs.
An integrated version of the WARP processing node - the IWARP chip - is
currently under development at Intel with first silicon being available later this year.
The IWARP chip incorporates a 20 MFLOP floating point processor and has
unidirectional byte wide ports. These run at 40 MHz giving a data bandwidth of 320
Mbytes per second.
280 Systolic arrays
The development of a programmable node chip also forms the basis of a major
programme at STL in the UK[11]. This chip offers 24-bit floating point arithmetic
at a processing rate of 60 MFLOPs. It also incorporates parallel I/O data ports
allowing a total I/O data rate of 90 million floating point words per second. A
major application of this chip is an adaptive digital beam forming system developed
jointly by STL and RSRE [12].
A major attraction of the original systolic array concept was that these architectures
seemed well suited for VLSI design. However, as discussed above, the basic
processing element required in the designs proposed by Kung and Leiserson [3] are
of at least single chip complexity. However, subsequent research by McCanny and
McWhirter [13] showed that many important front end digital signal processing
(DSP) chips such as correlators and digital filters could be designed as systolic
arrays in their own right if the basic processing element is defined at the single-bit
level rather than at the word level. The function of the basic processing of the inner
product step function. The element is then reduced to that of a gated full adder - the
bit level equivalent result is that many of these processors (typically several
thousand in current technology) can be integrated on a single VLSI chip, with all the
inherent advantages of regularity and nearest neighbour interconnections. These
features, along with bit level pipelining, can be used to ease design and produce
chips capable of operating at very high sample rates. These ideas have since been
applied to a wide range of important DSP functions and joint programmes
established with a number of companies to see these ideas implemented as
Systolic arrays 281
FIR FILTERING
FIR filters are probably the most common form of digital filters and have the twin
advantages of being unconditionally stable and offering a linear phase characteristic.
Mathematically an N point FIR filter can be written as:
N-l
= x
yj *=oai j-i c1)
where aQ, a^, , a^_i represent a fixed set of coefficients which define the filter's
frequency response and xj (i = 0,1,2, ) represents a sequence of input signal
values. If a| and x^ are both n-bit binary numbers then the pth bits of the output
words y; may be expressed in the form:
N-l n-l
y? = i r a j x*)_J + c a r r i e s (2)
i=0 q=0
This computation is usually carried out by performing the inner summation for each
value of i (ie performing the multiplication explicitly) and accumulating the results.
Most conventional DSP systems based on ripple-through multipliers operate in this
manner. This approach is particularly easy to understand and can be pipelined right
down to the bit level [13].
Equation (2) can be computed in other ways and an alternative is, for a fixed value
of q, to perform the summation:
The pth bit of the result y; can then be formed by carrying out the final sum over q.
This approach has been used in the design of a bit level systolic FIR filter with
multi-bit input data and coefficients [14]. The architecture of this circuit is shown in
Figure 2 and is based on a bit serial data input organisation. As illustrated, the
circuit comprises an orthogonal array of simple processing bit level processing
whose main logic function is an AND gate plus a full adder. The circuit has been
designed with the coefficient bits remaining on fixed sites, with each cell in the rth
row storing one bit of the coefficient a ^ . The circuit also incorporates
unidirectional data flow which is important since it permits the use of extra delays
when driving signals from chip to chip or bypassing a faulty row of cells if a fault
tolerant design is required.
-— 0
1 h1
y — y ' 0 (a-x)©c
c -*— y ' - c ' + y ' ( a - x ) + c ' ( a - x )
x x x
yS i o o o\ ?
The operation of the circuit in Figure 2 is as follows. Data words enter the array
through the top right-hand cell (least significant bit first) and are clocked from right
to left. Once a given bit has traversed any row it is delayed for several cycles (in this
case three) before being input to the row below. As they move through the array,
these data bits interact with the coefficient bits on each cell to form partial products
p p-q
of the form a x . Each partial product is passed to the cell below on the next clock
i j-i
clock cycle. The net result is that the sum over i of all partial products in equation
(3) is formed within a parallelogram shaped interaction region that moves down
through the array. The final result is formed by summing over q (ie accumulating all
partial products of the same significance using two extra rows of cells at the bottom
of the main array (not shown in Figure 2).
A commercial FIR filter has been designed by Marconi Electronic Devices Ltd
(MEDL)[15] which is based on the architecture shown in Figure 2. A block diagram
of the chip is shown in Figure 3. This chip has been designed in 3 micron
CMOS/SOS and consists of two separate 16-stage FIR filters with 8-bit wide
coefficients. This allows it to be configured internally to operate either as two
separate 16-stage filters with 8-bit data and coefficients (eg for use with complex
signals). Alternatively these can be cascaded internally to form one 32-stage device.
The chip can also be configured to form one 16-stage device with 16-bit coefficients.
In each mode of operation the input data length is programmable to one of four
different values and the device has been designed so that it can accept either two's
complement or unsigned magnitude data. Additional circuitry has also been
provided so that chips can be cascaded to form filters with up to 256 stages. This
chip also allows coefficients to be updated during normal circuit operation, a feature
which is useful in adaptive filtering applications. The device has been designed to
operate at clock rates of more than 20 MHz with a power consumption of 0.5 W.
284 Systolic arrays
Since the circuit operates in a bit serial manner the output rate depends on input
word length (plus word growth), a typical example being one megaword per second
for 16-bit input data.
Length
select
seect
Config _
U
Configuration Logic _Update
Select " Coeffs
Data In
Right
Right
LeU Left Right Coeff
Coed
8-bit 8-bit
Coeff by by Coeff
Register Connect Register
16 16
Left Stage Logic Stage Right Load
Load
right
left
Data 2 Array Array coeff
coell
Out
..Right
Accum - Accum ^Result
Cascade - Cascade
Cascade Adder Input
Result :
8. Leti Result
Cascade
Control
The above circuit operates in a bit serial manner. It is also possible to design FIR
filter systems with bit parallel data organisation. One way to do this is based on a
word level convolution architecture of the type shown in Figure 4. In this system
each coefficient is associated with a word level processing element with the data
words and result words being propagated in a uni-directional manner through the
array. The word level operation of this circuit depends on the values of the x words
being delayed by one cycle more than the y words as these propagate through the
array. In accordance with the cut theorem an arbitrary number of delays can be
incorporated into data lines of Figure 4 provided the same number is introduced
Systolic arrays 285
into the result lines. The processing elements in Figure 4 can therefore be replaced
by multiply/accumulate structure based on systolic multiplier circuits (such as those
described in references 13 and 16) which are pipelined at the bit level.
^7
Q
i
z
y,
- Register
An alternative approach is to employ a bit sliced approach and this forms the basis
of the architecture used in a commercial matched filter/correlator chip also
developed by MEDL [17]. In this approach a single bit of each of the N coefficient
words is stored on a single chip with bit parallel input data being propagated across
these. A full correlation/convolution operation is then performed by cascading n of
these bit slice chips together.
VECTOR QUANTISATION
the codebook. The codevector which produces the best match is chosen and its
index transmitted to the receiver. The signal is then reconstructed at the receiver
using a simple table look-up procedure. One of the problems with vector
quantisation is that the encoding process is often computationally intensive
(particularly in image coding applications) and in many instances is well beyond the
capability of programmable DSP chips.
An important aspect of many VQ systems is that it is only the index of the match
codevector that is required for transmission and not the actual distortion value nor
the codevector itself. As discussed in detail elsewhere [18], the main pattern
matching computation required for most of the common distortion measures (least
squares, weighted least squares and Itakura/Saito distortion measures) is an inner
product computation between the input data vector and the appropriate vectors in
the codebook. The index of the codevector which produces the maximum inner
product value is then chosen.
0 inner prod
array
M
E
0 QDDDD 0
kamMor kcumktor kcunuldor
The function of each systolic array is to compute the inner product of the input
vector x with each of the codevectors yj. These results are then passed through an
accumulator circuit at the bottom of the array, to a comparator. The function of this
circuit is to compare the output from a given inner product array with the result
passed from the previous comparator. If it is bigger then this result, along with its
associated index j , it is passed on. If not, the previous result along with its index is
passed on.
A bit level systolic array circuit for computing the inner product is illustrated in
Figure 6. This is similar in structure to the array in Figure 2 in that the bits of each
element in a given codevector y,- remain on fixed sites. The array differs from that
in Figure 2 in that all the elements in the data vector are input in parallel to the
array with no links between successive rows as in Figure 2. Apart from this the
basic operation of the circuit is similar. The data bits propagate across the array in
a bit serial manner with partial products of the same significance being accumulated
at the bottom of the array. The circuit shown in Figure 5 has been designed for one-
dimensional vectors (eg speech signals) but the concepts can readily be extended to
two-dimensional data blocks of the type required in image coding applications [21].
The circuit described has been designed for a VQ system in which a full codebook
search is assumed. However, quite a number of other VQ techniques have been
developed in which codebook size is significantly reduced. Good examples are
Gain/Shape Vector Quantisation (GSVQ) and methods in which the codebook is
organised in a tree structure. As is discussed in detail elsewhere[ 19-21], the inner
product array in Figure 6 can be used to construct circuits for these applications.
Whilst the structure shown in Figure 5 is well suited to high bandwidth systems the
hardware requirements may be excessive for lower bandwidth applications such as
288 Systolic arrays
d35
d25 d 3 4
d 15 d 2 4 d 3 3
,05 ,14 ,33 i32
d d d d
partial products ,04 ,13 ^22 ^31
required to d d d d
forn d=x^x
d 0 3 d 12 d 2 1 d 3 0
02 11 £0
'guard band1 of sign d d d
extended input da ta d01 d1'
speech coding. An alternative circuit can be designed which requires only a single
inner product array. This can be achieved by interchanging the data vector and
codevector bits in Figure 6 ie the data vector bits should be held on fixed sites with
successive codevector bits being propagated across these. With this system the
chosen codeword is available once the entire set of codevectors have propagated
across the data vector. The stored data vector can be then updated and the process
repeated.
Systolic arrays 289
The Discrete Cosine Transform (DCT) has widespread applications in both speech
and image coding systems. Applications include mobile radio communications,
high quality video telephone and high definition television. A number of highly
regular bit level systolic arrays for computing both the 1D-DCT and 2D-DCT and its
inverse can be derived from the Winograd algorithm[ 19,23,25]. This allows the 1D-
DCT y = (yQ, y1? ... , y N 4 ) of a ID input sequence x = (x0, x 1 ... , x N 4 ) to be
written as:
y = CDAx (4)
A bit level systolic array system for computing the 1D-DCT is illustrated
schematically in Figure 7(a). As is described in detail elsewhere [22,19] this circuit
also operates in a bit serial word parallel manner, with the elements of both the A
and C matrices being stored on fixed sites on the array [Figure 7(b)]. The results of
the first matrix x vector multiplication are input to a bank of serial/parallel
multipliers before being passed to the second matrix x vector multiplication array
(the C array), as shown.
The extension of the above ideas to the 2D-DCT follows that described by
Blahut[24]. This allows the 2D-DCT matrix block Y of a two dimensional input
290 Systolic arrays
N inputs y35
y'y;
Aarray y? y.4 y,3
NXH cells y.3 yr y J y,1
result .4 yf yj yJ
\\ \ parallelogram
M bit-serial
1 1
-serial
y,1
-serial
bit -serial
Multipliers
'guard band* of sign
extended Input data y.1 y.V
S
xf xf x^x| xj a10
1| j outputs 0^20
xf xf x^'xf xi )j/a01
Carray xl xl x xi -A/
HXN cells x| xj ^' >4^
ail fa panllelognn coefficient array
(a) (b)
Figure 7: Systolic 1D-DCT circuit
8. In this circuit the A and C matrix blocks have the same function as in Figure 7, as
have the serial/parallel multipliers.
A/A C/C
IIR FILTERING
Whilst the bit level systolic technique has been applied fairly widely to the design of
non-recursive components such as FIR filters, correlators etc, it has, until recently,
had limited application in the design of devices such as IIR filters which involve
recursive computations. Bit level systolic arrays for this type of application are
much more difficult to design because the effect of introducing M pipelined stages
into such a system is to introduce an M cycle delay into the feedback loop. In a bit
292 Systolic arrays
parallel system the input word rate is equivalent to the clock rate and so the pipeline
latency corresponds to M word delays. The potential speed of the device is
therefore reduced by a factor M.
yn = b i y n - i + un (7)
where
u
n = a0xn + a l x n - l (8)'
x n is a continuously sampled data stream and aQ, a^ and b^ are coefficients which
determine the filter frequency response. As discussed above, the non-recursive part
of this computation can be implemented using conventional bit parallel pipelined
multiplier/accumulators. The difficulty arises with the recursive part (ie the
computation of b^yn_^) and this is illustrated in Figure 9. This shows how this
computation may be implemented using a pipelined shift and add multiplier
/accumulator array in which the output is fed back to the input and in which it has
been assumed that the input data is fractional. From this circuit it will be seen that
Systolic arrays 293
it takes a total of five cycles (in general p +1 cycles, where p is the wordlength)
before the bits required, which must be fed back into the array for the next iteration
- the most significant bits - are available at the array output.
In practice, the problem with implementing the circuit shown in Figure 10 is that
one cannot use conventional binary arithmetic since this would necessitate the
294 Systolic arrays
T
Figure 10: Conceptual msb array with feedback
propagation of carries from the lsb to the msb. However this problem can be
avoided if redundant arithmetic schemes are employed which allow arithmetic to be
carried out most significant bit first. A good example of such redundant number
systems are the Signed Digit Number Representations (SDNRs) which were
originally introduced by Avizienis [31] to reduce carry propagation in parallel
numerical operations such as add, subtract, multiply and divide. A number of
systolic circuits have been designed based on both radix 4 and radix 2-SDNRs which
embody the principles illustrated in Figure 10 [28-30]. Space does not allow a
detailed description of these arrays. However, the ability to limit carry propagation
so that a computation carried out at a given level of significance only affects results
a limited number of levels of significance higher (typically between one and two
places) means that the latency within the pipelined loop can be reduced to a small
number of cycles - typically two to three cycles. Moreover, the latency and hence
the throughput rate is independent of the word length of the input data or
coefficient words.
Systolic arrays 295
A prototype bit parallel systolic IIR filter chip based on these ideas has been
designed at Queen's University Belfast. This is illustrated in Figure 11. The chip in
question operates on 12-bit parallel two's complement input data with 13-bit parallel
two's complement coefficients to produce a 15-bit two's complement output. The
design was undertaken using VLSI Technology's Design Express Integrated Circuit
Design system with the chip being fabricated in their 1.5 micron double layer metal
CMOS technology.
The internal architecture of the circuit is based on radix-2 SDNR arithmetic and
input/output interface circuitry is incorporated on board which allows conversion
between two's complement and radix-2 SDNR. As discussed in detail in reference
[32] this interface circuitry represents a small overhead in terms of chip area and
does nothing to reduce data throughput rate. The chip in question can operate up
to 15 Megasamples per second.
An important conclusion which can be drawn from this design is that typical cell
sizes required in a system employing redundant arithmetic are only about 60%
larger than a conventional bit level cell. Moreover, Knowles and McWhirter[33]
have recently shown that conventional carry save arithmetic can be used to
implement the main multiplier array with signed digit number representations only
being required along the periphery. This means that systolic IIR filters with bit
parallel input data can be implemented in the same amount of hardware as
structures built from conventional pipelined multiplier arrays.
DISCUSSION
It has only been possible in this paper to describe a few of the DSP operations which
may be implemented as bit level systolic arrays. The list of applications is numerous
and includes for example the computation of Fourier transforms, Walsh Hadamard
transforms and front-end image processing operations such as rank order filtering,
two dimensional convolution and correlation.
The devices fabricated to date provide hard evidence that the bit level systolic array
architecture has a number of important advantages for high-performance VLSI chip
design. Since the arrays are completely regular the task of designing an entire VLSI
Systolic arrays 297
chip is essentially reduced to optimizing the design and layout of one or two very
simple cells.
It is also very easy to carry out a functional validation of an array using a hardware
description language such as ELLA or VHDL, less than 100 lines of ELLA code
being sufficient, for example, to describe the convolution circuit described earlier.
As a result, the typical design time for a bit level systolic array is very much less than
that required for a random logic circuit with the same number of transistors. It is
important to realise that bit level systolic arrays are not just regular in the geometric
sense; they also exhibit complete electrical regularity by virtue of the fact that each
cell is connected only to its nearest neighbours. The corresponding absence of long
interconnects also enhances the circuit performance by reducing parasitic
capacitance (and hence RC time delays) and leads to extremely high circuit packing
densities.
Since the circuits described above have been pipelined down to the bit level, their
throughput rates depend only on the propagation delay through a single cell
(typically of order three to four gate delays). As a result, we have been able to
achieve high performance data rates at relatively low levels of power consumption.
In fact it was possible with some of the earlier systolic arrays (eg the MEDL
correlator and convolver chips) to obtain functional throughput rates of 5 x 1 0 ^ to
10 gate Hz/cm^ with a 3 micron process, a figure which exceeded the then targets
specified for phase 1 VHSIC 1.25 micron devices. Although most of the emphasis in
this and previous sections has been on pipelining circuits right down at the single bit
level, it should be noted that for some circuit technologies where the propagation
delay and physical size of a latch is comparable with that of the processing element,
it may be better to pipeline the circuits at a higher level. In such cases the basic cell
298 Systolic arrays
can be designed as the logical equivalent of a 2 x 1,2 x 2, or larger array of single bit
cells but without any internal latches. It is possible in this way to retain the VLSI
design advantages of bit level systolic arrays without compromising the cell design in
terms of the optimum area, speed, and power consumption for a given technology.
Although the systolic array approach to chip design eliminates most of the global
connections within a circuit, it has been argued that many of the problems
associated with the interconnect lines have simply been transferred to the clock.
This may be true to some extent. However, since the problem has been centralized,
the circuit designer can concentrate his attention on the problems of clock
distribution, confident that the function of the array is otherwise determined by that
of a single cell. With a more conventional architecture it is essential to minimize
clock skew over the entire chip. In a fully systolic design, where processors
communicate only with their neighbours, it is only the incremental skew between
processors that must be minimized, and this can be kept small through careful
design and by running all the clock lines in metal. In most of the chips designed to
date, the master clock is bussed from an input pad along the bottom or top of the
chip, the clock signal to each row or column being driven by its own clock buffer.
Using this type of scheme, clock skews of less than 1 ns have been achieved across
entire rows or columns of the chips described above.
In this paper attention has been focused mainly on research undertaken by the
author and his colleagues. Considerable effort has also been devoted in many other
laboratories world-wide on research on VLSI array processor architectures which
are pipelined at the bit level and suitable for high performance DSP chip design.
Quite a number of these architectures have been used as the basis of chip designs.
Further information on these designs are available from a number of sources
Systolic arrays 299
Recent research on systematic design techniques such as those based on the use of
dependence graphs[4,34-38] have shown that many specific bit level array processor
designs can be derived by applying various mappings, transformations and "cuts" to
dependence graphs. In the simple case of binary multiplication, for example, a
whole range of well known ripple-through, systolic and serial/parallel multipliers
can be derived from a single graph which illustrates the dependencies between the
various bit level computations required for this operation[34]. Similar insights are
obtained using methods such as those based on functional programming
languages[39,40]. These tools provide a simple and intuitive mechanism which
enables the VLSI chip designer to examine the relative trade-offs between levels of
pipelining, chip area, power consumption, latency and throughput for a given
application. They also provide a means whereby a non-specialist can rapidly map
his desired function onto an efficient bit level processor array.
REFERENCES
8. "Systolic Arrays", 1987 Special Issue, Computer July 1987, Computer Society of
the IEEE.
9. McWhirter, J.G. and McCanny, J.V., 1987, "Systolic and Wavefront Arrays",
Chapter 8, VLSI Technology and Design, Academic Press, eds. McCanny, J.V. and
White J.G., pp. 253-299.
10. Menzilcioglu, O., Kung, H.T. and Song, S.W., 1989, "A Highly Configurable
Architecture for Systolic Arrays of Powerful Processors" in Systolic Array Processors,
eds. McCanny, J.V., McWhirter, J.G. and Swartzlander, E.S., Prentice Hall
International, pp. 156-165.
11. Ward, C.R., Hazon, S.C. Massey D.R., Urqhart, A.J., Woodward, 1989,
"Practical Realisations of Parallel Adaptive Beamforming Systems" in Systolic Array
Processors, eds. McCanny, J.V., McWhirter, J.G. and Swartzlander, E.S., Prentice
Hall International, pp. 3-12.
12. McCanny, J.V. and McWhirter, J.G., 1987, "Some Systolic Array Developments
in the United Kingdom". IEEE Computer. July, pp. 51-63.
13. McCanny, J.V. and McWhirter, J.G., 1982, "Implementation of Signal
Processing functions using 1-Bit Systolic Arrays", Electron. Letts.. Vol. 18, no. 6, pp.
241-243.
14. McCanny, J.V., Evans, R.A. and McWhirter, J.G., 1986, "Use of Unidirectional
Dataflow in Bit Level Systolic Arrays". Electron. Letts.. Vol. 22, no. 10, pp. 540-541.
15. Boyd, K.J., 1989, "A Bit Level CMOS/SOS Convolver", in Systolic Array
Processors, eds. McCanny, J.V., McWhirter, J.G., Swartzlander, E.S., Prentice Hall
International, pp. 431-438.
16. Hoekstra, J., 1985, "Systolic Multiplier". Electron. Letts.. Vol. 20, pp. 995-996.
17. White, J.G., McCanny, J.V., McCabe, A.P.H., McWhirter, J.G. and Evans, R.A.,
1986,"A High Speed CMOS/SOS Implementation of a Bit Level Systolic Correlator"
Proc. IEEE Int. Conf. on Acoustics Speech and Signal Processing. Tokyo, pp. 1161-
1164.
18. Gray, R.M., 1984, "Vector Quantisation", IEEE ASSP Magazine. Vol. 1, pp. 4-
29.
19. Yan, M., McCanny, J.V. and Kaouri, H.A., 1988, "Systolic Array System for
Vector Quantisation using transformed Sub-band Coding", IEEE Proc. of Int. Conf.
on Systolic Arrays. San Diego, May 1988, pp. 675-685.
20. Yan, M. and McCanny, J.V., 1989, "A Bit Level Systolic Architecture for
implementing a VQ Tree Search", Journal of VLSI Signal Processing, to be
published.
21. Yan, M., 1989, "VLSI Architectures for Speech and Image Coding Application",
Ph.D. Thesis. The Queen's University of Belfast.
Systolic arrays 301
22. Ward, J.S, McCanny, J.V. and McWhirter, J.G., 1985, "Bit Level Systolic Array
Implementation of the Winograd Fourier transform Algorithm", IEE Proc. Pt. F,
Vol. 132, pp. 473-479.
23. Ward, J.S. and Stannier, B.S., 1983, "Fast Discrete Cosine transform for Systolic
Arrays". Electron. Letts.. Vol. 19, pp. 58-60.
24. Blahut, R.E., 1985, Fast Algorithms for Digital Signal Processing, Addison Wesley.
25. Yan, M. and McCanny, J.V., 1989, "VLSI Architectures for Computing the 2D-
DCT" in Systolic Array Processors, eds. McCanny, J.V., McWhirter, J.G. and
Swartzlander, E.S. Prentice Hall International, pp. 411-420.
26. Parhi, K.K. and Messerschmitt, D.G., 1987, "Concurrent Cellular VLSI
Adaptive Filter Architectures", IEEE Trans, on Circuits and Systems. Vol. CAS-34,
no. 10, pp. 1141-1151.
27. Parhi, K.K. and Messerschmitt, D.G., 1988, "Pipelined VLSI Recursive Filter
Architectures using Scattered Look-Ahead and Decomposition", Proc. IEEE Int.
Conf. on Acoustics. Speech and Signal Processing. New York, pp. 2120-2123.
28. Woods, R.F, Knowles, S.C, McCanny, J.V. and McWhirter, J.G., 1988, "Systolic
IIR filters with Bit Level Pipelining", Proc. Int. Conf. on Acoustics. Speech and
Signal Processing. New York, pp. 2120-2123.
29. Knowles, S.C, Woods, R.F., McWhirter, J.G. and McCanny, J.V, 1989, "Bit
Level Systolic Architectures for High Performance IIR filtering", Journal of VLSI
Signal Processing. Vol. 1, no. 1, pp. 1-16.
30. Woods, R.F, McCanny, J.V, Knowles, S.C. and McWhirter, J.G, "Systolic
Building Blocks for High Performance Recursive Filtering", Proc. IEEE Int. Conf.
on Circuits and Systems. Helsinki, June 1988, pp. 2761-2764.
31. Avizienis, A, 1961, "Signed Digit Number Representations for Fast Parallel
Arithmetic". IRE Trans, on Electronic Computers. Vol. EC-10, pp. 389-400.
32. McCanny, J.V, Woods, R.F. and Knowles, S.C, 1989, "The Design of a High
Performance IIR filter chip", Systolic Array Processors, eds. McCanny, J.V,
McWhirter, J.G. and Swartzlander, E.S, Prentice Hall International, pp. 535-544.
33. Knowles, S.C. and McWhirter, J.G, 1989, "An Improved Parallel Bit Level
Systolic Architecture for IIR filtering", Systolic Array Processors, eds. McCanny, J.V,
McWhirter, J.G. and Swartzlander, E.S, Prentice Hall International, pp. 205-214.
34. McCanny, J.V, McWhirter, J.G. and Kung, S.Y, 1989,"The use of Data
Dependence Graphs in the Design of Bit Level Systolic Arrays", IEEE Trans, on
Acoustics. Speech and Signal Processing, to be published.
35. Quinton, P , 1984, "Automatic Synthesis of Systolic Arrays from Uniform
Recurrent Equations", IEEE 11th Int. Symp. on Computer Architectures, pp. 208-
214.
36. Moldovan, D. and Forbes, J.A.B, 1986, "Partitioning and Mapping Algorithms
into Fixed Size Systolic Arrays". IEEE Trans, on Computers C-35(l), pp. 1-12.
302 Systolic arrays
37. Rao, S.K., 1985, "Regular Iterative Algorithms and their implementation on
Processor Arrays". Ph.D. Thesis. Stanford University, USA.
38. Van Dongen, V., 1989, "Quasi-regular Arrays: Definition and Design
Methodology", Systolic Array Processors, eds. McCanny, J.V., McWhirter, J.G. and
Swartzlander, E.S., Prentice Hall International, pp. 126-135.
39. Sheeran, M., 1985, "Designing Regular Array Architectures using Higher Order
Functions" in Functional Programming Languages and its Applications, ed.
Jovannaud, J.P., Springer-Verlag.
40. Luk, W., Jones, G. and Sheeran, M., 1989, "Computer-based tools for Regular
Array Design", Systolic Array Processors, eds. McCanny, J.V., McWhirter, J.G. and
Swartzlander, E.S., Prentice Hall International.
Chapter 21
21.1 BACKGROUND
* Accurate recognition
* Speaker independance
* 16 word vocabulary
* Robust operation
* Low cost
* Accurate recognition
* Speaker independance
* 16 word vocabulary
* Robust operation
* Moderate cost
WORD
MODELS
N
8 ) 2
" i
p(b|9) = (4)
2
2a.
N
v-> (b, - 9 . ) 2
In p(b|9) = - ) —i — (5)
^ 2a.2
Where
21.4.5 Optimisation
MICROVAX II 1200
DSP32 625
POINTERS IN FILTER BANK 413
POINTERS IN PATTERN MATCHING 288
FBA IN ASSEMBLER 113
PATTERN MATCHING IN ASSEMBLER 52
21.5 CONCLUSIONS
An example has been described of the coding of a speech
recognition algorithm onto a floating point digital signal
processor by means of a high level language compiler. It
has been shown that this can considerably reduce the
engineering effort required, compared with that for a fixed
point DSP using assembly language. It must be concluded
however, that it is necessary to use assembly coding for
computationally intensive routines if real-time performance
is to be acheived.
When devices are used in production volumes, the extra
cost of floating point over fixed point devices will not be
312 Throughput, speed and cost
22.1 INTRODUCTION
High Frequency Jet Ventilation (HFJV) is a recognised form of mechanical ventilatory
support that is used in both anaesthesia and critical care medicine. The technique differs
from conventional modes of ventilatory support in both its relative tidal volume and
respiratory rate. Several studies have shown that HFJV is capable of maintaining
adequate gas exchange in cases where conventional methods have either failed or
proved to be impractical. The main advantages of HFJV include lower peak and mean
airway pressures (Heijman et al., 1972), a reduction in pulmonary barotrauma (Keszler
et al., 1982) and less disturbance to cardiovascular function (Sjostrand, 1980). A more
detailed consideration of the subject may be found in several recent reviews (Smith,
1982; Drazen et al., 1984; Kolton, 1984).
General acceptance of HFJV has been inhibited by lack of understanding of the
mechanism by which HFJV maintains gas exchange, and also by a lack of practical
guidelines for clinicians on when to apply HFJV and what ventilator parameters to set
for optimum gas exchange. One approach to the better understanding of gas exchange
during HFJV treats the patient's respiratory system as an acoustic resonator whose
characteristics vary markedly over the range of HFJV frequencies (Lin and Smith, 1987).
Preliminary results based on an animal model have supported the hypothesis that the
respiratory system behaves as an acoustic resonator (Smith and Lin, 1989).
This chapter presents a case study of computer-aided instrumentation for on-line
measurement and control in the area of critical care medicine. The system has been
initially developed to test the hypothesis that acoustic resonance of the respiratory
airways represents an optimal state for alveolar gas exchange. Real-time signal
processing algorithms have been implemented around a user-friendly shell to aid in the
identification of transfer function relationships between respiratory data. This analysis is
a first step towards the development of a self-tuning algorithm for automatic control and
management of ventilator parameters to optimise gas exchange in patients. The
instrumentation can be used for extremely high resolution identification of respiratory
dynamics in a fraction of the time taken by previous workers with only minimal changes
to existing jet ventilation procedures.
314 On-line monitoring and control
PCV*
1
Proportional
Controller Valve
JDV
Jet Drive
Air/Q2 Input Cutoff D r ive Regulator
40-75 PSI V^ve 2.60 PSI
Accumulator
Gas from a high pressure (40-75 PSI) source is delivered into the ventilator via the inlet
cutoff valve (CV) and drive pressure regulator (DR) to an accumulator (ACC). An
electronic time-base controller energises the jet drive valve (JDV) so that gas flows
from the ACC to the jet line and patient periodically depending on the ventilation rate
(variable over the range 40-200 BPM) and inspiration time control. The
inspiratoryrexpiratory (I:E) ratio can be varied over the range 10-60 % of the respiratory
cycle. The inspiratory flow is determined by the drive pressure regulator and the
respiratory airways impedance of the ventilator/patient system.
On-line monitoring and control 315
A proportional controller valve (PCV), connected in parallel to the jet drive valve (JDV),
introduces small low amplitude pressure oscillations which is hydraulically summed with
the binary output of the jet ventilator. The PCV controller accepts an analogue signal
over the range 0-10 V and produces a continuous valve output which tracks the
reference input. A white-noise circuit (Horowitz and Hill, 1983) is used to drive the PCV
with a pseudo-random analogue noise signal with bandwidth selectable over a range of
frequencies. This allows for white-noise testing of respiratory airways.
The jet ventilator can operate in either a) binary, b) proportional, or c) binary plus
proportional mode. The various ventilator modes provide maximum flexibility, enabling
respiratory dynamics testing for different stimuli.
and x(t) is the stimulus and y(t) is the response, the identification task consists of
estimating the system functional H.
A white-noise stimulus with a flat power spectrum over the frequency range of interest
is equivalent to applying a range of sinusoids simultaneously. This technique inherently
provides a method for rapidly determining respiratory dynamics over an expanded
frequency range with a frequency resolution greater than is practicable with other
techniques. This is far superior to using swept sine-wave stimuli, where representation
is over a limited and pre-determined region of the function space.
To use the white-noise approach the ventilator must be operated in either mode b or c.
Mode b is extremely useful for:
• determining the frequency response characteristics of respiratory pressure, flow
and displacement transducers
• performing respiratory impedance measurements on patients; significantly sim-
plifying existing methods of measurement.
Mode c is intended for routine clinical use since it enables on-line identification of
respiratory dynamics during normal high frequency jet ventilation. Once preliminary
decisions are made, the white-noise method can be used for characterising the
respiratory airways over a short period of time.
Finally, the instrumentation is intended for use by nonexperts. Hence the man-machine
interface should be an environment which presents the functional power of the system in
a clear and concise manner. A friendly menu-driven system based on the type of system
architecture described by Hailstone et al. (1986) is desirable.
is processing data currently stored in blocks B r B 2 -B3-B 4 (Fig. 22.3a). In this case, the
age of the data decreases from Bj to B 4 . During the next data acquisition cycle (Fig.
22.3b), samples will be stored into Bj whilst blocks B2-B3-B4-B0 are processed.
Bo Bj B2 B3 B4
(a)
Data
acquisition area
(b) |
The DSP timing for these operations are critical. For example, the DSP must have
performed all processing on the current set of data and transferred the results to the host
before the next frame of 256 samples is transferred to the main calculation buffer. With
this configuration, data acquisition memory is optimised to be 256 samples per channel.
The overall bandwidth of this data acquisition scheme can be calculated from the
following timing relationship:
t
DSP + tXFR + t PC < t 256 ( 22 - 2 )
where:
• t DS p= time taken to perform signal processing routines ~ 60 ms,
• t X F R = time taken to transfer data over to the PC ~ 3 ms,
The maximum signal bandwidth that can be captured is approximately 320 Hz. The main
limitation is due to t p c which is imposed by the graphics system. For example, if a
graphics coprocessor were used for handling display information ( t p c would be
negligible) then the overall bandwidth would be fmax ~ 5 kHz per channel.
The requirement was initially specified for real-time operation up to a spectral bandwidth
of 50 Hz. This can be easily achieved with the existing system. The bandwidth is user
selectable over the range 1-50 Hz, providing spectral resolutions of the order 0.001-
0.050 Hz. This includes plotting a signal onto the screen at intervals determined by the
sampling rate.
n m
where w £ = e - J W * , a = N /2 " , k' = k - | 1 < m < n and X m i represents the
decimated samples after stage m of the FFT algorithm 0 < i < (N/2 m )-l. The DIT
redundancy reduction is achieved by dividing the input sequence into odd and even
sample sequences. Thus, at stage-1, an N-point sequence is transformed by combining
the DFTs of these two N/2-point sequences. The index i represents the number of
On-line monitoring and control 321
(22.5)
X
l,i*) = X0,2i(k') - W I X0,2i+l(k') J k = 1, k'= 0.
X2i(k)= X 1 2 O ^ 1 (22.6)
X2i(k)=Xli2i(k')-W£xui+1(k') 2<k<3,k'=k-2.
A A+B
\
B
C \ T ^ - ^ A-B
D \C+jD
Thus the first two stages of an N-point FFT can be performed simultaneously with a
special radix-4 butterfly which avoids multiplication operations to enhance execution
speed (Fig. 22.2). For N=1024 the first radix-4 butterfly is given by substituting i=0 in
equations (22.5) and (22.6):
X
2,o(°) = I
(22.7)
X20(2) = [Xao(O) + X 0J (0)] - [Xa2(0) + X03(0)]
X2>0(3) = [X00(0) - X0>1(0)] + j[X02(0) - XO3(0)]
256-point FFT
kernel with 128
radix-2
butterflies.
256
special
radix-4
butter-
256-point FFT
flies kernel with 128
radix-2
butterflies.
X 0 > 7 6 7 (0).
256-point FFT
kernel with 128
radix-2
butterflies.
•W SJ
Fig. 22.4 1024-point decimation-in-time radix-2 FFT kernel.
Fig. 22.4 shows the flow diagram for a 1024-point FFT based on an in-place decimation-
in-time algorithm. The computation loop between stages 3-8 are processed using a
smaller 256-point complex FFT which is executed utilising the on-chip memory of the
DSP. This scheme provides marked increase in speed of performance since on-chip
memory data accesses are single cycle instructions. Data is transferred between
external and internal memory at high speed via the pipeline capability of the TMS320
device.
The time required to compute a 1024-real point FFT and its squared magnitude terms
using the developed algorithm is 8 ms. This is a four-fold increase in performance
compared with the in-line FFT algorithm reported by Papamichalis and So (1986), and
an eight-fold increase over that of Burrus and Parks (1985). The latter workers have
implemented a similar algorithm in Fortran on a PDP 11 and achieved a timing
performance of 1400 ms. A more general discussion of the FFT may be found in Chapter
6 of this book.
On-line monitoring and control 323
where j = -1, X r and XA are the real and imaginary parts of S x at frequency co = 2rcf, and
contain magnitude and phase information of this frequency component of x(t).
Consider a single-input x(t), single-output y(t) linear system h(t), the transfer function
H(jco) is:
H(jco) = S y / S x (22.9)
If x(t) is random white noise, H(jco) can be found at all frequencies in x(t). However, due
to synchronisation problems and statistical considerations related to the randomness of
the input x(t), the computation of H(jco) by this method is not practical. These problems
can be overcome by using the concept of power spectra.
The input power spectrum is:
Gxx = S x S; = X? + X? (22.10)
G yy = S y S; = Y r 2 + Y i 2 (22.11)
= G yx /G xx (22.13)
The coherence function can also be derived from these calculations to provide a measure
of the extent to which the response is due to the stimulus and not to extraneous sources
of noise. The coherence function y2 is given by:
324 On-line monitoring and control
1 (22.16)
j " "•'- JJ
The coherence y 2 is a real term which, for any frequency co give the fraction of power at
the response that is due to the input. Whenever y 2 is less than unity, either the system
is non-linear or the signals do not have a causal relationship, or both. The deviation of
y 2 from unity is a quantitative measure of these conditions. Since physiological systems
are both nonlinear and have high noise content, the coherence function should provide a
valuable measure of confidence in spectral estimates.
The phase angle 0 of a complex number of the form z = (x + jy) can be calculated using
(y\
tan(0) =1 - .Using the Q 1 5 number notation defined in Rabiner and Gold (1975), we can
represent -180° < 0 < +180° as an integer in the range-32767<0 < 32767.
The Argand diagram of Fig. 22.5 shows a method for computing the phase 0 of the vector
x + jy depending on the quadrant it occupies. Hence, by defining 0° < 0 < +89° we can
calculate 0 for any value in the -180° < 0 < +180°. Using a look-up table of 2048 0-
values over the range 0° < 0 < +89°, we can achieve a maximum phase resolution of
0.044°.
Since 0 < tan 0 < 57.29 over the range 0° < 0 < +89° and 0 < n < 2047.
On-line monitoring and control 325
3
en= ( ^ ) tan-l [I) = 182.04 tan"* ( ^ (22 . 18)
en
y
Hence, the correct phase angle for a given ratio of - is found by calculating the index
and convergent provided certain criterion are satisfied (Astrom and Wittenmark, 1989).
e(t)
P;
The mathematical description of each model is in terms of the input pressure P| and P^
(derived from the ventilator) and the output alveolar pressure P a and P^. The state
variables must be indicative of respiratory dynamics since they form the basis of the
control law which drives the ventilator.
The computation power of the DSP can be conveniently exploited to implement a model-
reference adaptive control scheme in real-time. The mathematical routines required for
this purpose are well-suited to DSP architecture which is optimised to implement digital
filters. Where state estimation is necessary, subroutines may be written to perform
matrix manipulations.
22.7 CONCLUSIONS
The use of HFJV as a form of respiratory support is known to have certain advantages
over conventional methods of ventilation. However, a general acceptance of the
technique has been inhibited by lack of understanding of the underlying fluid dynamics
and a lack of practical guidelines for clinicians on when to apply HFJV and the ventilator
settings that should be used for optimal gas exchange.
A computer orientated approach offers an attractive solution to these problems, as it can
cope with the volume of information that needs to be considered in such an application,
and with the level of complexity that the HFJV technique entails. A real-time
instrumentation system has been developed based around a PC and DSP environment.
The white-noise method has been incorporated into the system to allow for a systematic
procedure of characterising a patient's respiratory system. The method is simple to
apply and gives good results over a short period of time.
A ventilator control system based around a model-reference adaptive scheme is
currently undergoing simulation tests. Based on the results of the simulation study and
the identification data collected from patients receiving jet ventilation as part of their
therapeutic regimen, a control scheme will be embedded into the existing
On-line monitoring and control 327
22.8 REFERENCES
Astrom, K J . and Wittenmark, B. (1989): Adaptive Control, Addison-Wesley Publishing
Company, Reading, MA.
Bendat, J.S. & Piersol, A.G. (1971): Random data: analysis and measurement
procedures, John Wiley & Sons, NY.
Burrus, C.S. and Parks, T.W. (1985): DFTIFFT and convolution algorithms, John Wiley
& Sons, NY.
Drazen, J.M., Kamm, R.D., and Slutsky, A.S. (1984): "High frequency ventilation".
PhysioL Rev., v64, 505-.
Hailstone, J.G., Jones, N.B., Parekh, A., Sehmi, A.S., Watson, J.D. and Kabay, S. (1986):
"Smart instrument for flexible digital signal processing". Med. & Biol. Eng. &
Comput., v24, 301-304.
Heijman, K., Heijman, L., Jonzon, A., Sedin, G., Sjostrand, U., and Widman, B. (1972):
"High frequency positive pressure ventilation during anaesthesia and routine surgery
in man". Ada Anaesthesiologica Scandinavica, vl6, 176-187.
Horowitz, P. and Hill, W. (1983): The art of electronics, Cambridge University Press,
444-445.
Kabay, S., Jones, N.B. and Smith, G. (1989): "A system for real-time measurement and
control in high frequency jet ventilation". IFAC-BME Decision Support for Patient
Management: Measurement, Modelling and Control, England, 151-160.
Kolton, M. (1984): "A review of high frequency oscillation". Can. Anaesth. Soc. J., v31,
416-.
Lin, E.S. and Smith, B.E. (1987): "An acoustic model of the patient undergoing
ventilation". Br. J. Anaesth., v59, 256-264.
Papamichalis, P. and So, J. (1986): "Implementation of fast Fourier transform algorithms
with the TMS32020". Digital Signal Processing Applications with the TMS320
Family, Texas Instruments, 92-.
Rabiner, L.R., and Gold, B. (1975): Theory and application of digital signal processing,
Prentice-Hall, Englewood Cliffs, 386-388.
Sjostrand, U.H. (1980): "High frequency positive pressure ventilation (HFPPV)".
Critical Care Medicine, v8, 345-364.
Smith, B.E. (1985): "The Penlon Bromsgrove high frequency jet ventilator for adult and
paediatric use". Anaesthesia, v40, 700-796.
Smith, B.E. and Lin, E.S. (1989): "Resonance in the mechanical response of the
respiratory system to HFJV". Acta Anaesthesiologica Scandinavica, v33, 65-69.
Smith, R.B. (1982): "Ventilation at high frequencies". Anaesthesia, v37, 1011-.
Chapter 23
23.1 INTRODUCTION
30n
Ionosphere
Under these conditions, many reflection points may exist for each mode
(see Figure 3).
Since the motion of the ionosphere is not necessarily well correlated
between the various reflection points, the spectrum of the received signal
will usually contain several components. Typically, over mid-latitude
paths, the frequency spreads are of the order of 0.1 Hz, but over high-
latitude paths may be of the order of 10 Hz or more (eg. see Figure 4).
Ionosphere
1.01
£ 0.6 i
73
O 0.4-
0.2-
0.0
-12 -9 -6 -3 3 6 9 12
Frequency (Hz)
Frequency
dispersion
Amplitude
3 Time (ms)
Mode 1
Mode 2
Bit
overlap
periods
-a 3
a) Signal frequency.
b) Signal phase.
c) Ease of synchronisation.
d) Noise a n d / o r interference level.
e) Channel impulse response.
f) Received signal-to-noise or signal-to-interference ratio.
Digital communication systems 335
g) Baseband spectrum.
h) Received data error rate.
i) Telegraph distortion.
j) Rate of repeat requests in an ARQ system.
The RTCE techniques fall into three categories which are considered
separately in the following sections.
23.3.1 Passive monitoring
|
-80
'8
-140
3.31 3.36
Freq. (MHz)
0 10 20 0 10 20
Time (ms) Time (ms)
0 10 20
Time (ms)
correcting codes are transmitted over the link, then a measure of signal
quality could be based on the rate at which errors are detected.
lAn
QF = "^ (23.1)
An + A
<u 70% n
!
i !:•;-.•.:
§ 50%-
s •'i! ! i : !
Ii
30%-
10%"
0 0.5
Quality factor
Figure JO. Measured FSK quality factor and bit error rate for a
transmission from Clyde River, Canadian NWT to Leicester, UK.
general trend, this does not closely match the measured value obtained
from the channel assessment process.
70-i
I 6o^ 60-
c 50- 50-
S 40" 40-
30- 30-
O
20 20-
10-1 10
0 12 24 0 12 24
Time (UT) Time (UT)
In some systems, the channel assessment includes not only the signal
strength and bit error rate (as described above), but the spectral content
of the noise and interference on the channel. This information can allow
the system to adapt the modulation to make best use of the channel. One
such example is described by Hague, Jowett and Darnell [8] in which the
340 Digital communication systems
•8
a ms ms
Frequency
85 Hz
2.5 kHz
Figure 12. A popular dual diversity signal format for low speed
data transmission. The mark and space tones are indicated as
m and s respectively.
ms ms ms ms ms ms
Frequency
85 Hz 1 kHz
For example, for a path length of 1000 km, the differences in arrival
times between the signals for each mode are typically 1 ms. To overcome
the effects of multi-path propagation in this case, the carrier signal must,
therefore, change frequency at least every 1 ms.
For the example given above, if the frequency is swept at 100 kHz s"\
the signals corresponding to each mode will be separated in frequency by
100 Hz.
23.7 REFERENCES
1. K. Davies. Ionospheric Radio Propagation. Dover Publications Inc.,
New York, 1966.
2. M.P.M. Hall and L.W. Barclay (editors). Radiowave Propagation.
Peter Peregrinus Ltd., London, 1989.
3. C. Goutelard, J. Caratori and A. Nehme. Optimisation of H.F. digital
radio systems at high latitudes. AGARD conference proceedings
number 382, 1985,3.5.1-3.5.18.
24.1 INTRODUCTION
The purpose of this case study is to present a development of the
step-invariant approach to highpass digital filter design [1] - a topic
only briefly reported in the literature [2-5]. It will be shown that the
transfer function,G(z) , obtained via the step invariant design
sif
method, may be made to approximate closely to the transfer
function, G(z) obtained via the bilinear z-transform method
(Chapter 9). The analysis presented justifies this step-invariant
design method, and shows that it is possible to convert a time
domain (step-invariant) filter, G(z) , to one that satisfies a
si
frequency domain specification,G(z) . This is achieved by
sif
observing certain conditions and by employing a suitable gain term.
The validity of the method is demonstrated using a practical example
of a simple highpass filter and a digital phase-advance network.
It was shown in Chapter 9 that the impulse-invariant design
method for bandlimited filters, is based on the application of
standard z-transforms, whereby an analogue filter transfer
function, G(s), is transformed to an equivalent digital filter transfer
function,G(z), that is
m k. m k.
For the analogue filter, the impulse response, g(t), is defined as.
L [G(s)] Similarly, for a digital filter, the impulse response, 9k, is
defined as Z [G(z)]. To be impulse-invariant 9^ = g(t) for t - 0, T,
27, ..., where T is the sampling period. Furthermore, the frequency
response of the digital filter,G(exp(y coT)), will approximate to the
346 Digital filters—a case study
1 (z-1)
G(z)
bl (24.1)
)J
z+
+1
n^_ S Y(s)
Now consider u^s) ~ Js~T~a) = ~X{s) ^ where V(s) is the Laplace
transform of the filter response and X(s) is the Laplace transform of
the filter input signal. For the step-invariant design method the step
input signal is assumed to have an amplitude of A for t = > 0, i.e.
Digital filters—a case study 347
X(s)= AI s ? therefore
(24
-2)
[ Y ( s ) ] = Aexp(- a t ) (24.3)
A z
Y{Z) =
(z-exp(-aD) (24.4)
<24 7>
-
_3
sampling period, 7 = 1 0 s
then equation (24.7) yields the corresponding value of ®a, which is
co =650 r ad/s, which is the value of « in equation (24.1).
_ Q
Therefore ccT I 2 = 650 x 1 0 / 2 = 0. 325, and thus
(z-1) = 0.755[z-0.5iJ
1 + 0.325 (0.325-1)
z + 0.325+1 J
G(exp(y a , 7 - ) ) w = 0 . 7 5 5 [ ^ j ^ l 1 ] (24 . 8)
—3
Similarly, for equation (24.5) a 7 = 2TT x 1 00 x 1 0 =0.628 and
the corresponding frequency response is
^ ^ exp(y coT) 1
G(exp (/ a > 7 - ) ) s . % X p ( ; V j 7 - ) . J 0 , 5 3 4 (24.9)
3
0)7 = a) 7 = 2 ; r x 1 0 0 x 1 0 ~ = 0.628
c
exp(co T\ = c o s (o T + j s i n < y 7 = O . 8 O 9 + / ' O . 5 8 8
v c ) c c
Digital filters—a case study 349
p (exp(jcoT))
bl 0.66
0.809 + 70.588-1
, then
(24.11)
f
-exp(- a coa a
V 2 ) .
( pT V (24.12)
l 1 + 2 J, V 2 )
(24.14)
(c) a pole at z = ( i - i L - j / (i
(24.15)
Digital filters—a case study 351
A( a ry\
a_ P
Y(Z) +
~ -1) z-exp(-
G(z) = ^ (24.16)
(24.17)
a I + +
- 2 6 —
(c) a pole at z =exp(- pT)
(24 18)
2 2 3 3 ^ -
G(z) =•
z-exp(-/JT) (24.19)
sif (1+ .
£ 2 7 ~ 2 = 9 4 . 9 2 x 1 0 ~ 6 =0.009 (which is «
L. 26.9 x 1 C ~ 3 " ^ •
G{z) 94.9
Slf
(1+94.9x10- 3 z - exp(-94.9x 10 3
)
= 0.9675 [z-0.97431
LZ-0.9095J
Digital filters—a case study 353
24.4 REFERENCES
25.1 INTRODUCTION
This case study illustrates the now classical Linear
Predictive Coding (LPC) method of speech compression used to
reduce the bit rate in digital speech transmission systems.
The chapter begins with a non-rigerous introduction to
the theory of linear predictive coding. This is followed by
an explanation of the real-time algorithms used to calculate
the parameters required to synthesise individual pitches of
voiced speech. The study ends with a description of how
these algorithms are implemented on the TMS32010, the
problems incurred and results obtained.
Toll quality telephonic speech covers the frequency range
300Hz to 3.4kHz which for an 8-bit pulse code modulation
system sampled at 8kHz produces a bit rate of 64-
kbits/second. To economise on bandwidth and hence increase
channel capacity it is desirable to reduce this bit rate to
a lower figure, this is possible because of the significant
redundancy present in the English Language.
Linear predictive coding is one method used to exploit
the redundancy in speech by assuming that the speech
waveform can be modelled as the response to a linear filter
as shown in figure 25.1. This system works by separating
the speech waveform into two broad categories, these are
'voiced' and 'unvoiced' sounds. An 'unvoiced' sound is
produced by the fast passage of air through teeth and lips
such as the "ss" in speech, to reproduce these sounds the
model uses a noise source as the input to the filter. A
'voiced' sound is produced when periodic pulses from the
glottis excite resonances in the vocal tract and head
cavities to produce vowel sounds such as the 'ee' in speech.
A typical section of voiced speech is shown in figure 25.2,
Speech processing 355
pitch
period
periodic
pulses
gain, G
random
noise
P
E
S(n) = a k .S( n _ k )
k=1
input error
signal signal
M 2
En = 2 e(n)
n=0
When the a-parameters are calculated using this method
they are up-dated after every sample value until after M
samples they reach their final form which are the
coefficients of an inverse filter. What has been
manufactured is a whitening filter such that if the original
pitch were passed through it a small error signal with a
flat frequency spectrum appears at its output.
output
signal
predicted
signal, s(n)
M P 2
E
n = [ S ( n ) - E ai.S( n _i)]
n=0 i1
dE n = 0, k = 1,2, p
da k
This gives p simultaneous equations with p unknowns which
after expansion becomes
M P
E 2 [S( n ) - E ai.S( n _i)] [-S(n_k)] = 0 , k = 1,2, ...p
n=0 i=1
M P M
E
S( n ) .S( n _ k ) = E a-j. E S( n _ k ).S( n _i), k = 1,2, p
n=0 i=1 n=0
M
2
S(n)-S(n-k) = R (k)
n=0
M
and E S( n _ k ) .S( n _i) = R(i_k)
n=0
P
R(k) = E ai.R(i_ k ) , k = 1,2 p
i 1
or in matrix form
H
(z) = L
p
1 - E
k=1
For voiced speech this function is excited by the
weighted impulse function G.u(n) once every pitch period.
Using only this single pulse as input the complete pitch is
reconstituted sample by sample from the transfer function
shown above.
e (n) e
p-l(n) e
p(n)
INPUT OUTPIT
G.u(n)
Fig 25.5
362 Speech processing
ie ki+1 = —
P = 13
p = P-l
R = 0 Flow diagram for real
P
time autocorrelation.
J n •> ^ ^
Fig 25.6
364 Speech processing
START
e j = Ro -
?o = R l
P =1
p = p + 1
n = 0
K
n+l " ®n " n+l*en
Fig 25.7
Speech processing 365
Gz = R( 0 ) - £ a k . R ( k )
k=1
P 2
G = V IT (1-k n ) = VN
n=1
0 0
1 - 4 256 (=
5 - 16 512 (=
17 _ 64 1024 (= «/32)
65 _ 256 2048 (= V128)
257 _ 1024 4096 (= V512)
1025 - 4096 8192 (= V2048)
4097 _ 16384
16384 (= V8192)
N = 0.00042725 = 14 as an integer
Initial guess for G = 512. This guess is passed through
Newtons equation twice, ie
C~ START J
n = 0
Y = G.u(n)
i = 13
i = i - 1
Y = Y + Ki
YES •
Fig 25.8
Speech processing 369
,20dBm
-30dBm
10 ms 0 5kHz
10 ras
20dBm
-30dBm
5kHz
0 8 ms 0 8 ms
20dBm
-30dBm
5kHz
8 ms 0 8 ms
REFERENCES
1 INTRODUCTION
The continuing boom in semiconductor technology, and associated
improvements in computing power have been widely acclaimed in control
literature. In this field however there seems to be an ever widening void
between the expectations of new technology and reality. Advances in control
theory have far outstripped technological capability. The advantages of
modern digital control techniques however promise great improvements in
system performance, and implementation of such schemes must be achieved.
The problems imposed by endeavouring to apply complex algorithms in the real
world however are many.
In recent years there has been a continual growth in power system networks
and interconnections, with a corresponding economic demand to install larger
generating sets. Improvements in machine design have led to generators with
lower inertias, and higher reactances. Load centres which are placed large
distances from generation plant add to problems of instability. Over the past
years, work at The Queen's University has concentrated on power system
control. From early work involving on-line modelling of full size generator
units [1], has developed a programme to design and implement self-tuning and
adaptive control strategies. The results of this research have been extremely
encouraging, and significant improvements in controller performance have
already been shown on a physical generator model [2]. Many implementation
problems have already been solved using heuristic programming techniques
which are essential for supervision of the control algorithms. As a consequence
the number of instructions to be executed every sample interval has risen
dramatically, forcing a reduction in sample speed. The lack of computing
power offered by conventional microprocessors has severely restricted the
advancement of this work.
372 A case study in digital control
2) Control algorithm.
ALTERNATOR TURBINE
phase values
analog &
digital
interface
CONTROL
TMS32010
SIGNAL
16
CONTROLLING SUPERVISORY
COMPUTER COMPUTER
For several years, a PDP 11/73 was used to carry out all computing tasks.
These not only involved measurement and control, but machine compensation
for simulation inadequacies, and transmission line fault control. This
arrangement has been used with some success, and self-tuning controllers have
been implemented. The minimum sample interval which can be achieved
however is four times the standard set by industry.
Rotor angle and machine speed are essential quantities for the assessment of
Automatic Voltage Regulator (AVR) performance. These signals are directly
measurable from the optical transducer.
acquisition interfaces, and other support chips. Great advances have also been
made in software development facilities, with C compilers, simulators and
assemblers being available. The cost of these devices has reduced dramatically.
In this application, a TMS32010 was used. This has 144 words of on-board
data memory, and 4K words of program memory residing off-chip. This is
enough capacity for embedded applications. Due to the high access speeds
required, the RAM program memory was loaded from ROM before program
execution. Unlike strict Harvard devices, the TMS has the ability to transfer
information between program and data memory, enabling storage of
coefficients within the program address space.
The device has a 16 bit multiplier on-board, giving a 32 bit result within
200ns. Special registers are included to greatly improve the performance of the
device.
INTERRUPT SOURCE
VECTOR
Interface to the TMS is through the data bus. Three multiplexed address
lines are used to select appropriate devices through a 3 to 8 line selector.
Dedicated mnemonics are used to read and write data. A maximum of 128
input and 128 output lines are available. As only 16 lines are available at any
instant, handshaking control requires that data be stored in intermediate
latches.
5 HARDWARE IMPLEMENTATION
The measurement system consists of two sub-systems. Acquisition of signals
used to derive the terminal quantities is primarily an analog function, while
rotor angle and speed are measured digitally. These two functions work
asynchronously, and are interrupt driven. As the TMS has only one interrupt
line, device identification and priority are handled externally (Figure 2). An
Intel 8214 Priority Interrupt Control Unit was used to arbitrate between
devices. The priority of the requesting device is compared to that currently
executing. Should the interrupt be successful, a device identification vector is
read in the first statements of the service routine. The memory required to
store machine status restricts the system to a maximum of four levels of
interrupt. External logic ensures that stack overflow does not occur. Valid
pending interrupt addresses are stored in memory and executed on completion
of the current ISR.
FROM
MACHINE BIO
TERMINALS
INT
5.3 Analc
Six phase quantities are taken from current and voltage transformers via
differential amplifiers. Several features are included specifically for
implementation of the chosen measurement algorithms (Figure 3).
Instantaneous sampling of all phase quantities is a pre-requisite for accurate
measurement. Sample-and-hold devices perform this task. The signals are
connected sequentially to the A/D converter through an analog switch. As some
of the measurement techniques require a precise number of samples per
machine cycle, a sample interval control clock is included so that the sample
interval can be adjusted on-line by writing a value to a counter preset register.
This clock rate is adjusted in response to the measured machine speed.
378 A case study in digital control
6 SOFTWARE
6.1
The TMS has a rich instruction set which operates in three addressing
modes. The majority of these execute in one machine cycle. Arithmetic, logical
and program control instructions are augmented by special operations for
multiply, input/output, and memory manipulation. Data shifting can be
implemented as an integral part of an instruction, allowing scaling of integer
data to reduce quantisation errors. In this application, the ability to efficiently
update data within the sample window is particularly useful. The TMS was
FIFO
uz
memory
device control
sample
identification
clock
request
adjust
pending
/ \
analog digital
input input
used to check results from TMS code, and produce simulation data. All
non-target dependent code was developed using this stub concept. On
completion of the hardware, a XDS320 emulator was used to debug embedded
code.
7 PERFORMANCE
The accuracy of the measurement system was difficult to assess as the only
benchmark was the existing PDP computer. The TMS based system was
capable of generating and transfering data (6 values) at rates in excess of
1KHz. Figure 5 shows a comparison of responses generated by the two
measurement systems under three phase fault conditions. This shows the
range of values the measurement system must be capable of dealing with.
Tests such as these were performed over the operating range of the generator,
proving the integrity of the new system. Noise was artificially introduced to the
measured phase values. It was found that the the new tecniques produced
better results even under conditions where the PDP system failed. The
measurement system was also used to interrupt the controlling computer when
a major fault was detected. This allowed time for preparatory code within the
controlling computer to compensate for imminent transient conditions.
1 .50 1 .40
ZD
1.12. a. 1 .05.
0.75. w 0.70.
o
0.37. a.
0.35.
r:
0.00. m
0 H 0.00.
3. 1 .20
2.25. 0.85.
1 .50. 0.50.
0.75. 0.15.
-0.20.
0 0
8 CONCLUSIONS
The system described has shown that the use of a DSP in the role of
measurement for control gives advantages over conventional systems.
Increased speed, and the capability to use more complex algorithms leads to
improved controller performance. Although designed specifically for a
laboratory system, the necessity for improved measurement systems in
industry is growing. Manufacturers are now starting to implement digital
controllers, the performance of which will soon be limited by a lack of accurate
information at high transfer rates. Parameters which were once deemed
unmeasurable can now be constructed on-line and in real-time. This opens up
greater possibilities for control system designers.
9 REFERENCES
27.1 INTRODUCTION
Over the past 30 years the application of digital control to industrial processes has
grown dramatically. This has been brought about in the main by the rapid advances in
VLSI technology, with the provision of increased processor complexity at a reduced cost.
The application of microprocessors to control requires real time operation, and normally
only fixed point two's complement arithmetic is used. In addition to this, the designer has
to put in a substantial amount of effort to produce a practical implementation of the
control algorithms taking account of such issues as finite word length, selection of
sampling rate, efficient programming of the algorithms, scaling of all the variables and
cofflcients and the provision of a suitable input/output interface. The advent of digital
signal processors with their specially tailored architectures, provide an ideal solution to
overcome some of the limitations inherent in the use of microprocessors for control.
It has recently been shown (1,2) that such processors can be readily used to implement
discrete algorithms for control system applications. The use of digital signal processors
(DSP) for controller realisation overcomes some limitations associated with their design
using microprocessors, as they have architectures and dedicated arithmetic circuits which
provide high resolution and high speed arithmetic, making them ideally suited for use as
controllers. Initial work was done using the Intel 2920, and material has already been
published by the author on its use for controller implementations (2,3). The 2920 is one of
the earliest of DSP's and consequently is not the most powerful in terms of speed and
instruction set. It was initially decided however, to use the 2920 as a controller as this was
the only processor available which had A/D and D/A converters on chip. For the
particular applications considered the limited speed and size of instruction set proved no
real restriction, whilst having a single chip controller requiring no external circuitry apart
from a clock and power supply, was a decided advantage. The applications covered
implementations of PID, Smith Predictor and Dahlin algorithms which were successfully
applied to control a heating process.
The need for powerful digital signal processing devices in an expanding range of
application areas has provided the catalyst for major advances in DSP architecture which
has completely overtaken the 2920, and rendered it obselete. There are now available to
the design engineer a wide range of devices with considerably more powerful architec-
tures of which the TMS320 family of DSP's are representative. The need for such
powerful processors in control depends very much on the level of sophistication included
within the control scheme - whether real-time adaption is required, and how fast the
dynamics of the process being controlled. The initial impetus for the work described in
this chapter was to assess the ease with which discrete controllers could be implemented
on one of the more recent DSP's. The particular processor chosen was the TMS 32010,
the first member of the TMS family of VLSI digital signal processors.
Digital controllers 383
The processor also uses hardware to implement functions which previously had been
achieved using software. As a result multiplication takes 200ns, ie. one instruction cycle,
to execute. It also has a hardware barrel shifter for left-shifting data to 15 places before it
is loaded into the ALU. Extra hardware has also been included so that two auxiliary
registers which provide indirect data RAM addresses, can be configured in an auto
increment/decrement mode for single cycle manipulation of data values. This gives the
design engineer the type of power previously unavailable on a single chip (6,7).
Data memory consists of the 144x16 words of RAM present on chip. All non-immediate
operands reside within this RAM. If more data is required, then it is possible to use
external RAM and then read the data into the on-chip RAM as they are required. This
facility was not required in this project since the on-chip RAM was found adequate for all
the programs implemented.
The TMS32010 was applied to control the temperature of a process where air drawn
through a variable orifice by a centrifugal blower, is driven past a heater grid and through
a length of tubing to the atmosphere again. The process consists of heating the air flowing
in the tube to some desired temperature. The detecting element consists of a bead
thermister fitted to the end of a probe inserted into the airstream 0.28m from the heater
(Fig. 27.2).
Variable
orifice
Hea ter Thermistor
Blower „
1
I
Set value
1 1 Ftower 1 1 Bridge
Measured
indicator ^ I supply 1
value —m
value
indicator
J Set value
o step
On this process, a step response test was made resulting in a second order model of the
form:
The corresponding pulse transfer functions for the continuous model for different
sampling periods is shown in Table 27.1. Recall that this is the z transform of the product
of the zero-order hold and process transfer function.
Digital controllers 385
0.0896(1 + 0J09z~l)z^
T=0.1
(i-0J67z-])(l-0A69z-i)
0.173(1 +0.6z-')z"3
T-0.15
(l-0.677z" 1 )(l-0.321z- 1 )
0.448(1+0.362z"1)z"2
T=0.3
(l-0.451z- 1 )(l-0.103z~ 1 )
where en is the sampled data of the system error, un the manipulated variable and kp, k. =
Tk/T. and kd =Td k / T respectively, the proportional, integral and derivative control gains.
The sample time is T.
This equation can be rearranged to give the following pulse transfer function
(27.3)
where
= kd/kp
Using the procedure suggested by Roberts and Dallard (8) which reduces the number of
separate tuning parameters to the one gain term by selecting the sampling period to
T=0.1Tu, equation 27.3 reduces to
(27.4)
Dahlin's method (9) is a particular case of the general synthesis approach to design
where digital controllers are designed directly in the discrete domain (Chapter 13). The
process under control, is modelled by a first order or second order transfer function and
the desired closed loop transfer function K(z) is arranged to be a first order lag of the form
K(z) = - (27.6)
(l-exp(-AT)z-1)
The reciprocal of the time constant X is used as a tuning parameter with larger values
giving increasingly tight control.
Zero Order
Controller Hold Plant
R(s) C(s)
l-e" ST
1
D(z) Gp(s)
Fig. 27.3 Block diagram of sampled data system for Dahlin design
(27.7)
4
where p = exp(-AT)
(25.8)
Using the second order model the Dahlin controller equation for a closed loop response
with a time constant of 0.15s becomes
(l-0.767z- 1 )(l-0.469z~ 1 )
3.17 (27.10)
(l-0.513z- ! -0.486z^)
21 A3 Kalman Design
The Dahlin algorithm is based on the specification of the output in response to a setpoint
change without any constraints placed on the manipulated variable. An alternative
approach originally proposed by Kalman (11) is to design a digital controller with
restrictions placed on both the manipulated and controller variables. For example, the
specifications for a step change in set point might be for the response to settle at the final
value within the actuation signal assuming only a specific number of values before
reaching the final value.
This approach allows the designer to take account of load changes and to specify that
the error sequence should be zero after a specified number of sample instants, or that the
error should reduce in a specified manner. It can be shown that in the case of a
second-order system a minimum of two values of manipulated variables are required
before the setpoint can be reached. Similarly for a third order system, 3 values etc. Taking
account of these restrictions we can write expressions for the controlled and manipulated
signal as follows:-
C
^~£oC"Z~n (27.12)
(27B)
U(z)= iuz^
Given the specification that the response should settle to the final value within two
sampling periods, it can be shown that (10) for a second order pulse transform model of
the form
K +>>2Z *>,+>>2Z f
that,
A sampling period of 0.3s was selected for the design as a higher sampling frequency
resulted in continuous oscillations of its actuation signal. Using equation 27.17
(l-0.553z~1+0.0464z-2)
P(z)=1637 (2718)
(l-2-)(l+2-+0.266z-)
§«.,-! Z
B
A{
(27.22)
which can be written in terms of the predicted system output at time (n+k) to give
(27.23)
It is obvious from equation 27.23 that since the terms C(n+k-l), C(n+k-2) are not yet
available at time "n", then the equation in its present form is not immediately solveable.
To avoid a "k-step" computation loop, it can be shown that
which gives a prediction of C(n+k) based on values of C up to time Mn" and previous
actuations extending a further (k-1) samples back. The factors F(z) and J(z) are given by
expanding 1/A(z) by long division up to z*.
The design implemented (Fig. 27.4) was based on the model following predictor, where
the output response was based on a first order lag with time delay K, which can be written
as
-1
1-Pz' (27.26)
U(Z) C zj
B
A U) Z
It proved necessary to increase the sampling period to 0.15s due to the unacceptable
actuation signal oscillations that occurred at 0.1s. This resulted in the following equations
£ 0.173(1 + 0 . 6 z - V 3
(27.27)
U (1 - 0.67z-') (1 - 0.32Z-1)
(27.28)
f
F(z) = 0.55(1 -0.3z" )
(27.29)
Therefore, the difference equations for the controller are:
rn = 0.632w n +0.368^ _,
(27.30)
M/i = 5.77^ - 1.6^.., - l37un_2-0A6un_3
With this implementation an offset occurs in the presence of a disturbance, this can be
overcome by using the incremental form of the predictive controller.
A particular form of the general predictor is the one proposed by Smith, which has been
covered in Chapter 13. The Smith predictor was implemented with the PID controller
specified in equation 27.4.
The pole placement design can be viewed as an extension of the classical root-locus
method, where a design is undertaken to position the dominant close-loop poles of the
system. In this section, the pole placement design is based on the general linear regulator
shown in Fig 27.5, which is of the form
where
= ho + hiz~x+h2z~2+...+hiz~
A(z)C(z) = z^B(z)U(z)
where
By combining equations 27.31 and 27.32 the closed loop transfer function is
Dt zzZk=D(zT {2733)
r=l+f1z-1+r2z-2+...+fcz- (27.36)
so that
p
f=m-l
and p=k +r- 1
The controller parameters can now be obtained by solving a set of linear equations
which can be represented in vector-matrix, using the Sylvester type matrix as follows:
Digital controllers 391
1 0 . 0 0 . . . 0 a (27.38)
- \
a
\ 1 0 . . 0 0 . . . 0
k -
ax 1 0 . 0 0 . . . 0
a, .
- a,
. 0 b\ 0
. 0 bx
. 1
am . am br /o
0 am 0 K . . . 0
0 0 am . . . . bt
0 0 0 .
0 0 0 . 0
0 0 0 . a 0 0 b A
The number of rows is given by k + r + m - 1
50 =R (27.39)
9 = (27.40)
The solution so far, gives the coefficients of the controller to meet the desired pole
requirements, we still have to choose H in order to meet the steady state requirements of
the system. The simplest option is to select H to be a scaling factor in order that in steady
state c(t) = w(t), so that
z~* (27.41)
In general, the pole placement design can give a poor steady state performance in the
face of load disturbance. This can be overcome by including an integrator in the design
and incorporating it as part of the process.
For the temperature process, cascaded with an integrator, the pulse transform becomes
C (0.173z"'+0.124z- 2 )z~ 2
(27.43)
- 1.992.Z"1 + 1.207z-2-0.215z-3)
392 Digital controllers
and given that m=3, r=2 and k=2, then f=2 and p=3, so that the Sylvester matrix is
1 0 0 0 0 0
-1.992 1 0 0 0 0
1.207 -1.992 1 0. 173 0 0
S=
-0.215 1.207 -1.992 0. 104 0.173 0
0 -0.215 1.207 0 0.104 0.173
0 0 -0.215 0 0 0.104
r=l-1.2z"1+0.48z"2
(27.44)
so that
( 0.792 ^
-0.727
0
R=
0
0
(27.45)
0
' 1 0 0 0 0 0 (27.46)
1.992 1 0 0 0 0
0.8779 0.5316 0.1157 -0.1925 0.3202 -0.5326
o = 10.8852 8.4416 5.116 1.1125 -1.8506 3.0784
-9.0900 - 5.9305 -1.7406 2.8955 4.7988 -7.9827
1.8149 1.0990 0.2392 -0.3979 0.6618 8.5144
Giving
' -0.792 '
0.8507
0.3088
2.4840
-2.8878
-6.385 (27.47)
and
Incorporating the integrator in the forward path part of the controller, yields
Digital controllers 393
27.5 IMPLEMENTATION
One of the first critical decisions in the implementation was the choice of number
representation. The TMS32010 uses a Q-notation system which is based upon fixed-point
two's complement representation of numbers. Out of the 16 binary places possible the
MSB is used to represent whether the number is a positive or negative number. This then
leaves the programmer to decide where along the remaining 15 binary places he wishes to
place the decimal point. Thus if a number has i integer bits, then it also has (15-i)
fractional bits and is regarded as a (15-i) bit number. The choice of where to place the
decimal point was critical to the performance of the controller, as it proved easy to
generate an overflow. Unless this was monitored by checking the overflow register then
swings between very large and very small numbers occurred which in turn caused erratic
system behaviour. The programmer must either allow enough integer bits in his result to
accommodate bit growth due to arithmetic operations or he must be prepared to handle
overflows. In this implementation it proved necessary to do both. It was found that by
representing all input and output values by Q15 numbers and all coefficients by Q l l
numbers then this reduced the possibility of overflow although it still proved necessary to
test for this condition and take appropriate action.
The instruction set of the TMS32010 and the manner in which structure can be imposed
through the use of macros is one of the TMS320l0's greatest assets. In the majority of
cases a single instruction will not only allow its own specific function to be completed,
but it will also allow the programmer to set up the conditions necessary to execute the
next instructions. An example of this would be the LTD instruction which loads the T
register with the contents of an address ready for multiplication. The instruction also shifts
the data from its present location to the next highest memory location, as well as adding
any previous result obtained in the accumulator. Thus the instruction LTD effectively
performs the z ' operation and if combined in a loop with the MPY (multiply) instruction
then it evaluates the expression.
Cn= i bken_k
* 0
Due to the similarity of the structure of the algorithms implemented, the use of macros
proved invaluable and aided the efficient development of the software. This required
some standardisation of the memory locations used for both the data and parameters, so
that the macros developed could be used with all the algorithms.
The development of the software was aided by the use of three software packages
available on the VAX 11/785 system.
The packages
(i) The XDS/320 Macro Assemble which translates TMS32010 assembly language
394 Digital controllers
(ii) The XDS/320 Linker which apart from producing a final down-loadable
object code, also allows a program to be designed and implemented in
separate modules which can then be linked to form the complete program.
Dahlin
Predictive incremental
Smith predictor
Pole placement
100 130
It was found that the best results were obtained using the Dahlin Controller. This was
true of all the performance indices measured. The indices did not take into account the
actuation signal, and so failed to penalise the ringing. The relative merits of the controllers
in the face of a load disturbance are best illustrated in Fig. 27.6, where the best Dahlin
response is selected as 100% and the other results shown relative to it. The disturbance
response for the different controllers are shown in Figures 27.7-27.9. It should be
observed that since (1-z1) is a factor of the demoninator polynomial for the Dahlin
algorithm, then the controller has integral action which ensures that there is zero offset to
a disturbance input.
Digital controllers 395
°C °C
time time
(a) PID Roberts (k=0.5) (b) Dahlin
°C
time time
(c) Dahlin with one ringing pole removed (d) Kalman
Fig. 27.7 Disturbance response with PID, Dahlin and Kalman Controllers
(opening vent)
The slight degradation of responses produced by removing the ringing poles can be
seen for both Dahlin and Kalman designs. The Smith predictor performs well as does the
PID Roberts when the gain is adjusted correctly. The designs of the predictive model
following and pole placement controller which have no integrator in the forward path
perform very badly. With the integrator included there is a dramatic improvement, but
they still fare worse than the Dahlin implementations.
27.7 CONCLUSIONS
The chapter has shown that the signal processing capability of the TMS320 can be
readily harnassed to implement a range of digital controllers. It has demonstrated that the
powerful architecture of this DSP lends itself to implement efficiently the kind of
algorithms met in discrete controllers and that a great deal of structure is possible with the
system. The accuracy obtained depends very much on the number structure used and this
required knowledge about the algorithms being implemented. It was not felt that the
processor power was in any way stretched. Producing coding in assembly that was robust
and error free was time consuming, and any future development will use a high level C
compiler.
396 Digital controllers
°C
time time
(a) Model following predictor (b) Model following
incremental predic-
tor
°C
time
(c) Smith predictor
°C °c
time time
(a) without integrator (b) with integrator
Fig. 27.9 Disturbance response with pole-placement controller (closing vent)
Digital controllers 397
REFERENCES
8. Robe ts and Dallard: "Discrete PID Controller with a single tuning parameter"Mea-
surement and Control, Vol.7, pp.T97-T101, Dec 1974
28.1 INTRODUCTION
Signal processing is concerned with the acquisition, abstraction
and analysis of information, and is involved with the development of
techniques, algorithms and architectures for implementing these
function. As is evident from the earlier chapters, applications of
signal processing are vast, and ever increasing.
The seventies were largely concerned with the development of
algorithms, and dedicated hardware for signal processing. With the
development of DSP chips there has been a phenomenal growth in
signal processing activities, and the focus of attention has turned to
real time systems with the ready availability of cheap and reliable
devices from Texas Instruments, Fujitsu, NEC, Motorola, amongst
others. These devices, with their support tools offer an easy
environment for prototyping, and eventual production.
In this review a brief summary is given of the current trends and
new directions in the development of algorithms, architectures and
devices for signal processing.
28.2 ALGORITHMS
The thrust of present interest is moving from fixed format,
application dependent algorithms, such as the FFT, or digital filters,
Review and outlook for the future 399
where the issues of accuracy and stability such as word length effects,
or round-off noise were of concern, to more flexible, data dependent
algorithms, such as model based spectral analysis methods employing
Autoregressive (AR) Moving Average (MA) or ARMA methods,
maximum entropy techniques, and adaptive filters, where
convergence rates, residual errors, and computing effort are of
interest in studying algorithm performance.
Most signal processing algorithms have their origins in linear
systems theory and approximation theory, where, in the main, a
minimum mean square criterion of performance is used to develop
stable, iterative (or block processing) algorithms. For instance, the
DFT of a data sequence is a least mean squares time-domain solution
which fits a series of complex sinusoidal harmonics to the data. Most
digital filters are minimum mean sequence error solutions in the
frequency domain, to the frequency response of ideal systems.
Similarly, adaptive filters arise as iterative solutions to a quadratic
error criterion between a desired output and a known output.
More recently attention has turned towards the use of non-linear
optimisation criterion, as a number of problems are more suited to
this framework, eg finding the maximum likelihood estimates of a set
of signals, and new algorithms such as simulated annealing [1], are
much in favour.
Simulated annealing maps the cost function associated with an
optimisation problem onto the energy of the states of a system, and
through the use of an annealing algorithm seeks to find the (global)
minimum energy state, ie the configuration that minimises the cost
function. This is particularly important for cost functions with
multiple minima. Here the minimum of a given cost function of many
400 Review and outlook for the future
28.3 ARCHITECTURE
Architectures for signal processing systems have evolved from
conventional structures such as recursive or non-recursive tapped
delay line with fixed or adaptive (time varying) weights to lattice filters
[2], Lattice configurations form an important class of architectures for
signal processing. They possess regularity of form comprising
identical stages (sections), which have orthogonal properties, and
involve bounded coefficients. Lattices are thus inherently stable and
possess good numerical round-off characteristics. These properties
make them particularly attractive for adaptive processing. Lattices
arise in the autoregressive modelling of input data, and provide
outputs which are the residuals of the models. Figure 1 illustrates an
M stage lattice. It consists of two channels corresponding to the
402 Review and outlook for the future
outputs (fm(k)} and {gm(k)} which are referred to as the forward and
backward residuals at various stages {m}. Each stage of the lattice
comprises a pair of adders and (for modelling second-order stationary
data) a reflection coefficient multiplier {Km} and a delay element.
The governing equations for each stage of the lattice are
where {x)(k)} is the input data sequence, and (fm(k)} and {gm(k)} are
the forward and backward residuals at the the lattice stage, for the kth
time instant. Km is the reflection coefficient for the mth stage.
The forward residuals represent the prediction error between
the output of the mth stage lattice and the input data, for an mth
order autoregressive model of the data; the backward residuals is the
Review and outlook for the future 403
projects which exploit both fine grain and coarse grain parallelism are
available from [7].
Another novel architecture attracting increasing attention is
Artificial Neural Networks (ANNs). These are particularly useful in
the areas of speech processing, signal classification, sensor processing
and pattern recognition [8,9].
ANNs are a network of interconnections of simple processing
units, which can adapt their behaviour (response) according to inputs
received during a training phase. The processing unit or node is a
non-linear, element which sums N weighted inputs and passes the
result through a non-linearity. The node is defined in terms of an
internal threshold or offset 0i and by the type of non-linearity.
Typically the output of a simple node is
OUTPUT
t t
OUTPUT
LAYER
PROCESSING
ELEMENT
t t t
INPUT
Every MLP has a input layer, one or more hidden layers, and an
output layer. The processing is performed in the hidden and output
layers. In terms of performance, training signals are applied to the
inputs, and propagated through the network to produce output values.
These are compared with the desired outputs, and the error used to
modify (train) the weights, via a popular algorithm called the Back
Propagation Algorithm [12]. Once the network has been trained, it
can be used to extract signals from noise offered to the input, or as a
pattern classifier where different images (patterns) are applied to the
input patterns. It is now accepted that neural networks will play a
significant role in the design and implementation of novel signal
processing systems. The current disadvantage of long learning times,
406 Review and outlook for the future
28.4 DEVICES
The explosive growth in dedicated signal processing devices
augurs important new developments in signal processing, aided with
the ready availability of 1-micron and sub micron technology DSP
chips, with clock rates in excess of 30 MHz. Most DSP devices are
based on the RISC architecture, with on-chip memory and fast parallel
multiply accumulate units with fast I/O links. In this context the
INMOS transputer with four serial I/O links, and the follow-up INMOS
A100 device, and the more recent Motorola DSP 96000 series which
includes two fast serial links, and the Texas Instruments TMS 320C30
which also incorporates serial communications links, all contribute to
the growing armoury of DSP devices.
The recently announced TMS 320C50 is a complete DSP system
on a chip [13] offering a 35ns instruction cycle, with two to four times
the performance of fixed point DSPs; an enhanced instruction set,
and source code compatibility with previous generations of TMS 320s.
It has a four-wire serial test bus, allowing easy interface to peripherals.
A further innovation which is of direct relevance to real time
signal processing is the development of mixed signal* systems on a
chip. This is a relatively new phenomenon which provides a novel
means of mixing analogue and digital functions on a chip [14].
Examples are the Brooktree RAMDACs for graphics applications, self
calibrating convertors, and data acquisition components from Crystal
Semiconductor Corporation. National Semiconductor plans to release
Review and outlook for the future 407
(MEASURED
OUTPUT
VARIABLE)
DIGITALLY
CONTROLLED
(DESIRED SETTINGS
SET POINT)
INTERNAL BUS
MICROCONTROLLER
CIRCUIT
ONE-LINE DISPLAY
28.5 CONCLUSIONS
The above has been a brief journey through some of the new
directions in algorithms, architectures and devices for signal
processing. This, of necessity, is a personal view point, nevertheless,
with significant new developments over the horizon, the subject area
of signal processing is set to grow and grow. As a final statement, the
integration of fast algorithms, parallel architectures, and high
performance multiprocessors, the field of parallel signal processing
and its applications is one which will lead to rich rewards.
REFERENCES
Winograd 66
word-length effects 398
Z - plane 38
Z - transform 21, 31, 33, 62
Digital signal processing: principles, devices and applications
Recent progress in the design and production of digital signal processing (DSP) devices has provided
significant new opportunities to workers in the already extensive field of signal processing. It is now
possible to contemplate the use of DSP techniques in cost-sensitive wide bandwidth applications,
thereby making more effective use of the large body of available signal processing knowledge.
Digital signal processing, long the province of telecommunications is, in both research and
applications contexts, of growing importance in fields of medical signal analysis, industrial control
(particularly robotics), in the analysis and synthesis of speech and in both audio and video
entertainment systems. The growing demand for engineering skills in these areas has led to the
writing of this book and the presentation of the material of the book at an lEE-sponsored Vacation
School at the University of Leicester.
This book is different from others in the field in that it not only presents the fundamentals of DSP
ranging from data conversion to z-transforms and spectral analysis, extending this into the areas of
digital filtering and control, but also gives significant detail of the new devices themselves and how
to use them. In addition to presenting the basic theory and describing the devices and how to
design with them, the material is consolidated by extensive use of real examples in specific case
studies.
The book is directed at readers with first degree level training in engineering, physical sciences or
mathematics and with some understanding of electronics, and is appropriate for design engineers
in industry, users of DSP devices in scientific research and in all technical development areas
associated with the processing of signals for display, storage, transmission or control.