Digital Filters Digital Filters Digital Filters Digital Filters Digital Filters 55555
Digital Filters Digital Filters Digital Filters Digital Filters Digital Filters 55555
For additional information on these topics, see References at the end of this
chapter.
87
5 Digital Filters
5.5.1 Decimation
Decimation is equivalent to sampling a discrete-time signal. Continuous-
time (analog) signal sampling and discrete-time (digital) signal sampling
are analogous.
+∞
s(t) = ∑ ∂(t–nT)
n=–∞
The frequency that is one-half the sampling frequency (Fs/2) is called the
Nyquist frequency. The analog signal xc(t) must be bandlimited before
sampling to at most the Nyquist frequency. If xc (t) is not bandlimited, the
images created by the sampling process overlap each other, mirroring the
spectral energy around nFs/2, and thus corrupting the signal
representation. This phenomenon is called aliasing. The input xc(t) must
pass through an analog anti-alias filter to eliminate any frequency
component above the Nyquist frequency.
88
Digital Filters 5
xc(t)
t
... –3T –2T –T 0 T 2T 3T ...
s(t)
t
... –3T –2T –T 0 T 2T 3T ...
s(t) xc(t)
t
... –3T –2T –T 0 T 2T 3T ...
x(n)
n
–3 –2 –1 0 1 2 3
89
5 Digital Filters
Xc(w)
w
πF
s
S(w)
w
2πF 4πF
s s
X(w)*S(w)
•••
w
2πF 4πF
s s
90
Digital Filters 5
x(n)
s(n)
x(n)s(n)
n
0 1 2 3 ...
y(m)
m
0 1 2 3 4 5
The decimation and anti-alias functions are usually grouped together into
one function called a decimation filter. Figure 5.9 shows a block diagram
of a decimation filter. The input samples x(n) are put through the digital
anti-alias filter, h(k). The box labeled with a down-arrow and M is the
sample rate compressor, which discards M–1 out of every M samples.
Compressing the filtered input w(n) results in y(m), which is equal to
w(Mm).
Data acquisition systems such as the digital audio tape recorder can take
advantage of decimation filters to avoid using expensive high-
91
5 Digital Filters
X(w)
w
ws
2
H(w)
w
ws
8
W(w)
w
ws
8
S(w)
w
ws ws 3w s ws
4 2 4
W(w)*S(w)
w
ws
8
X(w')
w
w' =
4
w's
2
92
Digital Filters 5
w(n)
x(n) h(k) ↓M y(m)
N–1
w(n) = ∑ h(k) x(n–k)
k=0
Figure 5.10a, on the next page, shows the signal flowgraph of an FIR
decimation filter. The N most recent input samples are stored in a delay
line; z–1 is a unit sample delay. N samples from the delay line are
multiplied by N coefficients and the resulting products are summed to
form a single output sample w(n). Then w(n) is down-sampled by M
using the rate compressor.
93
5 Digital Filters
h(0)
x(n) ↓M y(m)
-1
z h(1)
-1
z h(2)
(a) -1
z h(3)
-1
z h(N-1)
h(0)
x(n) ↓M y(m)
z-1 h(1)
↓M
z-1 h(2)
(b) ↓M
-1
z h(3)
↓M
z-1 h(N-1)
↓M
94
Digital Filters 5
It is not necessary to calculate the samples of w(n) that are discarded by
the rate compressor. Accordingly, the rate compressor can be moved in
front of the multiply/accumulate paths, as shown in Figure 5.10b. This
change reduces the required computation by a factor of M. This filter
structure can be implemented by updating the delay line with M inputs
before each output sample is calculated.
N–1
y(m) = ∑ h(k) x(Mm–k)
k=0
External hardware causes an interrupt at the input sample rate Fs, which
triggers the program to fetch an input data sample and store it in the data
circular buffer. The index register that points into this data buffer is then
incremented by one, so that the next consecutive input sample is written
to the next address in the buffer. The counter is then decremented by one
and compared to zero. If the counter is not yet zero, the algorithm waits
for another input sample. If the counter has decremented to zero, the
algorithm calculates an output sample, then resets the counter to M so that
the next output will be calculated after the next M inputs.
counter=1
initialize AG
yes
dm(^data,1)=input
counter=counter–1
no
counter=0
?
yes
The ADSP-2100 program for the decimation filter is shown in Listing 5.8.
Inputs to this filter routine come from the memory-mapped port adc, and
outputs go to the memory-mapped port dac. The filter’s I/O interfacing
hardware is described in more detail later in this chapter.
96
Digital Filters 5
{DECIMATE.dsp
Real time Direct Form FIR Filter, N taps, decimates by M for a
decrease of 1/M times the input sample rate.
INPUT: adc
OUTPUT: dac
}
.MODULE/RAM/ABS=0 decimate;
.CONST N=300;
.CONST M=4; {decimate by factor of M}
.VAR/PM/RAM/CIRC coef[N];
.VAR/DM/RAM/CIRC data[N];
.VAR/DM/RAM counter;
.PORT adc;
.PORT dac;
.INIT coef:<coef.dat>;
RTI; {interrupt 0}
RTI; {interrupt 1}
RTI; {interrupt 2}
JUMP sample; {interrupt 3= input sample rate}
The FIR filter equation starts the convolution with the most recent data
sample and accesses the oldest data sample last. Delay lines implemented
with circular buffers, however, access data in the opposite order. The
oldest data sample is fetched first from the buffer and the newest data
sample is fetched last. Therefore, to keep the data/coefficient pairs
together, the coefficients must be stored in memory in reverse order.
The relationship between the address and the contents of the two circular
buffers (after N inputs have occurred) is shown in the table below. The
data buffer is located in data memory and contains the last N data samples
input to the filter. Each pass of the filter accesses the locations of both
buffers sequentially (the pointer is modified by one), but the first address
accessed is not always the first location in the buffer, because the
decimation filter inputs M samples into the delay line before starting each
filter pass. For each pass, the first fetch from the data buffer is from an
address M greater than for the previous pass. The data delay line moves
forward M samples for every output calculated.
98
Digital Filters 5
Data Coefficient
DM(0) = x(n–(N–1)) oldest PM(0) = h(N–1)
DM(1) = x(n–(N–2)) PM(1) = h(N–2)
DM(2) = x(n–(N–3)) PM(2) = h(N–3)
• •
• •
• •
DM(N–3) = x(n–2) PM(N–3) = h(2)
DM(N–2) = x(n–1) PM(N–2) = h(1)
DM(N–1) = x(n–0) newest PM(N–1) = h(0)
The number of cycles required for the decimation filter routine is shown
below. The ADSP-2100 takes one cycle to calculate each tap (multiply and
accumulate), so only 18+N cycles are necessary to calculate one output
sample of an N-tap decimator. The 18 cycles of overhead for each pass is
just six cycles greater than the overhead of a non-multirate FIR filter.
N= 1 – 18
Fs tCLK
99
5 Digital Filters
In this example, a circular buffer input_buf stores the M input samples. The
code for loading input_buf is placed in an interrupt routine to allow the
input of data and the FIR filter calculations to occur simultaneously.
A loop waits until the input buffer is filled with M samples before the
filter output is calculated. Instead of counting input samples, this program
determines that M samples have been input when the input buffer’s index
register I0 is modified back to the buffer’s starting address. This strategy
saves a few cycles in the interrupt routine.
After M samples have been input, a second loop transfers the data from
input_buf to the data buffer. An output sample is calculated. Then the
program checks that at least one sample has been placed in input_buf. This
check prevents a false output if the output calculation occurs in less than
one sample interval. Then the program jumps back to wait until the next
M samples have been input.
This more efficient decimation filter spreads the calculations over the
output sample interval 1/Fs’ instead of the input interval 1/Fs. The
number of taps that can be calculated in real time is:
N= M – 20 – 2M – 6(M–1)
Fs tCLK
100
Digital Filters 5
{DEC_EFF.dsp
Real time Direct Form FIR Filter, N taps, decimates by M for a decrease of 1/M times
the input sample rate. This version uses an input buffer to allow the filter
computations to occur in parallel with inputs. This allows larger order filter for a
given input sample rate. To save time, an index register is used for the input buffer
as well as for a decimation counter.
INPUT: adc
OUTPUT: dac
}
.MODULE/RAM/ABS=0 eff_decimate;
.CONST N=300;
.CONST M=4; {decimate by factor of M}
.VAR/PM/RAM/CIRC coef[N];
.VAR/DM/RAM/CIRC data[N];
.VAR/DM/RAM/CIRC input_buf[M];
.PORT adc;
.PORT dac;
.INIT coef:<coef.dat>;
RTI; {interrupt 0}
RTI; {interrupt 1}
RTI; {interrupt 2}
JUMP sample; {interrupt 3= input sample rate}
fir: CNTR=N - 1;
MR=0, MX0=DM(I0,M0), MY0=PM(I4,M4);
DO taploop UNTIL CE; {N-1 taps of FIR}
taploop: MR=MR+MX0*MY0(SS), MX0=DM(I0,M0), MY0=PM(I4,M4);
MR=MR+MX0*MY0(RND); {last tap with round}
IF MV SAT MR; {saturate result if overflow}
DM(dac)=MR1; {output data sample}
wait_again: AX0=I1;
AY0=^input_buf;
AR=AX0-AY0; {test and wait if i1 still}
IF EQ JUMP wait_again; {points to start of input_buf}
JUMP wait_M;
102
Digital Filters 5
IRQ
ADSP-2100
Decode
Track
x(t) Tri- Track
/Hold A/D Latch Latch D/A y(t')
State /Hold
Buffer
÷M
Interval
Timer
5.5.3 Interpolation
103
5 Digital Filters
x(n)
n
–3 –2 –1 0 1 2 3
y(t)
t
... –3T –2T –T 0 T 2T 3T
xc(t)
t
... –3T –2T –T 0 T 2T 3T ...
104
Digital Filters 5
Y(w)*S(w)
w
πF s 2πF s 4πF s
Xc(w)
w
πF s
x(n)
w(m)
y(m)
105
5 Digital Filters
Digital audio systems such as compact disk and digital audio tape players
frequently use interpolation (oversampling) techniques to avoid using
costly high performance analog reconstruction filters. The anti-imaging
function in the digital interpolator allows these systems to use inexpensive
low-order analog filters on the D/A outputs.
X(w)
w
w ws
s
2
W(w)
w
w s' w'
s
2
H(w)
w
w' w s' w'
s s
2L 2
Y(w)
w
w s' w s'
2
106
Digital Filters 5
h(0)
x(n) ↑ L y(m)
z-1 h(1)
z-1 h(2)
(b)
z-1 h(3)
z-1 h(N-1)
107
5 Digital Filters
The low-pass filter of the interpolator uses an FIR filter structure for the
same reasons that an FIR filter is used in the decimator, notably
computational efficiency. The convolution equation for this filter is
N–1
y(m) = ∑ h(k) w(m–k)
k=0
N–1 is the number of filter coefficients (taps) in h(k), w(m–k) is the rate
expanded version of the input x(n), and w(m–k) is related to x(n) by
The signal flow graph that represents the interpolation filter is shown in
Figure 5.17b, on previous page. A delay line of length N is loaded with an
input sample followed by L–1 zeros, then the next input sample and L–1
zeros, and so on. The output is the sum of the N products of each sample
from the delay line and its corresponding filter coefficient. The filter
calculates an output for every sample, zero or data, loaded into the delay
line.
108
Digital Filters 5
-1
z 0 0 x(3) 0 0 h(2)
-1
z x(2) 0 0 x(3) 0 h(3)
-1
z 0 x(2) 0 0 x(3) h(4)
-1
z 0 0 x(2) 0 0 h(5)
Crochiere and Rabiner (see References at the end of this chapter) refer to
this efficient interpolation filtering method as polyphase filtering, because a
different phase of the filter function h(k) (equivalent to a set of interleaved
coefficients) is used to calculate each output sample.
109
5 Digital Filters
Figure 5.19 shows a flow chart of the interpolation algorithm. The
processor waits in a loop and is interrupted at the output sample rate
(L times the input sample rate). In the interrupt routine, the coefficient
address pointer is decremented by one location so that a new set of
interleaved coefficients will be accessed in the next filter pass. A counter
tracks the occurrence of every Lth output; on the Lth output, an input
sample is taken and the coefficient address pointer is set forward L
locations, back to the first set of interleaved coefficients. The output is then
calculated with the coefficient address pointer incremented by L locations
to fetch every Lth coefficient. One restriction in this algorithm is that the
number of filter taps must be an integral multiple of the interpolation
factor; N/L must be an integer.
counter=1
initialize AG
no
interrupt at L x (input sample rate)
?
yes
modify(^coef,–1)
counter=counter–1
no
counter=0
?
yes
dm(^data,1)=input
modify(^coef,L)
counter=L
110
Digital Filters 5
Listing 5.10 is an ADSP-2100 program that implements this interpolation
algorithm. The ADSP-2100 is capable of calculating each filter pass in
((N/L)+17) processor instruction cycles. Each pass must be calculated
within the period between output samples, equal to 1/FsL. Thus the
maximum number of taps that can be calculated in real time is:
N= 1 – 17L
Fs tCLK
where tCLK is the processor cycle time and Fs is the input sampling rate.
{INTERPOLATE.dsp
Real time Direct Form FIR Filter, N taps, uses an efficient algorithm
to interpolate by L for an increase of L times the input sample rate. A
restriction on the number of taps is that N/L be an integer.
INPUT: adc
OUTPUT: dac
}
.MODULE/RAM/ABS=0 interpolate;
.CONST N=300;
.CONST L=4; {interpolate by factor of L}
.CONST NoverL=75;
.VAR/PM/RAM/CIRC coef[N];
.VAR/DM/RAM/CIRC data[NoverL];
.VAR/DM/RAM counter;
.PORT adc;
.PORT dac;
.INIT coef:<coef.dat>;
RTI; {interrupt 0}
RTI; {interrupt 1}
RTI; {interrupt 2}
JUMP sample; {interrupt 3 at (L*input rate)}
{______________________Interpolate__________________________________}
112
Digital Filters 5
5.11 performs the 16-by-32 bit multiplication needed for this gain
correction. This code replaces the saturation instruction in the
interpolation filter program in Listing 3.3. The MY1 register should be
initialized to L at the start of the routine, and the last multiply/accumulate
of the filter should be performed with (SS) format, not the rounding
option. This code multiplies a filter output sample in 1.31 format by the
gain L, in 16.0 format, and produces in a 1.15 format corrected output in
the SR0 register.
MX1= MR1;
MR= MR0*MY1 (UU);
MR0= MR1;
MR1= MR2;
MR= MR+MX1*MY1 (SU);
SR= LSHIFT MR0 BY -1 (LO);
SR= SR OR ASHIFT MR1 BY -1 (HI);
113
5 Digital Filters
ADSP-2100
IRQ
Decode
Track
x(t) Tri- Track
/Hold A/D Latch Latch D/A y(t')
State /Hold
Buffer
÷L
Interval
Timer
114
Digital Filters 5
Interpolator Decimator
x(k) w(k)
Figure 5.22, on the next page, shows the frequency representation for a
sample rate increase of 3/2. The input sample frequency of 4kHz is first
increased to 12kHz which is the least common multiple of 4 and 6kHz.
This intermediate signal X(k) is then filtered to eliminate the images
caused by the rate expansion and to prevent any aliasing that the rate
compression could cause. The filtered signal W(k) is then rate compressed
by a factor of 2 to result in the output Y(k) at a sample rate of 6kHz. Figure
5.23, also on the next page, shows a similar example that decreases the
sample rate by a factor of 2/3.
Figure 5.24 shows the relationship between the sample periods used in the
3/2 and 2/3 rate changes. The intermediate sample period is one-Lth of
the input period, 1/Fs, and one-Mth of the output period.
115
5 Digital Filters
X(n)
2 4 6 8 10 12 kHz
W(k)
2 4 6 8 10 12 kHz
Y(m)
1 2 3 4 5 6=F" kHz
s
2 3 4 6 8 9 10 12 kHz
W(k)
2 4 6 8 10 12 kHz
Y(m)
1 2 3 4=F" kHz
s
L/M = 2/3
1
F
s
Intermediate rate
The flow charts in Figures 5.25 and 5.26 show two implementations of the
rate change algorithm. The first one uses software counters to derive the
input and output sample rates from the common sample rate. In this
algorithm, the main routine is interrupted at the common sample rate.
Depending on whether one or both of the counters has decremented to
zero, the interrupt routine reads a new data sample into the input buffer
and/or sets a flag that causes the main routine to calculate a new output
sample.
For some applications, the integer factors M and L are so large that the
overhead for dividing the common sample rate leaves too little time for
the filter calculations. The second implementation of the rate change
algorithm solves this problem by using two external hardware dividers,
÷L and ÷M, to generate the input and output sample rates from the
common rate. The ÷L hardware generates an interrupt that causes the
processor to input a data sample. The ÷M hardware generates an interrupt
(with a lower priority) that starts the calculation of an output sample.
117
5 Digital Filters
L/M Rate Change Interrupt Main
Main Routine at L*input rate
countin=1 countin=countin–1
countout=1
out_flag=0
coef_update=0
initialize AG no
countin=0
?
yes
no load input_buffer
out_flag=1
? with input sample
coef_update=0
no
countout=0
?
yes
filter N/L-1 coeffs
output sample
out_flag=1
countout=M
Return
to Main
118
Digital Filters 5
To implement the calculation-saving techniques, the programs must
update the coefficient pointer with two different modification values.
First, the algorithm must update the coefficient pointer by L each time an
input sample is read. This modification moves the coefficient pointer back
to the first set of the polyphase coefficients. The variable coef_update tracks
these updates. The second modification to the coefficient pointer is to set it
back by one location for each interpolated output, even the outputs that
not calculated because they are discarded in the decimator. The
modification constant is –M because M–1 outputs are discarded by the
rate compressor. The total value that updates the coefficient pointer is –M
+ coef_update.
Return
to Main
load input_buffer
with input sample
modify coef_update
by L
Return
to Main
119
5 Digital Filters
The rate change program in Listing 5.12 can calculate the following
number of filter taps within one output period:
where tCLK is the ADSP-2100 instruction cycle time, and the notation u
means the smallest integer greater than or equal to u. The program in
Listing 3.6 can execute
N= M – 22L – 9L M/L
Fs tCLK
taps in one output period. These equations determine the upper limit on
N, the order of the low-pass filter.
{RATIO_BUF.dsp
Real time Direct Form FIR Filter, N taps. Efficient algorithm
to interpolate by L and decimate by M for a L/M change in the input sample rate. Uses
an input buffer so that the filter computations to occur in parallel with inputting and
outputting data. This allows a larger number of filter taps for a given input sample
rate.
INPUT: adc
OUTPUT: dac
}
.MODULE/RAM/ABS=0 Ratio_eff;
.CONST N=300; {N taps, N coefficients}
.CONST L=3; {decimate by factor of L}
.CONST NoverL=100; {NoverL must be an integer}
.CONST M=2; {decimate by factor of M}
.CONST intMoverL=2; {smallest integer GE M/L}
.VAR/PM/RAM/CIRC coef[N]; {coefficient circular buffer}
.VAR/DM/RAM/CIRC data[NoverL]; {data delay line}
.VAR/DM/RAM input_buf[intMoverL]; {input buffer is not circular}
.VAR/DM/RAM countin;
.VAR/DM/RAM countout;
.VAR/DM/RAM out_flag; {true when time to calc. output}
.PORT adc;
.PORT dac;
.INIT coef:<coef.dat>;
120
Digital Filters 5
RTI; {interrupt 0}
JUMP interrupt; {interrupt 1= L * input rate}
RTI; {interrupt 2}
RTI; {interrupt 3}
fir: CNTR=NoverL-1;
MR=0, MX0=DM(I0,M0), MY0=PM(I4,M4);
DO taploop UNTIL CE; {N/L-1 taps of FIR}
taploop: MR=MR+MX0*MY0(SS), MX0=DM(I0,M0), MY0=PM(I4,M4);
MR=MR+MX0*MY0(RND); {last tap with round}
IF MV SAT MR; {saturate result if overflow}
DM(dac)=MR1; {output data sample}
JUMP wait_out;
122
Digital Filters 5
{RATIO_2INT.dsp
Real time Direct Form FIR Filter, N taps. Efficient algorithm to interpolate by L and
decimate by M for a L/M change in the input sample rate. Uses an input buffer so that
the filter computations to occur in parallel with inputting and outputting data. This
allows a larger number of filter taps for a given input sample rate. This version uses
two interrupts and external divide by L and divide by M to eliminate excessive overhead
for large values of M and L.
INPUT: adc
OUTPUT: dac
}
.MODULE/RAM/ABS=0 Ratio_2int;
.CONST N=300; {N taps, N coefficients}
.CONST L=3; {decimate by factor of L}
.CONST NoverL=100; {NoverL must be an integer}
.CONST M=2; {decimate by factor of M}
.CONST intMoverL=2; {smallest integer GE M/L}
.VAR/PM/RAM/CIRC coef[N]; {coefficient circular buffer}
.VAR/DM/RAM/CIRC data[NoverL]; {data delay line}
.VAR/DM/RAM input_buf[intMoverL]; {input buffer is not circular}
.PORT adc;
.PORT dac;
.INIT coef:<coef.dat>;
AY0=^input_buf;
AR=AX0-AY0;
IF EQ JUMP modify_coef; {skip do loop if buffer empty}
CNTR=AR; {dump in buffer into delay line}
I1=^input_buf;
DO load_data UNTIL CE;
AR=DM(I1,M0);
load_data: DM(I0,M0)=AR;
I1=^input_buf; {fix pointer to input_buf}
modify_coef: MODIFY(I5,M5); {modify coef update by -M}
M6=I5;
MODIFY(I4,M6); {modify ^coef by coef update}
I5=0; {reset coef update}
fir: CNTR=NoverL-1;
MR=0, MX0=DM(I0,M0), MY0=PM(I4,M4);
DO taploop UNTIL CE; {N/L-1 taps of FIR}
taploop: MR=MR+MX0*MY0(ss), MX0=DM(I0,M0), MY0=PM(I4,M4);
MR=MR+MX0*MY0(RND); {last tap with round}
IF MV SAT MR; {saturate result if overflow}
DM(dac)=MR1; {output data sample}
RTI;
124
Digital Filters 5
5.5.5 Multistage Implementations
The previous examples of sample rate conversion in this chapter use a
single low-pass filter to prevent the aliasing or imaging associated with
the rate compression and expansion. One method for further improving
the efficiency of rate conversion is to divide this filter into two or more
cascaded stages of decimation or interpolation. Each successive stage
reduces the sample rate until the final sample rate is equal to twice the
bandwidth of the desired data. The product of all the stages’ rate change
factors should equal the total desired rate change, M or L.
Crochiere and Rabiner (see References at the end of this chapter) show that
the total number of computations in a multi-stage design can be made
substantially less than that for a single-stage design because each stage has
a wider transition band than that of the single-stage filter. The sample rate
at which the first stage is calculated may be large, but because the
transition band is wide, only a small number of filter taps (N) is required.
In the last stage, the transition band may be small, but because the sample
rate is small also, fewer taps and thus a reduced number of computations
are needed. In addition to computational efficiency, multistage filters have
the advantage of a lower round-off noise due to the reduced number of
taps.
Figure 5.27, on the following page, shows the frequency spectra for an
example decimation filter implemented in two stages. The bandwidth and
transition band of the desired filter is shown in Figure 5.27a and the
frequency response of the analog anti-alias filter required is shown in
Figure 5.27b. The shaded lines indicate the frequencies that alias into the
interstage transition bands. These frequencies are sufficiently attenuated
so as not to disturb the final pass or transition bands. The frequency
responses of the first and final stage filters are shown in Figure 5.27c and
d. The example in Figure 5.28 is the same except that aliasing is allowed in
the final transition band. This aliasing is tolerable, for instance, when the
resulting signal is used for spectral analysis and the aliased band can be
ignored. All the transition bands are wide and therefore the filter stages
require fewer taps than a single-stage filter.
Crochiere and Rabiner (see References at the end of this chapter) contains
some design curves that help to determine optimal rate change factors for
the intermediate stages. A two- or three-stage design provides a
substantial reduction in filter computations; further reductions in
computations come from adding more stages. Because the filters are
cascaded, each stage must have a passband ripple equal to final passband
125
5 Digital Filters
(a)
required
data
bandwidth
(b)
analog
anti-alias
response
fs fs
2
(c)
stage 1
response
f's f's
2
(d)
stage 2
response
f"s f"s
2
ripple divided by the number of stages used. The stopband ripple does
not have to be less than the final stopband ripple because each successive
stage attenuates the stopband ripple of the previous stage.
126
Digital Filters 5
Two buffers are needed because only a portion (M–1 samples) of the input
buffer can be dumped into the first stage delay line at once. The single-
stage decimation algorithm used only one buffer because it could dump
all input data into the buffer at once. The two ping-ponged buffers are
implemented as one double-length circular buffer, 2M locations long,
indexed by two pointers offset from each other by M. Because the pointers
follow each other and wrap around the buffer, the two halves switch
between reading and writing (ping-pong) after every M inputs.
(a)
required
data
bandwidth
(b)
analog
anti-alias
response
fs f
s
2
(c)
stage 1
response
f's f's
2
(d)
stage 2
response
f"s f"
s
2
Figure 5.28 Two-Stage Decimation with Transition Band Aliasing
127
5 Digital Filters
{DEC2STAGEBUF.dsp
Real time Direct Form Finite Impulse Filter, N1,N2 taps, uses two cascaded stages to
decimate by M1 and M2 for a decrease of 1/(M1*M2) times the input sample rate. Uses an
input buffer to increase efficiency.
INPUT: adc
OUTPUT: dac
}
.MODULE/RAM/ABS=0 dec2stagebuf;
.CONST N1=32;
.CONST N2=148;
.CONST M_1=2; {decimate by factor of M}
.CONST M_2=2; {decimate by factor of M}
.CONST M1xM2=4; {(M_1 * M_2)}
.CONST M1xM2x2=8; {(M_1 * M_2 * 2)}
.VAR/PM/RAM/CIRC coef1[N1];
.VAR/PM/RAM/CIRC coef2[N2];
.VAR/DM/RAM/CIRC data1[N1];
.VAR/DM/RAM/CIRC data2[N2];
.VAR/DM/RAM/CIRC input_buf[m1xm2x2];
.VAR/DM/RAM input_count;
.PORT adc;
.PORT dac;
.INIT coef1:<coef1.dat>;
.INIT coef2:<coef2.dat>;
RTI; {interrupt 0}
RTI; {interrupt 1}
RTI; {interrupt 2}
JUMP sample; {interrupt 3 at input rate}
128
Digital Filters 5
I0=^data1; {setup a circular buffer in DM}
L0=%data1;
M0=1; {modifier for data is 1}
I1=^data2; {setup a circular buffer in DM}
L1=%data2;
I2=^input_buf; {setup input buffer in DM}
L2=%input_buf;
I3=^input_buf; {setup second in buffer in DM}
L3=%input_buf;
IMASK=b#1000; {enable interrupt 3}
CNTR=N2 - 1;
MR=0, MX0=DM(I1,M0), MY0=PM(I5,M4);
DO taploop2 UNTIL CE; {N-1 taps of FIR}
taploop2: MR=MR+MX0*MY0(SS), MX0=DM(I1,M0), MY0=PM(I5,M4);
MR=MR+MX0*MY0(RND); {last tap with round}
IF MV SAT MR; {saturate result if overflow}
DM(dac)=MR1; {output data sample}
JUMP wait_full;
{INT2STAGE.dsp
Two stage cascaded real time Direct Form Finite Impulse Filter, N1, N2 taps, uses an
efficient algorithm to interpolate by L1*L2 for an increase of L1*L2 times the input
sample rate. A restriction on the number of taps is that N divided by L be an integer.
INPUT: adc
OUTPUT: dac
}
.MODULE/RAM/ABS=0 interpolate_2stage;
.CONST N1=32;
.CONST N2=148;
.CONST L_1=2; {stage one factor is L1}
.CONST L_2=2; {stage two factor is L2}
.CONST N1overL1=16;
.CONST N2overL2=74;
.VAR/PM/RAM/CIRC coef1[N1];
.VAR/PM/RAM/CIRC coef2[N2];
.VAR/DM/RAM/CIRC data1[N1overL1];
.VAR/DM/RAM/CIRC data2[N2overL2];
.VAR/DM/RAM counter1;
.VAR/DM/RAM counter2;
130
Digital Filters 5
.PORT adc;
.PORT dac;
.INIT coef1:<coef1.dat>;
.INIT coef2:<coef2.dat>;
RTI; {interrupt 0}
RTI; {interrupt 1}
RTI; {interrupt 1}
JUMP sample; {interrupt 3= L1*L2 output rate}
{________________________Interpolate_________________________________}
sample: MODIFY(I5,M6); {shifts coef pointer back by -1}
AY0=DM(counter2);
AR=AY0-1; {decrement and update counter}
DM(counter2)=AR;
IF NE JUMP do_fir2; {test, do stage 1 if L_2 times}
132
Digital Filters 5
5.5.6 Narrow-Band Spectral Analysis
The computation of the spectrum of a signal is a fundamental DSP
operation that has a wide range of applications. The spectrum can be
calculated efficiently with the fast Fourier transform (FFT) algorithm. An
N-point FFT results in N/2 bins of spectral information spanning zero to
the Nyquist frequency. The frequency resolution of this spectrum is Fs/N
Hz per bin, where Fs is the sample rate of the data. The number
computations required is on the order of Nlog2N. Often, applications such
as sonar, radar, and vibration analysis need to determine only a narrow
band of the entire spectrum of a signal. The FFT would require calculating
the entire spectrum of the signal and discarding the unwanted frequency
components.
133
5 Digital Filters
(a) bandpass
x(n) ↓M FFT
filter
w(n) y(m)
X(w)
(b)
K=0 K=1 K=2 K=3 K=4 K=5 K=6 K=7 K=8 K=9
w
0 w w π
1 2
(c)
w
0 π/M 2π/M π
(d)
w'=
w/M
0=w π= w 2π
1
2
Figure 5.29 Integer Band Decimator for High Resolution Spectral Analysis
odd integer band is chosen for the band-pass filter, e.g., K=9, then the
translated signal is inverted in frequency. This situation can be corrected
by multiplying each output sample y(m) by (–1)m, i.e., inverting the sign of
all odd samples.
The entire narrow band filtering process can be accomplished using the
single- or multi-stage decimation program listed in this chapter. A listing
of an ADSP-2100 FFT implementation can be found in Chapter 6.
134
Digital Filters 5
5.6 ADAPTIVE FILTERS
The stochastic gradient (SG) adaptive filtering algorithm, developed in its
present form by Widrow and Hoff (Widrow and Hoff, 1960), provides a
powerful and computationally efficient means of realizing adaptive filters.
It is used to accomplish a variety of applications, including
The estimation error of the joint process estimator is thus formed by the
difference between the signal it is desired to estimate, d(T), and a
weighted linear combination of the current and past input values y(T).
The weights, cj(T), are the transversal filter coefficients at time T. The
adaptation of the jth coefficient, cj(T), is performed according to the
following equation:
In this equation, y(T–j+1) represents the past value of the input signal
“contained” in the jth tap of the transversal filter. For example, y(T), the
present value of the input signal, corresponds to the first tap and y(T–42)
corresponds to the forty-third filter tap. The step size β controls the “gain”
of the adaptation and allows a tradeoff between the convergence rate of
the algorithm and the amount of random fluctuation of the coefficients
after convergence.
135
5 Digital Filters
5.6.1 Single-Precision Stochastic Gradient
The transversal filter subroutine that performs the sum-of-products
operation to calculate ec(T), the estimation error, is given in FIR Filters,
earlier in this chapter. The subroutine that updates the filter coefficients is
shown in Listing 5.16. This subroutine is based on the assumption that all
n data values used to calculate the coefficient are real.
The first instruction multiplies ec(T) (the estimation error at time T, stored
in MX0) by β (the step size, stored in MY1) and loads the product into the
MF register. In parallel with this multiplication, the data memory read
which transfers y(T–n+1) (pointed to by I2) to MX0 is performed. The nth
coefficient update value, βec(T) y(T–n+1), is computed by the next
instruction in parallel with the program memory read which transfers the
nth coefficient (pointed to by I6) to the ALU input register AY0. The adapt
loop is then executed n times in 2n+2 cycles to update all n coefficients.
The first time through the loop, the nth coefficient is updated in parallel
with a dual fetch of y(T–n+2) and the (n–1)th coefficient. The updated nth
coefficient value is then written back to program memory in parallel with
the computation of the (n–1)th coefficient update value, ec(T) y(T–n+2).
This continues until all n coefficients have been updated and execution
falls out of the loop. If desired, the saturation mode of the ALU may be
enabled prior to entering the routine so as to automatically implement a
saturation capability on the coefficient updates.
The maximum allowable filter order when using the stochastic gradient
algorithm for an adaptive filtering application is determined primarily by
the processor cycle time, the sampling rate, and the number of other
computations required. The transversal filter subroutine requires a total of
n+7 cycles for a filter of length n, while the gradient update subroutine
requires 2n+9 cycles to update all n coefficients. At an 8-kHz sampling
rate and an instruction cycle time of 125 nanoseconds, the ADSP-2100 can
implement an SG transversal filter of approximately 300 taps. This
implementation would also have 84 instruction cycles for miscellaneous
operations.
136
Digital Filters 5
.MODULE rsg_sub;
{
Real SG Coefficient Adaptation Subroutine
Calling Parameters
MX0 = Error
MY1 = Beta
I2 —> Oldest input data value in delay line
L2 = Filter length
I6 —> Beginning of filter coefficient table
L6 = Filter length
M1,M5 = 1
M6 = 2
M3,M7 = –1
CNTR = Filter length
Return Values
Coefficients updated
I2 —> Oldest input data value in delay line
I6 —> Beginning of filter coefficient table
Altered Registers
AY0,AR,MX0,MF,MR
Computation Time
(2 × Filter length) + 6 + 3 cycles
.ENTRY rsg;
137
5 Digital Filters
5.6.2 Double-Precision Stochastic Gradient
In some adaptive filtering applications, such as the local echo cancellation
required in high-speed data transmission systems, the precision afforded
by 16-bit filter coefficients is not adequate. In such applications it is
desirable to perform the coefficient adaptation (and generally the filtering
operation as well) using a higher-precision representation for the
coefficient values. The subroutine in Listing 5.17 implements a stochastic
gradient adaptation algorithm that is again based on the equations in the
previous section but performs the coefficient adaptation in double
precision. Data values, of course, are still maintained in single precision.
The 16-bit coefficients are stored in program memory LSW first, so that
the LSWs of all coefficients are stored at even addresses, and the MSWs at
odd addresses. The coefficients thus require a circular buffer length
(specified by L6) that is twice the length of the filter. As in the single-
precision SG program in the previous section, the first instruction is used
to compute the product of ec(T), the estimation error, and β, the step size,
in parallel with the data memory read that transfers the oldest input data
value in the delay line, y(T–n+1), to MX0. Upon entering the adaptd loop,
y(T–n+1) is multiplied by βec(T) to yield the nth coefficient update value.
This is performed in parallel with the fetch of the LSW of the nth
coefficient (to AY0). The next instruction computes the sum of the update
value LSW (in MR0) and the LSW of the nth coefficient, while performing
a dual fetch of the MSW of the nth coefficient (again to AY0) and the next
data value in the delay line (to MX0). The LSW of the updated nth
coefficient is then written back to program memory in parallel with the
update of the MSW of the nth coefficient, and the final instruction of the
loop writes this updated MSW to program memory. The adaptd loop
continues execution in this manner until all n double-precision coefficients
have been updated. If you want saturation capability on the coefficient
update, you must enable and disable the saturation mode of the ALU
within the update loop. The updates of the LSWs of the coefficients should
be performed with the saturation mode disabled; the update of the MSWs
of the coefficients should be performed with the saturation mode enabled.
{
Double-Precision SG Coefficient Adaptation Subroutine
Calling Parameters
MX0 = Error
MY1 = Beta
I2 —> Oldest input data value in delay line
L2 = Filter length
I6 —> Beginning of filter coefficient table
L6 = 2 × Filter length
M1,M5 = 1
M3,M7 = –1
CNTR = Filter length
Return Values
Coefficients updated
I2 —> Oldest input data value in delay line
I6 —> Beginning of filter coefficient table
Altered Registers
AY0,AR,MX0,MF,MR
Computation Time
(4 × Filter length) + 5 + 4 cycles
.ENTRY drsg;
139
5 Digital Filters
5.7 REFERENCES
Bellanger, M. 1984. Digital Processing of Signals: Theory and Practice. New
York: John Wiley and Sons.
Bloom, P.J. October 1985. High Quality Digital Audio in the Entertainment
Industry: An Overview of Achievements and Changes. IEEE ASSP Magazine,
Vol. 2, Num. 4, pp. 13-14.
Liu, Bede and Abraham Peled. 1976. Theory Design and Implementation of
Digital Signal Processing, pp. 77-88. John Wiley & Sons.
Liu, Bede and Fred Mintzer. December 1978. Calculation of Narrow Band
Spectra by Direct Decimation. IEEE Trans. Acoust. Speech Signal Process.,
Vol. ASSP-26, No. 6, pp. 529-534.
Otnes, R.K and L.E Enochson. 1978. Applied Time Series Analysis, pp. 202-
212. Wiley-Interscience.
Widrow, B., and Hoff, M., Jr. 1960. Adaptive Switching Circuits. IRE
WESCON Convention Record, Pt. 4., pp. 96-104.
140