0% found this document useful (0 votes)
61 views

Non-Linear Least Squares: in General We Are Not Lucky Enough To Have A Linear Problem. in This Case

The document discusses nonlinear least squares problems and methods for solving them. Specifically: 1) Nonlinear least squares problems can be solved by linearizing the problem using a Taylor expansion and iteratively solving the linearized problem until convergence. 2) An example problem of designing an IIR digital filter using the Prony least squares method is presented. This transforms the nonlinear problem into an equivalent linear least squares problem that can be solved directly. 3) The method of moments is introduced as another estimation technique that derives relations between parameters to be estimated and theoretical moments of distributions, then substitutes sample moments to obtain estimators.

Uploaded by

Syarif Hidayat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

Non-Linear Least Squares: in General We Are Not Lucky Enough To Have A Linear Problem. in This Case

The document discusses nonlinear least squares problems and methods for solving them. Specifically: 1) Nonlinear least squares problems can be solved by linearizing the problem using a Taylor expansion and iteratively solving the linearized problem until convergence. 2) An example problem of designing an IIR digital filter using the Prony least squares method is presented. This transforms the nonlinear problem into an equivalent linear least squares problem that can be solved directly. 3) The method of moments is introduced as another estimation technique that derives relations between parameters to be estimated and theoretical moments of distributions, then substitutes sample moments to obtain estimators.

Uploaded by

Syarif Hidayat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Non-linear Least Squares

In general we are not lucky enough to have a linear

problem. In this case:


First, we check whether it is equivalent to a linear problem.
Second, if we dont need to do it often, plug it into a general

purpose minimizer.
Third, especially if we need to do it many times (e.g. signal

tracking) it may be a good approximation to linearize the


problem via a Taylor expansion about some starting value
for the parameters. The process is iterated until the
convergence is (hopefully) attained.
The problem may also be separable: linear in some

parameters and nonlinear in others.

Non-linear Least Squares

We consider only a case where we can transform the

problem into an equivalent linear problem.

Example: LS Filter Design


Consider the design of an IIR digital filter whose frequency

response closely matches a given frequency response


specification Hd (ej ).
The design method is called Prony Least Squares method,

and is implemented in Matlab (see prony)1 .


The transfer function of any IIR filter is given by

H(z) =

B(z)
b[0] + b[1]z1 + + b[q]zq
=
A(z)
1 + a[1]z1 + + a[p]zp

1
Historically, G.R. de Prony invented the method as early as 1795. The goal was then to decompose a signal into
a finite set of damped sinusoids. The idea is very close to that of the Fourier transform, which was published in 1822.

Example: LS Filter Design

If the desired frequency response is Hd (ej2f ), then its

inverse Discrete Time Fourier transform is the desired


impulse response:
hd [n] = F1 {Hd (ej2f )}
In comparison, the impulse response of a candidate IIR

filter is given by

h[n] =

0,

Pp

k=1 a[k]h[n

k] +

Pq

k=0 b[k][n

k], n > 0
n<0

Example: LS Filter Design

This leads us to consider the filter design problem as a

Least Squares problem:


minimize
a[k],b[k]

N1
X

(hd [n] h[n])2

n=0

However, this becomes a highly nonlinear problem even

for simple H(z).

Example: LS Filter Design

We can get rid of the nonlinearity by minimizing the

filtered error, i.e., by filtering both sequences with A(z):


Jf =

N1
X

(hdf [n] b[n])2 ,

n=0

where
hdf [n] =

p
X
k=0

a[k]hd [n k]

Example: LS Filter Design

In all, the function to be minimized becomes:

Jf =

p
N1
X X
n=0

a[k]hd [n k] b[n]

2

k=0

Some algebra reveals, that for the feedforward coefficients

b = (b[0], b[1], . . . , b[q])T the minimum is obtained by


b = h + H0 a
with

Example: LS Filter Design

hd [0]
hd [1]

h= .
..
hd [q]

0
hd [0]

H0 = hd [1]

..

.
hd [q 1]

and

0
0
hd [0]
..
.
hd [q 2]

..
.

..

.
hd [q p]

Example: LS Filter Design

= h + H0 a, the cost function simplifies


Moreover, when b
to
=
Jf (a, b)

N1
X

hd [n]

n=q+1

k=1
T

= (x Ha) (x Ha),
where

p
X

!2
a[k]hd [n k]

Example: LS Filter Design

hd [q + 1]
hd [q + 2]

x=

..

.
hd [N 1]

hd [q]
hd [q + 1]

h [q + 2]
H = d

..

.
hd [N 2]

and

hd [q 1]
hd [q]
hd [q + 1]
..
.
hd [N 3]

The LS solution is of course the normal one:

a = (HT H)1 HT x
In summary:

a = (HT H)1 HT x
b = h + H0 a

..
.

hd [q p + 1]
hd [q p + 2]

hd [q p + 3]

..

.
hd [N 1 p]

Example: LS Filter Design


As an example, consider the lowpass filter


Hd (e2jf ) =

1,
0,

if |f| < fc
if |f| > fc

The corresponding ideal impulse response is

hd [n] = 2fc sinc(2fc n),

n = 0, 1, . . . , N 1

Since an IIR filter has to be causal, lets delay the impulse

response n0 samples:
hd [n] = 2fc sinc(2fc (n n0 )),

n = 0, 1, . . . , N 1

Example: LS Filter Design

Example Matlab code of the implementation is shown on

the next slide. Download at:


http://www.cs.tut.fi/courses/SGN-2607/LSProny.m
Pay attention to how the LS solution is calculated: instead

of (HT H)1 HT x we use Matlabs H\x, which uses QR


decomposition to produce a more stable solution. See
help mldivide.

Example: LS Filter Design


% Define desired FIR impulse response:
hd = 2*fc*sinc(2*fc*(n-n0));
% Calculate LS feedback coefficients (Kay, p. 264):
x = hd(q+2:end);
H = - toeplitz(hd(q+1:end-1), hd(q+1:-1:q-p+2));
a = H\x; % Note: inv(H*H)*H*x may be numerically unstable.
% Calculate feedforward coefficients as well (Kay, p. 263):
h = hd(1:q+1);
H0 = toeplitz([0;hd(1:q)], zeros(1,p));
b = h + H0*a;
a = [1; a];

Example: LS Filter Design

The results of designing a filter with specifications f0 = 0.1,

N = 51, n0 = b N
2 c and p = q = 10 are shown in the figures
below.

The FIR template impulse response is shown below.


Ideal impulse response hd[n]
0.2

Amplitude

0.15
0.1
0.05
0
0.05

10

15

20

25
n (samples)

30

35

40

45

50

Example: LS Filter Design

The designed LS IIR filter minimizes the squared difference

between the impulse responses. The resulting impulse


response is shown below. Note that the first N = 51
samples are almost equal to the FIR counterpart: the largest
deviation is at the center and is approximately 0.005.
100 first samples of the Prony LS filter impulse response
0.2
0.15
Amplitude

0.1
0.05
0
0.05
0.1
0.15

10

20

30

40

50
n (samples)

60

70

80

90

100

Example: LS Filter Design


The frequency response of the FIR filter is below.
20

Magnitude (dB)

0
20
40
60
80
100
120

0.1

0.2

0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)

0.8

0.9

0.1

0.2

0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)

0.8

0.9

Phase (degrees)

200
400
600
800
1000
1200

Example: LS Filter Design

And the IIR filter frequency response is below.


Magnitude (dB)

20

20

40

60

0.1

0.2

0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)

0.8

0.9

0.1

0.2

0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)

0.8

0.9

Phase (degrees)

200
400
600
800
1000

Method of Moments

Method of Moments
The Method of Moments is another estimation technique that

produces practical estimators, whose optimality can not be


shown.
As the results often are close to the optimum, they are

often used as the initial guess in numerical ML estimation.


The basic idea is to derive a relation between the

theoretical moments of a random variable and the variable


to be estimated. After this the theoretical moments are
substituted by the corresponding sample moments.
The method is a special case of a more general principle,

where any theoretical quantity of the distribution can be


used in place of the moments.

Method of Moments
Assume that the k-th moment k = E(xk [n]) depends

upon the unknown parameter according to


k = h().
We solve for as

= h1 (k ).
We replace the theoretical moment by its sample estimator

1 X k
k =
x [n]
N
N1
n=0

Method of Moments

We obtain

 1 N1

X
= h1
xk [n] .
N
n=0

Example: DC level in WGN

The simplest possible case is again the DC level in WGN

problem.
Suppose x[n] = A + w[n], for n = 0, 1, . . . , N 1, where

w[n] N(0, 2 ) and A is to be estimated


The method of moments uses the observation that the

unknown parameter is directly the mean of the


distribution of x[n]: A = E(x[n]) = 1 .

Example: DC level in WGN

According to the rule of the previous slide, we can replace

the theoretical moment by its natural estimator, the sample


mean. Thus, we arrive at the well known result
X
= 1
A
x[n]
N
N1
n=0

Example: Mixture Gaussian Model


A second example is estimating the mixture parameter of a

Gaussian mixture model:



w1 [n], with probability 1 
x[n] =
w2 [n], with probability 
where w1 [n] N(0, 1 ) and w2 [n] N(0, 2 ) for
n = 0, 1, . . . , N 1.
The parameter  is to be estimated.
What we need now is the relation between  and the

theoretical moments of the distribution.


The second moment is the natural thing to look at because

that is the thing differentiating the two distributions.

Example: Mixture Gaussian Model

If we denote the two Gaussians by 1 (x) and 2 (x), the

second moment can be written as:


E(x2 [n])



x2 [n] (1 )1 (x[n]) + 2 (x[n]) dx[n]

Z
Z
2
x2 [n]2 (x[n]) dx[n]
(1 )
x [n]1 (x[n]) dx[n] + 

(1 )21 + 22

Example: Mixture Gaussian Model


We can now solve the unknown parameter from the above

equation:
E(x2 [n]) = (1 )21 + 22
E(x2 [n]) = 21 21 + 22
E(x2 [n]) = 21 + (22 21 )
E(x2 [n]) 21 = (22 21 )
E(x2 [n]) 21
=
22 21

Example: Mixture Gaussian Model

The last step is to substitute the theoretical moment

E(x2 [n]) by the sample moment


1
N

 =

N1
P

1 PN1 2
n=0 x [n]:
N

x2 [n] 21

n=0
22

21

Method of moments - vector parameter

- the p 1 dimensional vector to be estimated.


p theoretical moment equations are needed

1
2
..
.

= h1 (1 , 2 , . . . , p )
= h2 (1 , 2 , . . . , p )
..
..
.
.

p = hp (1 , 2 , . . . , p )

Method of moments - vector parameter

= h1 (),

where

1 PN1
NP n=0 x[n]
1 N1 x2 [n]

N n=0

..
.

1 PN1 p
n=0 x [n]
N

Also any other set of p moments can be used. The general

rule is to use as small orders as possible.

Example

Consider the Gaussian mixture problem, but with the

variances unknown as well.


Were supposed to estimate , 21 and 22 .
Because the PDF is an even function, all odd moments are

zero. Thus, lets use the three first even moments that can
be shown to be
2 = E(x2 [n]) = (1 )21 + 22
4 = E(x4 [n]) = 3(1 )41 + 342
6 = E(x6 [n]) = 15(1 )61 + 1562

Example

The three parameters can be solved from the three

equations:
21

22 =
 =

u+
v
21

u2 4v
2

2 21
22 21

Example

where
6 54 2
54 1522
4
v = 2 u
3

u =

The actual estimators are obtained by replacing the three

theoretical moments with the sample moments.

Example

The derived estimators are tested with simulated data

generated according to the model.


After this, 10000 realizations of length N = 1000 were

generated and the parameters estimated from the data.

Example
The results are shown below.
21: Mean = 28.8234, true value = 29.9011, variance = 50.4606. Failed: 0.59%
4000

2000

20

40

60

80

100

120

2: Mean = 2.5121, true value = 3.0489, variance = 1.8517


2

4000

2000

10

15

20

25

: Mean = 0.71497, true value = 0.75648, variance = 0.0070266


3000
2000
1000
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Example

Note, that 0.59% of the cases failed to produce a realistic

result. A failure means that the results were either out of


bounds (such as  > 1 or  < 0) or imaginary (due to the
square root).
This is the nature of MoM estimators. They do not really

apply the information of the distribution, and can therefore


produce out-of-bounds results.

Example

In more difficult cases (such as a small or large ), the

deviations from the true value can become exceptionally


large (and the amount of failures may grow significantly,
see example below.

Example

21: Mean = 11.6936, true value = 11.8698, variance = 2.7965. Failed: 65.21%
1000

500

10

15

20

25

30

2: Mean = 25.0345, true value = 4.1351, variance = 8210.0663


2

300
200
100
0

50

100

150

200

250

300

350

400

450

500

0.9

: Mean = 0.15934, true value = 0.042838, variance = 0.020393


1000

500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Example

The Matlab code can be downloaded at

http://www.cs.tut.fi/courses/SGN-2607/Mixture.m.

Example: MoM Frequency Estimator

As a second example, we derive a frequency estimator of a

noisy sinusoid. The result will be easier to calculate than


the periodogram based ML estimator derived earlier.
The model is

x[n] = A cos(2f0 n + ) + w[n],

n = 0, 1, . . . , N 1

where w[n] is zero mean white noise with variance 2 . The


frequency f0 is to be estimated.
A and are assumed known, but it turns out that only A is

required for estimation of f0 .

Example: MoM Frequency Estimator

The problem here is that the parameter is in the

deterministic part of the model. Thus, moments or other


stochastic quantities dont carry the information required
for estimation of f0 .
Thus, let us view the phase as a random variable with

U(0, 2).
This way s[n] = A cos(2f0 n + ) can be viewed as a

realization of a random process.


Now the quantity that will help us is not a moment, but the

autocorrelation function (ACF).

Example: MoM Frequency Estimator


We have to find a connection between f0 and the ACF. By

definition,
rss [k] = E [s[n]s[n + k]]
= E [A cos(2f0 n + ) A cos(2f0 (n + k) + )]]
= A2 E [cos(2f0 n + ) cos(2f0 (n + k) + )]
This can be simplified using the cosine product formula2 :


1
1
rss [k] = A2 E
cos(2f0 k) + cos(4f0 n + 2f0 k + 2)
2
2
=
2

A2
A2
E [cos(2f0 k)] +
E [cos(4f0 n + 2f0 k + 2)]
2
2

cos cos = 12 cos( ) + 12 cos( + )

Example: MoM Frequency Estimator


The latter expectation vanishes:

E [cos(4f0 n + 2f0 k + 2)]


Z 2
1
=
cos(4f0 n + 2f0 k + 2) d
2
0
2
.
1
=
sin(4f0 n + 2f0 k + 2)
4
0

1
[sin(4f0 n + 2f0 k + 4) sin(4f0 n + 2f0 k + 0)]
=
4
= 0,
because sine is 2-periodic.

Example: MoM Frequency Estimator

Thus, the ACF becomes

rss [k] =

A2
A2
E[cos(2f0 k)] =
cos(2f0 k)
2
2

Because w[n] was assumed white and independent of ,

the ACF of the observed signal x[n] = s[n] + w[n] is


obtained by:
rxx [k] = rss [k] + rww [k] =

A2
cos(2f0 k) + 2 [k]
2

Example: MoM Frequency Estimator


It is safest to use moments of a low order, so let us select

k = 1, from which we obtain the equation


rxx [1] =
or

A2
cos(2f0 ),
2

1
f0 =
arccos
2

2rxx [1]
A2

The final step is to substitute rxx [1] by its sample estimator,


N2
1 X
r xx [1] =
x[n]x[n + 1]
N1
n=0

Example: MoM Frequency Estimator

This way we get

1
f0 =
arccos
2

PN2

x[n]x[n + 1]
(N 1)A2

n=0

Example: MoM Frequency Estimator

An example of the estimator performance is illustrated

below.
The Box plot3 shows that the estimates converge towards

the true value.

3
Box plot shows the smallest observation, lower quartile (Q1), median, upper quartile (Q3), and largest
observation; Matlab: boxplot.

Example: MoM Frequency Estimator

0.2

0.1619

Estimate of f0

0.15

0.1

0.05

0
100

200

300

400

500

600

700 800 900 1000 1100 1200 1300 1400 1500


Data record length N

Example: MoM Frequency Estimator

In the usual case, A is not known, and we have to estimate

it.
In a similar manner we can derive a MoM estimator for A2 ,

as well. It turns out to be


X
2 = 2
A
x2 [n] 2
N
N1
n=0

Example: MoM Frequency Estimator

If theres no information of 2 except that the SNR is high,

we can simply discard 2 to end up with the result of


Problem 9.12 of Kays book:
" P
#
N2
N
x[n]x[n
+
1]
1
n=0
f0 =
arccos
P
2
2
(N 1) N1
n=0 x [n]

You might also like