0% found this document useful (0 votes)
123 views

Nonlinear Least Squares Theory - Lecture Notes

The document provides an overview of nonlinear least squares theory. It discusses nonlinear model specifications, the nonlinear least squares estimator, asymptotic properties of the estimator, and numerical optimization algorithms used to compute the estimator such as gradient descent. Nonlinear time series models, production functions, and artificial neural networks are provided as examples of nonlinear specifications.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
123 views

Nonlinear Least Squares Theory - Lecture Notes

The document provides an overview of nonlinear least squares theory. It discusses nonlinear model specifications, the nonlinear least squares estimator, asymptotic properties of the estimator, and numerical optimization algorithms used to compute the estimator such as gradient descent. Nonlinear time series models, production functions, and artificial neural networks are provided as examples of nonlinear specifications.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Nonlinear Least Squares Theory

CHUNG-MING KUAN
Department of Finance & CRETA

March 9, 2010

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 1 / 33
Lecture Outline

1 Nonlinear Specifications

2 The NLS Method


The NLS Estimator
Nonlinear Optimization Algorithms

3 Asymptotic Properties of the NLS Estimator


Digression: Uniform Law of Large Numbers
Consistency
Asymptotic Normality
Wald Tests

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 2 / 33
Nonlinear Specifications

Given the dependent variable y , consider the nonlinear specification:

y = f (x; β) + e(β),

where x is ` × 1, β is k × 1, and f is a given function. There are many


choices of f . A flexible model is to transform one (or several) x by the
Box-Cox transform of x:
xγ − 1
,
γ

which yields x − 1 when γ = 1, 1 − 1/x when γ = −1, and a value close


to ln x when γ → 0.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 3 / 33
The CES (constant elasticity of substitution) production function:
−λ/γ
y = α δL−γ + (1 − δ)K −γ

,

where α > 0, 0 < δ < 1 and γ ≥ −1, which yields:


λ  −γ
ln δL + (1 − δ)K −γ .

ln y = ln α −
γ

The translog (transcendental logarithmic) production function:

ln y = β1 +β2 ln L+β3 ln K +β4 (ln L)(ln K )+β5 (ln L)2 +β6 (ln K )2 ,

which is linear in parameters; in this case, the OLS method suffices.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 4 / 33
Nonlinear Time Series Models

An exponential autoregressive (EXPAR) model:


p
X
2
 
yt = αj + βj exp −γyt−1 yt−j + et .
j=1

A self-exciting threshold autoregressive (SETAR) model:


(
a0 + a1 yt−1 + · · · + ap yt−p + et , if yt−d ∈ (−∞, c],
yt =
b0 + b1 yt−1 + · · · + bp yt−p + et , if yt−d ∈ (c, ∞),
where 1 ≤ d ≤ p is the delay parameter, and c is the threshold
parameter. Alternatively,
p
X  p
X 
yt = a0 + aj yt−j + δ0 + δj yt−j 1{yt−d >c} + et ,
j=1 j=1

with aj + δj = bj .
C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 5 / 33
Replacing the indicator function in SETAR model with a “smooth”
function h we obtain the smooth threshold autoregressive (STAR)
model:
p
X  p
X 
yt = a0 + aj yt−j + δ0 + δj yt−j h(yt−d ; c, δ) + et ,
j=1 j=1

where h is a distribution function, e.g.,


1
h(yt−d ; c, δ) = ,
1 + exp[−(yt−d − c)/s]

with c the threshold value and s a scale parameter. The STAR model
admits smooth transition between different regimes, and it behaves
like a SETAR model when (yt−d − c)/s is large.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 6 / 33
Artificial Neural Networks

A 3-layer neural network can be expressed as


 
q
X  p
X 
f (x1 . . . . , xp ; β) = g α0 + αi h γi0 + γij xj  ,
i=1 j=1

which contains p input units, q hidden units, and one output unit. The
functions h and g are known as activation functions, and the parameters
in these functions are connection weights.

h is typically an S-shaped function; two leading choices are the logistic


function h(x) = 1/(1 + e −x ) and the hyperbolic tangent function

e x − e −x
h(x) = .
e x + e −x
The function g may be the identity function or the same as h.
C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 7 / 33
Artificial neural networks are designed to mimic the behavior of biological
neural systems and have the following properties.

Universal approximation: Neural network is capable of approximating


any Borel-measurable function to any degree of accuracy, provided
that q is sufficiently large. In this sense, neural networks can be
understood as a series expansion, with hidden units functions as the
basis functions.
Parsimonious model: To achieve a given degree of approximation
accuracy, neural networks are simpler than the polynomial and
trigonometric expansions, in the sense that the number of hidden
units q can grow at a much slower rate.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 8 / 33
The NLS Estimator

The NLS criterion function:


1
QT (β) = [y − f(x1 , . . . , xT ; β)]0 [y − f(x1 , . . . , xT ; β)]
T
T
1 X
= [yt − f (xt ; β)]2 .
T
t=1

The first order condition contains k nonlinear equations with k


unknowns:
2 set
∇β QT (β) = − ∇β f(x1 , . . . , xT ; β) [y − f(x1 , . . . , xT ; β)] = 0,
T
where ∇β f(x1 , . . . , xT ; β) is a k × T matrix. A solution to the first
order condition is the NLS estimator β̂ T .

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 9 / 33
[ID-2] f (x; ·) is twice continuously differentiable in the second argument
on Θ1 , such that for given data (yt , xt ), t = 1, . . . , T , ∇2β QT (β̂ T ) is
positive definite.

While [ID-2] ensures that β̂ T is a minimum of QT (β), it does not


guarantee the uniqueness of this solution. For a given data set, there
may exist multiple, local minima of QT (β).
For linear regressions, f(β) = Xβ so that ∇β f(β) = X0 and
∇2β f(β) = 0. It follows that ∇2β QT (β) = 2(X0 X)/T , which is
positive definite if, and only if, X has full column rank. Note that in
linear regression, the identification condition does not depend on β.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 10 / 33
Nonlinear Optimization Algorithms

An NLS estimate is usually computed using a numerical method. In


particular, an iterative algorithm starts from some initial value of the
parameter and then repeatedly calculates next available value according to
a particular rule until an optimum is reached approximately.

A generic, iterative algorithm is

β (i+1) = β (i) + s (i) d(i) .

That is, the (i + 1) th iterated value β (i+1) is obtained from β (i) with an
adjustment term s (i) d(i) , where d(i) characterizes the direction of change
in the parameter space and s (i) controls the amount of change. Note that
an iterative algorithm can only locate a local optimum.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 11 / 33
Gradient Method

The first-order Taylor expansion of Q(β) about β † is

QT (β) ≈ QT (β † ) + [∇β QT (β † )]0 (β − β † ).

Replacing β with β (i+1) and β † with β (i) ,


0
QT β (i+1) ≈ QT β (i) + ∇β QT β (i) s (i) d(i) .
  

Setting d(i) = −g(i) , where g(i) is ∇β QT (β) evaluated at β (i) , we have

QT β (i+1) ≈ QT β (i) − s (i) g(i)0 g(i) ,


   

where g(i)0) g(i) ≥ 0. This leads to:

β (i+1) = β (i) − s (i) g(i) .

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 12 / 33
Steepest Descent Algorithm

To maximize the step length, note that


∂QT β (i+1)
 (i+1)
(i+1) ∂β
= −g(i+1)0 g(i) = 0.

= ∇ Q
β T β
∂s (i) ∂s (i)
Let H(i) = ∇2β QT (β)|β=β(i) . By Taylor’s expansion of g , we have

g(i+1) ≈ g(i) + H(i) β (i+1) − β (i) = g(i) − H(i) s (i) g(i) .




Thus, 0 = g(i+1)0 g(i) ≈ g(i)0 g(i) − s (i) g(i)0 H(i) g(i) , or equivalently,
g(i)0 g(i)
s (i) = ≥ 0,
g(i)0 H(i) g(i)
when H(i) is p.d. We obtain the steepest descent algorithm:
" #
g (i)0 g(i)
β (i+1) = β (i) − g(i) .
g(i)0 H(i) g(i)
C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 13 / 33
Newton Method

The Newton method takes into account the second order derivatives.
Consider the second-order Taylor expansion of Q(β) around some β † :

1
QT (β) ≈ QT (β † ) + g†0 (β − β † ) + (β − β † )0 H† (β − β † ).
2

The first order condition of QT (β) is g† + H† (β − β † ) ≈ 0, so that

β ≈ β † − (H† )−1 g† .

This suggests the following Newton-Raphson algorithm:


−1
β (i+1) = β (i) − H(i) g(i) ,
−1
with the step length 1 and the direction vector − H(i) g(i) .

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 14 / 33
From Taylor’s expansion it is easy to see that
1 −1 (i)
QT β (i+1) − QT β (i) ≈ − g(i)0 H(i)
 
g ≤ 0,
2

provided that H(i) is p.s.d. Thus, the Newton-Raphson algorithm usually


results in a decrease of QT .

When QT is (locally) quadratic, the second-order expansion is exact, so


that β = β † − (H† )−1 g† must be a minimum of QT (β). This immediately
suggests that the Newton-Raphson algorithm can reach the minimum in a
single step. Yet, there are two drawbacks.
The Hessian matrix need not be positive definite.
The Hessian matrix must be inverted at each iteration step.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 15 / 33
Gauss-Newton Algorithm

Letting Ξ(β) = ∇β f(β) we have

2 2 2
H(β) = − ∇ f(β)[y − f(β)] + Ξ(β)0 Ξ(β),
T β T
Ignoring the first term, an approximation to H(β) is 2Ξ(β)0 Ξ(β)/T ,
which requires only the first order derivatives and is guaranteed to be
p.s.d. The Gauss-Newton algorithm utilizes this approximation as
0 −1
β (i+1) = β (i) + Ξ β (i) Ξ β (i) Ξ β (i) y − f β (i) .
  

Note that the adjustment term can be obtained as the OLS estimate of
regressing y − f β (i) on Ξ β (i) ; this is known as the Gauss-Newton
 

regression.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 16 / 33
Other Modifications

To maintain a correct search direction, H(i) needs to be p.d.


(i)
Correcting H(i) by: Hc = H(i) + c (i) I, where c (i) > 0 is chosen to
(i)
“force” Hc to be p.d.
(i) (i)
For H̃ = H−1 , one may compute H̃c = H̃ + cI. Such a correction
is used in the Marquardt-Levenberg algorithm.
(i)
The quasi-Newton method corrects H̃ by a symmetric matrix:
(i+1) (i)
H̃ = H̃ + C(i) .

This is used by the Davidon-Fletcher-Powell (DFP) algorithm and the


Broydon-Fletcher-Goldfarb-Shanno (BFGS) algorithm.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 17 / 33
Initial Values and Convergence Criteria

Initial values: Specified by the researcher or obtained using a random


number generator. Prior information, if available, should also be
taken into account.
Convergence criteria:
(i+1)
− β (i) < c, where k · k denotes the Euclidean norm,

β
g β (i) < c, or

QT β (i+1) − QT β (i) < c.
 

For the Gauss-Newton algorithm, one may stop the algorithm when
TR 2 is “close” to zero, where R 2 is the coefficient of determination of
the Gauss-Newton regression.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 18 / 33
Digression: Uniform Law of Large Numbers

Consider the function q(zt (ω); θ). It is a r.v. for a given θ and a function
of θ for a given ω. Suppose {q(zt ; θ)} obeys a SLLN for each θ ∈ Θ:
T
1 X a.s.
QT (ω; θ) = q(zt (ω); θ) −→ Q(θ),
T
t=1

where Q(θ) is non-stochastic. Note that Ωc0 (θ) = {ω : QT (ω; θ) 6→ Q(θ)}


varies with θ.

Although IP(Ωc0 (θ)) = 0, ∪θ∈Θ Ωc0 (θ) is an uncountable union of


non-convergence sets and may not have probability zero.
∩θ∈Θ Ω0 (θ) may occur with probability less than one.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 19 / 33
When θ also depends on T (e.g., when θ is replaced by an estimator θ̃T ),
there may not exist a finite T ∗ such that QT (ω; θ̃T ) are arbitrarily close to
Q(ω; θ̃T ) for all T > T ∗ . Thus, we need a notion of convergence that is
uniform on the parameter space Θ.

We say that QT (ω; θ) converges to Q(θ) uniformly in θ almost surely (in


probability) if

sup |QT (θ) − Q(θ)| → 0, a.s. (in probability).


θ∈Θ

We also say that q(zt (ω); θ) obey a strong (or weak) uniform law of large
numbers (SULLN or WULLN).

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 20 / 33
Example: Let zt be i.i.d. with zero mean and

1
 T θ,
 0 ≤ θ ≤ 2T ,
1 1
qT (zt (ω); θ) = zt (ω) + 1 − T θ, 2T < θ ≤ T ,
 1
0, T < θ < ∞.

Observe that for θ ≥ 1/T and θ = 0,


T T
1 X 1 X a.s.
QT (ω; θ) = qT (zt ; θ) = zt −→ 0,
T T
t=1 t=1

by Kolmogorov’s SLLN. For a given θ, we can choose T large enough such


a.s.
that QT (ω; θ) −→ 0, where 0 is the pointwise limit. Yet for Θ = [0, ∞),
a.s.
sup |QT (ω; θ)| = |z̄T + 1/2| −→ 1/2,
θ∈Θ

so that the uniform limit is different from the pointwise limit.


C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 21 / 33
What is the extra condition needed to ensure SULLN if we already have,
for each θ ∈ Θ,
T
1 X a.s.
QT (θ) = [qTt (zt ; θ) − IE(qTt (zt ; θ))] −→ 0.
T
t=1

Suppose QT (θ) satisfies a Lipschitz-type condition: for θ and θ † in Θ,

|QT (θ) − QT (θ † )| ≤ CT kθ − θ † k a.s.,

where |CT | ≤ ∆ a.s. and ∆ does not depend on θ. Then,

sup |QT (θ)| ≤ sup |QT (θ) − QT (θ † )| + |QT (θ † )|.


θ∈Θ θ∈Θ

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 22 / 33
Given  > 0, we can choose θ † such that kθ − θ † k < /(2∆). Then,
 
sup |QT (θ) − QT (θ † )| ≤ CT ≤ ,
θ∈Θ 2∆ 2

uniformly in T . Also, by pointwise convergence of QT , |QT (θ † )| < /2 for


large T . Consequently, for all T sufficiently large,

sup |QT (θ)| ≤ .


θ∈Θ

This shows that pointwise convergence and a Lipschitz condition on QT


together suffice for a SULLN or WULLN.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 23 / 33
Consistency

The NLS criterion function is QT (β) = T −1 T 2


P
t=1 [yt − f (xt ; β)] , and its
minimizer is the NLS estimator β̂ T . Suppose IE[QT (β)] is continuous on
Θ1 such that β o is its unique, global minimum. If QT (β) is close to
IE[QT (β)], we would expect β̂ T close to β o .

To see this, assuming that QT obeys a SULLN:



sup QT (β) − IE[QT (β)] → 0,
β∈Θ1

for all ω ∈ Ω0 and IP(Ω0 ) = 1. Set



= inf
c
IE[QT (β)] − IE[QT (β o )] ,
β∈B ∩Θ1

for an open neighborhood B of β o .

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 24 / 33
For ω ∈ Ω0 , we have for large T , IE[QT (β̂ T )] − QT (β̂ T ) < 2 , and

QT (β̂ T ) − IE[QT (β o )] ≤ QT (β o ) − IE[QT (β o )] < ,
2

because the NLS estimator β̂ T minimizes QT (β). It follows that

IE[QT (β̂ T )] − IE[QT (β o )]

≤ IE[QT (β̂ T )] − QT (β̂ T ) + QT (β̂ T ) − IE[QT (β o )] < ,

for all T sufficiently large. As β̂ T is such that IE[QT (β̂ T )] is closer to


IE[QT (β o )] with probability one, it can not be outside the neighborhood B
of β o . As B is arbitrary, β̂ T must be converging to β o almost surely.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 25 / 33
Q: How do we ensure a SULLN or WULLN?
If Θ1 is compact and convex, we have from the mean-value theorem and
the Cauchy-Schwartz inequality that

|QT (β) − QT (β † )| ≤ k∇β QT (β ‡ )k kβ − β † k a.s.,

where β ‡ is the mean value of β and β † , in the sense that


|β − β † | < |β ‡ − β † |. Hence, the Lipschitz-type condition would hold for

CT = sup ∇β QT (β),
β∈Θ1

with ∇β QT (β) = −2 T
P
t=1 ∇β f (xt ; β)[yt − f (xt ; β)]/T . Note that
∇β QT (β) may be bounded in probability, but it may not be bounded in
an almost sure sense. (Why?)

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 26 / 33
We impose the following conditions.
[C1] {(yt w0t )0 } is a sequence of random vectors, and xt is vector
containing some elements of Y t−1 and W t .
(i) The sequences {yt2 }, {yt f (xt ; β)} and {f (xt ; β)2 } all obey a WLLN for
each β in Θ1 , where Θ1 is compact and convex.
(ii) yt , f (xt ; β) and ∇β f (xt ; β) all have bounded second moment
uniformly in β.
[C2] There exists a unique parameter vector β o such that
IE(yt | Y t−1 , W t ) = f (xt ; β o ).

Theorem 8.1
Given the nonlinear specification: y = f (x; β) + e(β), suppose that [C1]
IP
and [C2] hold. Then, β̂ T −→ β o .

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 27 / 33
Remark: Theorem 8.1 is not satisfactory because it only deals with the
convergence to the global minimum. Yet, an iterative algorithm is not
guaranteed to find a global minimum of the NLS objective function.
Hence, it is more reasonable to expect the NLS estimator converging to
some local minimum of IE[QT (β)]. Therefore, we shall, in what follows,
assert only that the NLS estimator converges in probability to a local
minimum β ∗ of IE[QT (β)]. In this case, f (x; β ∗ ) is, at most, an
approximation to the conditional mean function.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 28 / 33
Asymptotic Normality

By the mean-value expansion of ∇β QT (β̂ T ) about β ∗ ,

0 = ∇β QT (β̂ T ) = ∇β QT (β ∗ ) + ∇2β QT (β †T )(β̂ T − β ∗ ),

where β †T is a mean value of β̂ T and β ∗ . Thus, when ∇2β QT (β †T ) is


invertible, we have
√ √
T (β̂ T − β ∗ ) = −[∇2β QT (β †T )]−1 T ∇β QT (β ∗ )

= −HT (β ∗ )−1 T ∇β QT (β ∗ ) + oIP (1),

where HT (β) = IE[∇2β QT (β)]. That is, T (β̂ T − β ∗ ) and

−HT (β ∗ )−1 T ∇β QT (β ∗ ) are asymptotically equivalent.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 29 / 33
Under suitable conditions,

√ T
2 X
T ∇β QT (β ∗ ) = − √ ∇β f (xt ; β ∗ )[yt − f (xt ; β ∗ )]
T t=1
√ D
obeys a CLT, i.e., (V∗T )−1/2 T ∇β QT (β ∗ ) −→ N (0, Ik ), where

T
!
2 X
V∗T = var √ ∗ ∗
∇β f (xt ; β )[yt − f (xt ; β )] .
T t=1

Then for D∗T = HT (β ∗ )−1 V∗T HT (β ∗ )−1 ,


√ D
(D∗T )−1/2 HT (β ∗ )−1 T ∇β QT (β ∗ ) −→ N (0, Ik ).

By asymptotic equivalence,
√ D
(D∗T )−1/2 T (β̂ T − β ∗ ) −→ N (0, Ik ).

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 30 / 33
When D∗T is replaced by a consistent estimator D
b ,
T

b −1/2 T (β̂ − β ∗ ) −→
D
D
N (0, Ik ).
T T

Note that
T
∗2 X  0 
IE ∇β f (xt ; β ∗ ) ∇β f (xt ; β ∗ )

HT (β ) =
T
t=1
T
2 X
IE ∇2β f (xt ; β ∗ ) yt − f (xt ; β ∗ ) ,
 

T
t=1

which can be consistently estimated by its sample counterpart:


T T
b = 2
X 0 2 X
∇2β f (xt ; β̂ T )êt .
   
H T ∇ β f (xt ; β̂ T ) ∇ β f (x t ; β̂ T ) −
T T
t=1 t=1

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 31 / 33
When t = yt − f (xt ; β ∗ ) are uncorrelated with ∇2β f (xt ; β ∗ ), HT (β ∗ )
depends only on the expectation of the outer product of ∇β f (xt ; β ∗ ) so
that H
b may be simplified as
T

T
b = 2
X   0
H T ∇β f (xt ; β̂ T ) ∇β f (xt ; β̂ T ) .
T
t=1
PT 0
This is analogous to estimating Mxx by t=1 xt xt /T in linear regressions.

If {t } is not a martingale difference sequence with respect to Y t−1 and


W t , V∗T can be consistently estimated using a Newey-West type
estimator. This is more likely in practice as the NLS estimator typically
converges to a local optimum β ∗ .

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 32 / 33
Wald Tests

Hypothesis: Rβ ∗ = r, where R is a q × k selection matrix and r is a


q × 1 vector of pre-specified constants.
By the asymptotic normality result, we have under the null that
√ √
b−1/2 T R(β̂ − β ∗ ) = Γ
Γ b−1/2 T (Rβ̂ − r) −→
D
N (0, Iq ),
T T T T

where Γ b R0 , and D
b = RD b is a consistent estimator for D∗ .
T T T T

The Wald statistic is


−1 D
b (Rβ̂ − r)0 −→ χ2 (q).
WT = T (Rβ̂ T − r)ΓT T

For nonlinear restrictions r(β ∗ ) = 0, the Wald test is not invariant


with respect to the form of r(β) = 0.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 33 / 33

You might also like