Modeling Directional (Circular) Time Series
Modeling Directional (Circular) Time Series
25 July 2019
Circular observations pose special problems for time series modeling. This article shows how the
score-driven approach, developed primarily in econometrics, provides a natural solution to the
difficulties and leads to a coherent and unified methodology for estimation, model selection and
testing. The new methods are illustrated with hourly data on wind direction.
Modeling directional (circular) time
series
Andrew Harvey, Stan Hurn and Stephen Thiele
Faculty of Economics, Cambridge University and QUT, Brisbane
Abstract
Corresponding author.
Andrew Harvey [email protected].
We would like to thank participants at the INET workshop on Score-driven and Non-
linear Time Series Models held at Trinity College, Cambridge, in March 2019 for helpful
comments. We are particularly grateful to Richard Davis and Neil Shephard for their sug-
gestions. We are also grateful to Howell Tong and others for comments made a seminar
at the University of Bologna.
1
1 Introduction
Directional variables are circular. If the starting point is due south then
moving through 180 degrees ends up at due north. The same point is reached
by moving 180 degrees in the opposite direction. In terms of radians the
points and meet up and this poses a challenge for directional time
series modelling.
A number of ways for modelling circular time series have been proposed
in the literature. The most widely used is based on transformations, the aim
of which is to try to put the data in a form that lends itself to conventional
autoregressive (AR) or autoregressive moving average (ARMA) modeling.
A second method uses transformations but to formulate an autoregressive
model in which the conditional distribution of the next observation is circu-
lar. By contrast, the approach proposed is also based a circular conditional
distribution, but the dynamics are formulated so as to be consistent with the
circularity of the data. It draws on recently developed procedures for dealing
with non-Gaussian conditional distributions in a wide variety of situations,
primarily in economics and finance. The defining feature of the new class of
circular time series models, which turns out to be crucial for performance as
well as theoretical properties, is that the dynamics of the time-varying pa-
rameter are driven by the score of the conditional distribution. Score-driven
models are known as Dynamic Conditional Score (DCS) or Generalized Au-
toregressive Score (GAS) models; see, for example, Harvey (2013) and Creal,
Koopmans and Lucas (2013).
Harvey and Luati (2014) show how the score-driven model may be used
for modeling changing location when the conditional distribution is Student’s
t. The score automatically handles observations that would be classed as
outliers for a normal distribution by making them less influential. The same
methodology applied to directional data deals with the problem of circular-
ity. The asymptotic distribution of the maximum likelihood (ML) estimator
can be derived for a first-order model and general principles can be used for
2
testing and model selection. A nonstationary model is also feasible. There-
fore although circular data have special features that need to be explicitly
recognized, their overall treatment follows from a well-developed time series
approach. A score-driven autoregressive model is also proposed. A model of
this kind has not yet appeared in the dynamic score literature but it turns
out to be particularly attractive here.
Many scientific fields have applications in which directions are collected
and statistically analysed. In particular, modeling wind direction is becom-
ing increasingly important as energy generation by means of wind power
increases. The score driven models developed in this paper are applied to
wind direction data from the Black Mountain in Canberra. This is a rela-
tively short time series and is chosen mainly for comparative purposes with
previous studies. Our models will, however, generalize to handle many of the
issues raised by longer time series.
The plan of the paper is as follows. Sections 2 and 3 review the von
Mises distribution and existing methods for modeling circular time series.
Score-driven models are described in Section 4. The small sample properties
of these models are investigated by Monte Carlo experiments in Section 5.
Model selection methods are discussed in Section 6, while Section 7 applies
the new models to data on wind direction and highlights their advantages
over existing methods. The last section concludes and points to future de-
velopments.
3
2 Circular data and the von Mises distribu-
tion
A (continuous) circular probability distribution (PDF) which depends on a
vector of parameters , denoted f (y; ), must satisfy the following conditions:
(i) f (y; ) 0
Z
(ii) f (y; )dy = 1
1
f (y; µ, ) = exp{ cos(y µ)}, y, µ < , 0, (1)
2I0 ()
4
Note that
A() = E cos(y µ),
@ ln f
= sin(y µ), (3)
@µ
with variance A(). The score with respect to the concentration parameter
is
@ ln f
= cos(y µ) A(). (4)
@
Fitting such models can be seen as a missing data problem because the
unwrapped observations can be decomposed as
xt = yt + 2kt , t = 1, ..., T,
5
An alternative approach is to transform a circular variable y, where <
y µ < , to a variable x in the range 1 < x < 1 by means of a link
function, x = g 1 (y µ). There are then two ways to proceed. The first is to
fit a linear time series model to x and then transform back to y. The model
for x is sometimes called a direct or linked linear process. When the time
series model is an ARM A(p, q) process it is called LARM A(p, q) or, more
commonly, CARM A(p, q), where the C denotes ‘circular’ as opposed to the
L for ‘linked’. In the second class of models, which are nonlinear, the inverse
form is an autoregression, denoted IAR(p), whereby the conditional mean,
µt|t1 , of a circular distribution, such as vM , is specified as
µt|t1 = µ+g{1 g 1 (yt1 µ)+...+p g 1 (ytp µ)}, < yµ < . (6)
The IAR(1) model with close to one can be approximated without the
transformation so
µt|t1 ' µ + (yt1 µ). (7)
4 Score-driven models
When the conditional distribution is continuous and circular, letting the
score drive a dynamic equation for the conditional mean, µt|t1 , solves the
6
circularity problem because the score function is also circular and continuous.
Unlike the CARM A and IAR approaches, which set up a barrier for yt µ
at and and thus ignore the proximity of observations on either side, in
the proposed approach a value of yt slightly bigger than is treated in the
same way as if it were slightly bigger than .
The conditional score, ut , enters into a dynamic equation such as
where the conditional distribution of a variable xt defined over the real line,
has location µt|t1 so that
in which "t has location zero. An observation falling outside the range µ ±
can be wrapped, as in (5), so that yt lies in the range [, ). The score-
driven data generating process is invariant to this wrapping of the observa-
tions. Hence the question of how to estimate kt0 defined below (5) does not
arise. Note that there is no need to wrap µt|t1 but if it is reset neither the
data generation process nor estimation is a§ected.
The conditional score is a martingale di§erence sequence with mean zero.
In the case of the von Mises distribution, that is "t vM (0, ) in (9), dividing
the score, (3) by its information quantity gives sin(yt µt|t1 )/A() but there
is a good case for dropping A() because then the filter is not dependent on
. Hence the forcing variable in an equation such as (8) is
7
u 3
-3 -2 -1 1 2 3
y
-1
-2
-3
reflecting the fact that, like the score for a tdistribution, it is a redescend-
ing function. However, for small deviations from the mean, the score is
approximately linear; the MacLaurin expansion shows that sin(yt µt|t1 ) '
yt µt|t1 . If the concentration is large, so that a Gaussian conditional dis-
tribution is a reasonable approximation, then yt µt|t1 is the score and the
model corresponds to the steady-state innovations form of the Kalman fil-
ter from a Gaussian unobserved components model made up of a first-order
autoregressive process and white noise.
The dots in Figure 1 illustrate the critical role played by the score when
µt|t1 is close to . Suppose µt|t1 = a, where a is small and positive. (In
the figure it is 2.) Suppose the next observation is negative at + b,
where b is small and positive. The distance between µt|t1 and yt is only a+b,
but yt µt|t1 = 2 + b + a. However, sin(yt µt|t1 ) = sin(2 + b + a) =
sin(b + a). Thus the impact of the negative observation on µt|t1 is positive.
8
4.1 Maximum likelihood estimation
The log-likelihood function when the observations have a von Mises condi-
tional distribution is
X
T
ln L(, ) =T ln(2I0 ()) + cos(yt µt|t1 ),
t=1
with
a = A() (12)
b = 2 2A() + 2 (1 A()/) < 1. (13)
9
The information matrix for a vM distribution with parameters µ and
is given in Mardia and Jupp (2000, p 86, 350). The derivation of D( ) is in
Appendix A.
0
Proposition 2 Provided b < 1 and 6= 0, the ML estimator, ( e e)0 , is con-
sistent and asymptotically normal with mean ( 0 )0 and covariance matrix
given by the inverse of (11).
The proof follows from Lemma 1 in Jensen and Rahbek (2004) and Harvey
(2013). The conditions b < 1 and 6= 0 are needed for the information
matrix to be positive definite. Third derivatives associated with the mean
are bounded because they depend only on sines and cosines. Derivatives with
respect to concentration are also bounded because
@A() A()
= 1 A()2 .
@
X
T
sin(yt µt|t1 ) = 0,
t=1
X
T
S( , µ) = cos(yt µt|t1 )
t=1
10
been computed, the ML estimate of may be obtained by solving
) = S( e , µ
A(e e)/T.
11
when both and are positive and, in addition, , it follows that zt 0
and so |zt | = zt . Since E(zt ) = A(), Jensen’s inequality shows that
E ln t ( ) < 0.
4.2 Non-stationarity
The non-stationary first-order DCS model is
where is a drift term and µ1|0 is fixed. The conditional mean can, in
principle, travel all the way round the circle; see the footnote below (5).
Such situations can arise in practice. For example, Fisher (1993, p 249) gives
a data set of weekly observations at a location in England where the wind
direction moves round the full circle every quarter.
Because var(µt|t1 ) ! 1 as t ! 1 in (14), we have the following prop-
erty.
We cannot initialize µ1|0 with the directional mean because it does not
exist. The best option is to start o§ the recursion in (14) with µ2|1 = y1 and
compute estimates of the other parameters. These provide starting values for
full ML estimation with µ1|0 treated as a fixed parameter. The transformation
µ1|0 = 2 arctan(!), where ! is unconstrained, may be employed to ensure
µ1|0 < .
The asymptotics still hold for (14), as in Harvey (2013, p 45-6), so for a
conditional vM distribution, e is asymptotically normal with mean and
1 2A() 2 (1 A()/))
avar(e
) = . (15)
T A()2
12
Estimating the initial value, µ1|0 , makes no di§erence to avar(e
) because the
asymptotic variance is O(1). The equation for b, that is (13), now implies
> 0 and < 2A()/( A()). Note that ! 0 as ! 0 whereas for
! 1, 0 < < 2. However, for = 2, < 2.16. The result in Remark 4
suggests2 that invertibility is guaranteed by 1.
13
normal with mean 0 and covariance matrix, (1/A())Q1 , where the ij th
element of Q is the circular autocovariance of order |i j| , as defined in
the numerator of (21). Given that sin(yt µ) is stationary, the matrix Q
will be positive definite. The ML estimators of µ and are similarly as-
e and of each other.
ymptotically normal, being distributed independently of
p
The limiting distribution of T (eµ µ) is normal with mean 0 and variance
P
1/[A()(1 k k E cos(yt µ))2 /T ]. The asymptotic distribution of
e is as
implied by (11).
e can be estimated as
Corollary 1 The large-sample covariance matrix of
" #1
X
T
e = 1
avar() st s0t , (17)
eA(e
) t=p+1
1
avar(e
µ) = PT P (18)
eA(e
) t=p+1 (1 k e))2
k cos(ytk µ
1
' P P .
)(T p { k k } Tt=p+1 cos(yt µ
eA(e e))2
14
by adding lagged scores as defined in (10). Thus
µt|t1 = µ+1 sin(yt1 µ)+...+ p sin(ytp µ)+1 ut1 +...+q utq , (19)
15
even for T = 2000. Given the proximity to the unit root this may not be
surprising. Overall, the sample MSEs for and are much closer to the
asymptotic MSEs. Finally, note that, in accordance with the theory, a lower
means a higher MSE.
Table 2 shows the results of Monte Carlo experiments, again based on
10,000 replications, for the nonstationary model (14) with no intercept. The
MSEs are close to the values given by the asymptotic theory.
X
T 2 X
T 2
2 1 1
R = T cos yt + T sin yt .
t=1 t=1
16
Table 1:
Mean square errors of the maximum likelihood estimates of the score driven
model for circular data based on the von Mises distribution. Asymptotic
standard errors are shown in brackets.
T µ
0.9 0.5 2 250 0.916 0.033 0.062 0.264
500 0.309 (0.272) 0.011 (0.008) 0.031 (0.029) 0.125 (0.122)
1000 0.139 0.004 0.015 0.062
2000 0.068 (0.068) 0.002 (0.002) 0.008 (0.007) 0.031 (0.030)
0.98 0.5 2 250 14.83 0.012 0.071 0.330
500 11.45 (4.543) 0.004 (0.001) 0.036 (0.024) 0.170 (0.122)
1000 7.752 0.002 0.018 0.083
2000 3.390 (1.136) 0.001 (0.0002) 0.007 (0.006) 0.039 (0.030)
0.7 0.5 2 250 0.127 0.132 0.077 0.260
500 0.064 (0.064) 0.056 (0.045) 0.038 (0.037) 0.128 (0.122)
1000 0.032 0.024 0.018 0.062
2000 0.016 (0.016) 0.012 (0.011) 0.009 (0.009) 0.030 (0.030)
0.9 1 2 250 3.432 0.017 0.075 0.278
500 1.134 (0.756) 0.007 (0.004) 0.035 (0.033) 0.131 (0.122)
1000 0.477 0.003 0.017 0.064
2000 0.222 (0.189) 0.001 (0.001) 0.008 (0.008) 0.032 (0.030)
0.9 0.2 2 250 0.196 0.211 0.048 0.262
500 0.082 (0.081) 0.051 (0.019) 0.023 (0.022) 0.130 (0.122)
1000 0.040 0.014 0.011 0.062
2000 0.020 (0.020) 0.006 (0.005) 0.006 (0.005) 0.031 (0.030)
0.9 0.5 0.5 250 1.470 0.328 0.374 0.094
500 0.656 (0.574) 0.068 (0.016) 0.158 (0.123) 0.046 (0.044)
1000 0.306 0.014 0.070 0.022
2000 0.151 (0.143) 0.005 (0.004) 0.033 (0.031) 0.011 (0.011)
0.9 0.5 4 250 0.603 0.026 0.045 1.137
500 0.204 (0.162) 0.009 (0.007) 0.022 (0.022) 0.541 (0.520)
1000 0.084 0.004 0.011 0.265
2000 0.041 (0.040) 0.002 (0.002) 0.005 (0.005) 0.131 (0.130)
17
Table 2:
Scaled Mean square errors of the maximum likelihood estimates of the score
driven model for the nonstationary model (14). Asymptotic standard errors
are shown in brackets.
T
0.5 2 250 0.052 (0.044) 0.260 (0.244)
500 0.026 (0.022) 0.128 (0.122)
1000 0.017 (0.011) 0.062 (0.062)
2000 0.010 (0.005) 0.032 (0.030)
1 2 250 0.066 (0.061) 0.255 (0.244)
500 0.032 (0.031) 0.127 (0.122)
1000 0.017 (0.015) 0.062 (0.062)
2000 0.009 (0.008) 0.030 (0.030)
CC SS CS SC
c ( ) = , = 1, 2, ... (20)
SS CC
0 0 ( SC
0 )
2
where CC
= E[cos yt cos yt ] and similarly for SS CS
, and SC
. Both
sin yt and cos yt have zero means because of a uniformity assumption; see
Holzmann, Munk, Suster and Zucchini (2006). An alternative form is in
(6.36) of Fisher (1993, p 151). Fisher and Lee (1994, p 333) write down the
corresponding correlogram.
When the distribution is not uniform, the directional mean needs to be
subtracted; see Fisher (1993, p151-2). The circular correlation coe¢cient
proposed by Jammalamadaka and SenGupta (2001, p176-9) is formulated
somewhat di§erently and it implies a circular ACF given by
c ( ) = SS SS
/ 0 , = 0, 1, 2, ... (21)
where SS
= E[sin(yt µ)(sin(yt µ)], = 0, 1, 2, ...
18
The sample5 circular ACF corresponding to (21) is
P
sin(yt y d ) sin(yt y d )
rc ( ) = P 2 , = 1, 2, ... (22)
sin (yt y d )
The limiting distribution when the observations are independent and iden-
p
tically distributed (IID) is standard normal, that is T rc ( ) ! N (0, 1); see
Brockwell and Davis (1991, Theorem 7.7.2).
The Lagrange multiplier (LM) test against serial correlation in location
is based on the portmanteau or Box-Ljung statistic constructed from the
autocorrelations of the scores; see Harvey (2013, p 52-4) and Harvey and
Thiele (2016). For a vM distribution with > 0, the scores are proportional
to the sines of the angular observations measured as deviations from their
directional mean, so the autocorrelations are the circular autocorrelations
as defined in (21). The derivation can be based on the SCAR or SCMA
models, the latter being a special case of (19) with no lagged unconditional
scores. When the Q-statistic in the portmanteau test is based on the first P
sample autocorrelations, it is asymptotically distributed as 2P under the null
hypothesis of serial independence. Once a dynamic model has been fitted, a
formal test requires that the degrees of freedom be adjusted by subtracting
the number of estimated dynamic parameters from P . This is the Box-Pierce
test. An alternative is to carry out an LM test; see the discussion in Harvey
and Thiele (2016).
For the purposes of initial model identification it is helpful to know
something about the behaviour of the CACF in (21) for wrapped models.
From Jammalamadaka and SenGupta (2001, p 180), the circular ACF for a
wrapped Gaussian model, constructed as in (5), is
sinh(2 x ( ))
c ( ) = , = 0, 1, 2... (23)
sinh(2 x (0))
5
There appears to be a typographical error in the sample correlation given in (8.2.5)
of Jammalamadaka and SenGupta (2001, p178) because it is not consistent with the
theoretical definition.
19
The wrapping diminishes the autocorrelations, the more so the bigger is
the variance of the unwrapped series, x (0). On the other hand, as x (0)
! 0, c ( ) ! x ( ), that is the ACF of c ( ) is close to that of xt . Thus
whereas the ACF of unwrapped observations, were they available, could be
interpreted in the usual way for linear data, this is no longer true for the
wrapped observations unless the variance is small. For a score-driven model,
the issues are somewhat di§erent because sin(yt µd ) = sin(xt µ) and so,
since µd = µ, the dynamic properties of the wrapped and unwrapped series
are the same. The challenge is therefore to determine the properties of the
unwrapped series.
There is no closed form CDF for the vM distribution, but probability inte-
gral transforms (PITs) can be computed by approximations as detailed in
Mardia and Jupp (2000, p 41). An LM test against a class of exponential
distributions, given in Mardia and Jupp (2000, p 142-3), can be carried out.
A rejection of the vM distribution may lead one to consider a more general
class of distributions, such as the one proposed by Jones and Pewsey (2005).
When a model has been fitted, the most informative diagnostic plot is
one where yt is adjusted, by adding or subtracting 2, so as to be in the
range µt|t1 ± . In this way observations close to ± no longer appear at
both the top and bottom of the graph.
Goodness of fit may be assessed by the dispersion (circular variance)
X
T
D =1 cos(yt µt|t1 )/T (25)
t=1
20
p
or the circular standard deviation, s = 2 ln(1 D), a measure whose
square is most comparable to the prediction error variance; see Mardia and
Jupp (2000, pp 18-19, 30). In time series forecasting the random walk often
provides a useful benchmark. The equivalent benchmark for dispersion is
P
D = 1 Tt=2 cos(yt yt1 )/(T 1). Hence goodness of fit for a particular
model might be characterized by A = 1 D/D ; with a perfect fit A = 1.
Alternatively we could use B = 1 s2 /s2 , where s2 = 2 ln(1 D ). A
negative value for A or B indicates that the model is worse than a forecast
given by the last observation.
6.4 Forecasts
When a forecast of the next observation, that is µt+1|t , falls outside the range
it can be reset, as in (5), to give yet+1|t in the range [, ). The conditional
distribution for yt+1 is vM (eyt+1|t , ).
The conditional distribution of yt+` may be obtained by simulation with
P
the accuracy measured by D(`) = 1 `j=1 cos(yt+j µt+j|t )/`.
21
1 .0 0
ACF-sin W in d BMd
0 .7 5
0 .5 0
0 .2 5
0 .0 0
-0 .2 5
-0 .5 0
-0 .7 5
0 1 2 3 4 5 6 7 8 9 10 11 12
the structure is more indicative of ARM A(1, 1). However, given the damping
a§ect on circular autocorrelations highlighted by (23), an AR(1) model may
not be unreasonable. The ambiguity shows that circular correlograms need
to be interpreted with care.
The histogram of the circular observations, which is shown in Figure 3,
suggests that a transformation may be neither necessary nor desirable. A
probit would produce a normal distribution if the original distribution were
uniform - which it clearly is not. As it is, the excess kurtosis for a probit
transformation is 3.02. The tan(y/2) transformation is even more extreme
in this respect with excess kurtosis of 13.51; it is perhaps not surprising that
the estimate of in the CAR(1) model is only 0.35.
Table 3 shows the results of fitting various models. The first model ignores
circularity by assuming a conditional Gaussian distribution. The other mod-
els take it to be von Mises. The benchmark given by the circular variance
for first di§erences is D = 0.258 implying s2 = 0.597. Standard errors are
22
Density
WindBMd N(s=0.934)
0.6
0.5
0.4
0.3
0.2
0.1
shown for the DCS(1) model - the first-order filter of (8) - and the SCAR(1)
model. For DCS(1) these were obtained from (11).
The score-driven models give the best fit. Furthermore the circular cor-
relogram of the best-fitting model, SCAR(1), shows very little evidence of
residual serial correlation; see Figure 4. Note however, that the superior fit
of the SCAR model over the DCS is due to the first 18 observations which
lie mainly below the others. If these observations are dropped, DCS(1) is the
better model.
Figure 5 shows the filtered conditional mean for the DCS model and
compares it with the IAR(1) filter given by the tan(y/2) transformation.
The DCS filter is much less variable; the SCAR filter behaves in a similar
way. A plot for the probit IAR(1) lies between the IAR tan(y/2) and the
DCS. If the data are not centred by subtracting the directional mean, the IAR
filters behave di§erently whereas the score filters are basically una§ected.
23
1.00
ACF-sinResAR1
0.75
0.50
0.25
0.00
-0.25
-0.50
-0.75
0 1 2 3 4 5 6 7 8 9 10 11 12
-1
-2 DCS
IAR
-3
10 20 30 40 50 60 70
t
24
Table 3:
Estimates and goodness of fit measures for Black Mountain data.
Standard errors are shown in parentheses.
Model µ D s2 A
Gaussian AR(1) 0.02 0.52 - - 0.243 0.557 0.057
(0.10) (0.14)
IAR(1) Probit 0.03 0.68 - 2.46 0.239 0.547 0.071
(0.26) (0.14) - (0.35)
IAR(1) tan 0.08 0.67 - 2.44 0.242 0.555 0.060
(0.27) (0.15) - (0.35)
SCAR(1) 0.60 1.24 - 3.00 0.190 0.421 0.263
(0.13) (0.13) - (0.43)
DCS(1) 0.11 0.66 0.64 2.54 0.231 0.526 0.103
(0.20) (0.16) (0.15) (0.36)
Fisher and Lee (1994) also estimated a CAR(1) model with parameter
e
= 0.52 after a probit transformation. The CAR models do not give one-
step ahead forecasts with a vM distribution and so are di¢cult to compare
directly with IAR models. In any case they are a much less attractive option.
8 Conclusions
This article shows how the score-driven approach provides a natural solution
to the di¢culties posed by circular data and leads to a coherent and unified
methodology for estimation, model selection and testing. The data generat-
ing process is una§ected by any wrapping of the observations and the models
estimated by maximum likelihood are una§ected by the way the data is cut.
Two classes of models are introduced, one based on a filtered component and
the other taking an autoregressive form. An asymptotic theory is developed
and Monte Carlo experiments examine small sample performance. Diagnos-
tic checks for serial correlation follow straightforwardly. The new models are
25
fitted to hourly data on wind direction and are shown to provide a better fit
than existing methods.
The score-driven approach may be extended in a number of directions.
Firstly conditional distributions other than von Mises are easily accommo-
dated. Secondly heteroscedasticity can be modeled with dynamic equations
driven by the score with respect to concentration. Thirdly dynamic seasonal
and diurnal e§ects (for hourly data) can be handled and finally the approach
can be used to formulate models for circular-linear data. These issues will
be addressed in later work.
26
Appendix A: Information matrix for the DCS model
The information matrix for the ML estimator of is, from Harvey (2013,
p 37),
0 1 2 3
A D E
B C 1 6 7
D( ) = D @ A = 4 D B F 5 (26)
1b
µ E F C
with
2 2u (1 + a) (1 )2 (1 + a)
A = 2u = A()/, B= , C = ,
(1 2 )(1 a) 1a
a 2u c(1 ) ac(1 )
D = , E= and F = .
1 a 1a (1 a)(1 a)
Now
a = A()
b = 2 2A() + 2 (1 A()/)
because E (@ut /@µ)2 = E cos2 (yt µt|t1 ) and we know from the informa-
2
tion quantity for that E[ cos(yt µt|t1 ) A() ] = 1 A()2 A()/.
Finally c = E sin(yt µt|t1 ) cos(yt µt|t1 ) = E sin{2(yt µt|t1 )} /2 =
0. Thus E = F = 0.
There are no extra terms because the o§-diagonals in the information
matrix, (11), are zero and ut = sin(yt µt|t1 ) does not depend on ; see
http://www.econ.cam.ac.uk/DCS/docs/Lemma10.pdf for further details on
the issues involved.
27
Note that the normal equations for are
@ ln L XT
= sin(yt µt|t1 ) sin(ytj µ) = 0, j = 1, ..., p. (27)
@j t=p+1
@ 2 ln L XT
= cos(yt µt|t1 ) sin(ytj µ) sin(ytk µ), j, k = 1, ..., p
@j @k t=p+1
@ 2 ln f
Et1 = A() sin(ytj µ) sin(ytk µ), j, k = 1, ..., p (28)
@j @k
at the true parameter values. The unconditional expectation gives the circu-
lar autocovariances. Furthermore
@ 2 ln L XT X
= cos(yt µt|t1 )[1 k cos(ytk µ)] sin(ytj µ)
@j @µ t=p+1 k
X
T
sin(yt µt|t1 ) cos(ytj µ), j = 1, ..., p
t=p+1
so
@ 2 ln L XT X
Et1 = A() [1 k cos(ytk µ)] sin(ytj µ).
@j @µ t=p+1 k
The unconditional expectation is zero because sine is odd and cosine is even
and so cos(ytk µ) sin(ytj µ) is odd and its (unconditional ) expec-
tation is zero. As regards , taking conditional expectations shows that
E(@ 2 ln L/@j @) = 0, j = 1, ..., p and E(@ 2 ln L/@µ@) = 0.
28
The result for µ follows because
@ ln L XT X
= sin(yt µt|t1 )[1 k cos(ytk µ)]
@µ t=p+1 k
and
@ 2 ln L XT X
= cos(y t µ t|t1 )[1 k cos(ytk µ)]2
@µ2 t=p+1 k
X
T X
sin(yt µt|t1 ) k sin(ytk µ)
t=p+1 k
29
References
Blasques, F., Gorgi, P., Koopman, S.J. and O. Wintenberger (2018).
Feasible invertibility conditions and maximum likelihood estimation for
observation-driven models. Electronic Journal of Statistics, 12, 1019—1052.
Fisher, N.I. and A.J. Lee (1992). Regression models for an angular response.
Biometrics, 48, 665—77.
Fisher, N.I. and A.J. Lee (1994). Time series analysis of circular data. Jour-
nal of the Royal Statistical Society, B, 70, 327—332.
Harvey, A.C. (2013) Dynamic Models for Volatility and Heavy Tails: with
Applications to Financial and Economic Time Series. Econometric Society
Monograph, New York: Cambridge University Press.
30
Harvey, A.C. and A. Luati. (2014). Filtering with heavy tails. Journal of
the American Statistical Association, 109, 1112—1122.
31