0% found this document useful (0 votes)
32 views29 pages

An Introduction To Animal Movement Modeling With Hidden Markov Models Using Stan For Bayesian Inference

Uploaded by

somu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views29 pages

An Introduction To Animal Movement Modeling With Hidden Markov Models Using Stan For Bayesian Inference

Uploaded by

somu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

An Introduction to Animal Movement Modeling with

Hidden Markov Models using Stan for Bayesian


Inference
Vianey Leos-Barajas1 & Théo Michelot2
1
Iowa State University/Bielefeld University - [email protected]
arXiv:1806.10639v1 [q-bio.QM] 27 Jun 2018

2
University of Sheffield - [email protected]

Introduction
Hidden Markov models (HMMs) are popular time series model in many fields including ecology, economics
and genetics. HMMs can be defined over discrete or continuous time, though here we only cover the former. In
the field of movement ecology in particular, HMMs have become a popular tool for the analysis of movement
data because of their ability to connect observed movement data to an underlying latent process, generally
interpreted as the animal’s unobserved behavior. Further, we model the tendency to persist in a given
behavior over time.
Those already familiar with Michael Betancourt’s case study “Identifying Bayesian Mixture Models” will
see a natural extension from the independent mixture models that are discussed therein to an HMM, which
can also be referred to as a dependent mixture model. Notation presented here will generally follow the
format of Zucchini et al. (2016) and cover HMMs applied in an unsupervised case to animal movement data,
specifically positional data. We provide Stan code to analyze movement data of the wild haggis as presented
first in Michelot et al. (2016). Implementing HMMs in Stan has also been covered by Luis Damiano here:
https://github.com/luisdamiano/gsoc17-hhmm For a thorough overview of HMMs, see Zucchini et al. (2016).

Hidden Markov Models


An HMM is a doubly stochastic time series with an observed process (Yt ) that depends on an underlying
state process (St ). The observations {Yt }Tt=1 are taken to be conditionally independent given the states
{St }Tt=1 and are generated by so-called state-dependent distributions, {fn }N n=1 . In this case we assume that
St can take on a finite number N ≥ 1 of states, such that we can also refer to this as an N -state HMM. The
evolution of states over time is governed by a first-order Markov chain, i.e. Pr(St |St−1 , . . . , S1 ) = Pr(St |St−1 ),
(t) (t)
with transition probability matrix Γ(t) = γi,j , where γi,j = Pr(St = j|St−1 = i) for i, j = 1, . . . , N . Assuming
a time-homogeneous process, we have that Γ(t) = Γ. A consequence of this formulation is that the amount of
time Dn spent in a given state n (before switching to an other state) is a random variable that follows a
geometric distribution with parameter 1 − γn,n , formally Dn ∼ Geom(1 − γn,n ) with Dn ∈ N. Lastly, it is
(1)
necessary to define the initial state distribution δ (1) for the state process at time t = 1 with entries δn =
Pr(S1 = n), for n = 1, . . . , N .
All together, an HMM is completely defined by specification of three components:
• State-dependent distributions, {fn }Nn=1
(t)
• Transition probability matrix, Γ(t) = γi,j , for i, j = 1, . . . , N
• Initial state distribution, δ (1)
– Stationary distribution, δ = δΓ
– Estimate the initial state distribution, e.g. δ (1) ∼ Dirichlet(ν)

1
For a time-homogeneous process we can use the stationary distribution as the initial state distribution,
otherwise we can estimate the distribution.

Likelihood

There are two functions referred to as the “likelihood” in the HMM literature, the complete-data likelihood,
i.e. the joint distribution of the observations and states, or the marginal likelihood, i.e. the joint distribution
of the observations only. The complete-data likelihood is written as follows,

T
Y T
Y
f (y, s) = Lc = δs(1)
1
γst−1 ,st fst (yt ) (1)
t=2 t=1

The simplicity of the complete-data likelihood formulation may be one reason why many conduct inference
for parameters and states jointly, typically through a Gibbs sampler, alternating between estimation of states
and parameters. In contrast, evaluation of the marginal likelihood requires summation over all possible state
sequences,

N
X N
X T
Y T
Y
Lm = ··· δs(1)
1
γst−1 ,st fst (yt ) (2)
s1 =1 sT =1 t=2 t=1

However, evaluation of the marginal likelihood is necessary for implementation in Stan as the states are
discrete random variables. Zucchini et al. (2016) show that the marginal likelihood can be written explicitly
as a matrix product,

Lm = δ (1) P(y1 )ΓP(y2 ) · · · ΓP(yt )1> (3)

for an N × N matrix P(yt ) = diag (f1 (yt ), . . . , fN (yt )) and a vector of 1s of lengthN , 1 = (1, . . . , 1). For
observations missing at random, we simply have P(yt ) = IN ×N . The marginal likelihood can be calculated
efficiently with the forward algorithm, which calculates the likelihood recursively. We define the forward
variables αt , beginning at time t = 1, as follows

α1 = δ (1) P(y1 ), αt = αt−1 ΓP(yt ), (4)

Then, the marginal likelihood is obtained by summing over αT ,

N
X
Lm = f (y1 , . . . , yT ) = αT (i) = αT 1> . (5)
i=1

Notably, the computational effort involved in evaluating Lm is only linear in T , the number of observations, for
a given number of states, N . Direct evaluation of the likelihood can result in numerical underflow. However,
we can also use the forward algorithm to evaluate the log marginal likelihood, log(Lm ), and avoid underflow
when calculating each forward variable – as demonstrated in the implementation in Stan model given below.
For the HMM details we provide here, we assume the following:
• The state-dependent distributions are distinct, f1 6= · · · 6= fN ;
• The t.p.m. Γ has full rank and is ergodic
These two points are sufficient for an HMM to be identifiable. The first point is important when applying an
HMM to animal movement data because the states are assumed to reflect different behaviors. The second
point indicates we would like for the animal to be able to transition between behaviors across time.

2
Priors

An HMM has two main sets of parameters that require specification of prior distributions, the parameters
corresponding to the i) state-dependent distributions and ii) transition probabilities, with a possible third set
if estimating the initial distribution as well.
However, because an HMM lies within the class of mixture models, the lack of identifiability due to label-
switching (i.e. a reordering of indices can lead to same joint distribution) should be taken into account.

State-dependent distributions

As in the independent mixture models discussed in Betancourt (2017), identification and inferences of the
state-dependent distributions of an HMM can be problematic. Issues related to label-switching can make
it difficult for the MCMC chains to efficiently explore the parameter space. In practice, HMMs are also
notorious for their multi-modality. As such, some additional restrictions and information, such as ordering
of a subset of the parameters of interest and/or informative priors can aid inference. For example, we can
impose an ordering on the means, µ1 < µ2 < · · · < µN , of the state-dependent distributions (if possible),
which is easily done in Stan:
parameters {
postive_ordered[N] mu;
}
Other parametrizations can also be used to order the means. For example, given µ1 ∈ R and a vector of
length N − 1, η ∈ R+ , µn = µn−1 + ηn−1 , for n ∈ 2, . . . , N .

parameters {
real mu;
vector<lower=0> etas[N-1];
}

transformed parameters{
vector[N] ord_mus;

ord_mus[1] = mu;
for(n in 2:N)
ord_mus[n] = ord_mus[n-1] + etas[n-1];

}
As the state-dependent distributions reflect characteristics of the observed data, priors for the parameters of
interest should not place the bulk of the probability on values that are unrealistic. Also, note that because of
potential label-switching, some type of ordering will likely be needed so that the priors correspond to the
appropriate distributions (if not exchangeable). See Betancourt (2017) for similar issues in mixture models.

Transition Probability Matrix

It is typically easier to form some type of intuition of the parameters of the state-dependent distributions than
entries of the t.p.m. However, in animal movement data there is generally persistence in the estimated states
that we would like to capture (hence the reason for using HMMs). For the model, this behavior corresponds
to large diagonal entries, γn,n for n ∈ {1, . . . , N }, typically >0.8 in our own experience, though this could of
course vary depending on the temporal resolution of the data and question of interest.

3
Capturing Important Features of the Data and Model Evaluation
There are two features of the movement data that we aim to capture, a) the marginal distribution of Yt and
b) the temporal dependence of the observed data (e.g. autocorrelation).

Marginal Distribution of Yt and Temporal Dependence

The marginal distribution of yt of an HMM is the distribution a given observation at time t unconditional
on the states, f (yt |θ), with θ reflecting the state-dependent parameters requiring estimation. For a time-
homogeneous process with stationary distribution, δ, the marginal distribution is derived as a mixture of the
state-dependent densities weighted by entries of δ,

f (yt |θ) = δ1 · f1 (yt ) + · · · + δN · fN (yt )

In the analysis of animal movement data, the stationary distribution can give the ecologist an estimate of
the proportion of time that the animal exhibits the states (and related behaviors) overall. However, it is
important to not only report this result of the HMM because there are inifitely many HMM formulations
that lead to the same marginal distribution for yt . For example,

HMM1 HMM2
State-dependent Dist. f1 ∼ N (0, 4); f1 ∼ N (5, 5) f2 ∼ N (0, 4); f2 ∼ N  (5, 5)
TPM Γ = 0.7 0.3
0.3 0.7 Γ = 0.95 0.0.5
0.05 0.95
Stationary Dist δ = (0.5 0.5) δ = (0.5 0.5)
Marginal Dist. 0.5 · f1 + 0.5 · f2 0.5 · f1 + 0.5 · f2

This result is a key difference between independent mixture models and HMMs. An HMM is identifiable, even
given the above result, because there is dependence over time that we take into account via the transition
probability matrix, Γ. The marginal distribution does not completely relay all of the information about the
manner in which the data were generated. In particular, taking into account the temporal dependence, as an
HMM does, allows for identification of state-dependent distributions that may significantly overlap and other
flexible forms, see Alexandrovich et al. (2016) and Langrock et al. (2015).
Aside from capturing the marginal distribution of yt , we also aim to capture the temporal dependence present
in the data. In particular, the autocorrelation structure of data produced by the fitted HMM should be
comparable to the data itself. As a result, this can be a key characteristics with which to do posterior
predictive checking (Morales et al., 2004).

Assessing Model Adequacy Using Forecast (Pseudo-)Residuals and Posterior


Predictive Checks

Forecast (Pseudo-)Residuals

One manner in which the fitted HMM can be assessed is through evaluation of the (pseudo-)residuals. The
pseudo-residuals are computed in two steps. First, for continuous observations, the uniform pseudo-residuals
ut are defined as

ut = Pr(Yt ≤ yt |Y(t−1) = y (t−1) ), t ∈ {1, . . . , T }.

Then, the normal pseudo-residuals are obtained as

rt = Φ−1 (ut ), t ∈ {1, . . . , T },

4
where Φ is the cumulative distribution function of the standard normal distribution. If the fitted HMM is
the true data-generating process, the rt have a standard normal distribution. In practice, a qq-plot can be
used to compare the distribution of pseudo-residuals to the standard normal, and assess the fit. Further,
the (pseudo-)residuals of a fitted HMM should not be autocorrelated, indicating that the dependence is
adequately captured.

Posterior Predictive Checks

Posterior predictive checks allow one to assess the adequacy of the fitted model by generating M replicate
data sets from the distribution f (y ∗ ky) = f (y ∗ |θ)f (θky). In particular here, we use these checks to assess
the fitted model’s ability to be interpreted as the data generating mechanism. The main idea is that the
model should be able to produce data that is similar to the one observed in the key features defined a priori.
Given M posterior draws θ ∗1 , . . . , θ ∗M , we generate M data sets from the distribution f (y ∗ kθ ∗ ). We then
compare key features of the replicate data sets to observed features. See Betancourt (2018) for more details.
We demonstrate a few graphical posterior predictive checks in the HMM examples.

State Estimation
In animal movement modeling (the focus presented here), estimation of the underlying state sequence is not
the primary focus of the analysis but rather a convenient byproduct of the HMM framework. It is most
important that the estimated state-dependent distributions can be connected to biologically meaningful
processes, though state estimation can help one visualize the results of the fitted models.
There are two approaches to state estimation:
• Local State Decoding: P r(St |y1 , . . . , yT , θ) OR
• Global State Decoding: P r(S1 , . . . , ST |y1 , . . . , yT , θ)
The first considers the distribution of the state at time t, St , given the observations and estimated parameters
θ. These distributions can be obtained through implementation of the forward-backward algorithm.
The aim of the second approach is to obtain the most likely state sequence given all of the observations.
For this, we use the Viterbi algorithm which returns the most likely state sequence given the observations
and estimated parameters. Both approaches are already covered by Luis Damiano: https://github.com/
luisdamiano/gsoc17-hhmm. In general, both will return similar (if not equal) results when it comes to state
decoding (assigning an observation to one of N states).
Going beyond assignment of observations to states and obtaining the state probabilities at each point in time
can also be highly informative. In particular, two models may result in similar state decodings yet correspond
to different estimates of the parameters of interest. While this is not a problem per se, it can be difficult to
connect the estimated states to key biological processes when the observations have large probabilities of
being associated with more than one state.

Example: Fitting a 2-state HMM

# Initialisation
library(rstan)
library(bayesplot)
library(ggplot2)
library(coda)
library(circular)

5
library(moveHMM)

rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
pal <- c("firebrick","seagreen","navy") # colour palette
set.seed(1)

Before getting into details about how HMMs are applied to animal movement data, we present how to fit a
basic HMM in Stan (Stan 2018). We consider a 2-state HMM with Gaussian state-dependent distributions
for the observation process Xt . That is, at each time step t = 1, 2, . . ., we have

Yt |St = j ∼ N (µj , σ 2 ),

for j ∈ {1, 2}.


The following code simulates from the model. Note that here we take the initial state distribution to be the
stationary distribution.
# Number of states
N <- 2
# transition probabilities
Gamma <- matrix(c(0.9,0.1,0.1,0.9),2,2)
# initial distribution set to the stationary distribution
delta <- solve(t(diag(N)-Gamma +1), rep(1, N))
# state-dependent Gaussian means
mu <- c(1,5)

nobs <- 1000


S <- rep(NA,nobs)
y <- rep(NA,nobs)

# initialise state and observation


S[1] <- sample(1:N, size=1, prob=delta)
y[1] <- rnorm(1, mu[S[1]], 2)

# simulate state and observation processes forward


for(t in 2:nobs) {
S[t] <- sample(1:N, size=1, prob=Gamma[S[t-1],])
y[t] <- rnorm(1, mu[S[t]], 2)
}

plot(y, col=pal[S], type="h")

The likelihood of the model can be written with the forward algorithm, given in Equation 4, with

φ(xt |µ1 , σ 2 ) 0
 
P (xt ) = ,
0 φ(xt |µ2 , σ 2 )

where φ is the Gaussian pdf.


The following code provides the complete implementation of the N -state HMM with Gaussian state-dependent
distributions in Stan, based on the forward algorithm.
First, we define the known quantities in the data block:

6
10
5
y

0
−5

0 200 400 600 800 1000

Index

Figure 1: Simulated observations from a 2-state HMM with Gaussian state-dependent distributions.

data {
int<lower=0> N; // number of states
int<lower=1> T; // length of data set
real y[T]; // observations
}
There are two sets of parameters that requires estimation, Γ and µ, which we define in the parameter block:
parameters {
simplex[N] theta[N]; // N x N tpm
ordered[N] mu; // state-dependent parameters
}
We assume stationarity of the underlying Markov chain and initialize the process with the stationary
distribution, δ. As δ is a function of Γ, we compute it in the transformed parameters block:
transformed parameters{

matrix[N, N] ta; //
simplex[N] statdist; // stationary distribution

for(j in 1:N){
for(i in 1:N){
ta[i,j]= theta[i,j];
}
}

statdist = to_vector((to_row_vector(rep_vector(1.0, N))/


(diag_matrix(rep_vector(1.0, N)) - ta + rep_matrix(1, N, N)))) ;
}
Given the information in the data block, and having defined all parameters of interest, we now define the rest
of the model in the model block:

7
model {

vector[N] log_theta_tr[N];
vector[N] lp;
vector[N] lp_p1;

// prior for mu
mu ~ student_t(3, 0, 1);

// transpose the tpm and take natural log of entries


for (n_from in 1:N)
for (n in 1:N)
log_theta_tr[n, n_from] = log(theta[n_from, n]);

// forward algorithm implementation

for(n in 1:N) // first observation


lp[n] = log(statdist[n]) + normal_lpdf(y[1] | mu[n], 2);

for (t in 2:T) { // looping over observations


for (n in 1:N) // looping over states
lp_p1[n] = log_sum_exp(log_theta_tr[n] + lp) +
normal_lpdf(y[t] | mu[n], 2);

lp = lp_p1;
}

target += log_sum_exp(lp);
}
We first run 2000 iterations for each of the 4 chains, with the first 1000 draws drawn during the warm-up
phase, and verify that the posterior draws capture the true parameters.
stan.data <- list(y=y, T=nobs, N=2)
fit <- stan(file="HMM1.stan", data=stan.data, refresh=2000)

mus <- extract(fit, pars=c("mu"))

hist(mus[[1]][,1],main="",xlab=expression(mu[1]))
abline(v=1, col=pal[1], lwd=2)
hist(mus[[1]][,2],main="",xlab=expression(mu[2]))
abline(v=5, col=pal[2], lwd=2)

From the fitted model, we extract the parameters of interest and generate 4000 data sets in order to perform
a few graphical posterior predictive checks.
## extract posterior draws
psam <- extract(fit, pars = c("theta", "mu"))

## generate new data sets

n.sims <- dim(psam[[1]])[1]


n <- length(y)

# state sequences

8
700

600
500
Frequency

Frequency

400
300

200
100
0

0
0.8 1.0 1.2 1.4 1.6 4.8 5.0 5.2 5.4

µ1 µ2

Figure 2: Histograms of the posterior draws for the state-dependent variance. The vertical lines show the
true values used in the simulation.

ppstates <- matrix(NA, nrow = n.sims, ncol = n)


# observations
ppobs <- matrix(NA, nrow = n.sims, ncol = n)

for (j in 1:n.sims) {
theta <- psam[[1]][j, , ]
statdist <- solve(t(diag(N) - theta + 1), rep(1, N))

ppstates[j, 1] <- sample(1:N, size = 1, prob = statdist)


ppobs[j, 1] <- rnorm(1, mean = psam[[2]][j, ppstates[j, 1]], sd = 2)

for (i in 2:length(y)) {
ppstates[j, i] <- sample(1:N, size = 1, prob = theta[ppstates[j, i -
1], ])
ppobs[j, i] <- rnorm(1, mean = psam[[2]][j, ppstates[j, i]], sd = 2)
}
}

First, we check that the densities of the replicated data sets are similar to the observed data set. For this we
use the R package bayesplot.
ppc_dens_overlay(y, ppobs[1:100,])

9
y
y rep

−5 0 5 10

We also plot the autocorrelation function of the observed data and compare to 90% credible intervals for the
ACF of the replicated data sets.
nlags <- 61
oac = acf(y[2:(n - 1)], lag.max = (nlags - 1), plot = FALSE)$acf # observed acf

ppac = matrix(NA, n.sims, nlags)


for (i in 1:n.sims) {
ppac[i, ] = acf(ppobs[i, ], lag.max = (nlags - 1), plot = FALSE)$acf
}

hpd.acf <- HPDinterval(as.mcmc(ppac), prob = 0.95)


dat <- data.frame(x = 1:61, acf = as.numeric(oac), lb = hpd.acf[, 1], ub = hpd.acf[,
2])

ggplot(dat, aes(x, acf)) + geom_ribbon(aes(x = x, ymin = lb, ymax = ub), fill = "grey70",
alpha = 0.5) + geom_point(col = "purple", size = 1) + geom_line() + coord_cartesian(xlim = c(2,
60), ylim = c(-0.1, 0.5)) + xlab("Lag") + ylab("ACF") + ggtitle("Observed Autocorrelation
Function with 90% CI for ACF of Predicted Quantities")

10
Observed Autocorrelation
Function with 90% CI for ACF of Predicted Quantities

0.4

ACF

0.2

0.0

0 20 40 60
Lag

Finally, we use the posterior expected values of the variables of interest to construct the forecast (pseudo-
)residuals.
##### R code for Forecast (Pseudo-)Residuals

## Calculating forward variables


HMM.lalpha <- function(allprobs, gamma, delta, n, N, mu) {
lalpha <- matrix(NA, N, n)

lscale <- 0
foo <- delta * allprobs[1, ]
lscale <- 0
lalpha[, 1] <- log(foo) + lscale
sumfoo <- sum(foo)
for (i in 2:n) {
foo <- foo %*% gamma * allprobs[i, ]
sumfoo <- sum(foo)
lscale <- lscale + log(sumfoo)
foo <- foo/sumfoo # scaling
lalpha[, i] <- log(foo) + lscale
}
lalpha
}

## Calculating forecast (pseudo-)residuals


HMM.psres <- function(x, allprobs, gamma, n, N, mu) {

delta <- solve(t(diag(N) - gamma + 1), rep(1, N))

la <- HMM.lalpha(allprobs, gamma, delta, n, N, mu)

pstepmat <- matrix(NA, n, N)


fres <- rep(NA, n)

11
ind.step <- which(!is.na(x))

for (j in 1:length(ind.step)) {
pstepmat[ind.step[j], 1] <- pnorm(x[ind.step[j]], mean = mu[1], sd = 2)
pstepmat[ind.step[j], 2] <- pnorm(x[ind.step[j]], mean = mu[2], sd = 2)
}

if (!is.na(x[1]))
fres[1] <- qnorm(rbind(c(1, 0)) %*% pstepmat[1, ])
for (i in 2:n) {

c <- max(la[, i - 1])


a <- exp(la[, i - 1] - c)
if (!is.na(x[i]))
fres[i] <- qnorm(t(a) %*% (gamma/sum(a)) %*% pstepmat[i, ])
}
return(list(fres = fres))
}

means <- colMeans(mus[[1]])

allprobs <- matrix(1, nrow = n, ncol = N)


for (j in 1:N) allprobs[which(!is.na(y)), j] <- dnorm(y, mean = means[j], sd = 2)

gamma <- matrix(c(mean(psam[[1]][, 1, 1]), 1 - mean(psam[[1]][, 1, 1]), 1 -


mean(psam[[1]][, 2, 2]), mean(psam[[1]][, 2, 2])), nrow = 2, byrow = T)

fres <- HMM.psres(x = y, allprobs = allprobs, gamma = gamma, n = n, N = N, mu = means)

Plotting the residuals in a Q-Q plot:


ggplot(data=data.frame(x=fres$fres), aes(sample = x)) + stat_qq() +
stat_qq_line(color="purple", size=1) +
ggtitle("Q-Q Plot") + theme_classic()

12
Q−Q Plot
4

2
sample

−2

−2 0 2
theoretical

Note that it is also possible to construct the distribution of forecast residuals at each time t.

Covariates
In HMMs applied to animal movement, covariates are typically incorporated at the level of the hidden
states. For the general case of time-varying covariates, we define the corresponding time-dependent transition
(t) (t) (t)
probability matrix Γ(t) = (γij ), where γij = Pr(St+1 = j|St = i). The transition probabilities at time t, γij ,
(t) (t) 
can then be related to a vector of environmental (or other) covariates, ω1 , . . . , ωp , via the multinomial
logit link:
(
(ij) Pp (ij) (t)
(t) exp(ηij ) β0 + l=1 βl ωl if i 6= j;
γij = PN , where ηij =
k=1 exp(ηik ) 0 otherwise.

Essentially there is one multinomial logit link specification for each row of the matrix Γ(t) , and the entries on
the diagonal of the matrix serve as reference categories.

Modeling Animal Movement with HMMs

Motivation

We consider the application of HMMs to the analysis of animal movement tracks. Movement data typically
consist of a bivariate time series of longitude-latitude positions, collected at regular time intervals over the
study period (e.g. hourly locations). HMMs are widely used in movement ecology to describe such data as
arising from several distinct movement patterns, modelled by the underlying Markov chain St . In particular,
these movement patterns serve as proxies for general behaviors of interest. At each time step, we consider
that an animal is in one of N (behavioural) states (e.g. “exploratory”, “foraging”. . . ), on which depend some
metrics of movement. Note that there is generally no 1-1 mapping from state to behavior of interest, but
more on this later.

13
In this context, the most common HMM formulation is based on the step lengths and turning angles, which
can be derived from the location data. The step length Lt is the distance between the two successive locations
Xt and Xt+1 , and the turning angle ϕt is the angle between the two successive directions (Xt−1 , Xt ) and
(Xt , Xt+1 ).

Wild Haggis

We present a simulation study based on the (simulated) wild haggis tracking data from Michelot et al. (2016).
The data set comprises 15 tracks, with slope and temperature covariates.

Figure 3: Andrea Langrock’s impression of the elusive wild haggis

We use the function prepData in the package moveHMM to derive step lengths and turning angles from the
location data.
rawhaggis <- read.csv("data/haggis.csv")
# derive step lengths and turning angles from locations
data <- prepData(rawhaggis, type="UTM")

hist(data$step, main="", xlab="Step length")


hist(data$angle, breaks=seq(-pi,pi,length=15), main="", xlab="Turning angle")

Following Michelot et al. (2016), we consider a 2-state HMM with gamma and von Mises state-dependent
distributions. That is, for j ∈ {1, 2}
2500

1500
1000
Frequency

Frequency
1500

500
500
0

0 5 10 15 20 25 −3 −2 −1 0 1 2 3

Step length Turning angle

Figure 4: Histograms of the step lengths (left) and turning angles (right) in the wild haggis data.

14
Lt |St = j ∼ gamma(αj , βj )
ϕt |St = j ∼ von Mises(µj , κj ),

where αj is the shape and βj the rate of the gamma distribution, and µj is the mean and κj the concentration
of the von Mises distribution. The larger the concentration, the smaller the variance of the turning angles
around their mean.
We find it more convenient to parametrise the gamma distribution in terms of its mean and standard deviation,
rather than its scale and rate parameters (default in R and Stan). We use the following transformation to
obtain one set of parameters from the other:

mean2 mean
shape = , rate = .
SD2 SD2
The mean parameter of the von Mises distribution is constrained between −π and π. This can cause estimation
issues, if the sampler gets stuck around either bound. To address this problem, we consider the alternative
parametrisation: for each state j,
(

j = κj cos(µj )
ϕ (6)
yj = κj sin(µj )

The point (xϕ ϕ


j , yj ) is unconstrained in R .
2

The following code implements a N -state HMM with gamma and von Mises state-dependent distributions,
with the possibility to include covariates in the state process. We describe each block separately.
data {
int<lower=0> T; // length of the time series
int ID[T]; // track identifier
vector[T] steps; // step lengths
vector[T] angles; // turning angles
int<lower=1> N; // number of states
int nCovs; // number of covariates
matrix[T,nCovs+1] covs; // covariates
}
In the ‘data’ block, we include the vector of step lengths, the vector of turning angles, and the (design) matrix
of covariate values. The design matrix has one column of 1s, corresponding to the intercept, and one column
for each covariate. We also need to specify the length of the time series (i.e. number of locations), the number
of states (two in the analysis), and the number of covariates (three in the analysis: temperature, slope, and
slope2 ).
parameters {
positive_ordered[N] mu; // mean of gamma - ordered
vector<lower=0>[N] sigma; // SD of gamma
// unconstrained angle parameters
vector[N] xangle;
vector[N] yangle;
// regression coefficients for transition probabilities
matrix[N*(N-1),nCovs+1] beta;
}
We define the state-dependent movement parameters: the mean and standard deviation of the gamma
distribution (step lengths), and the transformed unconstrained parameters of the turning angle distribution

15
defined in Equation 6. The vector of mean step lengths is defined to be ordered, to avoid label switching.
We also introduce the matrix of regression coefficients for the transition probabilities, with one row for each
non-diagonal entry of the transition probability matrix, and one column for each covariable (plus one for the
intercept).
transformed parameters {
vector<lower=0>[N] shape;
vector<lower=0>[N] rate;
vector<lower=-pi(),upper=pi()>[N] loc;
vector<lower=0>[N] kappa;

// derive turning angle mean and concentration


for(n in 1:N) {
loc[n] = atan2(yangle[n], xangle[n]);
kappa[n] = sqrt(xangle[n]*xangle[n] + yangle[n]*yangle[n]);
}

// transform mean and SD to shape and rate


for(n in 1:N)
shape[n] = mu[n]*mu[n]/(sigma[n]*sigma[n]);

for(n in 1:N)
rate[n] = mu[n]/(sigma[n]*sigma[n]);
}
In the ‘transformed parameters’, we calculate the parameters expected by the state-dependent pdfs, i.e. the
shape and rate of the gamma distribution, and the location (mean) and concentration of the von Mises
distribution.
model {
vector[N] logp;
vector[N] logptemp;
matrix[N,N] gamma[T];
matrix[N,N] log_gamma[T];
matrix[N,N] log_gamma_tr[T];

// priors
mu ~ normal(0, 5);
sigma ~ student_t(3, 0, 1);
xangle[1] ~ normal(-0.5, 1); // equiv to concentration when yangle = 0
xangle[2] ~ normal(2, 2);
yangle ~ normal(0, 0.5); // zero if mean angle is 0 or pi

// derive array of (log-)transition probabilities


for(t in 1:T) {
int betarow = 1;
for(i in 1:N) {
for(j in 1:N) {
if(i==j) {
gamma[t,i,j] = 1;
} else {
gamma[t,i,j] = exp(beta[betarow] * to_vector(covs[t]));
betarow = betarow + 1;
}

16
}
}

// each row must sum to 1


for(i in 1:N)
log_gamma[t][i] = log(gamma[t][i]/sum(gamma[t][i]));
}

// transpose
for(t in 1:T)
for(i in 1:N)
for(j in 1:N)
log_gamma_tr[t,j,i] = log_gamma[t,i,j];

// likelihood computation
for (t in 1:T) {
// initialise forward variable if first obs of track
if(t==1 || ID[t]!=ID[t-1])
logp = rep_vector(-log(N), N);

for (n in 1:N) {
logptemp[n] = log_sum_exp(to_vector(log_gamma_tr[t,n]) + logp);
if(steps[t]>=0)
logptemp[n] = logptemp[n] + gamma_lpdf(steps[t] | shape[n], rate[n]);
if(angles[t]>=(-pi()))
logptemp[n] = logptemp[n] + von_mises_lpdf(angles[t] | loc[n], kappa[n]);
}
logp = logptemp;

// add log forward variable to target at the end of each track


if(t==T || ID[t+1]!=ID[t])
target += log_sum_exp(logp);
}
}
We derive the transition probability matrix Γ(t) , at each time point, from the regression coefficients and the
covariates values provided. We store the log transition probabilities, which we use in the forward algorithm,
in the array log_gamma_tr. Note that each matrix (each layer of the array) is transposed, so that each row
corresponds to the probabilities of transitioning into a state, rather than out of a state.
We choose priors on the movement parameters based on previous biological knowledge of the movements of
the wild haggis.
The loop over the observations corresponds to the forward algorithm, on the log-scale to obtain the log-
likelihood and circumvent numerical problems. At time t, the j-th element of the log forward variable can be
written as

N
!
X
log(αt,j ) = log γij αt−1,i
i=1
N
!
X
= log exp(log(γij ) + log(αt−1,i ))
i=1

where the {log(γij )}N


i=1 are given by the j-th row of the (transposed) matrix of log transition probabilities

17
log_gamma_tr, and the log(αt−1,i ) are obtained iteratively.
We fit the model to the haggis data.
# set NAs to out-of-range values
data$step[is.na(data$step)] <- -10
data$angle[is.na(data$angle)] <- -10
data$ID <- as.numeric(data$ID)

stan.data <- list(T=nrow(data), ID=data$ID, steps=data$step, angles=data$angle, N=2, nCovs=3,


covs=cbind(1, scale(data$temp), scale(data$slope), scale(data$slope)^2))

inits <- list(list(mu=c(1,5), sigma=c(1,5), xangle=c(-1,3),


yangle=c(0,0), beta=matrix(c(-2,-2,0,0,0,0,0,0),nrow=2)),
list(mu=c(1,5), sigma=c(1,5), xangle=c(-1,3),
yangle=c(0,0), beta=matrix(c(-2,-2,0,0,0,0,0,0),nrow=2)))

fit <- stan(file="HMMmovement.stan", data=stan.data, iter=1000, init=inits,


control=list(adapt_delta=0.9), chains=2)

We can obtain summaries and diagnostics from the fitted model object:
get_elapsed_time(fit)

## warmup sample
## chain:1 963.890 2480.15
## chain:2 945.301 2494.70
summary(fit, pars = c("shape", "rate", "loc", "kappa"), probs = c(0.05, 0.95))$summary

## mean se_mean sd 5% 95%


## shape[1] 4.1182991 0.0037457471 0.118450922 3.9278760 4.3177401
## shape[2] 2.7859819 0.0020680500 0.065397484 2.6807733 2.8941504
## rate[1] 4.1433578 0.0040786320 0.128977668 3.9304840 4.3526516
## rate[2] 0.5576037 0.0004364508 0.013801786 0.5349910 0.5814677
## loc[1] -2.4851165 0.0596924707 1.851187485 -3.1345873 3.1300015
## loc[2] -0.3091046 0.0001920197 0.006072196 -0.3191760 -0.2989181
## kappa[1] 1.0116248 0.0011559750 0.036555138 0.9522811 1.0730602
## kappa[2] 8.0071543 0.0063207519 0.199879724 7.6993283 8.3305498
## n_eff Rhat
## shape[1] 1000.0000 1.0029196
## shape[2] 1000.0000 0.9989344
## rate[1] 1000.0000 1.0029198
## rate[2] 1000.0000 0.9991912
## loc[1] 961.7489 0.9984171
## loc[2] 1000.0000 0.9989916
## kappa[1] 1000.0000 0.9988296
## kappa[2] 1000.0000 0.9988326
We plot the estimated step length and turning angle densities for each state.
# restore NAs
data$step[data$step < 0] <- NA
data$angle[data$angle < (-pi)] <- NA

# unpack posterior draws


shape <- extract(fit, pars = "shape")$shape
rate <- extract(fit, pars = "rate")$rate

18
1.2
1.0

1.0
0.8

0.8
0.6
density

density

0.6
0.4

0.4
0.2

0.2
0.0

0.0
0 5 10 15 20 −3 −2 −1 0 1 2 3

step length turnging angle

Figure 5: Histograms of the observed step lengths (left) and turning angles (right), with the estimated
state-dependent densities weighted by the stationary distributions.

loc <- extract(fit, pars = "loc")$loc


kappa <- extract(fit, pars = "kappa")$kappa

# indices of posterior draws to plot (thinned for visualisation purposes)


ind <- seq(1, nrow(shape), by = 5)

# plot step length densities


stepgrid <- seq(min(data$step, na.rm = TRUE), max(data$step, na.rm = TRUE),
length = 100)
plot(NA, xlim = c(0, 20), ylim = c(0, 1.1), xlab = "step length", ylab = "density")
for (i in ind) {
# plot density for each state
points(stepgrid, dgamma(stepgrid, shape = shape[i, 1], rate = rate[i, 1]),
type = "l", lwd = 0.2, col = adjustcolor(pal[1], alpha.f = 0.1))
points(stepgrid, dgamma(stepgrid, shape = shape[i, 2], rate = rate[i, 2]),
type = "l", lwd = 0.2, col = adjustcolor(pal[2], alpha.f = 0.1))
}

# plot turning angle densities


anglegrid <- seq(-pi, pi, length = 100)
plot(NA, xlim = c(-pi, pi), ylim = c(0, 1.2), xlab = "turnging angle", ylab = "density")
for (i in ind[-1]) {
# plot density for each state
points(anglegrid, dvm(anglegrid, mu = loc[i, 1], kappa = kappa[i, 1]), type = "l",
lwd = 0.2, col = adjustcolor(pal[1], alpha.f = 0.1))
points(anglegrid, dvm(anglegrid, mu = loc[i, 2], kappa = kappa[i, 2]), type = "l",
lwd = 0.2, col = adjustcolor(pal[2], alpha.f = 0.1))
}

We can also plot the transition probabilities as functions of the covariates. For example, we use the following
code to visualise the effect of the slope on the transition probabilities when temperature is equal to 10.

19
# extract parameters of the t.p.m
samp <- as.matrix(fit)
beta <- samp[, grep("beta", colnames(samp))]

# build a design matrix


gridslope <- seq(min(data$slope), max(data$slope), length = 100)
gridslopesc <- seq(min(scale(data$slope)), max(scale(data$slope)), length = 100)
fixedtemp <- 10
DM <- cbind(1, fixedtemp, gridslopesc, gridslopesc^2)

# indices of posterior draws to plot (thinned for visualisation purposes)


ind <- seq(1, nrow(samp), by = 5)

# plot the transition probabilities


par(mfrow = c(2, 2))
for (i in 1:2) {
for (j in 1:2) {
tpm <- moveHMM:::trMatrix_rcpp(nbStates = 2, beta = t(matrix(beta[ind[1],
], ncol = ncol(DM))), covs = DM)
plot(gridslope, tpm[i, j, ], type = "l", ylim = c(0, 1), col = rgb(1,
0, 0, 0.3), lwd = 0.5, xlab = "slope", ylab = paste0("Pr(", i, " -> ",
j, ")"))
for (k in ind[-1]) {
tpm <- moveHMM:::trMatrix_rcpp(nbStates = 2, beta = t(matrix(beta[k,
], ncol = ncol(DM))), covs = DM)
points(gridslope, tpm[i, j, ], type = "l", col = rgb(0, 0, 0, 0.2),
lwd = 0.5)
}
}
}
par(mfrow = c(1, 1))

We perform the same graphical posterior predictive checks from before. First, we simulate data using draws
from the posterior distribution of the parameters:
## generate new data sets

n.sims <- dim(kappa)[1]


n <- length(data$step)

# state sequences
ppstates <- matrix(NA, nrow = n.sims, ncol = n)
# observations
ppsteps <- matrix(NA, nrow = n.sims, ncol = n)
ppangs <- matrix(NA, nrow = n.sims, ncol = n)

DM <- cbind(1, scale(data$temp), scale(data$slope), scale(data$slope)^2)


for (j in 1:n.sims) {
tpm <- moveHMM:::trMatrix_rcpp(nbStates = 2, beta = t(matrix(beta[j, ],
ncol = ncol(DM))), covs = DM)
initdist <- rep(1/N, N)

ppstates[j, 1] <- sample(1:N, size = 1, prob = initdist)

20
1.0

1.0
0.8

0.8
Pr(1 −> 1)

Pr(1 −> 2)
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

0 10 20 30 40 0 10 20 30 40

slope slope
1.0

1.0
0.8

0.8
Pr(2 −> 1)

Pr(2 −> 2)
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

0 10 20 30 40 0 10 20 30 40

slope slope

Figure 6: Posterior transition probabilities as functions of the slope covariate.

21
ppsteps[j, 1] <- rgamma(1, shape = shape[j, ppstates[j, 1]], rate = rate[j,
ppstates[j, 1]])
ppangs[j, 1] <- rvm(1, mean = loc[j, ppstates[j, 1]], k = kappa[j, ppstates[j,
1]])

for (i in 2:n) {
ppstates[j, i] <- sample(1:N, size = 1, prob = tpm[ppstates[j, i - 1],
, i])
ppsteps[j, i] <- rgamma(1, shape = shape[j, ppstates[j, i]], rate = rate[j,
ppstates[j, i]])
ppangs[j, i] <- rvm(1, mean = loc[j, ppstates[j, i]], k = kappa[j, ppstates[j,
i]])
}
}

for (j in 1:n.sims) ppangs[j, ] <- as.numeric(minusPiPlusPi(ppangs[j, ]))

We check that the densities of the replicated data sets are similar to the observed data set, for both step
lengths and turning angles.
ppc_dens_overlay(data$step[which(!is.na(data$step))],
ppsteps[1:100,which(!is.na(data$step))])

y
y rep

10 20

ppc_dens_overlay(data$angle[which(!is.na(data$angle))],
ppangs[1:100,which(!is.na(data$angle))])

y
y rep

−3 −2 −1 0 1 2 3

We compare the observed autocorrelation with the autocorrelation of the simulated data sets.

22
nlags <- 61
# observed acf
oac = acf(data$step[2:(n - 1)], lag.max = (nlags - 1), plot = FALSE, na.action = na.pass)$acf
ppac = matrix(NA, n.sims, nlags)
for (i in 1:n.sims) {
ppac[i, ] = acf(ppsteps[i, ], lag.max = (nlags - 1), plot = FALSE)$acf
}

hpd.acf <- HPDinterval(as.mcmc(ppac), prob = 0.95)


dat <- data.frame(y = 1:61, acf = as.numeric(oac), lb = hpd.acf[, 1], ub = hpd.acf[,
2])

ggplot(dat, aes(y, acf)) + geom_ribbon(aes(x = y, ymin = lb, ymax = ub), fill = "grey70",
alpha = 0.5) + geom_point(col = "purple", size = 1) + geom_line() + coord_cartesian(xlim = c(2,
60), ylim = c(-0.1, 0.5)) + xlab("Lag") + ylab("ACF") + ggtitle("Observed Autocorrelation Function
with 90% CI for ACF of Predicted Quantities")

Observed Autocorrelation Function


with 90% CI for ACF of Predicted Quantities

0.4
ACF

0.2

0.0

0 20 40 60
Lag

We also compute the forecast (pseudo-)residuals for the step lengths. We make an adjustment to the previous
code because the t.p.m. is no longer stationary.
##### R code for Forecast (Pseudo-)Residuals

## Calculating forward variables


HMM.lalpha <- function(allprobs, gamma, delta, n, N) {
lalpha <- matrix(NA, N, n)

lscale <- 0
foo <- delta * allprobs[1, ]
lscale <- 0
lalpha[, 1] <- log(foo) + lscale
sumfoo <- sum(foo)
for (i in 2:n) {
foo <- foo %*% gamma[, , i] * allprobs[i, ]
# scaling
sumfoo <- sum(foo)

23
lscale <- lscale + log(sumfoo)
foo <- foo/sumfoo
lalpha[, i] <- log(foo) + lscale
}
lalpha
}

## Calculating forecast (pseudo-)residuals


HMM.psres <- function(x, allprobs, gamma, n, N, shape, rate) {

delta <- rep(1/N, N)

la <- HMM.lalpha(allprobs, gamma, delta, n, N)

pstepmat <- matrix(NA, n, N)


fres <- rep(NA, n)
ind.step <- which(!is.na(x))

for (j in 1:length(ind.step)) {
pstepmat[ind.step[j], 1] <- pgamma(x[ind.step[j]], shape = shape[1],
rate = rate[1])
pstepmat[ind.step[j], 2] <- pgamma(x[ind.step[j]], shape = shape[2],
rate = rate[2])
}

if (!is.na(x[1]))
fres[1] <- qnorm(rbind(c(1, 0)) %*% pstepmat[1, ])
for (i in 2:n) {

c <- max(la[, i - 1])


a <- exp(la[, i - 1] - c)
if (!is.na(x[i]))
fres[i] <- qnorm(t(a) %*% (gamma[, , i]/sum(a)) %*% pstepmat[i,
])
}
return(list(fres = fres))
}
shape.est <- colMeans(shape)
rate.est <- colMeans(rate)
beta.est <- colMeans(beta)

allprobs <- matrix(1, nrow = n, ncol = N)


for (j in 1:N) allprobs[which(!is.na(data$step)), j] <- dgamma(data[which(!is.na(data$step)),
"step"], shape = shape.est, rate = rate.est)

gamma <- moveHMM:::trMatrix_rcpp(nbStates = 2, beta = t(matrix(beta.est, ncol = ncol(DM))),


covs = DM)

fres <- HMM.psres(x = data$step, allprobs = allprobs, gamma = gamma, n = n,


N = N, shape = shape.est, rate = rate.est)

24
Plotting the residuals in a Q-Q plot:
ggplot(data=data.frame(x=fres$fres), aes(sample = x)) + stat_qq() +
stat_qq_line(color="purple", size=1) +
ggtitle("Q-Q Plot") + theme_classic()

Q−Q Plot
4

2
sample

−2

−4
−4 −2 0 2 4
theoretical

Interpreting the results - Proceed with caution

The application of an HMM to positional data is meant to serve as the data generating mechanism. Ideally,
the estimated states serve as proxies for biologically meaningful behaviors, but. . .

Important Lesson:

The adequacy of the HMM is in no way validated by how well the states
correspond to general behaviors of interest.

Keeping this in mind, is an HMM useful? Yes, of course! The usefulness lies in combining biological expertise
with a modeling framework that is intuitive (from a biological standpoint) to use as the data generating
mechanism for the observed movement data. But an HMM is not magic, nor will any other unsupervised
technique magically identify general behaviors of interest without some understanding of the biological
mechanism.
For various animals, the movement patterns that manifest themselves in positional data do tend to follow a
general pattern: directed movements tend to correlate with large step lengths and larger turning angles are
generally associated with shorter step lengths. In terrestrial animals, this can broadly serve as proxies for
areas in which the animal will forage or travel through. In marine animals, like sharks, we have interepreted
the states to correspond to area-restricted search and traveling behavior. Nonetheless, the HMM is useful for
clustering movement patterns into these general behavioral states. From there, we can incorporate covariates
to understand what may drive an animal to remain in a certain area (chum in the water, habitat quality,
etc.).

25
En fin, an HMM can be quite a useful tool for the analysis of animal movement data, blending important
ecological knowledge with sophisticated modeling techniques. And importantly, inferences can be made in
the Stan programming language.

Acknowledgements
We thank Juan M. Morales and Roland Langrock for feedback on an earlier version.

References
Alexandrovich, G., Holzmann, H. & Leister, A. (2016) Nonparametric identification and maximum likelihood
estimation for hidden Markov models. Biometrika, 103, 423–434.
Betancourt, M. (2017). Identifying Bayesian Mixture Models. Retrieved from https://betanalpha.github.io/
assets/case_studies/identifying_mixture_models.html
Betancourt, M. (2018). A Principled Bayesian Workflow. Retrieved from https://betanalpha.github.io/
assets/case_studies/principled_bayesian_workflow.html
Gabry, J. and Mahr, T. (2018). bayesplot: Plotting for Bayesian Models. R package version 1.5.0. https:
//CRAN.R-project.org/package=bayesplot
Langrock, R. & Zucchini, W. (2011) Hidden Markov models with arbitrary state dwell-time distributions.
Computational Statistics and Data Analysis, 55, 715–724.
Langrock, R., Kneib, T., Sohn, A., & DeRuiter, S. L. (2015) Nonparametric inference in hidden Markov
models using P-splines. Biometrics, 71(2), 520–528.
Michelot, T., Langrock, R. & Patterson, T.A. (2016) moveHMM: an R package for the statistical modelling
of animal movement data using hidden Markov models. Methods in Ecology and Evolution, 7, 1308–1315.
Morales, J.M., Haydon, D.T., Frair, J., Holsinger, K.E., & Fryxell, J.M. (2004). Extracting more out of
relocation data: building movement models as mixtures of random walks. Ecology, 85(9), 2436–2445.
Plummer, M., Best, N., Cowles, K and Vines, K. (2006). CODA: Convergence Diagnosis and Output Analysis
for MCMC, R News, vol 6, 7-11
Stan Development Team (2018). RStan: the R interface to Stan. R package version 2.17.3. http://mc-stan.
org/.
Wickham, H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
Zucchini, W., MacDonald, I.L. Langrock, R. (2016) Hidden Markov Models for Time Series: An Introduction
using R, 2nd Edition, Chapman & Hall/CRC, FL, Boca Raton

26
Extras

State decoding

We can obtain inferences into the hidden state process, using either global decoding' (Viterbi
algorithm) orlocal decoding’ (forward-backward algorithm). For a set of estimated parameters, the
Viterbi algorithm computes the sequence of states most likely to have given rise to the observed data. The
forward-backward algorithm is used to derive state probabilities, i.e. probabilities of being in each state at
each time step.
In Stan, we can obtain the most likely state sequence (and/or state probabilities) for each posterior draw.
We include the Viterbi and forward-backward algorithms in the ‘generated quantities’ block, as shown below.
generated quantities {
int<lower=1,upper=N> viterbi[T];
real stateProbs[T,N];
vector[N] lp;
vector[N] lp_p1;

// Viterbi algorithm (most likely state sequence)


{
real max_logp;
int back_ptr[T, N];
real best_logp[T, N];

for (t in 1:T) {
if(t==1 || ID[t]!=ID[t-1]) {
for(n in 1:N)
best_logp[t, n] = gamma_lpdf(steps[t] | shape[n], rate[n]);
} else {
for (n in 1:N) {
best_logp[t, n] = negative_infinity();
for (j in 1:N) {
real logp;
logp = best_logp[t-1, j] + log_theta[t,j,n];
if(steps[t]>0)
logp = logp + gamma_lpdf(steps[t] | shape[n], rate[n]);
if(angles[t]>(-pi()))
logp = logp + von_mises_lpdf(angles[t] | loc[n], kappa[n]);

if (logp > best_logp[t, n]) {


back_ptr[t, n] = j;
best_logp[t, n] = logp;
}
}
}
}
}

for(t0 in 1:T) {
int t = T - t0 + 1;
if(t==T || ID[t+1]!=ID[t]) {
max_logp = max(best_logp[t]);

27
for (n in 1:N)
if (best_logp[t, n] == max_logp)
viterbi[t] = n;
} else {
viterbi[t] = back_ptr[t+1, viterbi[t+1]];
}
}
}

// forward-backward algorithm (state probabilities)


{
real logalpha[T,N];
real logbeta[T,N];
real llk;

// log alpha probabilities


for(t in 1:T) {
if(t==1 || ID[t]!=ID[t-1]) {
for(n in 1:N)
lp[n] = -log(N);
}

for (n in 1:N) {
lp_p1[n] = log_sum_exp(to_vector(log_theta_tr[t,n]) + lp);
if(steps[t]>=0)
lp_p1[n] = lp_p1[n] + gamma_lpdf(steps[t] | shape[n], rate[n]);
if(angles[t]>=(-pi())) {
lp_p1[n] = lp_p1[n] + von_mises_lpdf(angles[t] | loc[n], kappa[n]);
}
logalpha[t,n] = lp_p1[n];
}
lp = lp_p1;
}

// log beta probabilities


for(t0 in 1:T) {
int t = T - t0 + 1;

if(t==T || ID[t+1]!=ID[t]) {
for(n in 1:N)
lp_p1[n] = 0;
} else {
for(n in 1:N) {
lp_p1[n] = log_sum_exp(to_vector(log_theta_tr[t+1,n]) + lp);
if(steps[t+1]>=0)
lp_p1[n] = lp_p1[n] + gamma_lpdf(steps[t+1] | shape[n], rate[n]);
if(angles[t+1]>=(-pi()))
lp_p1[n] = lp_p1[n] + von_mises_lpdf(angles[t+1] | loc[n], kappa[n]);
}
}
lp = lp_p1;
for(n in 1:N)
logbeta[t,n] = lp[n];
}

28
// state probabilities
for(t0 in 1:T) {
int t = T - t0 + 1;
if(t==T || ID[t+1]!=ID[t])
llk = log_sum_exp(logalpha[t]);
for(n in 1:N)
stateProbs[t,n] = exp(logalpha[t,n] + logbeta[t,n] - llk);
}
}
}

29

You might also like