An Introduction To Animal Movement Modeling With Hidden Markov Models Using Stan For Bayesian Inference
An Introduction To Animal Movement Modeling With Hidden Markov Models Using Stan For Bayesian Inference
2
University of Sheffield - [email protected]
Introduction
Hidden Markov models (HMMs) are popular time series model in many fields including ecology, economics
and genetics. HMMs can be defined over discrete or continuous time, though here we only cover the former. In
the field of movement ecology in particular, HMMs have become a popular tool for the analysis of movement
data because of their ability to connect observed movement data to an underlying latent process, generally
interpreted as the animal’s unobserved behavior. Further, we model the tendency to persist in a given
behavior over time.
Those already familiar with Michael Betancourt’s case study “Identifying Bayesian Mixture Models” will
see a natural extension from the independent mixture models that are discussed therein to an HMM, which
can also be referred to as a dependent mixture model. Notation presented here will generally follow the
format of Zucchini et al. (2016) and cover HMMs applied in an unsupervised case to animal movement data,
specifically positional data. We provide Stan code to analyze movement data of the wild haggis as presented
first in Michelot et al. (2016). Implementing HMMs in Stan has also been covered by Luis Damiano here:
https://github.com/luisdamiano/gsoc17-hhmm For a thorough overview of HMMs, see Zucchini et al. (2016).
1
For a time-homogeneous process we can use the stationary distribution as the initial state distribution,
otherwise we can estimate the distribution.
Likelihood
There are two functions referred to as the “likelihood” in the HMM literature, the complete-data likelihood,
i.e. the joint distribution of the observations and states, or the marginal likelihood, i.e. the joint distribution
of the observations only. The complete-data likelihood is written as follows,
T
Y T
Y
f (y, s) = Lc = δs(1)
1
γst−1 ,st fst (yt ) (1)
t=2 t=1
The simplicity of the complete-data likelihood formulation may be one reason why many conduct inference
for parameters and states jointly, typically through a Gibbs sampler, alternating between estimation of states
and parameters. In contrast, evaluation of the marginal likelihood requires summation over all possible state
sequences,
N
X N
X T
Y T
Y
Lm = ··· δs(1)
1
γst−1 ,st fst (yt ) (2)
s1 =1 sT =1 t=2 t=1
However, evaluation of the marginal likelihood is necessary for implementation in Stan as the states are
discrete random variables. Zucchini et al. (2016) show that the marginal likelihood can be written explicitly
as a matrix product,
for an N × N matrix P(yt ) = diag (f1 (yt ), . . . , fN (yt )) and a vector of 1s of lengthN , 1 = (1, . . . , 1). For
observations missing at random, we simply have P(yt ) = IN ×N . The marginal likelihood can be calculated
efficiently with the forward algorithm, which calculates the likelihood recursively. We define the forward
variables αt , beginning at time t = 1, as follows
N
X
Lm = f (y1 , . . . , yT ) = αT (i) = αT 1> . (5)
i=1
Notably, the computational effort involved in evaluating Lm is only linear in T , the number of observations, for
a given number of states, N . Direct evaluation of the likelihood can result in numerical underflow. However,
we can also use the forward algorithm to evaluate the log marginal likelihood, log(Lm ), and avoid underflow
when calculating each forward variable – as demonstrated in the implementation in Stan model given below.
For the HMM details we provide here, we assume the following:
• The state-dependent distributions are distinct, f1 6= · · · 6= fN ;
• The t.p.m. Γ has full rank and is ergodic
These two points are sufficient for an HMM to be identifiable. The first point is important when applying an
HMM to animal movement data because the states are assumed to reflect different behaviors. The second
point indicates we would like for the animal to be able to transition between behaviors across time.
2
Priors
An HMM has two main sets of parameters that require specification of prior distributions, the parameters
corresponding to the i) state-dependent distributions and ii) transition probabilities, with a possible third set
if estimating the initial distribution as well.
However, because an HMM lies within the class of mixture models, the lack of identifiability due to label-
switching (i.e. a reordering of indices can lead to same joint distribution) should be taken into account.
State-dependent distributions
As in the independent mixture models discussed in Betancourt (2017), identification and inferences of the
state-dependent distributions of an HMM can be problematic. Issues related to label-switching can make
it difficult for the MCMC chains to efficiently explore the parameter space. In practice, HMMs are also
notorious for their multi-modality. As such, some additional restrictions and information, such as ordering
of a subset of the parameters of interest and/or informative priors can aid inference. For example, we can
impose an ordering on the means, µ1 < µ2 < · · · < µN , of the state-dependent distributions (if possible),
which is easily done in Stan:
parameters {
postive_ordered[N] mu;
}
Other parametrizations can also be used to order the means. For example, given µ1 ∈ R and a vector of
length N − 1, η ∈ R+ , µn = µn−1 + ηn−1 , for n ∈ 2, . . . , N .
parameters {
real mu;
vector<lower=0> etas[N-1];
}
transformed parameters{
vector[N] ord_mus;
ord_mus[1] = mu;
for(n in 2:N)
ord_mus[n] = ord_mus[n-1] + etas[n-1];
}
As the state-dependent distributions reflect characteristics of the observed data, priors for the parameters of
interest should not place the bulk of the probability on values that are unrealistic. Also, note that because of
potential label-switching, some type of ordering will likely be needed so that the priors correspond to the
appropriate distributions (if not exchangeable). See Betancourt (2017) for similar issues in mixture models.
It is typically easier to form some type of intuition of the parameters of the state-dependent distributions than
entries of the t.p.m. However, in animal movement data there is generally persistence in the estimated states
that we would like to capture (hence the reason for using HMMs). For the model, this behavior corresponds
to large diagonal entries, γn,n for n ∈ {1, . . . , N }, typically >0.8 in our own experience, though this could of
course vary depending on the temporal resolution of the data and question of interest.
3
Capturing Important Features of the Data and Model Evaluation
There are two features of the movement data that we aim to capture, a) the marginal distribution of Yt and
b) the temporal dependence of the observed data (e.g. autocorrelation).
The marginal distribution of yt of an HMM is the distribution a given observation at time t unconditional
on the states, f (yt |θ), with θ reflecting the state-dependent parameters requiring estimation. For a time-
homogeneous process with stationary distribution, δ, the marginal distribution is derived as a mixture of the
state-dependent densities weighted by entries of δ,
In the analysis of animal movement data, the stationary distribution can give the ecologist an estimate of
the proportion of time that the animal exhibits the states (and related behaviors) overall. However, it is
important to not only report this result of the HMM because there are inifitely many HMM formulations
that lead to the same marginal distribution for yt . For example,
HMM1 HMM2
State-dependent Dist. f1 ∼ N (0, 4); f1 ∼ N (5, 5) f2 ∼ N (0, 4); f2 ∼ N (5, 5)
TPM Γ = 0.7 0.3
0.3 0.7 Γ = 0.95 0.0.5
0.05 0.95
Stationary Dist δ = (0.5 0.5) δ = (0.5 0.5)
Marginal Dist. 0.5 · f1 + 0.5 · f2 0.5 · f1 + 0.5 · f2
This result is a key difference between independent mixture models and HMMs. An HMM is identifiable, even
given the above result, because there is dependence over time that we take into account via the transition
probability matrix, Γ. The marginal distribution does not completely relay all of the information about the
manner in which the data were generated. In particular, taking into account the temporal dependence, as an
HMM does, allows for identification of state-dependent distributions that may significantly overlap and other
flexible forms, see Alexandrovich et al. (2016) and Langrock et al. (2015).
Aside from capturing the marginal distribution of yt , we also aim to capture the temporal dependence present
in the data. In particular, the autocorrelation structure of data produced by the fitted HMM should be
comparable to the data itself. As a result, this can be a key characteristics with which to do posterior
predictive checking (Morales et al., 2004).
Forecast (Pseudo-)Residuals
One manner in which the fitted HMM can be assessed is through evaluation of the (pseudo-)residuals. The
pseudo-residuals are computed in two steps. First, for continuous observations, the uniform pseudo-residuals
ut are defined as
4
where Φ is the cumulative distribution function of the standard normal distribution. If the fitted HMM is
the true data-generating process, the rt have a standard normal distribution. In practice, a qq-plot can be
used to compare the distribution of pseudo-residuals to the standard normal, and assess the fit. Further,
the (pseudo-)residuals of a fitted HMM should not be autocorrelated, indicating that the dependence is
adequately captured.
Posterior predictive checks allow one to assess the adequacy of the fitted model by generating M replicate
data sets from the distribution f (y ∗ ky) = f (y ∗ |θ)f (θky). In particular here, we use these checks to assess
the fitted model’s ability to be interpreted as the data generating mechanism. The main idea is that the
model should be able to produce data that is similar to the one observed in the key features defined a priori.
Given M posterior draws θ ∗1 , . . . , θ ∗M , we generate M data sets from the distribution f (y ∗ kθ ∗ ). We then
compare key features of the replicate data sets to observed features. See Betancourt (2018) for more details.
We demonstrate a few graphical posterior predictive checks in the HMM examples.
State Estimation
In animal movement modeling (the focus presented here), estimation of the underlying state sequence is not
the primary focus of the analysis but rather a convenient byproduct of the HMM framework. It is most
important that the estimated state-dependent distributions can be connected to biologically meaningful
processes, though state estimation can help one visualize the results of the fitted models.
There are two approaches to state estimation:
• Local State Decoding: P r(St |y1 , . . . , yT , θ) OR
• Global State Decoding: P r(S1 , . . . , ST |y1 , . . . , yT , θ)
The first considers the distribution of the state at time t, St , given the observations and estimated parameters
θ. These distributions can be obtained through implementation of the forward-backward algorithm.
The aim of the second approach is to obtain the most likely state sequence given all of the observations.
For this, we use the Viterbi algorithm which returns the most likely state sequence given the observations
and estimated parameters. Both approaches are already covered by Luis Damiano: https://github.com/
luisdamiano/gsoc17-hhmm. In general, both will return similar (if not equal) results when it comes to state
decoding (assigning an observation to one of N states).
Going beyond assignment of observations to states and obtaining the state probabilities at each point in time
can also be highly informative. In particular, two models may result in similar state decodings yet correspond
to different estimates of the parameters of interest. While this is not a problem per se, it can be difficult to
connect the estimated states to key biological processes when the observations have large probabilities of
being associated with more than one state.
# Initialisation
library(rstan)
library(bayesplot)
library(ggplot2)
library(coda)
library(circular)
5
library(moveHMM)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
pal <- c("firebrick","seagreen","navy") # colour palette
set.seed(1)
Before getting into details about how HMMs are applied to animal movement data, we present how to fit a
basic HMM in Stan (Stan 2018). We consider a 2-state HMM with Gaussian state-dependent distributions
for the observation process Xt . That is, at each time step t = 1, 2, . . ., we have
Yt |St = j ∼ N (µj , σ 2 ),
The likelihood of the model can be written with the forward algorithm, given in Equation 4, with
φ(xt |µ1 , σ 2 ) 0
P (xt ) = ,
0 φ(xt |µ2 , σ 2 )
6
10
5
y
0
−5
Index
Figure 1: Simulated observations from a 2-state HMM with Gaussian state-dependent distributions.
data {
int<lower=0> N; // number of states
int<lower=1> T; // length of data set
real y[T]; // observations
}
There are two sets of parameters that requires estimation, Γ and µ, which we define in the parameter block:
parameters {
simplex[N] theta[N]; // N x N tpm
ordered[N] mu; // state-dependent parameters
}
We assume stationarity of the underlying Markov chain and initialize the process with the stationary
distribution, δ. As δ is a function of Γ, we compute it in the transformed parameters block:
transformed parameters{
matrix[N, N] ta; //
simplex[N] statdist; // stationary distribution
for(j in 1:N){
for(i in 1:N){
ta[i,j]= theta[i,j];
}
}
7
model {
vector[N] log_theta_tr[N];
vector[N] lp;
vector[N] lp_p1;
// prior for mu
mu ~ student_t(3, 0, 1);
lp = lp_p1;
}
target += log_sum_exp(lp);
}
We first run 2000 iterations for each of the 4 chains, with the first 1000 draws drawn during the warm-up
phase, and verify that the posterior draws capture the true parameters.
stan.data <- list(y=y, T=nobs, N=2)
fit <- stan(file="HMM1.stan", data=stan.data, refresh=2000)
hist(mus[[1]][,1],main="",xlab=expression(mu[1]))
abline(v=1, col=pal[1], lwd=2)
hist(mus[[1]][,2],main="",xlab=expression(mu[2]))
abline(v=5, col=pal[2], lwd=2)
From the fitted model, we extract the parameters of interest and generate 4000 data sets in order to perform
a few graphical posterior predictive checks.
## extract posterior draws
psam <- extract(fit, pars = c("theta", "mu"))
# state sequences
8
700
600
500
Frequency
Frequency
400
300
200
100
0
0
0.8 1.0 1.2 1.4 1.6 4.8 5.0 5.2 5.4
µ1 µ2
Figure 2: Histograms of the posterior draws for the state-dependent variance. The vertical lines show the
true values used in the simulation.
for (j in 1:n.sims) {
theta <- psam[[1]][j, , ]
statdist <- solve(t(diag(N) - theta + 1), rep(1, N))
for (i in 2:length(y)) {
ppstates[j, i] <- sample(1:N, size = 1, prob = theta[ppstates[j, i -
1], ])
ppobs[j, i] <- rnorm(1, mean = psam[[2]][j, ppstates[j, i]], sd = 2)
}
}
First, we check that the densities of the replicated data sets are similar to the observed data set. For this we
use the R package bayesplot.
ppc_dens_overlay(y, ppobs[1:100,])
9
y
y rep
−5 0 5 10
We also plot the autocorrelation function of the observed data and compare to 90% credible intervals for the
ACF of the replicated data sets.
nlags <- 61
oac = acf(y[2:(n - 1)], lag.max = (nlags - 1), plot = FALSE)$acf # observed acf
ggplot(dat, aes(x, acf)) + geom_ribbon(aes(x = x, ymin = lb, ymax = ub), fill = "grey70",
alpha = 0.5) + geom_point(col = "purple", size = 1) + geom_line() + coord_cartesian(xlim = c(2,
60), ylim = c(-0.1, 0.5)) + xlab("Lag") + ylab("ACF") + ggtitle("Observed Autocorrelation
Function with 90% CI for ACF of Predicted Quantities")
10
Observed Autocorrelation
Function with 90% CI for ACF of Predicted Quantities
0.4
ACF
0.2
0.0
0 20 40 60
Lag
Finally, we use the posterior expected values of the variables of interest to construct the forecast (pseudo-
)residuals.
##### R code for Forecast (Pseudo-)Residuals
lscale <- 0
foo <- delta * allprobs[1, ]
lscale <- 0
lalpha[, 1] <- log(foo) + lscale
sumfoo <- sum(foo)
for (i in 2:n) {
foo <- foo %*% gamma * allprobs[i, ]
sumfoo <- sum(foo)
lscale <- lscale + log(sumfoo)
foo <- foo/sumfoo # scaling
lalpha[, i] <- log(foo) + lscale
}
lalpha
}
11
ind.step <- which(!is.na(x))
for (j in 1:length(ind.step)) {
pstepmat[ind.step[j], 1] <- pnorm(x[ind.step[j]], mean = mu[1], sd = 2)
pstepmat[ind.step[j], 2] <- pnorm(x[ind.step[j]], mean = mu[2], sd = 2)
}
if (!is.na(x[1]))
fres[1] <- qnorm(rbind(c(1, 0)) %*% pstepmat[1, ])
for (i in 2:n) {
12
Q−Q Plot
4
2
sample
−2
−2 0 2
theoretical
Note that it is also possible to construct the distribution of forecast residuals at each time t.
Covariates
In HMMs applied to animal movement, covariates are typically incorporated at the level of the hidden
states. For the general case of time-varying covariates, we define the corresponding time-dependent transition
(t) (t) (t)
probability matrix Γ(t) = (γij ), where γij = Pr(St+1 = j|St = i). The transition probabilities at time t, γij ,
(t) (t)
can then be related to a vector of environmental (or other) covariates, ω1 , . . . , ωp , via the multinomial
logit link:
(
(ij) Pp (ij) (t)
(t) exp(ηij ) β0 + l=1 βl ωl if i 6= j;
γij = PN , where ηij =
k=1 exp(ηik ) 0 otherwise.
Essentially there is one multinomial logit link specification for each row of the matrix Γ(t) , and the entries on
the diagonal of the matrix serve as reference categories.
Motivation
We consider the application of HMMs to the analysis of animal movement tracks. Movement data typically
consist of a bivariate time series of longitude-latitude positions, collected at regular time intervals over the
study period (e.g. hourly locations). HMMs are widely used in movement ecology to describe such data as
arising from several distinct movement patterns, modelled by the underlying Markov chain St . In particular,
these movement patterns serve as proxies for general behaviors of interest. At each time step, we consider
that an animal is in one of N (behavioural) states (e.g. “exploratory”, “foraging”. . . ), on which depend some
metrics of movement. Note that there is generally no 1-1 mapping from state to behavior of interest, but
more on this later.
13
In this context, the most common HMM formulation is based on the step lengths and turning angles, which
can be derived from the location data. The step length Lt is the distance between the two successive locations
Xt and Xt+1 , and the turning angle ϕt is the angle between the two successive directions (Xt−1 , Xt ) and
(Xt , Xt+1 ).
Wild Haggis
We present a simulation study based on the (simulated) wild haggis tracking data from Michelot et al. (2016).
The data set comprises 15 tracks, with slope and temperature covariates.
We use the function prepData in the package moveHMM to derive step lengths and turning angles from the
location data.
rawhaggis <- read.csv("data/haggis.csv")
# derive step lengths and turning angles from locations
data <- prepData(rawhaggis, type="UTM")
Following Michelot et al. (2016), we consider a 2-state HMM with gamma and von Mises state-dependent
distributions. That is, for j ∈ {1, 2}
2500
1500
1000
Frequency
Frequency
1500
500
500
0
0 5 10 15 20 25 −3 −2 −1 0 1 2 3
Figure 4: Histograms of the step lengths (left) and turning angles (right) in the wild haggis data.
14
Lt |St = j ∼ gamma(αj , βj )
ϕt |St = j ∼ von Mises(µj , κj ),
where αj is the shape and βj the rate of the gamma distribution, and µj is the mean and κj the concentration
of the von Mises distribution. The larger the concentration, the smaller the variance of the turning angles
around their mean.
We find it more convenient to parametrise the gamma distribution in terms of its mean and standard deviation,
rather than its scale and rate parameters (default in R and Stan). We use the following transformation to
obtain one set of parameters from the other:
mean2 mean
shape = , rate = .
SD2 SD2
The mean parameter of the von Mises distribution is constrained between −π and π. This can cause estimation
issues, if the sampler gets stuck around either bound. To address this problem, we consider the alternative
parametrisation: for each state j,
(
xϕ
j = κj cos(µj )
ϕ (6)
yj = κj sin(µj )
The following code implements a N -state HMM with gamma and von Mises state-dependent distributions,
with the possibility to include covariates in the state process. We describe each block separately.
data {
int<lower=0> T; // length of the time series
int ID[T]; // track identifier
vector[T] steps; // step lengths
vector[T] angles; // turning angles
int<lower=1> N; // number of states
int nCovs; // number of covariates
matrix[T,nCovs+1] covs; // covariates
}
In the ‘data’ block, we include the vector of step lengths, the vector of turning angles, and the (design) matrix
of covariate values. The design matrix has one column of 1s, corresponding to the intercept, and one column
for each covariate. We also need to specify the length of the time series (i.e. number of locations), the number
of states (two in the analysis), and the number of covariates (three in the analysis: temperature, slope, and
slope2 ).
parameters {
positive_ordered[N] mu; // mean of gamma - ordered
vector<lower=0>[N] sigma; // SD of gamma
// unconstrained angle parameters
vector[N] xangle;
vector[N] yangle;
// regression coefficients for transition probabilities
matrix[N*(N-1),nCovs+1] beta;
}
We define the state-dependent movement parameters: the mean and standard deviation of the gamma
distribution (step lengths), and the transformed unconstrained parameters of the turning angle distribution
15
defined in Equation 6. The vector of mean step lengths is defined to be ordered, to avoid label switching.
We also introduce the matrix of regression coefficients for the transition probabilities, with one row for each
non-diagonal entry of the transition probability matrix, and one column for each covariable (plus one for the
intercept).
transformed parameters {
vector<lower=0>[N] shape;
vector<lower=0>[N] rate;
vector<lower=-pi(),upper=pi()>[N] loc;
vector<lower=0>[N] kappa;
for(n in 1:N)
rate[n] = mu[n]/(sigma[n]*sigma[n]);
}
In the ‘transformed parameters’, we calculate the parameters expected by the state-dependent pdfs, i.e. the
shape and rate of the gamma distribution, and the location (mean) and concentration of the von Mises
distribution.
model {
vector[N] logp;
vector[N] logptemp;
matrix[N,N] gamma[T];
matrix[N,N] log_gamma[T];
matrix[N,N] log_gamma_tr[T];
// priors
mu ~ normal(0, 5);
sigma ~ student_t(3, 0, 1);
xangle[1] ~ normal(-0.5, 1); // equiv to concentration when yangle = 0
xangle[2] ~ normal(2, 2);
yangle ~ normal(0, 0.5); // zero if mean angle is 0 or pi
16
}
}
// transpose
for(t in 1:T)
for(i in 1:N)
for(j in 1:N)
log_gamma_tr[t,j,i] = log_gamma[t,i,j];
// likelihood computation
for (t in 1:T) {
// initialise forward variable if first obs of track
if(t==1 || ID[t]!=ID[t-1])
logp = rep_vector(-log(N), N);
for (n in 1:N) {
logptemp[n] = log_sum_exp(to_vector(log_gamma_tr[t,n]) + logp);
if(steps[t]>=0)
logptemp[n] = logptemp[n] + gamma_lpdf(steps[t] | shape[n], rate[n]);
if(angles[t]>=(-pi()))
logptemp[n] = logptemp[n] + von_mises_lpdf(angles[t] | loc[n], kappa[n]);
}
logp = logptemp;
N
!
X
log(αt,j ) = log γij αt−1,i
i=1
N
!
X
= log exp(log(γij ) + log(αt−1,i ))
i=1
17
log_gamma_tr, and the log(αt−1,i ) are obtained iteratively.
We fit the model to the haggis data.
# set NAs to out-of-range values
data$step[is.na(data$step)] <- -10
data$angle[is.na(data$angle)] <- -10
data$ID <- as.numeric(data$ID)
We can obtain summaries and diagnostics from the fitted model object:
get_elapsed_time(fit)
## warmup sample
## chain:1 963.890 2480.15
## chain:2 945.301 2494.70
summary(fit, pars = c("shape", "rate", "loc", "kappa"), probs = c(0.05, 0.95))$summary
18
1.2
1.0
1.0
0.8
0.8
0.6
density
density
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0 5 10 15 20 −3 −2 −1 0 1 2 3
Figure 5: Histograms of the observed step lengths (left) and turning angles (right), with the estimated
state-dependent densities weighted by the stationary distributions.
We can also plot the transition probabilities as functions of the covariates. For example, we use the following
code to visualise the effect of the slope on the transition probabilities when temperature is equal to 10.
19
# extract parameters of the t.p.m
samp <- as.matrix(fit)
beta <- samp[, grep("beta", colnames(samp))]
We perform the same graphical posterior predictive checks from before. First, we simulate data using draws
from the posterior distribution of the parameters:
## generate new data sets
# state sequences
ppstates <- matrix(NA, nrow = n.sims, ncol = n)
# observations
ppsteps <- matrix(NA, nrow = n.sims, ncol = n)
ppangs <- matrix(NA, nrow = n.sims, ncol = n)
20
1.0
1.0
0.8
0.8
Pr(1 −> 1)
Pr(1 −> 2)
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0 10 20 30 40 0 10 20 30 40
slope slope
1.0
1.0
0.8
0.8
Pr(2 −> 1)
Pr(2 −> 2)
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0 10 20 30 40 0 10 20 30 40
slope slope
21
ppsteps[j, 1] <- rgamma(1, shape = shape[j, ppstates[j, 1]], rate = rate[j,
ppstates[j, 1]])
ppangs[j, 1] <- rvm(1, mean = loc[j, ppstates[j, 1]], k = kappa[j, ppstates[j,
1]])
for (i in 2:n) {
ppstates[j, i] <- sample(1:N, size = 1, prob = tpm[ppstates[j, i - 1],
, i])
ppsteps[j, i] <- rgamma(1, shape = shape[j, ppstates[j, i]], rate = rate[j,
ppstates[j, i]])
ppangs[j, i] <- rvm(1, mean = loc[j, ppstates[j, i]], k = kappa[j, ppstates[j,
i]])
}
}
We check that the densities of the replicated data sets are similar to the observed data set, for both step
lengths and turning angles.
ppc_dens_overlay(data$step[which(!is.na(data$step))],
ppsteps[1:100,which(!is.na(data$step))])
y
y rep
10 20
ppc_dens_overlay(data$angle[which(!is.na(data$angle))],
ppangs[1:100,which(!is.na(data$angle))])
y
y rep
−3 −2 −1 0 1 2 3
We compare the observed autocorrelation with the autocorrelation of the simulated data sets.
22
nlags <- 61
# observed acf
oac = acf(data$step[2:(n - 1)], lag.max = (nlags - 1), plot = FALSE, na.action = na.pass)$acf
ppac = matrix(NA, n.sims, nlags)
for (i in 1:n.sims) {
ppac[i, ] = acf(ppsteps[i, ], lag.max = (nlags - 1), plot = FALSE)$acf
}
ggplot(dat, aes(y, acf)) + geom_ribbon(aes(x = y, ymin = lb, ymax = ub), fill = "grey70",
alpha = 0.5) + geom_point(col = "purple", size = 1) + geom_line() + coord_cartesian(xlim = c(2,
60), ylim = c(-0.1, 0.5)) + xlab("Lag") + ylab("ACF") + ggtitle("Observed Autocorrelation Function
with 90% CI for ACF of Predicted Quantities")
0.4
ACF
0.2
0.0
0 20 40 60
Lag
We also compute the forecast (pseudo-)residuals for the step lengths. We make an adjustment to the previous
code because the t.p.m. is no longer stationary.
##### R code for Forecast (Pseudo-)Residuals
lscale <- 0
foo <- delta * allprobs[1, ]
lscale <- 0
lalpha[, 1] <- log(foo) + lscale
sumfoo <- sum(foo)
for (i in 2:n) {
foo <- foo %*% gamma[, , i] * allprobs[i, ]
# scaling
sumfoo <- sum(foo)
23
lscale <- lscale + log(sumfoo)
foo <- foo/sumfoo
lalpha[, i] <- log(foo) + lscale
}
lalpha
}
for (j in 1:length(ind.step)) {
pstepmat[ind.step[j], 1] <- pgamma(x[ind.step[j]], shape = shape[1],
rate = rate[1])
pstepmat[ind.step[j], 2] <- pgamma(x[ind.step[j]], shape = shape[2],
rate = rate[2])
}
if (!is.na(x[1]))
fres[1] <- qnorm(rbind(c(1, 0)) %*% pstepmat[1, ])
for (i in 2:n) {
24
Plotting the residuals in a Q-Q plot:
ggplot(data=data.frame(x=fres$fres), aes(sample = x)) + stat_qq() +
stat_qq_line(color="purple", size=1) +
ggtitle("Q-Q Plot") + theme_classic()
Q−Q Plot
4
2
sample
−2
−4
−4 −2 0 2 4
theoretical
The application of an HMM to positional data is meant to serve as the data generating mechanism. Ideally,
the estimated states serve as proxies for biologically meaningful behaviors, but. . .
Important Lesson:
The adequacy of the HMM is in no way validated by how well the states
correspond to general behaviors of interest.
Keeping this in mind, is an HMM useful? Yes, of course! The usefulness lies in combining biological expertise
with a modeling framework that is intuitive (from a biological standpoint) to use as the data generating
mechanism for the observed movement data. But an HMM is not magic, nor will any other unsupervised
technique magically identify general behaviors of interest without some understanding of the biological
mechanism.
For various animals, the movement patterns that manifest themselves in positional data do tend to follow a
general pattern: directed movements tend to correlate with large step lengths and larger turning angles are
generally associated with shorter step lengths. In terrestrial animals, this can broadly serve as proxies for
areas in which the animal will forage or travel through. In marine animals, like sharks, we have interepreted
the states to correspond to area-restricted search and traveling behavior. Nonetheless, the HMM is useful for
clustering movement patterns into these general behavioral states. From there, we can incorporate covariates
to understand what may drive an animal to remain in a certain area (chum in the water, habitat quality,
etc.).
25
En fin, an HMM can be quite a useful tool for the analysis of animal movement data, blending important
ecological knowledge with sophisticated modeling techniques. And importantly, inferences can be made in
the Stan programming language.
Acknowledgements
We thank Juan M. Morales and Roland Langrock for feedback on an earlier version.
References
Alexandrovich, G., Holzmann, H. & Leister, A. (2016) Nonparametric identification and maximum likelihood
estimation for hidden Markov models. Biometrika, 103, 423–434.
Betancourt, M. (2017). Identifying Bayesian Mixture Models. Retrieved from https://betanalpha.github.io/
assets/case_studies/identifying_mixture_models.html
Betancourt, M. (2018). A Principled Bayesian Workflow. Retrieved from https://betanalpha.github.io/
assets/case_studies/principled_bayesian_workflow.html
Gabry, J. and Mahr, T. (2018). bayesplot: Plotting for Bayesian Models. R package version 1.5.0. https:
//CRAN.R-project.org/package=bayesplot
Langrock, R. & Zucchini, W. (2011) Hidden Markov models with arbitrary state dwell-time distributions.
Computational Statistics and Data Analysis, 55, 715–724.
Langrock, R., Kneib, T., Sohn, A., & DeRuiter, S. L. (2015) Nonparametric inference in hidden Markov
models using P-splines. Biometrics, 71(2), 520–528.
Michelot, T., Langrock, R. & Patterson, T.A. (2016) moveHMM: an R package for the statistical modelling
of animal movement data using hidden Markov models. Methods in Ecology and Evolution, 7, 1308–1315.
Morales, J.M., Haydon, D.T., Frair, J., Holsinger, K.E., & Fryxell, J.M. (2004). Extracting more out of
relocation data: building movement models as mixtures of random walks. Ecology, 85(9), 2436–2445.
Plummer, M., Best, N., Cowles, K and Vines, K. (2006). CODA: Convergence Diagnosis and Output Analysis
for MCMC, R News, vol 6, 7-11
Stan Development Team (2018). RStan: the R interface to Stan. R package version 2.17.3. http://mc-stan.
org/.
Wickham, H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
Zucchini, W., MacDonald, I.L. Langrock, R. (2016) Hidden Markov Models for Time Series: An Introduction
using R, 2nd Edition, Chapman & Hall/CRC, FL, Boca Raton
26
Extras
State decoding
We can obtain inferences into the hidden state process, using either global decoding' (Viterbi
algorithm) orlocal decoding’ (forward-backward algorithm). For a set of estimated parameters, the
Viterbi algorithm computes the sequence of states most likely to have given rise to the observed data. The
forward-backward algorithm is used to derive state probabilities, i.e. probabilities of being in each state at
each time step.
In Stan, we can obtain the most likely state sequence (and/or state probabilities) for each posterior draw.
We include the Viterbi and forward-backward algorithms in the ‘generated quantities’ block, as shown below.
generated quantities {
int<lower=1,upper=N> viterbi[T];
real stateProbs[T,N];
vector[N] lp;
vector[N] lp_p1;
for (t in 1:T) {
if(t==1 || ID[t]!=ID[t-1]) {
for(n in 1:N)
best_logp[t, n] = gamma_lpdf(steps[t] | shape[n], rate[n]);
} else {
for (n in 1:N) {
best_logp[t, n] = negative_infinity();
for (j in 1:N) {
real logp;
logp = best_logp[t-1, j] + log_theta[t,j,n];
if(steps[t]>0)
logp = logp + gamma_lpdf(steps[t] | shape[n], rate[n]);
if(angles[t]>(-pi()))
logp = logp + von_mises_lpdf(angles[t] | loc[n], kappa[n]);
for(t0 in 1:T) {
int t = T - t0 + 1;
if(t==T || ID[t+1]!=ID[t]) {
max_logp = max(best_logp[t]);
27
for (n in 1:N)
if (best_logp[t, n] == max_logp)
viterbi[t] = n;
} else {
viterbi[t] = back_ptr[t+1, viterbi[t+1]];
}
}
}
for (n in 1:N) {
lp_p1[n] = log_sum_exp(to_vector(log_theta_tr[t,n]) + lp);
if(steps[t]>=0)
lp_p1[n] = lp_p1[n] + gamma_lpdf(steps[t] | shape[n], rate[n]);
if(angles[t]>=(-pi())) {
lp_p1[n] = lp_p1[n] + von_mises_lpdf(angles[t] | loc[n], kappa[n]);
}
logalpha[t,n] = lp_p1[n];
}
lp = lp_p1;
}
if(t==T || ID[t+1]!=ID[t]) {
for(n in 1:N)
lp_p1[n] = 0;
} else {
for(n in 1:N) {
lp_p1[n] = log_sum_exp(to_vector(log_theta_tr[t+1,n]) + lp);
if(steps[t+1]>=0)
lp_p1[n] = lp_p1[n] + gamma_lpdf(steps[t+1] | shape[n], rate[n]);
if(angles[t+1]>=(-pi()))
lp_p1[n] = lp_p1[n] + von_mises_lpdf(angles[t+1] | loc[n], kappa[n]);
}
}
lp = lp_p1;
for(n in 1:N)
logbeta[t,n] = lp[n];
}
28
// state probabilities
for(t0 in 1:T) {
int t = T - t0 + 1;
if(t==T || ID[t+1]!=ID[t])
llk = log_sum_exp(logalpha[t]);
for(n in 1:N)
stateProbs[t,n] = exp(logalpha[t,n] + logbeta[t,n] - llk);
}
}
}
29