0% found this document useful (0 votes)

57 views

Anomaly Detection and Localization

This document summarizes a research paper about detecting and localizing anomalies in crowded scenes. The paper proposes a joint detector of temporal and spatial anomalies based on modeling appearance and dynamics using mixtures of dynamic textures. Spatial and temporal saliency scores are produced at multiple spatial scales and combined using a conditional random field to provide a globally consistent anomaly map. Experiments on a new dataset of crowded pedestrian walkways show the approach achieves state-of-the-art anomaly detection results.

Uploaded by

ARUOS Soura

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views

Anomaly Detection and Localization

Uploaded by

ARUOS Soura

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

18 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO.

1, JANUARY 2014

Anomaly Detection and Localization

in Crowded Scenes
Weixin Li, Student Member, IEEE, Vijay Mahadevan, Member, IEEE, and
Nuno Vasconcelos, Senior Member, IEEE

Abstract—The detection and localization of anomalous behaviors in crowded scenes is considered, and a joint detector of temporal
and spatial anomalies is proposed. The proposed detector is based on a video representation that accounts for both appearance and
dynamics, using a set of mixture of dynamic textures models. These models are used to implement 1) a center-surround discriminant
saliency detector that produces spatial saliency scores, and 2) a model of normal behavior that is learned from training data and
produces temporal saliency scores. Spatial and temporal anomaly maps are then defined at multiple spatial scales, by considering the
scores of these operators at progressively larger regions of support. The multiscale scores act as potentials of a conditional random
field that guarantees global consistency of the anomaly judgments. A data set of densely crowded pedestrian walkways is introduced
and used to evaluate the proposed anomaly detector. Experiments on this and other data sets show that the latter achieves state-of-
the-art anomaly detection results.

Index Terms—Video analysis, surveillance, anomaly detection, crowded scene, dynamic texture, center-surround saliency

1 INTRODUCTION

S URVEILLANCE video is extremely tedious to monitor when

events that require follow-up have very low probability.
For crowded scenes, this difficulty is compounded by the
models must be defined at multiple scales. Second, different
tasks may require different models of normalcy. For instance, a
detector of freeway speed limit violations will rely on
complexity of normal crowd behaviors. This has motivated a normalcy models based on speed features. On the other
surge of interest in anomaly detection in computer vision hand, appearance is more important for the detection of
[1], [2], [3], [4], [5], [6], [7], [8], [9]. However, this effort is carpool lane violators, i.e., single-passenger vehicles in
hampered by general difficulties of the anomaly detection carpool lanes. Third, crowded scenes require normalcy
problem [10]. One fundamental limitation is the lack of a models robust to complex scene dynamics, involving many
universal definition of anomaly. For crowds, it is also independently moving objects that occlude each other in
infeasible to enumerate the set of anomalies that are possible complex ways, and can have low resolution.
in a given surveillance scenario. This is compounded by the In result, anomaly detection can be extremely challen-
sparseness, rarity, and discontinuity of anomalous events, ging. While this has motivated a great diversity of solutions,
which limit the number of examples available to train an it is usually quite difficult to objectively compare different
anomaly detection system. methods. Typically, these combine different representations
One common solution to these problems is to define of motion and appearance with different graphical models
anomalies as events of low probability with respect to a of normalcy, which are usually tailored to specific scene
probabilistic model of normal behavior. This enables a domains. Abnormalities are themselves defined in a some-
statistical treatment of anomaly detection, which conforms what subjective form, sometimes according to what the
with the intuition of anomalies as events that deviate from algorithms can detect. In some cases, different authors even
the expected [10]. However, it introduces a number of define different anomalies on common data sets. Finally,
challenges. First, it makes anomalies dependent on the scale experimental results can be presented on data sets of very
at which normalcy is defined. A normal behavior at a fine different characteristics (e.g., traffic intersection versus
visual scale may be perceived as highly anomalous when a subway entrance), frequently proprietary, and with widely
larger scale is considered, or vice versa. Hence, normalcy varying levels of crowd density.
In this work, we propose an integrated solution to all
these problems. We start by introducing normalcy models
. W. Li and N. Vasconcelos are with the Electrical and Computer that jointly account for the appearance and dynamics of complex
Engineering Department, University of California, San Diego, 9500 crowd scenes. This is done by resorting to a video
Gilman Drive, La Jolla, CA 92093.
E-mail: {wel017, nvasconcelos}@ucsd.edu. representation based on dynamic textures (DTs) [11]. This
. V. Mahadevan is with Yahoo! Labs, Embassy Golf Links Business Park, representation is then used to design models of normalcy
Bengaluru 560071, India. E-mail: [email protected]. over both space and time. Temporal normalcy is modeled
Manuscript received 15 Apr. 2012; revised 26 Feb. 2013; accepted 14 May with a mixture of DTs [12] (MDT) and enables the detection
2013; published online 12 June 2013. of behaviors that deviate from those observed in the past.
Recommended for acceptance by G. Mori. Spatial normalcy is measured with a discriminant saliency
For information on obtaining reprints of this article, please send e-mail to:
[email protected], and reference IEEECS Log Number
detector [13] based on MDTs, enabling the detection of
TPAMI-2012-04-0294. behaviors that deviate from those of the surrounding
Digital Object Identifier no. 10.1109/TPAMI.2013.111. crowd. The integration of spatial and temporal normalcy
0162-8828/14/$31.00 ß 2014 IEEE Published by the IEEE Computer Society
LI ET AL.: ANOMALY DETECTION AND LOCALIZATION IN CROWDED SCENES 19

with respect to either appearance or dynamics leads to a models local optical flow with a mixture of probabilistic
flexible model of normalcy, applicable to the detection of principal component analysis (PCA) models, [4] and [17]
anomalies of relevance to various surveillance tasks. draw inspiration from classical studies of crowd behavior
To address the scale problem, MDTs are learned at [21] to characterize flow with interaction features (e.g.,
multiple spatial scales. This is done with an efficient social force model), and [1] learns the representative flow of
hierarchical model, where layers of MDTs with successively groups by clustering optical flow-based particle trajectories.
larger regions of video support are learned recursively. The These approaches emphasize dynamics, ignoring anoma-
local measures of spatial and temporal abnormality are then lies of object appearance and, thus, anomalous behavior
integrated into a globally coherent anomaly map, by without outlying motion. Optical flow, pixel change
probabilistic inference. This is implemented with a condi- histograms, or other classical background subtraction
tional random field (CRF), whose single-node potentials are features are also difficult to extract from crowded scenes,
classifiers of local measures of spatial and temporal where the background is by definition dynamic, there are
abnormality, collected over a range of spatial scales. They lots of clutter, and occlusions. More complete representa-
are complemented by a novel set of interaction potentials, tions account for both appearance and motion. For example,
which account for spatial and temporal context, and [2] models temporal sequences of spatiotemporal gradients
integrate anomaly information across the visual field. to detect anomalies in densely crowded scenes, [22] declares
Finally, to address the difficulties of empirical evaluation as abnormal spatiotemporal patches that cannot be recon-
of anomaly detectors on crowded scenes, we introduce a structed from previous frames, and [23] pools appearance
data set of video from walkways in the campus of University and motion features over spatial neighborhoods, using a
of California, San Diego (UCSD), depicting crowds of distance to the nearest spatially colocated feature vector
varying densities. The data set contains 98 video sequences, among all training video clips, to quantify abnormality.
and five well-defined abnormal categories. These are not Object-based representations, based on location, blob
“synthetic,” or “staged,” but abnormal events that occur
shape, and motion [7] or optical flow magnitude, gradients,
naturally, for example, bicycle riders that cross pedestrian
location, and scale [9], have also been proposed. Other
walkways. Ground truth is provided for abnormal events,
representations include a bag-of-words over a set of
as well as a protocol to evaluate detection performance.
manually annotated event classes [24]. Various methods
The remainder of the paper is organized as follows: have also been used to produce anomaly scores. While
Section 2 reviews previous work on anomaly detection in simple spatial filtering suffices for some applications [19],
computer vision. The problems of temporal and spatial crowded scenes require more sophisticated graphical
anomaly detection in crowded scenes are discussed in models and inference. For example, [6] and [1] adopt
Section 3. This is followed by the mathematical character- Gaussian mixture models (GMM) to represent trajectories of
ization of multiscale anomaly maps in Section 4, and the normal behavior. Cong et al. [8] and Zhao et al. [20] learn a
proposed CRF for integration of spatial and temporal sparse basis and define unusual events as those that can
anomalies across different spatial scales in Section 5. Finally, only be reconstructed with either large error or the
an extensive experimental evaluation is discussed in combination of a large number of basis vectors.
Section 6 and some conclusions are presented in Section 7. Contributions of the second type address the integration
of local anomaly scores, which can be noisy, into a globally
2 PRIOR WORK consistent anomaly map. The authors of [2], [25], and [7]
guarantee temporally consistent inference by modeling
Recent advances in anomaly detection address event normal temporal sequences with hidden Markov models
representation and globally consistent statistical inference. (HMMs). While this enforces consistency along the tempor-
Contributions of the first type define features and models al dimension, there have also been efforts to produce
for the discrimination of normal and anomalous patterns. spatially consistent anomaly maps. For example, latent
Models of normal and abnormal behavior are then learned Dirichlet allocation (LDA) has been applied to force flow
from training data, and anomalies detected with a mini- features, in the model of spatial crowd interactions of [4].
mum probability of error decision rule. Although there are On the other hand, [5] and [3] rely on Markov random fields
some exceptions [5], the distribution of abnormal patterns is (MRF) to enforce global spatial consistency. In the realm of
usually assumed uniform, and abnormal events formulated sparse representations, [20] guarantees consistency of
as events of low probability under the model of normalcy. reconstruction coefficients over space and time by inclusion
One intuitive representation for event modeling is based of smoothness terms in the underlying optimization
on object trajectories. It is comprised of either explicitly or problem. Finally, [9] models object relationships, using
implicitly segmenting and tracking each object in the scene, Bayesian networks to implement occlusion reasoning.
and fitting models to the resulting object tracks [14], [15], It should be noted that most of these methods have
[16], [6], [17], [18]. While capable of identifying abnormal not been tested on the densely crowded scenes consid-
behaviors of high-level semantics (e.g., unusual long-term ered in this work. It is unclear that many of them could
trajectories), these procedures are both difficult and deal with the complex motion and object interactions
computationally expensive for crowded or cluttered scenes. prevalent in such scenes. Furthermore, while most
A number of promising alternatives, which avoid proces- methods include some mechanism to encourage spatial
sing individual objects, have been recently proposed. These and temporal consistency of anomaly judgments (MRF,
include the modeling of motion patterns with histograms of LDA, etc.), the underlying decision rule tends to be either
pixel change [5], histograms of optical flow [19], [8], [20], or predominantly temporal (e.g., trajectories, GMMs, HMMs,
optical flow measures [3], [4], [17], [1]. Among these, [3] or sparse representations learned over time) or spatial
20 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 1, JANUARY 2014

(e.g., interaction models) but is rarely discriminant with definition of normalcy. In this sense, the detection of spatial
respect to both space and time. This makes it difficult to anomalies can be equated to saliency detection [27].
infer whether spatial or temporal modeling are critically
important by themselves, or what benefits are gained 3.3 Roles of Crowds and Scale
from their joint modeling. Furthermore, the role of scale Most available background subtraction and saliency detec-
is rarely considered. These issues motivate the contribu- tion solutions are not applicable to crowded scenes, where
tions of the following sections. backgrounds can be highly dynamic. In this case, it is not
sufficient to detect variations of image intensity, or even
optical flow, to detect anomalous events. Instead, normalcy
3 ANOMALY DETECTION models must rely on sophisticated joint representations of
We start by proposing an anomaly detector that accounts appearance and dynamics. In fact, even such models can be
for scene appearance and dynamics, spatial and temporal ineffective. Since crowds frequently contain distinct sub-
context, and multiple spatial scales. entities, for example, vehicles or groups of people moving
in different directions, anomaly detection requires model-
3.1 Mathematical Formulation ing multiple video components of different appearance and
A classical formulation of anomaly detection, which we dynamics. A model that has been shown successful in this
adopt in this work, equates anomalies to outliers. A context is the mixture of DTs [12]. This is the representation
statistical model pX ðx
x Þ is postulated for the distribution of adopted in this work.
a measurement X under normal conditions. Abnormalities Another challenging aspect of anomaly detection within
are defined as measurements whose probability is below a crowds is scale. Spatial anomalies are usually detected at
threshold under this model. This is equivalent to a statistical the scale of the smallest scene entities, typically people.
test of hypotheses: However, a normal event at this scale may be anomalous at
a larger scale, and vice versa. For example, while a child
. H0 : x is drawn from pX ðx
x Þ; that rides a bicycle appears normal within a group of
. H1 : x is drawn from an uninformative distribution bicycle riding children, the group is itself anomalous in a
pX ðx
x Þ / 1. crowded pedestrian sidewalk. Local anomaly detectors,
The minimum probability of error rule for this test is to with small regions of interest, cannot detect such anomalies.
reject the null hypothesis H0 if pX ðxx Þ < , where is the To address this, we represent crowded scenes with a
normalization constant of the uninformative distribution. hierarchy of MDTs that cover successively larger regions.
As usual in the literature, we consider the problem of This is done with a computationally efficient hierarchical
anomaly detection from localized video measurements x , model, where MDT layers are estimated recursively.
where x is a spatiotemporal patch of small dimensions. A similar challenge holds for temporal anomalies. While
their detection is usually based on a small number of video
3.2 Spatial versus Temporal Anomalies frames, certain anomalies can only be detected over long
The normalcy model pX ðx x Þ can have both a temporal and a time spans. For example, while it is normal for two
spatial component. Temporal normalcy reflects the intuition pedestrian trajectories to converge or diverge at any point
that normal events are recurrent over time, i.e., previous in time, a cyclical convergence and divergence is probably
observations establish a contextual reference for normalcy abnormal. Anomaly detection across time scales is, how-
judgments. Consider a highway lane where cars move with ever, more complex than across spatial scales, due to
a certain orientation and speed. Bicycles or cars heading in constraints of instantaneous detection and implementation
the opposite direction are easily identified as abnormal complexity. Since video has to be buffered before anomalies
because they give rise to observations x substantially can be detected, large temporal windows imply long
different from those collected in the past. In this sense, detection delays and storage of many video frames. Due
temporal normalcy detection is similar to background to this, we do not consider multiple temporal scales in this
subtraction [26]. A model of normal behavior is learned work. A single scale is chosen, using acceptable values of
over time, and measurements that it cannot explain are delay and storage complexity, and used throughout our
denoted temporal anomalies. experiments. Note that, like their spatial counterparts,
Spatial normalcy reflects the intuition that some events temporal anomaly maps are computed at multiple spatial
that would not be abnormal per se are abnormal within a scales. Hence, in what follows, the term “scale” refers to the
crowd. Since the crowd places physical or psychological spatial support of anomaly detection, for both spatial and
constraints on individual behavior, behaviors feasible in temporal anomalies.
isolation can have low probability in a crowd context. For
example, while there is nothing abnormal about an
ambulance that rides at 50 mph in a stretch of highway, 4 NORMALCY AND ANOMALY MODELING
the same observation within a highly congested highway is In this section, we review the MDT model, discuss the
abnormal. Note that the only indication of abnormality is design of temporal and spatial models of normalcy, and
the difference between the crowd and the object at the time of formulate the computation of anomaly maps.
the observation, not that the ambulance moves at 50 mph.
Since the detection of such abnormalities is mostly based on 4.1 Mixture of Dynamic Textures
spatial context, they are denoted spatial anomalies. Their The MDT models a sequence of video frames x 1: ¼
detection does not depend on memory. Instead, it is based x 1 ; x 2 ; . . . ; x as a sample from one of K dynamic
½x
on a continuously evolving, instantaneously adaptive, textures [11]:
LI ET AL.: ANOMALY DETECTION AND LOCALIZATION IN CROWDED SCENES 21

map at location l is the negative-log probability of the most-

likely state sequence for the patch at l:
" #
X
K fig
T ðlÞ ¼ log i p s 1: ðlÞjz ¼ i ; ð3Þ
i¼1
fig
where s 1: ðlÞ x ðlÞ; z ¼ iÞ. We note that
¼ argmaxs 1: pðss1: jx
this generalizes the mixture of PCA models of optical flow
[3]. The matrix C z of (2b) is a PCA basis for patches drawn
from mixture component z, but the PCA decomposition
reports to patch appearance, not optical flow. Patch dynamics
are captured by the hidden state sequence s 1: , which is a
trajectory in PCA space. Hence, unlike mixtures of optical
flow, the representation is temporally smooth. The joint
representation of appearance and dynamics makes the
MDT a better representation for crowd video than the
Fig. 1. Temporal anomaly detection. An MDT is learned per scene
subregion, at training time. A temporal anomaly map is produced by
mixture of PCA.
measuring the negative log probability of each video patch under the
MDT of the corresponding region. 4.3 Spatial Anomaly Detection
Spatial anomaly detection is inspired by previous work in
X
K saliency detection [27], [13]. Saliency is defined in a center-
x 1: Þ ¼
pðx x 1: jz ¼ iÞ:
i pðx ð1Þ surround manner. Given a set of features, salient locations
i¼1
are those of substantial feature contrast with their immedi-
The mixture components pðx x 1: jz ¼ iÞ are linear dynamic ate surround. Spatial anomalies are then defined as
systems (LDS) defined by locations whose saliency is above some threshold. In this
work, we rely on the discriminant saliency criterion of [13].
s tþ1 ¼ Az s t þ n t ; ð2aÞ
x t ¼ Czst þ mt; ð2bÞ 4.3.1 Discriminant Saliency
Discriminant saliency formulates the saliency problem as a
where ZPis a multinomial random variable of parameters hypothesis test between two classes: a class of salient stimuli,
ði 0; i i ¼ 1Þ, which indexes the mixture component and a background class of stimuli that are not salient. Two
from which x t is drawn. s t is a hidden state variable that windows are defined at each scene location l: a center
encodes scene dynamics, and x t the vector of pixels in video window W 1l , with label CðlÞ ¼ 1, containing the location, and
frame t. Az ; C z are the transition and observation matrices a surrounding annular window W 0l , with label CðlÞ ¼ 0,
of component z, whose initial condition is s 1 N ð z ; S z Þ, containing background. A set of feature responses X are
and noise processes are defined by n t N ð0; Qz Þ and computed for each of the windows W cl , c 2 f0; 1g and SðlÞ,
m t N ð0; Rz Þ. The model parameters are learned by the saliency of location l, defined as the extent to which they
maximum-likelihood estimation (MLE) from a collection discriminate between the two classes. This is quantified by
of video patches, with the expectation-maximization (EM) the mutual information (MI) between feature responses and
algorithm of [12], which is reviewed in Appendix A.1, class label [13]:
which is available in the Computer Society Digital Library
X
1
at http://doi.ieeecomputersociety.org/10.1109/TPAMI. SðlÞ ¼ fpCðlÞ ðcÞKL½pXjCðlÞ ðx
xjcÞkpX ðx
x Þg; ð4Þ
2013.111. c¼0

4.2 Temporal Anomaly Detection R ðx

where pXjCðlÞ x jcÞ are class-conditional densities and
ðx
xÞ
Temporal anomaly detection is inspired by the popular KLðpkqÞ ¼ X pX ðx x Þ log pqXXðx x the Kullback-Leibler (KL)
x Þ dx
background subtraction method of [26]. This uses a GMM divergence between pX ðx x Þ and qX ðx
x Þ [30].
per image location to model the distribution of image Locations of maximal saliency are those where the
intensities. Observations of low probability under these discrimination between center and surround can be made
GMMs are declared foreground. For anomaly detection in with highest confidence, i.e., where (4) is maximal.
crowds, the GMM is replaced by an MDT, and the pixel grid The discriminant saliency principle can be applied to many
features [31]. When X consists of optical flow, it generalizes
replaced by one of preset displacement. Grid locations define
the force flow model of [4], where saliency is defined as the
the center of video cells, from which video patches are
difference between the optical flow at l and the average
extracted. The patches extracted from a subregion (group of
flow in its neighborhood (see [4, (8)]). This is a simplified
cells) are used to learn an MDT, during a training phase, as form of discriminant saliency, which replaces the MI of (4)
illustrated in Fig. 1. After this phase, subregion patches of by a difference to the mean background response.
low probability under the associated MDT are considered
anomalies. Given patch x 1: , the distribution of the hidden 4.3.2 Center-Surround Saliency with MDTs
state sequence s 1 under the ith DT component, pSjX ðss1: jx
x 1: ; Optical flow methods provide a coarse representation of
z ¼ iÞ, is estimated with a Kalman filter and smoother [28], dynamics and ignore appearance. For background subtrac-
[29], as discussed in Appendix A.2, available in the online tion, this problem has been addressed with the combination
supplemental material. The value of the temporal anomaly of DTs and discriminant saliency [32]. While using a more
22 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 1, JANUARY 2014

powerful representation than force flow, this method learns

a single DT from both center and surround windows. This
assumes a homogeneity of appearance and dynamics
within the two windows that do not hold for crowds,
where foregrounds and backgrounds can be quite diverse.
In this work, we adopt the MDT as the probability
distribution pXjCðlÞ ðx
x1: jcÞ from which spatiotemporal
patches x 1 are drawn. We note that under assumptions of
Gaussian initial conditions and noise, patches x 1: drawn
from a DT have a Gaussian probability distribution [33],

x 1: N ð ; Þ; ð5Þ

whose parameters follow from those of the LDS (2). When
the class-conditional distributions of the center and sur-
round classes, c 2 f0; 1g, at location l are mixtures of
Fig. 2. Spatial anomaly detection using center-surround saliency with
Kc DTs, it follows that
MDT models.
X
Kc

pXjCðlÞ ðx
x 1: jcÞ ¼ ci N x 1: ; ci ; ci 4.3.3 Spatial Anomaly Map
i¼1
ð6Þ The spatial anomaly map is a map of the saliency SðlÞ at
X
Kc
locations l. Given a location, this requires 1) learning MDTs
¼ ci piXjCðlÞ ðx
x 1: jcÞ;
i¼1
from center and surround windows, and 2) computing a
weighted average of these mixtures to obtain (7). Since
for c 2 f0; 1g. The marginal distribution is then learning MDTs per location is computationally prohibitive,
we resort to the following approximation. A dense collec-
X
1
tion of overlapping spatiotemporal patches is first extracted
pX ðx
x 1: Þ ¼ ½pCðlÞ ðcÞpXjCðlÞ ðx
x1: jcÞ
c¼0 from VðtÞ, a 3D video volume temporally centered at the
1
X X
Kc

current frame. A single MDT with K g mixture components,
g
¼ pCðlÞ ðcÞ ci N x 1: ; ci ; ci denoted f gi ; gi gK
i¼1 , is learned from this patch collection.
c¼0 i¼1
ð7Þ Each patch is then assigned to the mixture component of
KX
0 þK1 largest posterior probability. This segments the volume into
¼ !i N ðx
x 1: ; i ; i Þ superpixels, as shown in Fig. 2.
i¼1
At location l, the MDTs of (6) and (7) are derived from
KX
0 þK1
the global mixture model. The DT components are assumed
¼ !i piX ðx
x 1: Þ;
i¼1
equal to those of the latter and only the mixing proportions
are recomputed, using the ratio of pixels assigned to each
and the saliency measure of (4) requires the KL divergence component in the respective windows:
between (6) and (7). This is problematic because there is no P
closed form solution for the KL divergence between two Kg
X Mil
l2W cl
MDTs. However, because the MDT components are pXjCðlÞ ðx
x1: jcÞ ¼ P N x 1: ; gi ; gi ; ð10Þ
i¼1 l2W cl 1
Gaussian, it is possible to rely on popular approximations
to the KL divergence between Gaussian mixtures. We adopt for c 2 f0; 1g. Mil ¼ 1 if l is assigned to mixture
the variational approximation of [34]: component i and 0 otherwise. The prior probabilities for
center and surround, pC ðcÞ, are proportional to the ratio of
KLðpXjC kpX Þ volumes of center and surround windows. SðlÞ is
( PKC C i j )
X j j exp KL pXjC pXjC ð8Þ computed with (4), using (8) and (9). Note that the KL
Ci log PK0 þK1 :
!j exp KL piXjC pjX divergence
Kg
terms in (8) only require the computation of
i j
2 KL divergences between the K g mixture components,
and these are computed only once per frame because all
Each term of (8) contains a KL divergence between DTs,
mixture components are shared (i.e., the terms
which can be computed in closed form [35]. For example, exp ðKLðpkqÞÞ in (8) are fixed per frame). This procedure
for the terms in the denominator is repeated for every frame in the test video, as illustrated
in Fig. 2.
KL piXjC pjX
" #
1 jj j 1 C C 2 ð9Þ 4.4 Multiscale Anomaly Maps
¼
log C þ Tr j i þ i j j m ;
2 To account for anomalies at multiple spatial scales, we rely
i
on a hierarchical mixture of dynamic textures (H-MDT).
where m is the number of pixels per frame, and This is a model with various MDT layers, learned from
kzzk ¼ z T 1 z . Numerator terms are computed similarly. regions of different spatial support. At the finest scale, a
All computations can be performed recursively [35]. video sequence is divided into nL subregions (e.g., 5 8
LI ET AL.: ANOMALY DETECTION AND LOCALIZATION IN CROWDED SCENES 23

Fig. 3. Computation of temporal anomaly maps with multiscale spatial supports using the H-MDT. MDTs of increasingly larger spatial support are
estimated recursively, with the H-EM algorithm. Their application to a query video produces temporal anomaly maps based on supports of various
spatial scales.

subregions). nL MDT models fM M i gni¼1

L
are then learned from conditional likelihood of observing a configuration of
patches extracted from each of the subregions. At the anomaly labels y ¼ fyi ji 2 Sg is
coarsest scale, the whole visual field is represented with a (
global MDT. This results in a hierarchy of MDT models 1 X
P ðyy jx
x Þ ¼ exp Aðyi ; x Þ
ffMM 1i gni¼11
; . . . ; M L1 g, where M sj , the jth model at scale s, is Z i2S
learned from subregion Rsj . The hierarchy of support " #) ð11Þ
X 1 X
windows ffR1i gni¼1 1
; . . . ; RL g resembles the spatial pyramid þ Iðyi ; yj ; x ; i; jÞ ;
jN i j j2N
structure of [36]. H-MDT models can be learned efficiently i2S i

with the hierarchical expectation-maximization (H-EM)

where Z is a partition function and N i the neighborhood of
algorithm of [37]. Rather than collecting patches anew from
site i. The single-site and interaction potentials of (11),
larger regions, it estimates the models at a given layer

directly from the parameters of the MDT models at the layer Aðyi ; x Þ ¼ log yi w T f i ; ð12Þ
of immediately higher resolution.
For anomaly detection, each model is applied to the where ðxÞ ¼ ð1 þ ex Þ1 is the sigmoid function, and
corresponding window. This produces L anomaly maps,
I ðyi ; yj ; x ; i; jÞ ¼ yi yj v T ðff i ; f j ; i; jÞ ð13Þ
fT 1 ; . . . ; T L g, as illustrated in Fig. 3. A hierarchy of spatial
anomaly maps, fS 1 ; . . . ; S L g is also computed. For all s, the are based on a feature vector f i that concatenates the spatial
computation of S s relies on a global mixture model M . and temporal anomaly scores of site i at the L spatial scales,
The mixing proportions of (10) are computed using plus a bias term (set to 1):
surround windows of size identical to fRsi g and center
windows of constant size, as summarized in Algorithm 1
T
f i ¼ 1; T 1 ðiÞ; . . . ; T L ðiÞ; S 1 ðiÞ; . . . ; S L ðiÞ : ð14Þ
(see Appendix B for all algorithms, available in the online
supplemental material. w ; v are parameter vectors and a compound feature:

5 GLOBALLY CONSISTENT ANOMALY MAPS ðff i ; f j ; i; jÞ ¼ ejijj expðh

hi;j Þ; ð15Þ

In this section, we introduce a layer of statistical inference to where ji jj is the euclidean distance between sites i; j, and
fuse anomaly information across time, space, and scale in a expðh hi;j Þ the entry-wise exponential of h h i;j . The vector
globally consistent manner. h i;j contains the diagonal entries of ðff i f j Þðff i f j ÞT .
The single-site potential of (12) reflects the anomaly
5.1 Discriminative Model belief at site i. Using it alone, i.e., without (13), (11) is a
The anomaly maps of the previous section span space, time, logistic regression model. In this case, the detection of each
and spatial scale. Being derived from local measurements, anomaly is based on information from site i exclusively. The
they can be noisy. A principled framework is required to addition of the interaction potential of (13) enables the
1) integrate anomaly scores from the individual maps, model to take into account information from site i’s
2) eliminate noise, and 3) guarantee spatiotemporal con- neighborhood N i . This smoothes the single-site prediction,
sistency of anomaly judgments throughout the visual field. encouraging consistency of neighboring labels. The inter-
For this, we rely on a conditional random field [38] inspired action potential can be interpreted as a classifier that
by the discriminative random field (DRF) of [39]. An predicts whether two neighboring sites have the same
anomaly label yi 2 f1; 1g is defined at each location i in a label. Note that because f contains anomaly scores at
set S of observation sites. Given a video clip x , the different spatial scales, h i;j (or i;j ) accounts for the
24 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 1, JANUARY 2014

deviation , for all parameters. Given N independent

training samples fxx ðnÞ ; y ðnÞ gN
n¼1 , the gradients of the regular-
ized log-likelihood with respect to w , v , and are
@ N N
log y ðnÞ n¼1 j x ðnÞ n¼1
@w
w (
XN X ðnÞ ðnÞ ðnÞ ðnÞ
¼ yi w T f i yi f i ð17Þ
n¼1 i2S
X )
ðnÞ ðnÞ 1
IE yi w T f i yi f i 2 w;
i2S
w

@ N N
log p y ðnÞ n¼1 j x ðnÞ n¼1
@vv (
XN X 1 X ðnÞ ðnÞ ðnÞ

¼ ejijj yi yj exp h h i;j
n¼1 i2S
jN i j j2N
i
" !#)
X 1 X jijj ðnÞ 1
IE e yi yj exp hhi;j 2 v;
i2S
jN j
i j2N v
i

Fig. 4. CRF filter. Top: Graphical model. Bottom: Spatial and temporal ð18Þ
neighborhoods.
and
similarity between the two observations in anomaly spaces
@ ðnÞ N
of different scale (i.e., under different spatial normalcy log p fyy ðnÞ gN
n¼1 fx x gn¼1
contexts). The interaction potentials adaptively modulate @ 8
XN <X
the intensity of intersite smoothing according to these 1 X ðnÞ ðnÞ ðnÞ
similarity measures (and how they are weighted by v ). The ¼ I yi ; yj ; x ; i; j ji jj
n¼1
: i2S jN i j
j2N i
parameters w and v encode the relative importance of 2 0 139
different features. X 1 X = 1
þ IE4 @ Iðyi ; yj ; x ðnÞ ; i; j ji jj A5 2 ;
5.2 Online CRF Filter i2S
jN i j j2N ;
i

The model of (11) requires inference over the entire video ð19Þ
sequence. This is not suitable for online applications. An
online version can be implemented by conditioning the where the expectation is evaluated with distribution
anomaly label y ðÞ at time on 1) observations for t , and pðYjX; Þ. The conditional expectations of (17)-(19) require
2) anomaly labels for t < , leading to evaluation of the partition function Z, a problem known to
be NP-hard. As is common in the literature, this difficulty is

P y ðÞ jfyy ðtÞ g1 x ðtÞ gt¼1 ; avoided by estimating expectations through sampling.
t¼1 ; fx
( " Although sampling methods such as Markov chain Monte
1 X ðÞ
¼ exp A yi ; x ðÞ Carlo (MCMC) can converge to the true distribution, this
Z i2S usually requires many iterations. Since the procedure must
1 X ðÞ
ð16Þ
be repeated per gradient ascent step, these methods are
þ SS I SS yi ; yj ; x ðÞ ; i; j impractical. On the other hand, approximations such as
jN i j j2N SS
i
#) contrastive divergence minimization (which runs MCMC a
1 X ðÞ limited number of times with specific starting points) have
þ TT I TT yi ; yk ; x ; i; k ; been shown to be successful for vision applications [40],
N T
T
i k2N i [41]. We adopt these approximations for CRF learning.
This leverages the fact that, denoting any of the
where S is the set of observations at time (pixels of the parameters w ; v TT ; v SS ; TT ; SS by
, the partial gradients of
current frame). Two neighborhoods are defined per location (17)-(19) are
i: spatial N SiS (N SiS
S ) and temporal N T T T
T t 1
i (N i
fS gt¼1 ).
@ N N
The graphical model is shown at the top of Fig. 4, and these log p y ðnÞ n¼1 j x ðnÞ n¼1 ;
neighborhoods at the bottom. The parameters ¼ @

fww; v TT ; v SS ; TT ; SS g are estimated during training. XN

1
¼ F@

y ðnÞ ; x ðnÞ IEðYjX; y; x ðnÞ Þ 2

;
Þ F@

ðy
n¼1

5.2.1 Learning
ð20Þ
Both (11) and (16) can be learned with standard optimization
techniques, such as gradient descent or the Broyden-Fletcher- where F@

ðyy; x Þ is the sum of the terms in the summations

Goldfarb-Shanno (BFGS) method. To improve generalization, of (17), (18), or (19) that depend on
. Contrastive
the model is regularized with a Gaussian prior of standard divergence approximates the intractable conditional
LI ET AL.: ANOMALY DETECTION AND LOCALIZATION IN CROWDED SCENES 25

expectation IEðYjX;Þ ½F@

ðyy ; x ðnÞ Þ by F@

ð^
y ; x ðnÞ Þ, where y^ is
the “evil twin” of the ground-truth label field y ðnÞ [41]. y^ is
drawn by MCMC, using the inference procedure discussed
in Section 5.2.2, the current parameter estimates, and the
ground-truth labels y ðnÞ as a starting point.
Given the estimate of the partial gradients, the gradient
ascent rule for parameter updates reduces to
" #
X N ðnÞ ðnÞ ðnÞ ðnÞ 1

þ F@

y ; x F@

y^ ; x 2
;
n¼1

Fig. 5. Exemplar normal/abnormal frames in Ped1 (top) and Ped2

ð21Þ
(bottom). Anomalies (red boxes) include bikes, skaters, carts, and
where is a learning rate. In our implementation, this rule wheelchairs.
is initialized with v TT ¼ v SS ¼ 1 and TT ¼ SS ¼ 0. The initial
value of w is learned, assuming a logistic regression model implementation, the filter is run for Ns ¼ 10 iterations.
Again, the complete anomaly detection procedure is
(vv TT ¼ v SS ¼ 0 in (16)), with the procedure of [43].
summarized in Algorithm 4, available in the online
5.2.2 Inference supplemental material.
The inference problem is to determine the most likely
anomaly prediction y ? for a query frame x ðÞ , given 6 EXPERIMENTS
previous predictions fyy ðtÞ g1
t¼1 , and observations fx x ðtÞ gt¼1 :
In this section, we introduce a new data set and an
1 experimental protocol for evaluation of anomaly detection
y ? ¼ argmax log p y j y ðtÞ t¼1 ; x ðtÞ t¼1 ;
y in crowded environments and use it to evaluate the
X 1 X
proposed anomaly detector.
¼ argmax A yi ; x ðÞ þ I ðyi ; yj ; x ; i; jÞ :
y i2S
jN i j j2N 6.1 UCSD Pedestrian Anomaly Data Set
i

ð22Þ In the literature, anomaly detection has frequently been

evaluated by visual inspection [19], [7], [3], or with coarse
Again, exact inference is intractable. We rely on Gibbs ground truth, for example, frame-level annotation of
sampling to approximate the optimal prediction. This abnormal events [4], [1]. This does not completely address
consists of drawing labels from the conditional distribution: the anomaly detection problem, where it is usually desired
1 to localize anomalies in both space and time. To enable this,
p yi j x ðtÞ t¼1 ; y ðtÞ t¼1 ; y i ;
1 we introduce a data set1 of crowd scenes with precisely
p yi ; y i j x ðtÞ t¼1 ; y ðtÞ t¼1 ; localized anomalies and metrics for the evaluation of their
¼ 1 ð23Þ detection. The data set consists of video clips recorded with
p y i j x ðtÞ t¼1 ; y ðtÞ t¼1 ;
a stationary camera mounted at an elevation, overlooking
1 1

¼ exp Fi x ðtÞ t¼1 ; y ðtÞ t¼1 ; y i ; yi ; ; pedestrian walkways on the UCSD campus. The crowd
Z i density in the walkways is variable, ranging from sparse to
where Fi ðfxx ðtÞ gt¼1 ; fyy ðtÞ g1
t¼1 ; y i ; yi ; Þ is the sum of poten-
very crowded. In the normal setting, the video contains
tial functions that depend on site i (i.e., its “Markov only pedestrians. Abnormal events are due to either 1) the
blanket”): circulation of nonpedestrian entities in the walkways, or
2) anomalous pedestrian motion patterns. Commonly
1
Fi x ðtÞ t¼1 ; y ðtÞ t¼1 ; y i ; yi ; occurring anomalies include bikers, skaters, small carts,
1 X and people walking across a walkway or in the surrounding
¼ A yi ; x ðÞ þ I yi ; yj ; x ðtÞ t¼1 ; i; j grass. A few instances of wheelchairs are also recorded. All
jN i j j2N ð24Þ
i abnormalities occur naturally, i.e., they were not staged or
X 1
þ I yj ; yi ; x ðtÞ t¼1 ; j; i ; synthesized for data set collection.
j:i2N j
jN j j The data set is organized into two subsets, corresponding
to the two scenes of Fig. 5. The first, denoted “Ped1,”
and Z i the corresponding partition function: contains clips of 158 238 pixels, which depict groups of
X h 1 i people walking toward and away from the camera, and

Z i ¼ exp Fi x ðtÞ t¼1 ; y ðtÞ t¼1 ; y i ; y0i ; : ð25Þ some amount of perspective distortion. The second,
y0i denoted “Ped2,” has spatial resolution of 240 360 pixels
The procedure is detailed in Algorithms 2 and 3, available and depicts a scene where most pedestrians move horizon-
in the online supplemental material, where we present the tally. The video footage of each scene is sliced into clips of
online CRF filter used to estimate the label field. During 120-200 frames. A number of these (34 in Ped1 and 16 in
Ped2) are to be used as training set for the condition of
learning, the filter is initialized with the ground-truth labels
(yy 0 ¼ y ðÞ ). During testing, this initialization relies on the 1. Available from http://www.svcl.ucsd.edu/projects/anomaly/data-
predictions of the single-site classifiers (vv TT ¼ v SS ¼ 0). In our set.html.
26 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 1, JANUARY 2014

TABLE 1
Note that, although widely used in the literature, the
Composition of UCSD Anomaly Data Set frame-level criterion only measures temporal localization
accuracy. This enables errors due to “lucky co-occurrences”
of prediction errors and true abnormalities. For example, it
assigns a perfect score to an algorithm that identifies a
single anomaly at a random location of a frame with
anomalies. The pixel-level criterion is much stricter and
more rigorous. By evaluating both the temporal and spatial
accuracy of the anomaly predictions, it rules out these
a b
number of clips/number of anomaly instances. some clips contain “lucky co-occurrences.” We believe that the pixel-level
more than one type of anomaly.
criterion should be the predominant criterion for evaluation
of anomaly detection algorithms.
normalcy. The test set contains clips (36 for Ped1 and 12 for
Ped2) with both normal (around 5,500) and abnormal 6.3 Experimental Setup
(around 3,400) frames. The abnormalities of each set are Unless otherwise noted, observation sites are a video sub-
summarized in Table 1. lattice with spatial interval of four pixels and temporal
Frame-level ground-truth annotation, indicating whether interval of five frames. Temporal anomaly maps rely on
anomalies occur within each frame, and manually collected patches of 13 13 15 pixels. The temporal extent of
pixel-level binary anomaly masks, which identify the pixels 15 frames provides a reasonable compromise between the
containing anomalies, are available per test clip. We note ability to detect anomalies and the delay (1.5 s) and storage
that this includes ground truth on Ped1 contributed by (15 video frames) required for anomaly detection. To
Antic and Ommer [9], and supersedes the ground truth minimize computation, patches of variance smaller than
available on an earlier version of this work [43]. We denote 500 are discarded.2 Temporal H-MDT models are learned
the current ground truth by “full annotation” and the from fine to coarse scale. At the finer scale, there are 6 10
previous one by “partial annotation.” Unless otherwise windows R1i on Ped1 (8 11 for Ped2), each covering a
noted, the results of the subsequent sections correspond to 4141 pixel area and overlapping by 25 percent with each
the full annotation. of its four neighbors. An MDT of five components is
learned per window. At coarser spatial scales, an MDT is
6.2 Evaluation Methodology
estimated from the MDTs of the four regions that it covers
Two criteria are used to evaluate anomaly detection at the immediately finer resolution. Each estimated MDT
accuracy: a frame-level criterion and a pixel-level criterion. has one more component than its ancestor MDTs. Overall,
Both are based on true-positive rates (TPR) and false- there are 10 scales in Ped1 and 11 in Ped2. Spatial anomaly
positive rates (FPRs), denoting “an anomalous event” as
maps use a 3131 center window and surround windows
“positive” and “the absence of anomalous events” as
of size equivalent to Rsi . For segmentation, 7 7 10
“negative.” A frame containing anomalies is denoted a
patches are extracted from the 40 frames surrounding that
positive, otherwise a negative. The true and false positives
under analysis. There are five DT components at all levels
under the two criteria are:
of the spatial hierarchy. Both temporal and spatial MDTs
. Frame-level criterion. An algorithm predicts which have an eight-dimensional state space. The sensitivity of
frames contain anomalous events. This is compared the proposed detector to some of these parameters is
to the clip’s frame-level ground-truth anomaly discussed in Appendix C.2, available in the online
annotations to determine the number of true- and supplemental material.
false-positive frames.
6.4 Descriptor Comparison
. Pixel-level criterion. An algorithm predicts which
pixels are related to anomalous events. This is The first experiment evaluated the benefits of MDT-based
compared to the pixel-level ground-truth anomaly over optical flow descriptors. The optical flow descriptors
annotation to determine the number of true-positive considered were the local motion histogram (LMH) of [19],
and false-positive frames. A frame is a true positive the force flow descriptor of [4], and the mixture of optical
if 1) it is positive and 2) at least 40 percent of its flow models (MPPCA) of [3]. LMH uses statistics of local
anomalous pixels are identified; a frame is a false motion, and is representative of traditional background
positive if it is negative and any of its pixels are subtraction representations, force flow is a descriptor for
predicated as anomalous. spatial anomaly detection, and MPPCA a temporal
The two measures are combined into a receiver operating anomaly detector. For the MDT, only the anomaly maps
characteristic (ROC) curve of TPR versus FPR: of finest temporal and coarsest spatial scale were con-
sidered here. Since the goal was to compare descriptors,
# of true-positive frame the high-level components of the models in which they
TPR ¼ ;
# of positive frame were proposed, for example, the LDA of [4], the MRF of
# of false-positive frame [3], and the proposed CRF, were not used. Instead,
FPR ¼ : anomaly predictions were smoothed with a simple
# of negative frame
Performance is also summarized by the equal error rate 2. This variance threshold is quite conservative, only eliminating regions
(EER), the ratio of misclassified frames at which of very little motion. For the data sets used in our experiments, this has not
led to the elimination of any objects from further consideration. In other
FPR ¼ 1 TPR, for the frame-level criterion, or rate of contexts, for example, scenes where objects are static for periods of time,
detection (RD), i.e., 1-EER, for the pixel-level criterion. this could happen. In this case, the threshold should be set to zero.
LI ET AL.: ANOMALY DETECTION AND LOCALIZATION IN CROWDED SCENES 27

TABLE 2 TABLE 3
Descriptor Performance on UCSD Anomaly Data Set Filter Performance on the UCSD Anomaly Data Set

numbers outside/inside parentheses are results by full/partial annota-
tion (same for the rest of the paper).
Overall, although optical flow can signal fast moving
20 20 10 Gaussian filter. Anomaly predictions were anomalous subjects, it leads to too many false positives in
generated by thresholding the filtered anomaly maps and regions of complex motion, occlusion, and so on. More
ROC curves by varying thresholds. interesting is the lack of advantage for either spatial or
The performance of the different descriptors, under both temporal anomaly detection, both among MDT maps and
the frame-level (EER) and pixel-level (RD) criteria (using prior techniques (no clear advantage to either force flow or
both full and partial annotation in Ped1), is summarized in MPPCA). In fact, as shown in Fig. 6, temporal and spatial
Table 2. The corresponding ROC curves are presented in anomalies tend to be different objects. This suggests the
Appendix C.1 (Fig. 13), available in the online supplemental combination of the two strategies.
material. Examples of detected anomalies are shown in
Fig. 6. Under the frame-level criterion, temporal MDT has 6.5 Scale and Globally Consistent Prediction
the best performance in both scenes. Spatial MDT performs We next investigated the benefits of information fusion
worse than others in Ped1 but ranks second in Ped2. across space and scale, with the proposed CRF. We started
However, for the more precise pixel-level criterion, spatial with a single-scale description (S-MDT), using only the
MDT is the top or second best performer. In this case, both anomaly maps at finest temporal and coarsest spatial
MDTs significantly outperform all optical flow descriptors. scales, i.e., a 3D feature per site. We next considered a
The gap between corresponding competitors (e.g., temporal multiscale description, using the whole H-MDT. In both
MDT versus MPPCA or LMH, spatial MDT versus force cases, inference was performed with logistic regression, i.e.,
flow) is of at least 10 percent RD. These results show that the interaction term of (16) turned off, and the Gaussian
there is a definite benefit to the joint representation of filter of the previous section. In each trial, the logistic
appearance and dynamics of the MDT. classifier was trained by Newton’s method [42]. Finally, we
This is not totally surprising, given the limitations of considered the full blown CRF, denoted CRF filter. The
optical flow. First, the brightness constancy assumption is dimensions of the spatial and temporal CRF neighborhoods
easily violated in crowded scenes, where stochastic motion were set to jN SS j ¼ 6, jN TT j ¼ 3. ROC curves were generated
and occlusions prevail. Second, optical flow measures by varying the threshold for prediction.
instantaneous displacement, while the DT is a smooth Table 3 presents a comparison of the three approaches.
motion representation with extended temporal support. The corresponding ROC curves are shown in Appendix C.1
Finally, while optical flow is a bandpass measure, which (Fig. 14), available in the online supplemental material.
eliminates most of the appearance information, the DT Under the pixel-level criterion, the multiscale maps have
models both appearance and dynamics. The last two higher accuracy than their single-scale counterparts, demon-
properties are particularly important for crowded scenes, strating the benefits of modeling anomalies in scale space
where objects occlude and interact in complicated manners. (improvement of RD by as much as 11 percent). The CRF

Fig. 6. Anomaly predictions of temporal MDT, spatial MDT, MPPCA, force flow, and LMH (from left to right). Red regions are abnormal pixels. All
predictions generated with thresholds such that the different approaches have similar FPR under frame-level protocol (these settings apply to all the
subsequent figures unless otherwise stated).
28 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 1, JANUARY 2014

Fig. 7. Examples of anomaly localization with Gaussian smoothing (in blue) and CRF filter (in red). The latter predicts more accurately the
spatiotemporal support of anomalies in crowded regions, where occlusion is prevalent.
TABLE 4
Performance of Various Methods (RD/Seconds per Frame) by Pixel-Level Criterion on UCSD Anomaly Data Set

Implementation: } C/2.8-GHz CPU/2-GB RAM; \ C++ and Matlab (feature extraction and model inference)/2.6GHz CPU/2GB RAM; #Matlab/dual-
core 2.7GHz CPU/8GB RAM.

filter further improves performance (improvement of RD by improves the RD to 65 percent. Computationally, the
as much as 3 percent), demonstrating the gains of globally proposed detector is also much more efficient. For
consistent inference. As shown in Fig. 7, the visual improve- implementations on similar hardware (see footnotes
ments are even more substantial.3 Simple filtering does not of Table 4), it requires 1.11 s/frame, as compared to the
take into account interactions between neighboring sites and 3.8 s/frame reported for [8].
smooths the anomaly maps uniformly. On the other hand, Like the proposed detector, the Bayesian video parsing
the CRF adapts the degree of smoothing to the spatiotem- (BVP) of [9] combines spatial and temporal anomaly
poral structure of the anomalies, increasing the precision of detection, using a more complex video representation,
anomaly localization. Note how, in Fig. 7, the CRF-filter parsing of the video to extract all the objects in the scene,
successfully excludes occluded but normally behaving
a support vector machine classifier for detection of temporal
pedestrians from anomaly regions. These improvements
anomalies, a graphical model with seven nodes per site
are not always captured by the frame-level criterion. In fact,
there is little EER difference between S-MDT and H-MDT. (and multiple nonparametric models for location, scale, and
The inconsistency between frame- and pixel-level results in velocity) for detection of spatial anomalies, and occlusion
Tables 2 and 3 shows that the former is not a good measure of reasoning. This is an elegant solution, which achieves
anomaly detection performance. Henceforth, only the pixel- slightly better RD than the proposed detector (2 percent for
level criterion is used in the remaining experiments on this full and 3 percent for partial annotation), but at substan-
data set. tially higher computational cost (5 to 10 times slower). We
believe that when both accuracy and computation are
6.6 Anomaly Detection Performance considered, the proposed detector is a more effective
We next evaluated the performance of the complete solution. However, these results suggest that gains could
anomaly detector. For this, we selected two detectors be achieved by expanding the proposed CRF, as [9] trades a
from the recent literature, with state-of-the-art perfor- much simpler representation of video dynamics (optical
mance for temporal [8] and combined spatial and
flow versus MDT) for more sophisticated inference. It would
temporal anomaly detection [9]. The RD of the various
be interesting to consider CRF extensions with some of the
methods is summarized in Table 4, for both partial and
properties of the graphical model of [9], namely, explicit
full annotation. The corresponding ROC curves are shown
in Fig. 8. Table 4 also presents the processing time per occlusion reasoning. This is left for subsequent research.
video frame of each method. Missing entries indicate
unavailable results for the particular data set and/or
annotation type. A discussion of the detection errors made
by the detector is given in Appendix C.3, available in the
online supplemental material.
On Ped1, the temporal component of the proposed
detector substantially outperforms the temporal detector of
[8]. A multiple-scale temporal anomaly map with CRF
filtering increases the 46 percent RD4 of [8] to 52 percent.
A similar implementation of the spatial anomaly detector
(a multiple-scale map plus CRF filtering) achieves 58 percent.
Combining both maps and multiple spatial scales further

3. More results at http://www.svcl.ucsd.edu/projects/anomaly/

results.html.
4. These numbers refer to partial annotation, the only available for [8]. Fig. 8. ROC curves of pixel-level criterion on Ped1.
LI ET AL.: ANOMALY DETECTION AND LOCALIZATION IN CROWDED SCENES 29

Fig. 9. Impact of context on anomaly maps. First three columns: Temporal anomalies, cell coverage at different HMDT layers shown in blue. Last two
columns: Spatial anomalies, example center (surround) windows shown in blue (light yellow).

6.7 Role of Context in Anomaly Judgments abnormal events (e.g., seconds of normalcy followed by a
We next investigated the impact of normalcy context in short abnormal event). The main limitations of this data set
anomaly judgments. For temporal anomalies, context is are that
determined by the subregion size: As the latter increases,
temporal models become more global. Fig. 9 shows that the 1. it is relatively small (scenes 1, 2, and 3 contain two,
scale of normalcy context significantly impacts anomaly six, and three anomaly instances),
scores. For example, the two cyclists on the left-most 2. it has no pixel-level ground truth,
columns of the figure are missed at small scales but 3. the anomalies are staged, and
detected by the more global models. On the other hand, a 4. it produces very salient changes in the average
leftward heading pedestrian in the third column has high motion intensity of the scene.
anomaly score at the finest scale but is not anomalous in As a result, several methods achieve near perfect detection.
larger contexts. In summary, no single context is effective The proposed detector was based on 3 3 subregions of
for all scenes. Due to the stochastic arrangements of people size 180 180 at the finest spatial scale and a 3-scale
within crowds, two crowds of the same size can require anomaly map for both the temporal and spatial compo-
different context sizes. In general, the optimal size depends nents. One normal-abnormal instance of each scene was
on the crowd configuration and the anomalous event. used to train the temporal normalcy model and CRF filter,
A similar observation holds for spatial anomalies, where and the remaining instances for testing. A comparison to
context is set by the size of the surround window. For previous results in the literature, under the frame-level
example, in the fourth column of Fig. 9, the subject walking criterion, is presented in Table 5 and Fig. 11. Due to the
on the grass is very salient when compared to her
salient motion discontinuities, the temporal component
immediate neighbors, and anomaly detection benefits from
(99.2 percent AUC) substantially outperforms the spatial
a narrower context. For larger contexts, she becomes less
component (97.9 percent). Nevertheless, the complete
unique than a man that walks in the direction opposite to his
neighbors. On the other hand, the cart and bike of the last detector achieves the best performance (99.5 percent). This
column only pop out when the surround window is large is nearly perfect, and comparable to the previous best
enough to cover some pedestrians. In summary, anomalies results in the literature.
depend strongly on scene context, and this dependence can Subway. The Subway data set [19] consists of two
vary substantially from scene to scene. It is, thus, important sequences recorded from the entrance (1 h and 36 min,
to fuse anomaly information across spatial scales. 144,249 frames) and exit (43 min, 64,900 frames) of a

6.8 Performance on Other Benchmark Data Sets

The detection of anomalous events in crowded scenes can be TABLE 5
evaluated in a few data sets other than UCSD. These have Anomaly Detection Performance in AUC/ERR (Percent)
various limitations in terms of size, saliency of the
anomalies, evaluation criteria, and so on. They are discussed
in this section where, for completeness, we also present the
results of the proposed anomaly detector.
UMN. The UMN data set5 contains three escape scenes.
Normal events depict individuals wandering around or
organized in groups. Abnormal events depict a crowd
escaping in panic. Each scene contains several normal-

5. http://mha.cs.umn.edu/Movies/Crowd-Activity-All.avi.
30 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 1, JANUARY 2014

For temporal anomaly detection, MDTs were learned

using 20 20 30 patches from 3 4 subregions covering
the intersection. This was the finest level of a 3-scale
hierarchical model. For spatial anomaly detection, segmenta-
tion was computed with a 5-component MDT learned from
15 15 30 patches extracted from 45 consecutive frames.
An observation lattice of step 15 15 10 was used to
evaluate anomaly scores, and the neighborhood size of the
CRF filter was 2. The performance of the detector is
summarized in Table 5 and Fig. 11. Due to the sparsity of
Fig. 10. Anomalies detected by H-MDT CRF on the UMN (left), Subway the scenes (not enough spatial context around cars making
(center), and U-turn (right) data sets. illegal turns to establish them as anomalous) the performance
of the spatial anomaly detector is quite weak. However, the
subway station. Normal behaviors include people entering combination of the spatial and temporal anomaly maps again
and exiting the station; abnormal consist of people moving outperforms the temporal channel, achieving the best
in the wrong direction (exiting the entrance or entering the performance. Overall, the proposed detector has the best
exit) or avoiding payment. The main limitations of this data AUC on this data set. Examples of detected anomalies, for
set are: 1) reduced number of anomalies, and 2) predictable this and the other two data sets, are shown in Fig. 10.
spatial localization (entrance and exit regions). The original
512 384 frames were down sampled to 320 240, and 2 7 CONCLUSION
3 subregions of size 90 90, covering either the entrance or
exit regions, were used at the finest spatial scale. A 3-scale In this work, we proposed an anomaly detector that spans
time, space, and spatial scale, using a joint representation of
anomaly map was computed for both spatial and temporal
video appearance and dynamics and globally consistent
anomalies. Video patches were of size 15 15 15, and
inference. For this, we modeled crowded scenes with a
10 min of video from each sequence was used to train the
hierarchy of MDT models, equated temporal anomalies to
temporal normalcy model and CRF filters, while the
background subtraction, spatial anomalies to discriminant
remaining video was used for testing. Table 5 and Fig. 11
saliency, and integrated anomaly scores across time, space,
present a comparison of the proposed detector against
and scale with a CRF. It was shown that the MDT
recently published results on this data set. Again, the
representation substantially outperforms classical optical
temporal component outperforms its spatial counterpart,
flow descriptors, that spatial and temporal anomaly
but the best performance is obtained by combination of both detection are complementary processes, that there is a
temporal and spatial anomaly maps (H-MDT CRF). This benefit to defining anomalies with respect to various
achieves the best result among all methods, outperforming normalcy contexts, i.e., in anomaly scale space, and that it
the sparse reconstruction of [8] and the local statistical is important to guarantee globally consistent inference
aggregates of [23]. Note that, for this data set, the gains in across space, time and scale. We have also introduced a
both AUC and EER are substantial. challenging anomaly detection data set, composed of
U-turn. The U-turn data set [5] consists of one video complex scenes of pedestrian crowds, involving stochastic
sequence (roughly 6,000 frames of size 360 240) recorded motion, complex occlusions, and object interactions. This
by a static camera overlooking the traffic at a road data set provides both frame-level and pixel-level ground
intersection. The video is split into two clips of equal length truth, and a protocol for the evaluation of anomaly
for cross validation and anomalies consist of illegal vehicle detection algorithms. The proposed anomaly detector was
motion at the intersection. The main limitations of this data shown effective on both this and a number of previous data
set are: 1) the limited size, 2) absence of pixel-level ground sets. When compared to previous methods, it outperformed
truth, and 3) sparseness of the scenes. The latter enables the various state-of-the-art approaches, either in absolute
use of object-based operations, for example, tracking and performance or in terms of the tradeoff between anomaly
analysis of object trajectories [5], which we do not exploit. detection accuracy and complexity.

Fig. 11. ROC curves of frame-level criterion on the UMN (left), Subway (center), and U-turn (right) data sets.
LI ET AL.: ANOMALY DETECTION AND LOCALIZATION IN CROWDED SCENES 31

REFERENCES [25] D. Zhang, D. Gatica-Perez, S. Bengio, and I. McCowan, “Semi-

Supervised Adapted HMMs for Unusual Event Detection,” Proc.
[1] S. Wu, B. Moore, and M. Shah, “Chaotic Invariants of Lagrangian IEEE Conf. Computer Vision and Pattern Recognition, 2005.
Particle Trajectories for Anomaly Detection in Crowded Scenes,” [26] C. Stauffer and W. Grimson, “Adaptive Background Mixture
Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010. Models for Real-Time Tracking,” Proc. IEEE Conf. Computer Vision
[2] L. Kratz and K. Nishino, “Anomaly Detection in Extremely and Pattern Recognition, 1999.
Crowded Scenes Using Spatio-Temporal Motion Pattern Models,” [27] L. Itti, C. Koch, and E. Niebur, “A Model of Saliency-Based Visual
Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009. Attention for Rapid Scene Analysis,” IEEE Trans. Pattern Analysis
[3] J. Kim and K. Grauman, “Observe Locally, Infer Globally: A and Machine Intelligence, vol. 20, no. 11, pp. 1254-1259, Nov. 1998.
Space-Time MRF for Detecting Abnormal Activities with Incre- [28] R. Shumway and D. Stoffer, “An Approach to Time Series
mental Updates,” Proc. IEEE Conf. Computer Vision and Pattern Smoothing and Forecasting Using the EM Algorithm,” J. Time
Recognition, 2009. Series Analysis, vol. 3, no. 4, pp. 253-264, 1982.
[4] R. Mehran, A. Oyama, and M. Shah, “Abnormal Crowd Behavior [29] S. Roweis and Z. Ghahramani, “A Unifying Review of Linear
Detection Using Social Force Model,” Proc. IEEE Conf. Computer Gaussian Models,” Neural Computation, vol. 11, no. 2, pp. 305-345,
Vision and Pattern Recognition, 2009. 1999.
[5] Y. Benezeth, P. Jodoin, V. Saligrama, and C. Rosenberger, [30] S. Kullback, Information Theory and Statistics. Dover Publications,
“Abnormal Events Detection Based on Spatio-Temporal Co- 1968.
Occurences,” Proc. IEEE Conf. Computer Vision and Pattern [31] D. Gao, V. Mahadevan, and N. Vasconcelos, “On the Plausibility
Recognition, 2009. of the Discriminant Center-Surround Hypothesis for Visual
[6] A. Basharat, A. Gritai, and M. Shah, “Learning Object Motion Saliency,” J. Vision, vol. 8, no. 7, pp. 1-18, 2008.
Patterns for Anomaly Detection and Improved Object Detection,” [32] V. Mahadevan and N. Vasconcelos, “Background Subtraction in
Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008. Highly Dynamic Scenes,” Proc. IEEE Conf. Computer Vision and
Pattern Recognition, 2008.
[7] T. Xiang and S. Gong, “Video Behavior Profiling for Anomaly
[33] A. Chan and N. Vasconcelos, “Probabilistic Kernels for the
Detection,” IEEE Trans. Pattern Analysis and Machine Intelligence,
Classification of Auto-Regressive Visual Processes,” Proc. IEEE
vol. 30, no. 5, pp. 893-908, May 2008.
Conf. Computer Vision and Pattern Recognition, 2005.
[8] Y. Cong, J. Yuan, and J. Liu, “Sparse Reconstruction Cost for [34] J.R. Hershey and P.A. Olsen, “Approximating the Kullback
Abnormal Event Detection,” Proc. IEEE Conf. Computer Vision and Leibler Divergence between Gaussian Mixture Models,” Proc.
Pattern Recognition, 2011. IEEE Int’l Conf. Acoustics, Speech, and Signal Processing, 2007.
[9] B. Antic and B. Ommer, “Video Parsing for Abnormality [35] A.B. Chan and N. Vasconcelos, “Efficient Computation of the Kl
Detection,” Proc. IEEE Int’l Conf. Computer Vision, 2011. Divergence between Dynamic Textures,” Technical Report SVCL-
[10] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly Detection: A TR-2004-02, Dept. of Electrical and Computer Eng., Univ. of
Survey,” ACM Computing Surveys, vol. 41, no. 3, article 15, 2009. California San Diego, 2004.
[11] G. Doretto, A. Chiuso, Y. Wu, and S. Soatto, “Dynamic Textures,” [36] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond Bags of Features:
Int’l J. Computer Vision, vol. 51, no. 2, pp. 91-109, 2003. Spatial Pyramid Matching for Recognizing Natural Scene Cate-
[12] A. Chan and N. Vasconcelos, “Modeling, Clustering, and gories,” Proc. IEEE Conf. Computer Vision and Pattern Recognition,
Segmenting Video with Mixtures of Dynamic Textures,” IEEE 2006.
Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 5, [37] A. Chan, E. Coviello, and G. Lanckriet, “Clustering Dynamic
pp. 909-926, May 2008. Textures with the Hierarchical EM Algorithm,” Proc. IEEE Conf.
[13] D. Gao and N. Vasconcelos, “Decision-Theoretic Saliency: Computer Vision and Pattern Recognition, 2010.
Computational Principles, Biological Plausibility, and Implica- [38] J. Lafferty, A. McCallum, and F. Pereira, “Conditional Random
tions for Neurophysiology and Psychophysics,” Neural Computa- Fields: Probabilistic Models for Segmenting and Labeling Se-
tion, vol. 21, no. 1, pp. 239-271, 2009. quence Data,” Proc. 18th Int’l Conf. Machine Learning, 2001.
[14] C. Stauffer and W. Grimson, “Learning Patterns of Activity Using [39] S. Kumar and M. Hebert, “Discriminative Fields for Modeling
Real-Time Tracking,” IEEE Trans. Pattern Analysis and Machine Spatial Dependencies in Natural Images,” Proc. Advances in Neural
Intelligence, vol. 22, no. 8, pp. 747-757, Aug. 2000. Information Processing Systems, 2004.
[15] T. Zhang, H. Lu, and S. Li, “Learning Semantic Scene Models by [40] X. He, R. Zemel, and M. Carreira-Perpinán, “Multiscale Condi-
Object Classification and Trajectory Clustering,” Proc. IEEE Conf. tional Random Fields for Image Labeling,” Proc. IEEE Conf.
Computer Vision and Pattern Recognition, 2009. Computer Vision and Pattern Recognition, 2004.
[16] N. Siebel and S. Maybank, “Fusion of Multiple Tracking [41] G.E. Hinton, “Training Products of Experts by Minimizing
Algorithms for Robust People Tracking,” Proc. European Conf. Contrastive Divergence,” Neural Computation, vol. 14, pp. 1771-
Computer Vision, 2006. 1800, 2002.
[17] X. Cui, Q. Liu, M. Gao, and D.N. Metaxas, “Abnormal Detection [42] T. Minka, “A Comparison of Numerical Optimizers for Logistic
Using Interaction Energy Potentials,” Proc. IEEE Conf. Computer Regression,” technical report, Microsoft Research, 2003.
Vision and Pattern Recognition, 2011. [43] V. Mahadevan, W. Li, V. Bhalodia, and N. Vasconcelos, “Anomaly
[18] F. Jiang, J. Yuan, S.A. Tsaftaris, and A.K. Katsaggelos, “Anom- Detection in Crowded Scenes,” Proc. IEEE Conf. Computer Vision
alous Video Event Detection Using Spatiotemporal Context,” and Pattern Recognition, 2010.
Computer Vision and Image Understanding, vol. 115, no. 3, pp. 323- [44] T.P. Kah-Kay Yung, “Example-Based Learning for View-Based
333, 2011. Human Face Detection,” IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 20, no. 1, pp. 39-51, Jan. 1998.
[19] A. Adam, E. Rivlin, I. Shimshoni, and D. Reinitz, “Robust Real-
Time Unusual Event Detection Using Multiple Fixed-Location
Monitors,” IEEE Trans. Pattern Analysis and Machine Intelligence, Weixin Li received the bachelor’s degree from
vol. 30, no. 3, pp. 555-560, Mar. 2008. Tsinghua University, Beijing, China, in 2008,
the MSc degree in electrical engineering from
[20] B. Zhao, L. Fei-Fei, and E. Xing, “Online Detection of Unusual
the University of California, San Diego, in 2011,
Events in Videos via Dynamic Sparse Coding,” Proc. IEEE Conf.
and is currently working toward the PhD
Computer Vision and Pattern Recognition, 2011.
degree. His research interests primarily include
[21] D. Helbing and P. Molnár, “Social Force Model for Pedestrian computational vision and machine learning,
Dynamics,” Physical Rev. E, vol. 51, no. 5, pp. 4282-4286, 1995. with specific focus on visual analysis of human
[22] O. Boiman and M. Irani, “Detecting Irregularities in Images and in behavior, activity, and event, and models with
Video,” Int’l J. Computer Vision, vol. 74, no. 1, pp. 17-31, 2007. latent variables and their applications. He is a
[23] V. Saligrama and Z. Chen, “Video Anomaly Detection Based on student member of the IEEE.
Local Statistical Aggregates,” Proc. IEEE Conf. Computer Vision and
Pattern Recognition, 2012.
[24] R. Hamid, A. Johnson, S. Batta, A. Bobick, C. Isbell, and G.
Coleman, “Detection and Explanation of Anomalous Activities:
Representing Activities as Bags of Event N-Grams,” Proc. IEEE
Conf. Computer Vision and Pattern Recognition, 2005.
32 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 1, JANUARY 2014

Vijay Mahadevan received the BTech degree Nuno Vasconcelos received the licenciatura
from the Indian Institute of Technology, Madras, degree in electrical engineering and computer
in 2002, the MS degree from Rensselaer science from the Universidade do Porto, Portu-
Polytechnic Institute, Troy, New York, in 2003, gal, and the MS and PhD degrees from the
and the PhD degree from the University of Massachusetts Institute of Technology. He is a
California, San Diego, in 2011, all in electrical professor in the Electrical and Computer En-
engineering. From 2004 to 2006, he was with the gineering Department, University of California,
Multimedia group at Qualcomm Inc., San Diego, San Diego, where he heads the Statistical Visual
California. He is currently with Yahoo! Labs, Computing Laboratory. He has received a US
Bengaluru. His interests include computer National Science Foundation (NSF) CAREER
vision and machine learning and their applications. He is a member of award, a Hellman Fellowship, and has authored more than 150 peer-
the IEEE. reviewed publications. He is a senior member of the IEEE.

. For more information on this or any other computing topic,

please visit our Digital Library at www.computer.org/publications/dlib.

Anomaly_Detection_in_Crowded_Scenes
No ratings yet
Anomaly_Detection_in_Crowded_Scenes
8 pages
Adaptive Sparse Representations For Video Anomaly Detection
No ratings yet
Adaptive Sparse Representations For Video Anomaly Detection
15 pages
Neurocomputing: Dan Xu, Rui Song, Xinyu Wu, Nannan Li, Wei Feng, Huihuan Qian
No ratings yet
Neurocomputing: Dan Xu, Rui Song, Xinyu Wu, Nannan Li, Wei Feng, Huihuan Qian
3 pages
1-s2.0-S1047320321000201-main
No ratings yet
1-s2.0-S1047320321000201-main
14 pages
Log Eucledian Covariance Matrix
No ratings yet
Log Eucledian Covariance Matrix
8 pages
TSP_CMC_22147
No ratings yet
TSP_CMC_22147
15 pages
1 s2.0 S0925231223006847 Main
No ratings yet
1 s2.0 S0925231223006847 Main
12 pages
Abnormal Crowd Behavior Detection Using Motion Inf
No ratings yet
Abnormal Crowd Behavior Detection Using Motion Inf
10 pages
Anomaly Detection in Extremely Crowded Scenes Using Spatio-Temporal Motion Pattern Models
No ratings yet
Anomaly Detection in Extremely Crowded Scenes Using Spatio-Temporal Motion Pattern Models
8 pages
MOMS With Events: Multi-Object Motion Segmentation With Monocular Event Cameras
No ratings yet
MOMS With Events: Multi-Object Motion Segmentation With Monocular Event Cameras
15 pages
2004.00222
No ratings yet
2004.00222
12 pages
Benkabou 2021
No ratings yet
Benkabou 2021
11 pages
2110.02642v5
No ratings yet
2110.02642v5
20 pages
Detecting Pedestrians Using Patterns of Motion and Appearance
No ratings yet
Detecting Pedestrians Using Patterns of Motion and Appearance
8 pages
09_chapter 2
No ratings yet
09_chapter 2
16 pages
06811181
No ratings yet
06811181
12 pages
2002.01852v3
No ratings yet
2002.01852v3
14 pages
Campus Abnormal Behavior Recognition With Temporal Segment Transformers
No ratings yet
Campus Abnormal Behavior Recognition With Temporal Segment Transformers
14 pages
Video Anomaly Detection via Motion Completion Diffusion for Intelligent Surveillance System
No ratings yet
Video Anomaly Detection via Motion Completion Diffusion for Intelligent Surveillance System
11 pages
Any-Shot Sequential Anomaly Detection in Surveillance Videos CVPRW 2020 Paper
No ratings yet
Any-Shot Sequential Anomaly Detection in Surveillance Videos CVPRW 2020 Paper
6 pages
Weight_thresholded_regularized_robust_coding_fo_ vision-sensing_based_hand_gesture_detection_in_collaborative_robotics
No ratings yet
Weight_thresholded_regularized_robust_coding_fo_ vision-sensing_based_hand_gesture_detection_in_collaborative_robotics
18 pages
End-To-End Contextual Perception and Prediction With Interaction Transformer
No ratings yet
End-To-End Contextual Perception and Prediction With Interaction Transformer
8 pages
TAM-Net Temporal Enhanced Appearance-to-Motion Generative Network For Video Anomaly Detection
No ratings yet
TAM-Net Temporal Enhanced Appearance-to-Motion Generative Network For Video Anomaly Detection
8 pages
Obstacle Detection For Visually Impaire Using IoT
No ratings yet
Obstacle Detection For Visually Impaire Using IoT
21 pages
Detection of Anomalous Crowd Behavior Using Spatio Tempora Multiresolution Model and Kronecker Sum Decompositions
No ratings yet
Detection of Anomalous Crowd Behavior Using Spatio Tempora Multiresolution Model and Kronecker Sum Decompositions
10 pages
Adversarial Multi Scale Features Learning For Person Re Identification
No ratings yet
Adversarial Multi Scale Features Learning For Person Re Identification
4 pages
Online Real-Time Crowd Behavior Detection in Video Sequences
No ratings yet
Online Real-Time Crowd Behavior Detection in Video Sequences
11 pages
eeg pdf3
No ratings yet
eeg pdf3
5 pages
Abnormal Event Detection in Crowded Scenes Using One-Class SVM
No ratings yet
Abnormal Event Detection in Crowded Scenes Using One-Class SVM
9 pages
Human Activity Recognition Based On Spatial Transform in Video Surveillance
No ratings yet
Human Activity Recognition Based On Spatial Transform in Video Surveillance
5 pages
paper for bibliometric analysis_occlusion
No ratings yet
paper for bibliometric analysis_occlusion
3 pages
Re-Identification of Humans in Crowds Using Personal, Social and Environmental Constraints
No ratings yet
Re-Identification of Humans in Crowds Using Personal, Social and Environmental Constraints
14 pages
Context-Aware Drift Detection
No ratings yet
Context-Aware Drift Detection
25 pages
Minorpaper PDF
No ratings yet
Minorpaper PDF
19 pages
Action Recognition
No ratings yet
Action Recognition
14 pages
229 PDF
No ratings yet
229 PDF
7 pages
Deep Learning For Inertial Positioning A Survey
No ratings yet
Deep Learning For Inertial Positioning A Survey
18 pages
Occlusion Handling and Multi-Scale Pedestrian Detection Based On Deep Learning A Review
No ratings yet
Occlusion Handling and Multi-Scale Pedestrian Detection Based On Deep Learning A Review
21 pages
Vehicle Detection 1
No ratings yet
Vehicle Detection 1
15 pages
Action Recognition From Video Using
No ratings yet
Action Recognition From Video Using
16 pages
Disentangling Aperiodic Events From Traffic Series With A Dual-Branch Model.18286v1
No ratings yet
Disentangling Aperiodic Events From Traffic Series With A Dual-Branch Model.18286v1
13 pages
3 ICT Nawel
No ratings yet
3 ICT Nawel
6 pages
Future Frame Prediction Network For Human Fall Detection in Surveillance Videos
No ratings yet
Future Frame Prediction Network For Human Fall Detection in Surveillance Videos
11 pages
Analyzing Tracklets For The Detection of Abnormal Crowd Behavior
No ratings yet
Analyzing Tracklets For The Detection of Abnormal Crowd Behavior
8 pages
Tapu A Smartphone-Based Obstacle 2013 ICCV Paper PDF
No ratings yet
Tapu A Smartphone-Based Obstacle 2013 ICCV Paper PDF
8 pages
A Neural Network-Based Navigation Approach
No ratings yet
A Neural Network-Based Navigation Approach
17 pages
Computer Vision and Image Understanding: Nazim Ashraf, Yuping Shen, Xiaochun Cao, Hassan Foroosh
No ratings yet
Computer Vision and Image Understanding: Nazim Ashraf, Yuping Shen, Xiaochun Cao, Hassan Foroosh
16 pages
Coupled Prediction Classification For Robust Visual Tracking
No ratings yet
Coupled Prediction Classification For Robust Visual Tracking
15 pages
TETC 1 Deep_Learning_for_Visual_Localization_and_Mapping_A_Survey
No ratings yet
TETC 1 Deep_Learning_for_Visual_Localization_and_Mapping_A_Survey
21 pages
Unusual Crowd Activity Detection Using Opencv and Motion Influence Map
No ratings yet
Unusual Crowd Activity Detection Using Opencv and Motion Influence Map
6 pages
Mobility Episode Detection From CDR's Data Using Switching Kalman Filter
No ratings yet
Mobility Episode Detection From CDR's Data Using Switching Kalman Filter
7 pages
IoT-based Obstacle Recognition Technique For Blind
No ratings yet
IoT-based Obstacle Recognition Technique For Blind
20 pages
Depth-Based Human Fall Detection via Shape Features and Improved Extreme Learning Machine
No ratings yet
Depth-Based Human Fall Detection via Shape Features and Improved Extreme Learning Machine
8 pages
Autonomous Intelligent Surveillance Networks: Richard Comley
No ratings yet
Autonomous Intelligent Surveillance Networks: Richard Comley
5 pages
Precision and Efficiency in Dam Crack Inspection
No ratings yet
Precision and Efficiency in Dam Crack Inspection
20 pages
annomally detection reserach paper
No ratings yet
annomally detection reserach paper
21 pages
Prokaj Zhao Medioni Wasa 2012
No ratings yet
Prokaj Zhao Medioni Wasa 2012
7 pages
Self Organizing Deployment of Diverse Mobile Sensors: A Survey Paper
No ratings yet
Self Organizing Deployment of Diverse Mobile Sensors: A Survey Paper
5 pages
Underwater Computer Vision: Exploring the Depths of Computer Vision Beneath the Waves
From Everand
Underwater Computer Vision: Exploring the Depths of Computer Vision Beneath the Waves
Fouad Sabry
No ratings yet
Optical Flow: Exploring Dynamic Visual Patterns in Computer Vision
From Everand
Optical Flow: Exploring Dynamic Visual Patterns in Computer Vision
Fouad Sabry
No ratings yet
Assignment 0
No ratings yet
Assignment 0
2 pages
CS29003 Algorithms Laboratory Assignment 1: Logarithmic Vs Linear Vs Exponential Growth of Functions
No ratings yet
CS29003 Algorithms Laboratory Assignment 1: Logarithmic Vs Linear Vs Exponential Growth of Functions
4 pages
Scalable Data Mining (Autumn 2021) : Assignment 1 (Full Marks: 100)
No ratings yet
Scalable Data Mining (Autumn 2021) : Assignment 1 (Full Marks: 100)
3 pages
PFC Update Menu 01.01.2022
No ratings yet
PFC Update Menu 01.01.2022
16 pages
Assignment 1: Time Complexity of Algorithms
No ratings yet
Assignment 1: Time Complexity of Algorithms
2 pages
Assignment 1: CS21003 Algorithms 1
No ratings yet
Assignment 1: CS21003 Algorithms 1
1 page
Unsupervised Real-Time Anomaly Detection For Streaming Data: Neurocomputing June 2017
No ratings yet
Unsupervised Real-Time Anomaly Detection For Streaming Data: Neurocomputing June 2017
15 pages
Questions TCP
No ratings yet
Questions TCP
2 pages
01 Introduction
No ratings yet
01 Introduction
26 pages
02 DataCategorization
No ratings yet
02 DataCategorization
41 pages
Computer Communication & Networking: Sudipta Mahapatra E & ECE Department IIT Kharagpur
No ratings yet
Computer Communication & Networking: Sudipta Mahapatra E & ECE Department IIT Kharagpur
28 pages
Data Analytics: Department of Computer Science & Engineering
No ratings yet
Data Analytics: Department of Computer Science & Engineering
13 pages
Operating Systems: K. Sreenivasa Rao Professor Dept of Cse Iit Kharagpur
No ratings yet
Operating Systems: K. Sreenivasa Rao Professor Dept of Cse Iit Kharagpur
50 pages
Lect 13
No ratings yet
Lect 13
41 pages
Lecture Note 04 - Why Startups Fail 13.01.2020
No ratings yet
Lecture Note 04 - Why Startups Fail 13.01.2020
64 pages
VLSI Engineering: (L-T-P: 3-0-0, CRE - 3)
No ratings yet
VLSI Engineering: (L-T-P: 3-0-0, CRE - 3)
11 pages
Lecture Note 1 Introduction Definition - 03.01.2020
No ratings yet
Lecture Note 1 Introduction Definition - 03.01.2020
59 pages
Lecture Note 03 Myths and Realities About Entrepreneurs 10.01.2020
No ratings yet
Lecture Note 03 Myths and Realities About Entrepreneurs 10.01.2020
50 pages
EXPERIMENT 1 - Adi
No ratings yet
EXPERIMENT 1 - Adi
9 pages
Lecture Note 2 - Three Inspiring Stories
No ratings yet
Lecture Note 2 - Three Inspiring Stories
47 pages
MOSFET Fabrication
No ratings yet
MOSFET Fabrication
9 pages
Evolution of Microelectronics: (From Discrete Devices To Modern Integrated Circuits - A Brief Review)
No ratings yet
Evolution of Microelectronics: (From Discrete Devices To Modern Integrated Circuits - A Brief Review)
50 pages
BTP Topics From Faculties
No ratings yet
BTP Topics From Faculties
1 page
Tutorial - Ii: Digital Signal Processing
No ratings yet
Tutorial - Ii: Digital Signal Processing
13 pages
16EC30021 - DSP Lab Report - Exp01 - Palak
No ratings yet
16EC30021 - DSP Lab Report - Exp01 - Palak
8 pages
Chapter 1 Overview and Development of Supply Chain Management Learning Objectives
No ratings yet
Chapter 1 Overview and Development of Supply Chain Management Learning Objectives
11 pages
Advances in RAMS Engineering: In Honor of Professor Ajit Kumar Verma on His 60th Birthday Durga Rao Karanki - Get instant access to the full ebook with detailed content
100% (3)
Advances in RAMS Engineering: In Honor of Professor Ajit Kumar Verma on His 60th Birthday Durga Rao Karanki - Get instant access to the full ebook with detailed content
63 pages
Innovative Vehicle: Breaks Rules in Size: Power Ratio
No ratings yet
Innovative Vehicle: Breaks Rules in Size: Power Ratio
4 pages
Xavier University-Ateneo de Cagayan School of Business Management Corrales Avenue, Cagayan de Oro City
No ratings yet
Xavier University-Ateneo de Cagayan School of Business Management Corrales Avenue, Cagayan de Oro City
5 pages
1769 Iq32t
No ratings yet
1769 Iq32t
20 pages
Hsganh12 (2024-2025) 14 40 28 09
No ratings yet
Hsganh12 (2024-2025) 14 40 28 09
10 pages
Curmap Science10 QRT2
No ratings yet
Curmap Science10 QRT2
5 pages
A Different Osculation Approach To Test Divisibility of Numbers-1
No ratings yet
A Different Osculation Approach To Test Divisibility of Numbers-1
5 pages
PDF To Word 1
No ratings yet
PDF To Word 1
82 pages
Tender Document - SEF Impact Assessment 2
No ratings yet
Tender Document - SEF Impact Assessment 2
47 pages
Zen Macrobiotics PDF
100% (2)
Zen Macrobiotics PDF
124 pages
Get Statistics A Tool For Social Researchers 4 Canadian Edition Riva Lieflander PDF Ebook With Full Chapters Now
100% (7)
Get Statistics A Tool For Social Researchers 4 Canadian Edition Riva Lieflander PDF Ebook With Full Chapters Now
62 pages
Reasearch Methods
No ratings yet
Reasearch Methods
9 pages
Auto-Oxidation-Reduction of Benzaldehyde
No ratings yet
Auto-Oxidation-Reduction of Benzaldehyde
5 pages
NSX 39circuit
No ratings yet
NSX 39circuit
1 page
Chalcogenide Perovskites: Tantalizing Prospects, Challenging Materials
No ratings yet
Chalcogenide Perovskites: Tantalizing Prospects, Challenging Materials
28 pages
ABSTRACT ALGEBRA by Solo
No ratings yet
ABSTRACT ALGEBRA by Solo
7 pages
MATH Q1 Lesson 20 Estimating The Quotient of 3-To 4-Digit Dividends... Marvietblanco
No ratings yet
MATH Q1 Lesson 20 Estimating The Quotient of 3-To 4-Digit Dividends... Marvietblanco
21 pages
NCR Format
No ratings yet
NCR Format
5 pages
Analysis and Design of Flight Vehicles Structures - Bruhn
100% (1)
Analysis and Design of Flight Vehicles Structures - Bruhn
817 pages
Understanding Ambedkar
No ratings yet
Understanding Ambedkar
25 pages
Baghdad Modernism Amin Al Saden Pages
No ratings yet
Baghdad Modernism Amin Al Saden Pages
4 pages
Narrative Report
No ratings yet
Narrative Report
1 page
Kristen Heflin CV Online Version
No ratings yet
Kristen Heflin CV Online Version
8 pages
Cumulative Trauma Disorder
No ratings yet
Cumulative Trauma Disorder
39 pages
ĐỀ ĐỀ XUẤT CBH - HN lớp 10
No ratings yet
ĐỀ ĐỀ XUẤT CBH - HN lớp 10
9 pages
Agriculture + Mineral Energy Resources Notes Pro
No ratings yet
Agriculture + Mineral Energy Resources Notes Pro
80 pages
SWOT
No ratings yet
SWOT
7 pages
nmc-wb-5
No ratings yet
nmc-wb-5
18 pages
Chapter 1: Executive Summary
No ratings yet
Chapter 1: Executive Summary
40 pages