A Survey On Video Detection and Tracking of Maritime Vessels
A Survey On Video Detection and Tracking of Maritime Vessels
R. Da S. Moreira http://iaras.org/iaras/journals/ijsp
Abstract: - Maritime surveillance systems can be employed to increase the security of ports, airports, merchant
and war ships against pirates, terrorists or any hostile vessel attacks, to avoid collisions, to control maritime
traffic at ports and channels and for coastal and oil platforms defense. Cameras are one of the main sensors of
these systems. They are cheap and complement other types of sensors. There are few papers about video
maritime surveillance systems present in literature compared with other kinds of video surveillance systems.
This survey was motivated by the importance of the subject, to motivate new researches and because there are
no surveys about video detection and tracking of marine vehicles or they are not widespread. The paper
presents the state of the art algorithms.
Key-words: - Maritime surveillance systems, tracking, detection, features, image processing, radars
billion per year [4]. The attack against civilian and system- Automatic Identification System - [15], the
military marine vehicles is one way to hurt the ASV system- Automatic Sea Vision - [15], the VMS
economy and security of a country [5, 6]. The system- Vessel Monitoring System – [2] and the
terrorist attack against the U.S. warship Cole DDG AIVS3 system - Automated Intelligent Video
67 occurred at port Aden, Yemen, caused the death Surveillance system for Ships - [6] are examples of
of 17 people [5]. The French tanker Limburg also maritime surveillance systems that perform data
suffered terrorist attack at the Yemen coast. Pirate fusion.
attacks are very common in Somalia, in the Strait of
Malacca and Indonesia [7].
2 Components of a video surveillance
The manual operation of surveillance systems is not
efficient due to fatigue, stress and the limited ability
system
A complete video surveillance system consists of
of human beings to perform certain tasks, the
five main components, the initial detector, the image
development of automated systems for maritime
processor, the classifier, the tracker and the behavior
surveillance is essential to reduce the occurrence of
analyzer. Figure 1 shows a complete video
unwanted events [2, 3, 5 -19].
surveillance system.
The use of cameras in maritime surveillance Vehicle Image
systems has increased [19]. Cameras are essential to Frame I(t) Vehicle Image Classifier
Frame I(t) Detector Processor Classifier
Detector Processor
assist and supplement the radars and other sensors.
They are cheap, flexible [6, 11, 17, 20] and can be Behavior
Behavior Tracker
installed on almost every platform type [2]. The Analyzer Tracker
Analyzer
magnetometer detects vehicles by the change in the
Fig 1. main components of a complete video
magnetic field around the vehicle, but are limited to
surveillance system.
detect vehicles within walking distance [10]. Low
and high frequency radars are expensive, hampered
Some surveillance systems may not contain all these
by clutter [10], have blind zones close to the
components. The initial detector is a motion detector
transmitting antenna [6, 11] and detect with low
that detects all pixels in motion [1, 2, 4, 5, 7, 9, 15,
efficiency the vehicles built with non-conductive
16] or an object detector based on a classifier set
materials [4, 6, 7].
[22]. The information obtained by the initial
detector is handled by the image processor to
Efforts have been made worldwide for the
eliminate noise, to segment the most relevant
development of maritime surveillance systems. The
regions and to detect the connected components.
European project AMASS - Autonomous Maritime
These regions are evaluated and classified into
Surveillance System - was created to develop a
objects that are or are not of interest by the
surveillance system with FLIR cameras installed on
classifier. The objects of interest are modeled and
advanced platforms [17]. The AVITRACK system
are therefore called objects being tracked OT. The
[21] and MAAW - Maritime Activity Analysis
tracker attempts to locate the OT in a region of
Workbench - [5] are surveillance systems based an
interest ROI at each frame I(t) and determines the
cameras. The ARGOS system [1] has been active
OT position P(OT(t)). The ROI is the frame region
since 2007 and is used to monitor the maritime
where the probability OT be found is higher. The
traffic at Gran waterway in Venice, Italy. The
vehicle trajectory and speed are sent to the behavior
SELEX Sistemi Integrati system integrates the data
analyzer. It generates an alert to a control center if it
obtained by cameras and by radars and are operating
classify the event as a suspicious activity [3, 5, 6,
in Russia, Italy, Poland, China and Panama. Burkle
16, 18]. The trajectory and speed analysis can also
et al. [13] proposed a surveillance system based on
improve the efficiency of the detection and
cameras installed on different platforms and land
classification of marine vehicles [9].
bases to increase the system coverage area.
Marine vehicles do not have particular
New technologies have emerged allowing the data
characteristics that can be used for an efficient
fusion extracted from different systems and sensors.
classification [12]. It is difficult to construct a
The cameras are one of the main system
representative database for vessel classification due
components [2, 5]. The AMFIS system [13], the AIS
to the variety of marine vehicles types [6, 15].
Although some surveillance systems perform the surface or over the vehicle and they decrease the
classification [5, 18, 19], these systems classify a influence of white foam [9, 10], but they limit the
limited number of marine vehicles types and the quantity of features that can be extracted [20] and
classification efficiency depends on the position and have high energy consumption [2, 10].
distance relative to the camera.
Most of the surveillance systems use fixed cameras
[15]. Systems based on cameras installed on buoys
3 Difficulties have to compensate their movement to lower the
Conventional algorithms for detection and tracking probability of tracking mistakes [2]. In these cases,
vessels at video, when applied to a maritime the horizon line is used as a reference. Cameras
environment without proper adjustments, do not installed on aircrafts or low port marine vessels can
produce efficient results, as the background is quite produce tracking fails caused by the vibratory
dynamic. The maritime scenario presents challenges camera movement, being necessary to use a
that may hinder the initial detector and the tracker. smoothing filter [20].
The dynamic and unpredictable ocean appearance
makes their mathematical modeling difficult [7, 9].
The images captured by the cameras may not be 4 Horizon line detection
clear due to the presence of noise and clutter caused The initial detector usually detects a maritime vessel
by the electronics equipments or by the adverse around the position of the horizon line PHL. After
environmental condition, such as storms, haze and estimating the PHL, the surveillance system detects
low luminosity [4, 14, 20]. The white foam on the the maritime vehicles that arises next and above the
water surface caused by the vehicle propeller or by horizon line, limiting the search region and reducing
the waves, the sunlight reflection, the change in the execution time of the initial detector [2, 10, 14].
lighting conditions, the constant change of each The ROI region can be reduced to the ocean region,
pixel value caused by waves, the presence of objects below the PHL [6, 15, 19].
that float over the ocean, the great variability of
certain maritime vehicles features such as size, Authors like Fefilatyev et al. [14], Todorovic [23]
maneuverability, appearance, geometric shape, the and Ettinger et al. [24] estimate the PHL by
low contrast of the image captured by the cameras minimizing the intra-class variance of the sky and
or between the marine vehicle and the background the sea pixels values. To minimize the influence of
and the presence of birds, clouds, fog and aircraft the coast and marine vehicles present near the
that arises immediately above the horizon hinder the horizon, Fefilatyev [10] proposed the Unsupervised
detector and tracker [1, 2, 4, 7, 9, 10, 11, 14, 16, 19]. Slice algorithm. The image is divided into N parts
Figure 2 shows an image with low contrast and with N-1 vertical lines evenly distributed. The line
clutter. Figure 3 demonstrates the error caused by segments that minimize the intra-class variance of
white foam. each part are calculated and combined to estimate
the PHL. Fefilatyev et al. [25] minimize the
intra-class variance using features extracted from
the pixel values. Cornall et al. [26] estimate the PHL
by segmenting the pixels with a threshold. The
centroids of the sky and sea pixels define a segment
Fig 2. image with low contrast and clutter [18]. perpendicular to PHL. McGee et al. [27] and
Fefilatyev et al. [25] segment the sky and sea pixels
with an SVM classifier. McGee et al. [27] estimate
the PHL as the line that separates the sky and sea
pixels with less error among all candidate lines.
Fefilatyev et al. [25] define a quantized pixel map
Fig 3. white foam generated by the ship [16]. {-1,1} according to the classification. The PHL is
the line that minimizes the intra-class pixel variance
It is common to use FLIR cameras - Forward on the map. The surveillance system proposed by
Looking Infrared – because they are more Kruger et al. [17] have cameras with inertial units
insensitive to changes in lighting conditions, they do that determine the camera position in space,
to capture the sunlight reflection over the sea stabilize the image and reduce the total number of
possible candidate positions for the PHL. and divide the image into N regions and extract
features, such as entropy, energy, uniformity and
Fefilatyev et al. [2] discard the frames in which the contrast of each area. Maritime regions where
PHL estimation is unreliable to increase the detector vehicles are present have different characteristics
and the tracker robustness. The reliability reduction from the other regions.
can occur when the sky or the sea comes out of the
camera field of view and when water droplets are The constant movement of the water is one of the
deposited on the lens. Considering the hypothesis factors that cause failures in algorithms based on the
that the sea and sky pixel values have Normal background subtraction [4, 6]. The background
distributions, Fefilatyev et al. [2] select a small set subtraction statistically exploits the fact that each
of candidate lines with a less robust algorithm based pixel value follows a normal distribution (equation
on the Hough transform applied to a gradient map (1)) or a mixture of normal distributions over time.
and than select among the candidate lines the one The probability P of each pixel I(x,y) belong to the
that maximizes an function that indicates the ocean or to the vehicle is related to the difference of
variance between the two classes to accelerate the its value and the mean of each distribution
PHL estimation. Wei et al. [6] applie the Hough considering the distributions variances (equation
transform on a gradient map calculated over the (2)).
application result of a smoothing filter to the first
frame. If the line is not accurately detected, the BM(x,y) = N(x,y) (μ ,σ ) (1)
search region becomes the entire image. Bloisi et al. P(I(x,y)) ∈BM(x,y) σ (2)
[19] applies the Hough transform on a gradient | I(x,y)-μ |
image to determine a candidate PHL. The PHL
estimation is validated if 90% of sampled pixels Many authors have reported that using a background
above and below the PHL have different values. model BM represented by a mixture of Gaussians is
less efficient. Szpak and Tapamo [7] conducted
The approaches based on the Hough transform statistical tests based on DIP - Departure from
[2,6,19] and on optical flow [28] have higher Unimodality - and concluded that the pixel values in
computational complexity. most cases have a Normal distribution, however,
Bloisi et al. [1] reported that a mixture of Normal
distributions can represent the ocean better. The
5 Initial vessel detection right conclusion is that the best representation
An efficient initial detection of the maritime vehicle depends on the application.
is important because the performance of all other
surveillance system components depend on its Pires et al. [15], Grupta et al. [5] and Robert-Inácio
performance. Marine environments are very et al. [9] represent BM(x,y,t) by an adaptive Normal
dynamic and difficult to be processed, which can distribution. A maritime vehicle is detected whether
generate a lot of false detections FP and missing a connected component area larger than a threshold
vessels FN [10]. L is located on the region corresponding to the water
surface at the map of relevant pixels MRP. MRP is a
The initial detection based only on frame differences map that contains only the pixels that have a low
can fail in cases where the vehicle is docked or probability to belong to the BM. Pires et al. [15] and
moves toward the camera as little difference Robert-Inácio et al. [9] put in the MRP only the
between the pixel values at consecutive frames is pixels whose value of the difference between I(x,y,t)
produced [19]. The ocean pixels values are and µ(x,y,t) is greater than a threshold L2. Pires et
constantly varying due to the waves, which al. [15] calculates the difference pixel-by-pixel and
generates many FP [10]. There are detection Robert-Inácio et al. [9] split the image with a regular
algorithms based on the frequency information [29] grid and define I(x,y,t) as the average of the pixels
and on histograms [30], however, recent works use values at each region. Grupta et al. [5] put in the
Gaussian functions to model the sea pixel values MRP only the pixels whose squared value of the
and detect vehicles by background subtraction [1, 2, difference between I(x,y,t) and µ(x,y,t) divided by
4, 5, 7, 9, 15, 16]. The optical flow analysis is not σ(x,y,t) is greater than a threshold L3. Hu et al. [16]
much used for the initial detection due to the higher detect marine vehicles with background subtraction.
computational complexity [7]. Some authors [9, 29] The initial frames are used to define the BM.
BM(x,y) is the average of the last six I(x,y) values images recorded in a database and the result of the
inserted into a buffer. I(x,y,t) is inserted into the FFT transform applied to candidate regions at each
buffer only if the difference between µ(x,y,t) and the frame. Feineigle et al. [8] detect marine vehicles by
average value of the pixel at (x,y) and its 3x3 the Euclidean distance between SIFT feature points
neighborhood is greater than a threshold L at K detected at each frame and SIFT feature points
consecutive frames. Szpak and Tapamo [7] define present in a image dataset.
BM(x,y) as a Normal distribution initially estimated
with the first N frames and adjusted every frame Detection algorithms based on connected
considering higher weights to more recent frames. components localization must consider the vehicle
The probability of a pixel to belong to a marine proximity to the camera [4]. Using a camera focused
vehicle is proportional to the deviation of its value at infinity and installed on a buoy, Fefilatyev [10]
and its neighbors to the interval detects marine vehicles by exploiting the gradient
[µ(x,y,t)-3.σ(x,y,t);µ(x,y,t)+3.σ(x,y,t)]. At every Z information of the pixels above the PHL.
frames, an active contour starts at the image edges Morphological operations of erosion and dilation
and evolves to the position where a new marine followed by the connected components localization
vehicle is. The BM proposed by Bloisi et al. [1] is a are used to detect a pixel set with high gradient
mixture of seven Normal distributions defined by present above the PHL. Figure 4 shows a marine
clusterization of the RGB pixel values at (x,y) vehicle detection by exploiting the gradient
contained in the training images. It was chosen information.
seven distributions to represent all possible sea
appearances. The vehicle is detected when a
connected component has a low probability to
belong to the 7 distributions. Wei et al. [6] define
BM(x,y) = ax + by + c. The real values a, b and c
are the ones that minimize a mean squared error
function weighted by the pixel values that are below
the horizon line. They are updated at each frame.
The detection is performed with the search for Fig 4. detection of marine vehicles by exploiting the
connected components present at the residue image gradient information of the pixels above the PHL
I(x,y,t)-BM(x,y,t) segmented by thresholding. [10].
The initial detection validation by a classifier is Fefilatyev et al. [14] and Fefilatyev et al. [2]
present in the literature [18, 19], however, due to accelerated algorithm proposed by Fefilatyev [10].
high variability of the appearance and the geometric They removed the need for morphological
shape of marine vehicles, this approach is not very operations. The threshold values for the pixel
explored. Bloisi et al. [19] proposed an initial segmentation are obtained by applying the Otzu
detector based on a ensemble classifier trained segmentation method on a gradient map. Frost and
offline with Haar wavelet features. The ensemble Tapamo [4] detect marine vehicles by locating
was designed to increase the robustness of initial connected components based on segmentation by
detection in cases where a vessel is anchored and thresholding applied to a probability map estimated
when sunlight reflections or white foam are present by a Gaussian kernel function. Only connected
at the sea surface. Teutsch and Kruger [18] train a components with geometric shape similar to
SVM classifier with the features invariant moments, pre-defined models are considered marine vehicles.
some statistical measures such as mean and
variance, texture analysis, co-occurrence matrices The use of different and independent features is
and the gradient analysis to classify vehicles in two important to increase the robustness of the initial
steps. At the first step the detected candidates are detector and the tracker to the variability of vehicle
classified into objects over the ocean or clutter. If it and environment appearances. Kruger and Orlov
is classified as an object over the ocean, the object is [17] and Teutsch and Kruger [18] combine the result
classified as a marine vehicle or an irrelevant object of 3 detectors based on the extraction of distinctive
in the second step. Sullivan and M. Shah [31] detect features to determine if a vehicle is present near the
marine vehicles with the similarity value between PHL. Westall et al. [32] exploit the information
the result of the FFT transform applied to vehicles provided by different color spaces. At each frame
point in 3 different resolutions are extracted a color MRP that are distanced from PHL at least N pixels
histogram H and a histogram gradient orientations apart. The contour size of a FP caused by foam,
HoG through integral images to accelerate the shadows, reflections and waves decreases at every
extraction [11]. Connected points that have the frame to disappear when the active contour tracker
histograms H and HoG different from its neighbor’s based on level set functions proposed by Szpak and
histograms belong to a marine vehicle. Fefilatyev Tapamo [7] is applied. To reduce the FP rate, Bloisi
[10] compared the efficiency of texture et al. [19] use an ensemble classifier trained offline
measurements like entropy, average, standard to validate the initial detection. To decrease the FP
deviation, and moments up to the fourth order rate caused by white foam, Frost Frost and Tapamo
calculated with RGB value of each pixel and their [4] analyze if each pixel value remains different
neighborhood 11x11 pixels normalized to the from the BM value for more than N consecutive
interval [0,1]. Applying segmentation by frames. Foam pixels have lower persistence than the
thresholding, the pixels that belong to the sky, to the vehicle pixels. At the first step, Hu et al. [16] and
sea and to marine vehicles are separated into distinct Grupta et al. [5] eliminate the foam pixels removing
groups. Islam et al. [12] proposed a detector wose small connected components located at the MRP. Hu
initial image Q0 is blurred by a linear filter to et al. [16] applie an algorithm that eliminates
generate the image I. A Gaussian filter with σ=1 and shadows to reduce its influence. Pixels with high
other one with σ=3 are applied to Q0 and I to form brightness and chromaticity distortion are white
the filtered images Q1, Q3, I1 and I3. The foam candidates. The candidate pixels that have
differences Q1-Q3 and I1-I3 are applied to an brightness variation greater than a threshold are
anomaly detector to produce the A and B images. considered white foam.
A(x,y)-B(x,y) is proportional to the probability of
the pixel at (x,y) to belong to a marine vehicle. To decrease the FN and FP rates, some authors
apply morphological operations [5, 6, 32]. Grupta et
al. [5] applie the erosion, dilation and smoothing
5.1 Techniques Used To Lower The Quantity operations before looking for connected
Of FP And FN Detections components. Westall et al. [32] applie the opening,
closing, erosion and dilation operations to eliminate
The ocean is a dynamic environment that has waves,
noise and decrease the FP rate. Beyond these
white foam and light reflections on the water
operations, Wei et al. [6] applied the operations of
surface, which can generate a considerable FP and
opening and closing to the residue image I(x,y,t) -
FN amounts. Some authors [6, 9] report that the
BM(x,y,t) to eliminate clutter.
detection and tracking applied on IR images are
more efficient because the water temperature is not
influenced by these events.
6 Maritime Vehicle Tracking
Different methods are employed to decrease the FP There are many object tracking methods in the
and FN rates detections. Many authors [2, 7, 9- 11, literature. The mean-shift, successive clustering,
14, 15, 18, 19] only validate the initial detection of a active contour and template matching are the most
marine vehicle if the tracking result is consistent and used methods in marine environments. The use of
reliable at N consecutive frames. Fefilatyev [10] and Kalman filter [33] as an estimator produces good
Fefilatyev et al. [14] only consider an initial tracking applications results because the vehicle
detection if the detection is reliable and the centroid movement is not too complex [18].
and bounding box trajectories of the OT are
consistent at 10 of the first 20 frames. Fefilatyev et
al. [2] added to these rules the need of the vehicle 6.1 Kalman Filter
appearance to be almost constant at N consecutive The Kalman filter KF [33] is an optimal estimation
frames and the need for an object to have a method of the state of a stochastic, non-stationary,
considerable size. dynamic and linear process. Kalman [33] introduced
the representation of linear dynamical systems by
To decrease the FP rate caused by ocean waves, state equations. The process is governed by discrete
birds, aircrafts or objects of negligible size, and linear equations (equations (3) and (4)) [20].
Fefilatyev [10] and Grupta et al. [5] consider marine
vehicles the connected components located at the x(t+1) = A. x(t) + B.u(t) + w(t) (3)
z(t) = C.x(t) + D.u(t) + v(t) (4) the simplest tracking methods [9]. An image
segmentation algorithm is applied at each frame
Where x is the process state vector, which may image to generate a probability map. Then, a
contain variables related to the object translation, clustering algorithm forms the connected
scale and orientation and its first and second order components in the map. P(OT(t)) is usually
derivatives, u is the control vector, z is the considered the centroid position of the connected
measurement vector obtained by a tracking component that is nearest to the OT centroid
algorithm, A is the state transition matrix, B is the position estimated by the KF.
state control matrix, C is the observation matrix, D
is the measurement control matrix, w is the noise The surveillance system ASV [15] determines the
associated with the state and v is the noise vehicle spatial position geometrically by
associated with the measure. By hypothesis, w and v considering the camera height, the vehicle pixel
noise vectors are independent and have Gaussian closest to the water surface and the PHL. The
multivariate probability distribution functions of tracking is based on successive clustering. P(OT(t))
zero mean and diagonal covariance matrix Q and R is determined by associating the bounding box
respectively (w ~ N(0,Q) and v ~ N(0, R)). positions and centroid velocities estimated by the
KF for each vehicle and the ones calculated for each
KF is a recursive algorithm that consists of two connected component. Fefilatyev [10], Fefilatyev et
phases: time update and measurement update. The al. [14] and Fefilatyev et al. [2] track marine
time update phase (equations (5) and (6)) estimates vehicles by applying the Kalman filter and
the state vector x(t|t-1) value and the error matrix successive clustering at each frame. When one OT is
P(t|t-1) value considering the observations obtained not detected within the ROI estimated by the KF, the
at I(t-1). OT is considered occluded and the OT model is not
updated, but the KF continues estimating future
x ( t∣t-1)=A.x ( t-1∣t-1 ) (5) states of the OT bounding box and centroid. Bloisi
P(t∣t-1)=E((x(t)-x(t∣t-1)). (x(t)-x(t∣t-1)) T ) (6) et al. [1] and Grupta et al. [5] group together the
=A.P(t-1∣t-1) . A+Q clusters being tracked that are close to each other
and have similar movements in a single OT. Grupta
Where x(t) is the state at frame t, x(t|t-1) and et al. [5] segment the image by background
x(t-1|t-1) are a priori and a posteriori estimation of subtraction and analyze only the proximity and
the state vector, P(t|t-1) and P(t-1|t-1) are the a movement of the centroids. Bloisi et al. [1] segment
priori and a posteriori estimation of the error matrix the image by analyzing the optical flow similarity of
and E is the expected value. the pixels and cluster the neighbor segments with a
K-means algorithm. The optical flow is a dense field
The measurement update phase (equations (7), (8) vector of displacements that define the translation of
and (9)) corrects the x(t|t-1) and P(t|t-1) values by the pixels at successive frames. The OT movement
incorporating the z(t) measurement obtained bay the can be estimated by analyzing the optical flow of the
tracker at each frame. OT pixels. The optical flow is a vector that indicates
the displacement of the pixels between successive
frames and its calculation is performed considering
K(t) = P(t∣t-1).C T .(C.P(t∣t-1).CT +R) -1 (7)
the hypothesis that the brightness of the pixels at
x(t∣t) = x(t∣t-1) + K(t).(z(t)-C.x(t∣t-1)) (8)
successive frames do not vary abruptly [34]
P(t∣t) = P(t∣t-1) - K(t).C.P(t∣t-1) (9) (equation (10)).
Where K(t) is the Kalman gain at frame t. I(x,y,t) - I(x+ x, y+ y, t+ t) = 0 (10)
A very common application of KF is the prediction A high frame per second rate is required to secure
of each object position at frame t+1 to define the this hypothesis. The equation that connects the
ROI position [1, 2, 5, 6, 10, 14, 15, 18].
optical flow vector V=(∂x/∂t, ∂y/∂t)T and the first
order intensity derivative (equation (11)) is deduced
by Taylor series expansion up to the first order term
6.2 Successive Clustering
(equation (10)) [35].
The clustering applied to successive frames is one of
∥ ∥
n
p-p
proposed by Fukunaga and Hostetler [36], and then ∑ p i . wi . g( h i )
was adapted by Cheng [37] for image analysis, by m h (p)= i=1
2
- p (17)
∥ ∥
n
Comaniciu and Meer [38] for image segmentation p-p
and by Bradski [39] and Comaniciu et al. [40] for ∑ wi . g( h i )
i=1
object tracking.
Where g(a) = -k'(a) and wi is calculated by
The mean-shift algorithm considers the data as equation (18).
points in FS associated with a empirical probability
density function, where regions of dense data
√
n
ru
present in FS correspond to local maximum or w i (p)= ∑ .d[b(p i ) - u] (18)
modes of the data distribution. A local gradient i=1 cu . p
ascent algorithm is applied to the empirical
probability density function to determine the data Where ru is the of the bin u in the OT histogram and
region corresponding to the mode. Given n points p i, cu is the value of the bin u in the R histogram.
i=1,...,n in Rd, the empirical probability density
function EPDF(p) that has a radially symmetric Bibby and Reid [41] developed a tracker based on
kernel (equation (13)) centralized at p and has a mean-shift algorithm, but his approach fails in cases
bandwidth h is defined by equation (12) [37, 38]. of total occlusions.
1
n
p-p i Liu et al. [11] modifie the segmentation threshold by
EPDF(p) = d ∑ K( ) (12) selecting online the most discriminative features
nh i=1 h
2 with the algorithm proposed by Collins and Liu [42]
K(a) = ck .k(∥a ∥ ) (13) and applies the mean-shift algorithm starting from
the position estimated by KF to determine P(OT(t)).
Where ck is a normalization constant. The EPDF The feature pool has three color components, three
modes are localized at the points were the gradient differences between color components and the
of the EPDF is null (∇EPDF(p)=0). results of eight transformations applied to the Hue
component. The function that measures the
The OT model is represented by the function discrimination degree is based on the similarity
EPDFR [40]. Equivalently, a pixel region R is between the histograms of OT and its neighboring
represented by the function EPDFC. Both functions pixels.
are estimated by the histograms Hepdfr and Hepdfc
(equations (14) and (15)). At each interaction step,
the mean-shift vector (equation (17)) shifts R 6.4 Template Matching
toward a region of maximum similarity between the The template matching in the context of object
histograms calculated by the Taylor series expansion tracking is defined as the location of a small pixel
of the Bhattacharyya coefficient (equation (16)). set called template within the ROI [43]. The OT
The final R position is the OT position. model is the template to be found within the ROI.
Templates are constructed with the pixels inside a
n
simple geometric shape region. The position of the
Hepdfr u (p) = C ∑ k(∥(pi )∥2 ).d[b(p i )-u] (14)
candidate region C that maximizes the similarity
i=1
nh
(p-pi ) 2 between the OT model M and all candidates reveal
Hepdfc u (p) = C h ∑ k(∥ ∥ ).d[b(p i )-u] (15) P(OT(t)). The Hamming distance (equation (19))
h
i=1
[44], the Euclidean distance [45], the Cross
Correlation [46], NCC (equation (20)) - Normalized
Where b returns the histogram bin for the pixel p i, u
Cross Correlation - [47], the SSD (equation (21)) - pixels inside the ROI at I(t). The tracking algorithms
Sum of Squared Difference – [46] and SAD proposed by Fefilatyev et al. [14] and Fefilatyev et
(equation (22)) - Sum of Absolute Difference - [46] al. [2] define P(OT(t)) by the NCC template
are examples of similarity functions between matching algorithm used to stabilize the camera
templates. The weightless neural network WiSARD when the result of the segmentation by thresholding
can generate different similarity functions and can applied to an gradient image is unreliable. If the
be adapted to tracking [48]. The simplest similarity template matching is also unreliable, I(t) is
function is the sum of the differences between pixel discarded. The threshold value is calculated by Otsu
values of two templates (equation (19)). The values method. Moreira and Ebecken [48] proposed a
dx and dy that minimizes the function determines tracker based on the weightless neural network
P(OT(t)). WiSARD. The OT model is stored at the network
RAM nodes. Candidate regions of quantized pixels
∑ (I(x+dx,y+dy)-M(x,y)) are put at the network input. P(OT(t)) is defined as
Dif(M,Ci )= (x,y)∈M (19) the position of the region that maximizes the
Nx . N y network response. The tracker proposed by Hu et al.
∑ (I(x+dx,y+dy) . M(x,y)) [16] defines P(OT(t)) by a template matching
NCC(x,y)= (x,y) ∈M (20) algorithm that uses the MAD function (equation
√ ∑
(x,y) ∈M
M 2 (x,y) (24)) - Median of Absolute Differences – as the
similarity function.
SSD(x,y)= ∑ (M(x,y)-I(x+dx,y+dy)) 2 (21)
(x,y) ∈M W H
1
L1(x,y)= ∑ ∣M(x,y)-I(x+dx,y+dy)∣ (22) MAD= ∑ ∑ |OT(x,y,t)-I(x+i,y+j,t) | (24)
W.H i=0 j=0
(x,y)∈M
The L1 norm distance (equation (22)) rises the Where W and H are the length and height of the OT
robustness to noise because it generates a lower bounding box.
penalty than the quadratic SSD function penalty. To
limit the effects caused by variations in the
environment lighting conditions, the normalized 6.5 Histogram Matching
SSD can be used in place of the SSD (equation The histogram matching is a technique frequently
(23)). used for tracking objects because the histogram is
invariant to rotation and scale transformations
NSSD(x,y)= ∑ (A -B)2 (23) applied to the object and it is robust to partial
(x,y)∈M occlusions [49]. The appearance model is defined
M (x,y)- μ (M(x,y)) extracting a histogram with the OT pixels. P(OT(t))
A=
σ (M(x,y)) is the frame position that provides the maximum
I(x+dx,y+dy)-μ (I(x+dx,y+dy)) similarity measure between the OT histogram HM
B= and histograms extracted from candidate regions HC
σ (I(x+dx,y+dy))
(equation (25)).
Where µ and σ are the average and the standard n
deviation. S(HM,HC) = ∑ (HM(j)-H(j)) (25)
j=1
The similarity does not necessarily have to be
calculated with the pixel values. Any feature Where n is the total bin quantity and H(j) is the
extracted from a pixel region can be used. value of the bin j of the histogram H.
Fefilatyev et al. [14] and Fefilatyev et al. [2] Puzicha et al. [50] present other ways of calculating
stabilize the image obtained by the camera installed the similarity between histograms as the weighted
on a buoy by minimizing the NCC between two bin to bin difference, the histogram intersection
images IMG1 and IMG2. IMG1 is the difference (equation (26)) and χ². The log likelihood statistics
between the OT template and the average of the and log likelihood ratio statistics functions of
template pixel values. IMG2 is the difference similarity between histograms have been simplified
between the frame I(t) and the average value of the by Ojala et al. [51] (equation (27)) and (equation
(28)).
n
I=0
∑ min(H OR (j) - HOC (j)) I=1
I(H OR ,HOC ) = j=1 (26) I=2
n
∑ ( HOR (j))
j=1
n
L(H OR , HOC ) = ∑ HOC .log(HO R (j)) (27)
j=1
Fig 5. active contour evolution. I is the iteration step
n
H OC (j) number.
L(H OR , HOC )=2. ∑ HOC .log( ) (28)
j=1 H OR (j)
Goldenberg et al. [52] describes the mathematical
theory related to the parametric and non-parametric
The linear approximation of the Bhattacharyya
active contour methods. There are two ways to
coefficient [40] is the most used similarity function
represent the object contour: the explicit
for tracking objects, because it is easily calculated
representation, as is the case of snakes, or implicit
and because there are many authors who reported
representation, such as the level set function [4, 7].
the success of their application [49].
Snakes have not been applied for tracking marine
vehicles yet. For this reason, only the tracking based
The tracker proposed by Bloisi et al. [19] determine
on level set functions will be presented in this paper.
P(OT(t)) with histograms matching based on the
Bhattacharyya coefficient. The pixel values are in
A distance function that implicitly determines the
the HSV color space to minimize the influence of
curve C position is defined by equation (29) [4, 7].
shadows and lighting variations caused by sunlight
reflection over the sea surface. To decrease the
C={(x,y)| φ (x,y)=0} (29)
quantity of tracking and detection failures, Bloisi et
al. [19] proposed the radar and camera data fusion.
C is the set of image points whose level set function
Fusion occurs in a normalized plane where P(OT(t))
value is null. Many authors define the function as
is defined by the nearest neighbor rule. Westall et al.
the Euclidean distance between the point (x,y) and C
[32] detect the head of missing people at sea using
(equation (30)).
information in RGB, YCbCr, YIQ and HSV color
spaces considering by hypothesis that these color
{ }
spaces are independent. -d(x,y), if (x,y) is inside C
φ (x,y)= 0 , if (x,y) is over C (30)
d(x,y) , if (x,y) is outside C
6.6 Active Contour
The active contour tracking method represents the Where d(x,y) is the Euclidean distance between the
vehicle contour by one or more curves. The curves pixel at (x,y) and the curve C.
move dynamically at every frame toward the
position of the vehicle edges, which by hypothesis is The curve evolution is defined by the equation (31)
the place where the discontinuity of the pixel values [4]. The update of the level set function values at
are higher. Trackers generally use the final contour each point generates the implicit curve movement.
position at the previous frame as the initial position
at the current frame. The main advantage of the dφ
= V∣∇ φ∣ (31)
active contour is that it is relatively insensitive to dt
lighting variations. Figure 5 shows the active
contour evolution. Where V is a speed function that depends on the
pixel values and is independent of the
parametrization [52]. V can be defined as a gradient
function [4]. The update of the level set function
depends on the V value.
function. The energy function proposed by Frost and to detect the vehicle after the occlusion with greater
Tapamo [4] is minimized by a gradient descent efficiency. P(OT(t)) is determined by the particle
method. It is composed by a sum of three functions: that is more similar to the OT template.
the color histogram, the FFT transform and
statistical measures like entropy, contrast,
homogeneity and energy. These functions indicate
the difference between the pixel values of the OT
model and the pixel values inside the active contour.
Szpak and Tapamo [7] applies the active contour
method directly on a probability map that estimates Fig 6. occlusion example [18].
the probability of each pixel to be a background
pixel. The Chan-Vese energy function was chosen.
This function measures the sum of the probability 7 Conclusion
variances of the pixels inside and outside the curve. This paper presented the state of the art methods of
video detection and tracking of marine vehicles. The
maritime environment is very challenging and
6.7 Occlusion Handling dynamic. The algorithms of object detection and
Partial and total occlusions may occur. The tracking, when applied to a maritime environment
occlusion can cause a tracking failure. Figure 6 without proper adjustments, do not produce efficient
shows an occlusion case. Teutsch and Kruger [18] results. Many errors of detection and tracking may
proposed a tracker that combines 3 different trackers occur due to noise, clutter, waves, dynamic and
to increase the robustness to partial occlusions. unpredictable ocean appearance, sunlight
When the response of one or two trackers is reflections, bad environmental conditions, low
unreliable, P(OT(t)) obtained by them receives a luminosity and image contrast, presence of objects
lower weight. T1 and T2 trackers are based on pixel that float over the ocean, white foam, the great
regions and T3 is based on feature points extracted variability of certain maritime vehicles features such
by the algorithm proposed by Shi and Tomasi [53]. as size, maneuverability, appearance, geometric
T1 tracker performs segmentation by adaptive shape and the presence of birds, clouds, fog and
thresholding at each frame I(t) and defines P(OT(t)) aircraft that arises immediately above the horizon.
by the nearest neighbor rule applied to the centroids
of connected pixel regions present at I(t) and I(t-1). Video maritime surveillance systems are very
T2 performs the association between blobs extracted important. They can be used to increase the coastal
at I(t) and I(t-1). T3 performs the association and ship security against hostile vessel attacks, to
between feature points extracted from the ROI and avoid collisions, to control the maritime traffic at
the OT feature points and defines P(OT(t)) as the ports and channels and for oil platforms defense.
average position of each associated feature point.
Teutsch and Kruger [18] associate an independent There are not many researches about video detection
KF for each OT and only update their models when and tracking of marine vehicles. The algorithms
the OT is not occluded. A total occlusion occurs seem not to perform well in some real situations
when none of the trackers determines P(OT(t)) with when little vessels that have low contrast with the
high confidence. In this case, the KF continues background arise in the camera field of view. The
estimating P(OT(t)). If the OT is not detected at N video maritime surveillance is still a not complete
consecutive frames with high confidence, the solved problem and need to be more explored.
reference to the OT is erased.
Sea with Rapidly Moving Buoy-Mounted Camera [15] N. Pires, J. Guinet and E. Dusch, ASV: An
System, Ocean Engineering, Vol.54, 2012, pp. 1-12. Innovative Automatic System for Maritime
[3] S. Kasemi, S. Abghari, N. Lavesson, H. Johnson Surveillance, NAVIGATION, Vol.58, No.232, 2010,
and P. Ryman, Expert Systems with Applications, pp. 47-66.
Expert Systems with Applications, Vol.40, 2013, pp. [16] W.-C. Hu, C.-Y. Yang and D.-Y. Huang, Robust
5719-5729. Real-Time Ship Detection and Tracking for Visual
[4] D. Frost and J.-R. Tapamo, Detection and Surveillance of Cage Aquaculture, Journal of Visual
Tracking of Moving Objects in a Maritime Communication & Image Representation, Vol.22,
Environment with Level-set with Shape Priors, 2011, pp. 543-556.
EURASIP Journal on Image and Video Processing, [17] W. Kruger and Z. Orlov, Robust Layer-Based
Vol.1, No. 42, 2013, pp. 1-16. Boat Detection and Multi-Target-Tracking in
[5] K. M. Grupta, D. W. Aha, R. Hartley, and P. G. Maritime Environments, Proceedings of Waterside
Moore, Adaptive Maritime Video Surveillance, Security Conference, 2010.
Proceedings of SPIE, Vol.7346, No.09, 2009, pp. [18] M. Teutsch and W. Kruger, Classification of
1-12. Small Boats in Infrared Images for Maritime
[6] H. Wei, H. Nyguien, P. Ramu, C. Raju, X. Liu Surveillance, Proceedings of Waterside Security
and J. Yadegar, Automated Intelligence Video Conference, 2010.
Surveillance System for Ships, Proceedings of SPIE [19] D. Bloisi, L. Locchi, M. Fiorini and G.
Vol.7306, No.1N, 2009, pp. 1-12. Grasiano, Automatic Maritime Surveillance with
[7] Z. L. Szpak and J. R. Tapamo, Maritime Visual Target Detection, CiteSeerX Scientific
Surveillance: Tracking Ships Inside a Dynamic Literature Digital Library and Search Engine, 2011.
Background Using a Fast Level-Set, Expert System [20] A. K. Bacho, F. Roux and F. Nicolls, An
with Applications, Vol.38, 2011, pp. 6669-6680. Optical Tracker for The Maritime Environment,
[8] P. A. Feineigle, D. D. Morris and F. D. Snyder, Proceedings of SPIE, Signal Processing, Sensor
Ship Recognition Using Optical Imagery for Harbor, Fusion, and Target Recognition XX, Vol.8050, 2011.
Proceedings of Association for Unmanned Vehicle [21] F. Fusier, V. Valentin, F. Brémond, M. Thonnat,
Systems International, 2007. M. Borg, D. Thride and J. Ferryman, Video
[9] F. Robert-Inácio, A. Raybaud and É. Clément, Understandind for Complex Activity Recognition,
Multispectral Target Detection and Tracking for Machine Vision and Applications, Vol.18, 2007, pp.
Seaport Video Surveillance, Proceedings of Image 167-188.
and Vision Computting, 2007, pp. 169-174. [22] Z. Kalal, J. Matas and K. Mikolajczyk, P-N
[10] S. Fefilatyev, Detection of Marine Vehicles in Learning: Bootstraping Binary Classifiers by
Images and Videos of Open Sea, Master thesis Structural Constraints, IEEE Conference on
(Computational Science and Engineering) – South Computer Vision and Pattern Recognition, 2010, pp.
Florida University, United States, Florida, Tampa, 49-56.
2008. [23] S. Todorovic, Statistical Modeling and
[11] H. Liu, O. Javed, G. Taylor, X. Cao and N. Segmentation of Sky-Ground Images, Master thesis,
Haering, Omni-Directional Surveillance for Florida University, 2002.
Unmaned Water Vehicle, 8th International [24] S. M. Ettinger, N. C. Nechyba, P. G. Ifju, and
Workshop on Visual Surveillance, 2008. M. Waszac, Vision-guided Flight Stability and
[12] M. M. Islam, M. N. Islam, K. V. Asari, and M. Control for Micro Air Vehicles, Advanced Robotics,
A. Karim, Anomaly Based Vessel Detection in Vol.17, No.7, 2003, pp. 617-640.
Visible and Infrared Images, Proceedings of [25] S. Fefilatyev, V. Smarodzinava, L. O. Hall and
SPIE-IS & Eletronic Imaging, Vol.7251, No.0B, D. B. Goldgof, Horizon Detection Using Machine
2009, pp. 1-6. Learning Techniques, 5th International Conference
[13] A. Burkle and B. Essendorfer, Maritime on Machine Learning and Applications, 2006, pp.
Surveillance with Integrated Systems, Proceedings 17-21.
of Waterside Security Conference, 2010. [26] T. Cornall and G. Egan, Calculate Attitude
[14] S. Fefilatyev, D. Goldgof and C. Lembke, from Horizon Vision, 11Th Australian Aerospatial
Tracking Ships from Fast Moving Camera Throught Congress, 2005.
Image Registration, Proceedings of the 20th [27] T. G. Mcgee, R. SenGrupta and K. Hedrick,
International Conference on Pattern Recognition, Obstacle Detection for Small Autonomous Airckraft
2010, pp. 3500-3503. Using Sky Segmentation, Proceedings of the 2005
IEEE International Conference on Robotics and [40] Comaniciu, D., Romesh, V. and Meer, P.,
Automation, 2005, pp. 4679-4684. Kernel-based Object Tracking, IEEE Transactions
[28] D. Dusha, W. Poles and R. Walker, Fixed-Wing on Pattern Analysis and Machine Intelligence,
Attitude Estimation Using Computer Vision Based Vol.25, No.5, 2003, pp. 564-577.
Horizon Detection, Proceedings of Australian [41] C. Bibby and I. D. Reid, Visual Tracking at
International Aerospace Congress, 2007, pp. 1-19. Sea, In: ICPA, 2005, pp. 1841-1846.
[29] J. Sanderson, M. Teal and T. Ellis, [42] R. T. Collins and Y. Liu, On-line Selection of
Characterization of a Complex Maritime Scene Discriminative Tracking Features, IEEE
Using Fourier Space Analysis to Identify Small Transactions on Pattern Analysis and Machine
Craft, 7th International Conference on Image Intelligence, Vol.27, No.10, 2005, pp. 1631-1643.
Processing and it's Applications Vol.2, 1999, pp. [43] H. Schweitzer, R. Deng and R. F. Anderson, A
803-807. Dual-Bound Algorithm for Very Fast and Exact
[30] A. A. Smith and M. Teal, Identification and Template Matching, IEEE Transactions on Pattern
Tracking of Maritime Objects in Near Analysis and Machine Intelligence, Vol.33, No.3,
Infrared-Image Sequences for Collision Avoidance, 2011.
7th International Conference on Image Processing [44] E. Rublee, V. Rabaud, K. Konolige and G.
and It's Applications, Vol.1, 1999, pp. 250-254. Bradski, ORB:an Efficient Alternative to SIFT or
[31] M. D. R. Sullivan and M. Shah, Visual SURF, IEEE International Conference on Computer
Surveiilance in Maritime Port Facilities, Vision, 2010.
Proceedings of SPIE Vol.6978, No.11, 2008, pp. [45] D. Sinha, and G. Sanyal, Development of
1-8. Human Tracking System for Video Surveillance,
[32] P. O. Westall, P. Shea, J. J. Ford and S. Hrabar, Computer Science & Information Technology, Vol.3,
Improved Maritime Target Tracker Using Colour 2011, pp. 187-195.
Fusion, International Conference on High [46] A. I. Kravchonok, Region Growing Detection
Performance Computing & Simulation, 2009. of Moving Objects in Video Sequences Based on
[33] R. E. Kalman, A New Approach to Linear Optical Flow, Pattern Recognition and Image
Filtering and Prediction Problems, Transactions of Analysis, Vol.22, No.1, 2012, pp. 224-255.
the ASME – Journal of Basic Engineering, Vol.82, [47] S. X. Li, H.-C. Chang and C. F. Zhu, Adaptive
1960, pp. 35-45. Pyramid Mean Shift for Global Real-Time Visual
[34] J. Li, Y. Wang and Y. Wang, Visual Tracking Tracking, Image and Vision Computting, Vol.28,
and Learning Using Speeded Up Robust Features, 2010, pp. 424-437.
Pattern Recognition Letters, Vol. 33, 2012, pp. [48] R. D. S. Moreira and N. F. F. Ebecken, Parallel
2094-2101. WiSARD Object Tracker: a RAM-Based Tracking
[35] J. Barron, P. Fleet and S. Beauchemmin, System, Computer Science & Engineering: An
Performance of Optical Flow Thechniques, International Journal, Vol.4, No.1, 2014.
International Journal of Computer Vision, Vol.12, [49] J. Ning, L. Zhang, D. Zhang and W. Yu, Joint
No.1, 1994, pp. 42-77. Registration and Active Contour Segmentation for
[36] K. Fukunaga, and L. Hostetler, The Estimation Object Tracking, IEEE Transactions on Circuits
of the Gradient of a Density Function, with Systems for Video Technology, Vol.23, No.9, 2013.
Applications in Pattern Recognition, IEEE [50] J. Puzicha, Y. Rubner, C. Tomasi, and J.
Transactions on Information Theory, Vol.21, No.1, Buhmann, Empirical Evaluation on Dissimilarity
1975, pp. 32-40. Measures for Color and Texture, Proceedings of the
[37] Y. Cheng, Mean Shift, Mode Seeking and Seventh International Conference on Computer
Clustering, IEEE Transactions on Pattern Analysis Vision, 1999, pp. 1165-1173.
ans Machine Intelligence, Vol.17, No.8, 1995, pp. [51] T. Ojala, M. Pietikainen and T. Maenpaa,
790-799. Multiresolution Grey-Scale and Rotation Invariant
[38] D. Comaniciu and P. Meer, Mean Shift: A Texture Classification with Local Binary Patterns,
Robust Approach Toward Feature Space Analysis, IEEE Transactions on Pattern Analysis and
IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.24, 2002, pp. 971-987.
Machine Intelligence, Vol. 24, 2002, pp. 603-619. [52] R. Goldenberg, R. Kinmel, E. Rivlin and M.
[39] G. R. Bradski, Computer Vision Face Tracking Rudzsky, Fast Geodesic Active Contours, IEEE
for Use in a Perceptual User Interface, Intel Transactions on Image Processing, Vol.10, No.10,
Technology Journal Q2, 1998. 2001.
[53] J. Shi and C. Tomasi, Good Features to Track, Pattern Recognition, 1994.
9th IEEE Conference on Computer Vision and