Detection and Monitoring of Passengers On A Bus by Video Surveillance

new

Uploaded by

Pankaj Pareek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views6 pages

Detection and Monitoring of Passengers On A Bus by Video Surveillance

new

Uploaded by

Pankaj Pareek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Detection and Monitoring of Passengers on a Bus by Video Surveillance

Boon Chong Chee, Mihai Lazarescu and Tele Tan

Curtin University of Technology, Western Australia
[email protected]
{m.lazarescu, t.tan}@curtin.edu.au

Abstract motion gestures. These are exploitable cues that can be de-
tected to raise an awareness to the situation. Unfortunately
This paper presents a method to detect passengers on- such psychologically motivated indications are not the fo-
board public transport vehicles. The method comprises first cus of public transport-related surveillance studies.
an elliptical head detection algorithm using the curvature There are major problems in the operation of video
profile of the human head as a cue. This is followed by surveillance systems on buses. Due to limited concentra-
applying the geometric blur features which are consistent to tion and awareness abilities, monitoring multiple long run-
affine distortion of the image to keep track of the movement ning video sequences by human operators is often expen-
of the head within the vehicle. The profile of the moving sive, tedious, error prone and unproductive. Furthermore,
heads with respect to each other within a length of time can the video acquisitions are not processed until the buses have
then be used as indicative features to detect the advent of returned to the bus depot. As a result, no immediate or pre-
suspicious behaviour of the passengers. cautionary actions can be taken immediately after the events
of vandalism or abnormal human behaviour have occurred.
In the face of such challenges, the innovative use of au-
1. Introduction tomated and intelligent agents are advantageous in public
transport surveillance technology.
In this paper, we present an implementation of a video-
Vandalism on public transports is a perennial problem to
based surveillance system to detect passenger movements
transit authorities. Most public transport buses in countries
on-board based on the psychological patterns of the pas-
such as UK, Canada and Australia have CCTVs installed
sengers.
on-board. Repairing vandalised properties and removing
This paper is organised as follows: In Section 2, we
graffiti is costly. Measures to impede such unnecessary ex-
present a preliminary introduction to related contributions
penditure is imperative.
in human detection and tracking. This is followed by an ex-
In response to increased vandalism on public transport
planation on the adopted method of approach in Section 3.
systems especially on buses and trains, a great deal of
The system evaluation and experimental results is discussed
money and efforts are being invested to heighten security
in Section 4. Subsequently, the project conclusion and pos-
in these areas. This can be realised by using strategically
sible future developments will be presented in Section 5.
installed close circuit television (CCTV) cameras to mon-
itor and track commuters’ activities and interactions from
the time of boarding to departure. While such technology 2. Background
is not new, the increased need and urgency for crime fight-
ing measures has undoubtedly emphasised the importance Specific to a bus scenario, stereotypical activities that can
of such cameras in public transports. occur are posture transitions such as from sitting to stand-
In such perspective, the merits of video surveillance sys- ing, and vice versa. These include seat switching in par-
tems on public transports include its use for (1) vandalism ticular. Several situational and environmental constraints
deterrence and (2) as evidential record for vandalism [6]. are involved in implementing a bus surveillance system to
Vandalism is usually conducted under situations when monitor passenger activities. Firstly, (1) video surveillance
opportunities present itself. Although performed discretely, is operated on a constantly moving platform as opposed
there are several tell-tale behavioural signs prior to the act to typical surveillances on static grounds, contributing to
of vandalism. Generally, passengers tend to participate in the (2) nondeterministic lighting and shadow pattern. (3)
active movements such as switching of seats and large body Passenger movements are usually short and restricted. (4)

14th International Conference on Image Analysis and Processing (ICIAP 2007)

0-7695-2877-5/07 $25.00 © 2007
Passengers are often occluded behind an overcrowded bus From a higher level surveillance viewpoint, motion de-
and on-board furnitures. These commonalities bring forth tection techniques such as optical flow and image differ-
the open research problems of background inconsistency, encing [8] are used in event detection and recognition. To
non-trivial object occlusion, drastic lighting and shadow is- discriminate between abnormal and normal events, classi-
sues which generally cannot be resolved by present meth- fiers that can be trained with machine learning algorithms
ods. Considering these problems, the following paragraphs like neural networks and Support Vector Machines (SVM)
present image processing methods that have been contem- are used [15].
plated for the project objectives. Despite the growing demand and clear benefits of auto-
mated video surveillance on public transports, inadequate
2.1. Head detection work is performed with scenarios on-board bus in partic-
ular. Broadly, attempts of video surveillance on buses are
limited to boarding passengers when buses are stationary
From observation of a typical bus footage, passengers are
at bus stops and lack ‘in-journey’ surveillance [1]. Other
commonly occluded by seats and other passengers. While
transport related surveillance tasks includes estimating geo-
occlusion handling is the highlight of recent publications,
metrical positioning of passengers’ head [10] and monitor-
human body detections is difficult and not viable in such
ing of crowd and individuals in public transportation areas
circumstances especially when passengers’ movements are
[4, 15]. New methods to address the specific issues faced
both short and limited. This has motivated the project to
here must be developed.
focus on head detection techniques instead.
Methods such as template matching based on image cor-
relation, active shape models and snakes are available. Tem-
3. Methodology
plate matching exhaustively scans for an object give. Un-
like template matching, active shape models [7, 13] gener-
ates a parametric model of a shape based on the principle The proposed method of approach is illustrated in Fig-
components of the average shape of an object. Matching ure 1. For a video sequence of f frames, each raw image
is then based on estimating legal parameters constrained by frame at time T ∈ {0, . . . , f } is extracted for preprocessing
the model. This allows a more robust shape match. On the following a head detection process that highlights head can-
other hand, snakes perform localization by forming a con- didate regions. Subsequently, geometric blur descriptors [2]
tour around the edges of the object based on ‘energy’ mod- are obtained for each head candidate regions. These affine
els that controls smoothness, elasticity and external sensi- distortion tolerant descriptors are the key features for mea-
tivities. Elliptical matching is another popular approach in suring and associating correspondence between head candi-
head detection. The uncanny similarity of a human head date regions that appear in other frames. Overtime, the dy-
contour to an ellipse has inspired several works [3, 11]. To namic evolution of the passengers’ motion trajectories can
achieve better robustness and accuracy, several techniques be described. The details of the modules are presented in
have included the study of color models characterising skin the following paragraphs.
colors of diverse ethnicities as part of the detection process
[3, 11].

2.2. Feature tracking

Human tracking and motion modelling is perhaps the

critical task in a visual surveillance system. Algorithms
such as condensation, particle filter and Kalman filter can
be used to both track and predict the human motion based
on predefined prediction and sampling models. These algo-
rithms can be seen in the works of [9, 5, 12]. Stable track-
ing features such as scale invariant feature transform (SIFT)
[14] have been introduced. SIFT generates descriptors from
oriented filter responses within a window patch. The resul-
tant descriptor is designed to be insensitive to scale, orienta- Figure 1. An overview of the proposed bus
tion and affine transformations. Several other tracking fea- surveillance system.
tures and their variants include Fourier descriptors, shape
signatures and robust edge features.

14th International Conference on Image Analysis and Processing (ICIAP 2007)

0-7695-2877-5/07 $25.00 © 2007
3.1. Background segmentation 3.2. Elliptical head detection

As mentioned earlier, full body detection techniques for

Background segmentation is performed on an empty bus the purpose of tracking is not practical in a bus scenario. For
scene to demarcate an area of interest for head detection example, only heads and shoulders are well within the cam-
in a reduced local search space. We assume that the loca- era’s field of view when passengers are seated. A human
tion of a detected head on the seat and the pathway to be head retains an elliptical shape contour under a variety of
highly improbable. Hence, a region of seat and the pathway orientations offering itself a suitably good feature for ellip-
is segmented, leaving the rest of the background as a region tical matching. The head detection technique that is imple-
of interest by using a standard Expectation-Maximization mented in the bus surveillance system is a variant to that in
(EM) algorithm. [3] using an ellipse as a two-dimensional matching model.
An average sequence of background images is convolved Algorithms based solely on grey-level pixel intensi-
with a Gaussian smoothing kernel to produce a normalized ties are not robust enough against illumination variations.
background image. The EM algorithm is then performed Hence, an edge detector is applied on an input image to ob-
on the intensity histogram of the normalized background tain object boundaries used for head detection. In [3], the
image. Uniformly textured regions suggest the global grey- measure for the goodness of a head match, based on cosine
level distribution of the normalized background image to law, takes both intensity gradient orientation and magnitude
adhere a multivariate Gaussian mixture model (GMM), suit- into consideration:
able for the application of EM to estimate the hidden model
N
parameters. For the purpose of this segmentation, the EM 1
algorithm is constrained to converge on a bimodal distribu- φ= |g · n̂i | (1)
N i=1 i
tion density, with each Gaussian mixture component rep-
resenting a region’s intensity distribution. The converged where the score, φ, is calculated from the average of N ab-
GMM parameters by the E-M stepping are used to classify solute dot products of the unnormalized gradient orientation
each pixel on the background image as either ‘interest’ re- at pixel i , gi , and its corresponding unit normal vector of the
gion or ‘seat and pathway’ region. Figure 2a) shows the in- matching ellipse, n̂i . N is the total number of edge pixels
tensity mesh plot of the normalized background image and along the perimeter of a matching ellipse. The best match
Figure 2b) shows the resulting segmented image. for an input image is found by iterating through a search
window under various ellipse sizes for a maximally valued
score.
Modifications to the head detection method are neces-
sary for its application in the bus surveillance system. Mul-
tiple smaller sized heads are required to be detected in a
cluttered environment as opposed to detecting a single head
as in [3]. Furthermore, full elliptical matching is more diffi-
cult where boundaries of a human head contour that appears
in an edge image are generally shorter and drastically dis-
continuous, due to partial occlusion and poor edge detection
results.
Consequently, an initial simple correlation template
matching is performed for locating potential heads using a
template of a typical size of a passenger’s head. For each
matched region, a vertically rotated duplicate is appended
under to fabricate a pseudo-artificial ellipse. A least-squares
ellipse fitting algorithm is subsequently applied on the fab-
ricated ellipse. This approach allows a robust head fitting
since no constraints on the elliptical parameters is adminis-
tered during fitting. Finally, only the fitted ellipse contour
Figure 2. (a) 3D mesh plot of normalized residing in the original portion of the fabricate is retained.
background image. (b) Result of background With the elliptical arc, a top-hemispherical head detection
segmentation using EM. can be performed using Equation 1. In addition, fitted el-
liptical arcs with neighbouring local maxima are merged
assuming adjacent arcs are associated to the same head.

14th International Conference on Image Analysis and Processing (ICIAP 2007)

response. Subsequently, each channel is geometrically

blurred around a point of interest. Unlike [2], geometric
blur feature matrix follows a restricted sub-sampling pattern
shown in Figure 4d). The set of sampling points, Sx0 , justi-
Figure 3. Procedural steps of the head detec- fies an interest area under the local maxima of a contour and
tion module. disregards the rest which possibly contain irrelevant back-
ground features. Furthermore, the matching is performed
sequentially over each of three color channels instead of sin-
The head detection process is illustrated in Figure 3 and gle grey-level channel. The best matching corresponding
the sample detection results in Figure 8. Spurious detec- point is derived from the highest average score over three
tions maybe ignored using basic shape descriptors such as channels (refer to Algorithm 1). Following [2], the geomet-
elliptical roundness, aspect ratio, area and perimeter. ric blur descriptor around interest point x0 of image I can
be defined as:

3.3. Geometric blur features
GIx0 (x ) = I ∗ Bα|x0 −x |+β (2)
x
[2] describes geometric blur feature as a discriminative where x ∈ Sx0 and Bα|x0 −x |+β is a symmetric Gaussian
descriptor that averages geometrical transformations of a kernel with α and β smoothing parameters.
signal in a spatial domain. It is suitable for matching sig-
nals that have an affine relationship. The geometric blur fea- Algorithm 1: Color template matching
ture is formed on a sparse signal (e.g. image’s first deriva- input : template image, target image, x0
tive) by means of spatially varying Gaussian kernel convo- output: point correspondent
lutions. The non-uniform kernel dimension relatively in- foreach color channel c ∈ {H, S, V} do
creases along with the sampling euclidean distances from a templ ← c component of template image
point of interest. Geometric blur features can be constructed target ← c component of target image
on images with impoverished interest operators and hence, initialise zero matrix Rc
more applicable and flexible over feature descriptors such compute Gtempl x0
as SIFT [14].
foreach sampled edge pixels at (i , j ) in target do
After a head detection process, geometric blur features compute Gtarget i,j
are used for establishing point correspondences for head Rci,j ← correlation( Gtempl x0 , Gtarget i,j )
candidate within a search window in other frame. Given
end
a search window, a template geometric blur feature is first
end
constructed around the local maxima of a head contour.
return arg maxi,j (RHi,j + RSi,j + RVi,j )
This template feature is matched with other geometric fea-
tures that are constructed around each edge pixel within a
target frame. The confidence of each correspondence is de-
termined by a match between two geometric blur features 3.4. Motion detection
using the L2 normalized correlation technique.
In constructing a geometric blur feature, an image win- When the correspondent of a point is located, a track pro-
dow around a point of interest is convolved with a verti- file of a passenger’s motion trajectory can be established
cal, horizontal and cross oriented operators each produc- from the point to the nearest euclidean distanced head can-
ing an oriented edge filter response. A total of six sparsely didate from the correspondent. This is illustrated in Figure
signalled half-wave rectified channels (see Figure 4b)) are 5. Overtime, the entire motion trajectory of each passen-
obtained by difference of Gaussian on individual oriented ger may be observed. As part of the surveillance system,

14th International Conference on Image Analysis and Processing (ICIAP 2007)

0-7695-2877-5/07 $25.00 © 2007
both trajectory lengths and displacements are monitored for procedure. The surveillance system was tested using the
activity detection. pioneer implementation of geometric blur feature computa-
tion using single grey-level channel and original sampling
pattern. This will be compared against the proposed sys-
tem of employing color template matching process using
restricted sampling pattern. Finally, sample tracking results
on Figure 7 attempts to demonstrate the ability of the pro-
posed surveillance system to detect motion in a bus. The
empirical results of the experiments are tabulated in Table
1. System with ellipse matching acceptance at 1.5 aspect ra-
tio using HSV color template matching and restricted sam-
pling pattern is assumed in the experiments unless other-
wise specified.
Figure 5. Snapshot of a tracking process

The bus surveillance system is designed to actively track

motion while the bus is moving based on the displacement
of an object. An object that has moved over a considerable
distance is marked with motion streak as shown in Figure
7. The motivation of such a system is based on the psycho-
logical behaviour of passengers in a bus journey. During
Figure 6. Snapshots of video enactments
periodical stops, the system is set to a dormant mode while
passengers are free to board and leave the bus. While a bus From the results in Table 1, the system shows an over-
is travelling, passengers are typically well seated or stand- all reasonable head detection accuracy. However, Video 2
ing still. During this period, detecting large motion would has lower detection accuracy than other tests. This is appar-
indicate abnormality. ently due the weak edge features of passenger 2A and 2E re-
sulting in lower detection frequencies. Furthermore, move-
4. Experiments and Results ments nearer the camera tend to exhibit motion blur trails
making edge detection difficult. Figure 8 shows snapshots
To demonstrate the robustness and accuracy of the bus of the head detection process. Although there are falsely de-
surveillance system, the implementation was tested over a tected heads, their locational persistences do not contribute
variety of bus scenarios and alternative system setup. Three to the eventual motion detection result. A comparison of
scenario enactments were used in the experiments for test- the tracking methods has shown that color template match-
ing each with display rates of 25 fps. Video 1 (790 frames) ing with HSV model is predominantly better than its coun-
shows a cluttered scenario of six passengers featuring pas- terparts even though it is restricted by sampling pattern.
senger 1A switching seat locations. Video 2 (865 frames) Regardless of tracking methods, the system demonstrated
shows another crowd of five passengers without major ac- good motion detection capabilities justified by fairly high
tivity. Video 3 (835 frames) shows three passengers display- counts of ‘correct’ tracks with occasional falsely detected
ing abnormal behaviour (refer to Figure 6). The test video motions. These false motions are often caused by moving
ground truths indicating true positions of passenger heads shoulders and face regions coincidentally having the same
and appearance times were manually extracted. It was com- curvatures of a head as shown in Figure 7. Most motions
pared with the experimental results to measure the system’s were detected during the tests. Even though the motion
detection performance. The experiments were measured tracks are disconnected, the system detected significant mo-
against ‘correct’ and ‘incorrect’ evaluation metrics. Each tions enough to raise an awareness for Video 1 and Video 3,
correctly tracked object was awarded with a ‘correct’ met- demonstrating its secondary capability as an alarm operator.
ric point for each correct track location. Upon occurrence of
an incorrect track, the penalty was single one-off increment 5. Discussion and future work
of the ‘incorrect’ metric until the object resumed its correct
track. This project laid the basic foundation for a promising
An accuracy test on head detections for each video se- video surveillance system that can be operated within a bus.
quences was conducted based on average percentage of pas- As a future extension, this can provide support for activity
senger heads correctly detected. Subsequently, true positive recognition that can compliment the current system in tar-
detections were brought forward to evaluate the tracking geting the art of vandalism. Furthermore, the functionality

14th International Conference on Image Analysis and Processing (ICIAP 2007)

0-7695-2877-5/07 $25.00 © 2007
References
Table 1. Experimental results
Let K be the total number of passengers in a video [1] F. Bartolini, V. Cappellini, and A. Mecocci. Counting peo-
Results ple getting in and out of a bus by real-time image-sequence
processing. Image and vision computing, 12(1):36–41, Jan
Video 1 Video 2 Video 3
K = 6 K = 5 K = 3 1994.
[2] A. C. Berg and J. Malik. Geometric blur in template match-
Head detection acc. ing. In Proceedings of the 2001 IEEE Computer Society
88% 68% 86% Conference on Computer Vision and Pattern Recognition,
volume 1, pages 607–614, Kauai, Hawaii, Dec 2001.
[3] S. Birchfield. Elliptical head tracking using intensity gradi-
Tracking methods* ents and color histograms. In IEEE Conference on Computer
Original a 91% 94% 90% Vision and Pattern Recognition, Santa Barbara, California,
Proposed-HSV b 92% 95% 91% Jun 1998.
Proposed-RGB c 91% 93% 90% [4] N. D. Bird, O. Masoud, N. P. Papanikolopoulos, and
K correcti ×100
A. Isaacs. Detection of loitering individuals in public trans-
* as i=1 correcti +incorrecti
%, where i ∈ {1, . . . , K} portation areas. IEEE Transactions on Intelligent Trans-
a Grey-level processing using original sampling pattern portation Systems, 6(2):167–177, Jun 2005.
b HSV color processing using restricted sampling pattern [5] R. Bodor, B. Jackson, and N. Papanikolopoulos. Vision-
c RGB color processing using restricted sampling pattern based human tracking and activity recognition. In Proceed-
ings of the 11th Mediterranean Conferences on Control and
Automation, Rhodes, Greece, Jun 2003.
of motion detection can be extended to a full tracking sys- [6] N. Brew. An overview of the effectiveness of closed circuit
tem deployable on other suitable public transports such as television (cctv) surveillance. Research Note, 14, Oct 2005.
trains. In the view of unstable edge features under adverse [7] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. Ac-
lighting conditions, it is also our motivation to explore other tive shape model - their training and application. Computer
possible tracking techniques and features such as the KLT Vision and Image Understanding, 61(1):38–59, Jan 1995.
feature tracker [16] that are suitable for our bus scenario. [8] J. W. Davis. Hierarchical motion history images for rec-
ognizing human motion. In IEEE Workshop on Detection
and Recognition of Events in Video, pages 39–46, Vancou-
ver, Canada, Jul 2001.
[9] S. L. Dockstader and A. M. Tekalp. Multiple camera track-
ing of interacting and occluded human motion. Proceedings
of the IEEE, 89(10):1441–1455, Oct 2001.
[10] P. Faber. Image-based passenger detection and localization
inside vehicles. In Proceedings of the 19th International
Society for Photogrammetry and Remote Sensing Congress,
pages 230–238, Amsterdam, Jul 2000.
[11] J. Garcia, N. D. V. Lobo, M. Shah, and J. Feinstein. Auto-
matic detection of heads in colored images. In Proceedings
in the 2nd Canadian Conference on Computer and Robot
Vision, pages 276–281, May 2005.
[12] M. Isard and A. Blake. Condensation - conditional den-
sity propagation for visual tracking. International Journal
of Computer Vision, 29(1):5–28, 1998.
[13] S. Klim, S. Mortensen, B. Bodvarsson, L. Hyldstrup, and
Figure 7. Motion detection results H. H. Thodberg. More active shape model. In Image and
Vision Computing, pages 396–401, New Zealand, Nov 2003.
[14] D. Lowe. Distinctive image features from scale-invariant
keypoints. International Journal of Computer Vision,
20:91–110, 2003.
[15] C. Sacchi, C. Regazzoni, and G. Vernazza. A neural
network-based image processing system for detection of
vandal acts in unmanned railway environments. In Proceed-
ings of the 11th International Conference on Image Analysis
and Processing, pages 529–534, Palermo, Italy, Sep 2001.
[16] C. Tomasi and T. Kanade. Detection and tracking of point
Figure 8. Snapshots of head detection results features. Technical Report CMU-CS-91-132, Carnegie Mel-
lon University, Apr 1991.