0% found this document useful (0 votes)
43 views

6416 978-1-5386-7150-4/18/$31.00 ©2018 Ieee Igarss 2018

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

6416 978-1-5386-7150-4/18/$31.00 ©2018 Ieee Igarss 2018

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

VIRTUALOT - A FRAMEWORK ENABLING REAL-TIME COORDINATE

TRANSFORMATION & OCCLUSION SENSITIVE TRACKING USING UAS PRODUCTS,


DEEP LEARNING OBJECT DETECTION & TRADITIONAL OBJECT TRACKING
TECHNIQUES

Bradley J. Koskowich, Maryam Rahnemoonfar, Michael Starek

Texas A&M University - Corpus Christi


College of Science and Engineering
6300 Ocean Drive, Corpus Christi, TX, USA 78412

ABSTRACT build upon the concepts of Bozzini et al’s monoplotting work


[1] used for visualizing land cover changes from perspective
In this work we explore a combination of methods that imagery as GIS polygon geometry. We extend that inter-
allow us to analyze and study hyper-local environmental phe- face by replacing the perspective imagery with video. Deep-
nomena. Developing a unique application of monoplotting learning object detection applied to said video is combined
enables visualization of the results of deep-learning object de- with an evaluation of multiple traditional tracking methods,
tection and traditional object tracking processes applied to a whose outputs are recorded as points and lines. The trans-
perspective view of a parking lot on aerial imagery in real- formed outputs can be used to visualize movement behavior
time. Additionally, we propose a general algorithm to extract on aerial imagery in real-time. This can provide first respon-
some scene understanding by inverting the monoplotting pro- ders and resource management personnel a simulated birds-
cess and applying it to digital elevation models. This allows eye view of day-to-day operations or disasters as they unfold,
us to derive estimations of perspective image areas causing increasing their situational awareness.
object occlusions. Connecting the real world and perspective
Object occlusion remains a barrier for our application. To
spaces, we can create a resilient object tracking environment
that end, we compose a novel algorithm using monoplotting
using both coordinate spaces to adapt tracking methods when
inputs to estimate occlusions explicitly in accompanying per-
objects encounter occlusions. We submit that this novel com-
spective video. This allows us the option to extend our detec-
posite of techniques opens avenues for more intelligent, ro-
tion and tracking platform by defining additional behaviors to
bust object tracking and detailed environment analysis using
follow when an object encounters an occluded region.
GIS in complex spatial domains provided video footage and
Our ongoing tests occur at Texas A&M University Cor-
UAS products.
pus Christi (TAMUCC), USA, in a long-term effort to under-
Index Terms— photogrammetry, homography, computer stand how the layout of road travel directions, temporary bar-
vision, object detection, object tracking riers, and crosswalks affect pedestrian and vehicular traffic.
We believe this initial work is a novel combination of remote
sensing, computer vision, and GIS principles which we can
1. INTRODUCTION
expand in the future to accomplish this ultimate goal.
Hyper-local environments are complex, spatially constrained
areas. A single floor of a building, a building itself, or a uni- 2. RELATED WORKS
versity campus could each be considered a hyper-local en-
vironment. Within these areas, any number of spatially rele- 2.1. Monoplotting
vant phenomena, such as building evacuations, can occur. Our
goal is to capture and visualize these phenomena on an aerial The foundation for this project is the concept of monoplot-
image for enhanced situational awareness. Typically, aware- ting, the less well-known cousin of stereo-photogrammetry.
ness of this kind is achieved using security cameras. How- This single-image photogrammetry method has been typi-
ever, raw video footage usually requires human interpretation cally used for plotting historical images onto current ortho-
to understand its content and any effects beyond its immediate metric maps or aerial orthophotos to visualize topographical
scope. changes over time, such as environmental phenomena like
To reduce video information complexity to a model-able the growth of a forest or terrain elevation changes [1, 2].
form requires us to combine several established methods. We Traditional monoplotting normally requires a recorded cam-

978-1-5386-7150-4/18/$31.00 ©2018 IEEE 6416 IGARSS 2018

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on November 08,2022 at 11:39:34 UTC from IEEE Xplore. Restrictions apply.
era POSE, at least one calibrated camera image (though two ing us to validate our results using a total station and avoiding
are preferable), an aerial orthophoto, and an accompanying some measurement bias.
digital elevation model (DEM)[1].
There are several well defined relationships that can be
leveraged to reconstruct complete or partial POSE parameters
[3] given a series of known points in a pair of images. Com-
puting an image homography also provides us the parameters
of relative camera POSE between aerial and perspective im-
ages. Homographies can be adjusted for imprecision error via
iterative least squares adjustment [1]. Incorporating the DEM
with the aerial orthophoto, we can also derive transformation
parameters from 3D-2D space, where the 2D perspective im-
age maps back into 3D world space coordinates [1].

2.2. Object Detection & Tracking

Often issues in object variation (lighting, scale, deformation, Fig. 1. Processing Methodology
etc.) preclude perfectly accurate object detection and track-
ing. Incorporating the YOLO framework [4] handles most
variable presentation aspects. YOLO’s flexibility on input
size and speed makes it a natural choice over the RCNN fam- 4.1. Required Inputs
ily, especially over an eclectic mix of smaller input images.
Subsequently, we explore the efficacy of several traditional Several inputs are necessary in combination with the perspec-
tracking algorithms: Track-Learn-Detect (TLD), Kernalized tive video footage, and the processing flow is visualized in
Correlation Filters (KCF), and Multiple Instance Learning Figure 1:
(MIL) [5, 6, 7] on the outputs of the YOLO network. • Keypoints: 400 point pairs used to compute image homog-
raphy collected by hand as our results applying Shi-
Tomasi corner detection [8] were surprisingly sparse.
3. DATASETS • Registrations: homography parameters are computed indi-
vidually from perspective to aerial and vice versa (in-
Perspective video was provided by the TAMUCC Univer- verted) using the keypoints.
sity Police Department. For this initial work we analyzed a • Working Area Geometry: used to check the accuracy of a
1280x720 video file at 20 frames per second from an AXIS registration and create a mask of the working area in
Q6044-E PTZ Dome Network Camera. The camera was held the perspective image containing physical occlusions.
mostly in a static POSE. Aerial imagery products of campus • Regions Of Interest (ROIs): areas around the perspective
provided by the Measurement Analytics Lab at TAMUCC view periphery where track-able objects are likely to
were generated from a fixed-wing UAV (Sensefly eBee) plat- pass entering or exiting the frame.
form with an RGB camera. The orthophoto and DEM were
dervived from a point cloud generated using Structure-from-
Motion over a masked area of the parking lot with a ground
sample distance of 2.79cm.
The applied YOLO network was trained on a subset of
PASCAL VOC 2007 & 2012 data, specifically on the classes
of people, cars, and motorbikes. This allows us to reduce
some model overhead for increased performance.

4. METHODS

We stretch the limits of prior monoplotting work by starting


with the most complex case of input sources available to us: a
perspective view, wall-mounted video camera with no known
world-space coordinates or calibration data. We were able to Fig. 2. Registration Overlay: The perspective image trans-
avoid using external measurements as inputs during the regis- formed onto the the aerial image.
tration of perspective imagery with aerial orthophoto, allow-

6417

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on November 08,2022 at 11:39:34 UTC from IEEE Xplore. Restrictions apply.
4.2. Image Transformations registration accuracy decreases, visible in Figure 2 as crooked
lines and disjoint connections. The highest recorded devia-
The process of deriving image transformations applies image
tion across all registration iterations was ∼3.5m at the farthest
homography and is iterated upon with several collections of
ends of the 118 × 93m parking lot from the camera origin,
keypoints, increasing in volume starting with 8 keypoints and
∼3% transformation error at worst. Notably, deviations are
doubling until we encompass all 400 points. The efficacy of
not linear. We theorize their cause is image imperfections due
each transformation is evaluated by measuring the difference
to the intentional lack of camera calibration and/or distortion
of known points in the image with where they should align on
caused by the weather dome.
the aerial orthophoto, an example of which is shown in Figure
2.

4.3. Occlusion Masking


To account for occlusions during object tracking, we estimate
occlusion locations in the perspective view by transforming
the DEM with an inverted image homography. This allows
us to overlay the elevation values present in the DEM onto
the perspective plane accurately. Thresholding elevation post-
transformation in Figure 3 provides us a lower limit of oc-
cluded areas. The upper limit can be determined by com-
puting the offset from the lower limit based on the elevation
value. In the widest bounding envelope including these limits,
a Hough Transform generates lines filtered by the best pair
matched by angle relative to vertical and proximity. Point
(a)
intersections of all boundary lines form polygons which are
clipped by the defined working area to eliminate extraneous
geometry, shown in Figure 4a.

(b)

Fig. 3. The transformation of the DEM into the perspective Fig. 4. Detection and Tracking Results with Occlusion Mask.
image plane. a): Perspective View of Detection and Tracking Algorithm
With Detected Occlusion Areas. b): Aerial View of Detection
and Tracking Algorithm
5. RESULTS

5.1. Registration Accuracy 5.2. Occlusion Extraction


We compute registration accuracy as the maximum value of In our test case, our occlusion extraction method could iso-
deviation between identifiable points using the standard dis- late 12/13 vertical occlusion areas present in the image, at a
tance equation: 92% detection rate with ∼88% accurate fill rate. That is, the
p polygons drawn contain ∼88% occlusion pixels and ∼12%
D = (x2 − x1 )2 + (y2 − y1 )2
non-occluding pixels. However, being the only case in which
Registration points nearest the origin are such that their de- this algorithm has been tested, we defer judgement regarding
viation is negligible. As distance from the camera increases, general efficacy.

6418

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on November 08,2022 at 11:39:34 UTC from IEEE Xplore. Restrictions apply.
5.3. Detection & Tracking Accuracy 7. REFERENCES
In over six hours of reviewed video, YOLO detected every ve- [1] C. Bozzini, M. Conedera, and P. Krebs, “A new mono-
hicle which passed through designated ROIs around the per- plotting tool to extract georeferenced vector data and
spective view periphery. An optimization was made to detec- orthorectified raster data from oblique non-metric pho-
tion operations by only calling them as movement-areas com- tographs,” International Journal of Heritage in the Dig-
puted in ROIs began to decrease. This corresponded to a ma- ital Era, vol. 1, no. 3, Swiss Federal Research Institute
jority of instances where vehicles would present themselves WSL, Insubric Ecosystem Research Group, CH-6500
best for detection. Tentatively, we would rate this applica- Bellinzona, Switzerland, 2012.
tion of YOLO as 99.99% accurate at object detection, whose
outputs were passed to the tracking algorithms we evaluated. [2] T. Produit and D. Tuia, “An open tool to
Table 1 outlines performance of existing tracking methods register landscape oblique images and and gen-
available on OpenCV and our own variation on Lucas-Kanade erate their synthetic model,” REMOTE SENS-
optical-flow. Our variation performs k-means clustering on ING & SPATIAL ANALYSIS, pp. 170–176,
the detected features between frames and drops points which 2012. [Online]. Available: http://2012.ogrs-
become stuck on similar features and exceed a distance limit. community.org/2012 papers/d3 2 produit abstract.pdf
We define tracking accuracy as lock persistence on a set of [3] D. A. Strausz Jr, “An application of pho-
14 vehicles traveling radically different paths until they exit togrammetric techniques to the measurement of
view. Global denotes tracking context in the entirety of the historic photographs,” 2001. [Online]. Available:
frame, while Patch denotes a moving window around tracked https://ir.library.oregonstate.edu/downloads/js956g515
objects as a subset of the frame.
[4] J. Redmon, A. Farhadi, U. of Washington, and
A. I. for AI, “Yolo9000: Better, faster, stronger,” in
Table 1. Tracking Algorithm Performance*: At the time of this Computer Vision and Pattern Recognition. University
writing the TLD implmentation in OpenCV was not stable under Patch track-
ing. Value is a conservative estimate from what data could be recorded.
of Washington; Allen Institute for AI, 2016. [Online].
Global & Patch Available: https://arxiv.org/pdf/1612.08242
Global & Patch
FPS with [5] Z. Kalal, K. Mikolajczyk, and J. Matas, “Tracking-
Tracker Tracking
Coord. learning-detection,” IEEE TRANSACTIONS ON PAT-
Accuracy
Transforms TERN ANALYSIS AND MACHINE INTELLIGENCE,
TLD 8, 16* 79%, 65%* vol. 6, no. 1, Jan. 2010.
KCF 40, 70 58%, 58%
[6] J. F. Henriques, R. Caseiro, P. Martins, and J. Batista,
MIL 6, 12 65%, 65%
“High-speed tracking with kernelized correlation fil-
Optical-flow 70, 60 85%, 85%
ters,” IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence, vol. 37, no. 3, pp. 583–596, mar
2015.
6. CONCLUSIONS & FUTURE WORKS [7] B. Babenko, M.-H. Yang, and S. Belongie, “Visual
tracking with online multiple instance learning,” in
We conclude that this is a valid approach for accomplishing
2009 IEEE Conference on Computer Vision and Pattern
our outlined goal of integrating video data with UAS prod-
Recognition. IEEE, jun 2009.
ucts based on the relatively low degree of registration error
over a comparatively large area. Theoretically it is possible [8] J. Shi and C. Tomasi, “Good features to track,” Com-
to near-fully automate the processing workflow, however we puter Vision and Pattern Recognition, 1994. [Online].
are curious to investigate adapting alternative methods, such Available: http://www.ai.mit.edu/courses/6.891/
as Boerner’s work on automatically computing camera POSE handouts/shi94good.pdf
[9] to fully automate image registration. Similarly, regions
[9] R. Boerner and M. Krhnert, “Brute force matching be-
of interest in perspective images combined with shapefiles of
tween camera shots and synthetic images from point
travel networks could isolate ROIs for vehicle detection algo-
clouds,” ISPRS - International Archives of the Pho-
rithmically. We also plan to test the system with continually
togrammetry, Remote Sensing and Spatial Information
reduced image quality, in order to determine the minimum re-
Sciences, vol. XLI-B5, pp. 771–777, jun 2016.
quirements where this system could operate with a negligible
degree of uncertainty. As an alternative to traditional track- [10] L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi,
ing mechanics, we also look to incorporate other in-progress and P. H. S. Torr, “Fully-convolutional siamese net-
work based on Bertinetto et al’s [10] study of generic object works for object tracking,” Computer Vision and Pattern
tracking using Siamese networks. Recognition, 2016.

6419

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on November 08,2022 at 11:39:34 UTC from IEEE Xplore. Restrictions apply.

You might also like