0% found this document useful (0 votes)
9 views

Animal Tracking Object Detection ICTIS PrePrint

Journal

Uploaded by

pnvicky34
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Animal Tracking Object Detection ICTIS PrePrint

Journal

Uploaded by

pnvicky34
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/346600017

Deep Learning Methods for Animal Recognition and Tracking to Detect


Intrusions

Chapter · January 2021


DOI: 10.1007/978-981-15-7062-9_62

CITATIONS READS
8 4,611

3 authors, including:

Sudarshan Tsb Prashanth C Ravoor


People's Education Society PES University
97 PUBLICATIONS 629 CITATIONS 7 PUBLICATIONS 77 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Architectures for 1-D, 2-D and 3-D Discrete Transforms View project

Navigation System for a Mobile Service Robot View project

All content following this page was uploaded by Prashanth C Ravoor on 22 December 2020.

The user has requested enhancement of the downloaded file.


Deep Learning methods for Animal Recognition
and Tracking to Detect Intrusions

Ashwini V Sayagavi1 , Sudarshan T S B2 , and Prashanth C Ravoor3

Dept. of CSE, PES University, Bangalore


1
[email protected],2 [email protected],
3
[email protected]

Abstract. Over the last few years, there has been a steady rise in num-
ber of reported human-animal conflicts. While there are several reasons
for increase in such conflicts, foremost among them is the reduction in
forest cover. Animals stray close to human settlements in search of food,
and often end up raiding crops or preying on cattle. There are at times
human causalities as well. Proficient, reliable and autonomous monitor-
ing of human settlements bordering forest areas can help reduce such
animal-human conflicts. A broad range of techniques in computer vision
and deep-learning have shown enormous potential to solve such problems.
In this paper, a novel, efficient and reliable system is presented which
automatically detects wild-animals using computer vision. The proposed
method uses the YOLO object detection model to ascertain presence of
wild animals in images. The model is fine-tuned for identifying six dif-
ferent entities – humans, and five different types of animals (elephant,
zebra, giraffe, lion and cheetah). Once detected, the animal is tracked
using CSRT to determine its intentions, and based on the perceived in-
formation, notifications are sent to alert the concerned authorities. The
design of a prototype for the proposed solution is also described, which
uses Raspberry Pi devices equipped with cameras. The proposed method
achieves an accuracy of 98.8% and 99.8% to detect animals and humans
respectively.

Keywords: computer vision, animal intrusion detection, human animal


conflict, object tracking, object detection, internet of things

1 Introduction

There have been increasing reports of wild animals entering villages or towns,
especially in settlements surrounding forest areas, endangering human lives. In-
trusions by animals cause huge losses, be it in terms of crop loss or cattle being
attacked. Increasing human population leading to decreasing forest cover is one
of the leading causes for rise in human animal conflicts. Current methods to
reduce such conflicts include installation of electric fences or have sentries watch
for animals through the night. Electric fences cause severe injury to animals.
Moreover, they require enormous initial investment and additionally have high

This is a pre-copy-edit version of paper presented in ICTIS 2020.


Complete and updated work published at https://doi.org/10.1007/978-981-15-7062-9_62
2 Sayagavi et al.

maintenance costs. Recent developments in the field of of computer science en-


ables use of technology to create low-cost solutions to such problems. Computer
vision is one such technology which could potentially solve most of the associated
problems.
Use of deep learning methods to classify images that contain entities of in-
terest are gaining popularity. Deep Convolutional Neural Networks (DCNN) are
known to be accurate, and outperform all other existing methods in the task of
image classification. Krizhevsky et al. [5], who submitted the winning entry for
the ImageNet classification challenge, introduced a Deep Neural Network based
solution for image classification. It is now considered a landmark achievement in
computer vision, and has contributed to increased research in the field.
The main intent of this paper is to describe the design for a computer vision
system, capable of detecting wild animals and tracking their movement. DCNNs
could be leveraged to detect the presence of animals in the captured images.
In addition to detecting the presence of an animal, in order to effectively track
them and monitor their actions, it is also necessary to localize the animals within
the image. This is the task of object detection. Object detection systems predict
regions of interest within images, and in addition classify entities within these
regions. Thus object detection is the ideal choice for the system proposed in this
paper.
This paper introduces a novel method of reducing human animal conflicts,
through constant and automatic monitoring of vulnerable areas using a system
of cameras. The proposed solution is accurate and cost effective and to an extent,
can be customized specifically for a particular region. Section 2 presents notable
existing research work in the area. Section 3 describes the design of the proposed
system and highlights the role of various components. Section 4 summarizes the
important results of the study, followed by a brief discussion and scope for future
enhancements in section 5.

2 Literature Survey

The section is organized into two categories – some systems for animal detec-
tion have been reviewed, followed by methods of animal intrusion detection and
prevention.

2.1 Animal Detection using Computer Vision

Zhang et al. [19] describe a system for segmentation of animals from images
captured through camera traps. The procedure employed uses a multi-level it-
erative graph cut to generate object region proposals and accurately recognize
regions of interest. This is especially useful when the animal blends together
with the background and is difficult to identify. These proposals segmented into
background and foreground in the second stage. Feature vectors are extracted
from each image using AlexNet [5] architecture, and combined together with the
Animal Recognition and Tracking 3

histogram of oriented gradients (HOG) to generate Fisher vectors. The system


obtained an accuracy of 82.1% for animal and species detection.
Yousif et al. [18] combine deep learning classification with dynamic back-
ground modelling to evolve a swift and precise method for human and animal
detection from highly cluttered camera trap pictures. Background modelling
helps generate region proposals for foreground objects, which are then classi-
fied using the DCNN, resulting in improved efficiency and increased accuracy.
The proposed system achieves 82% accuracy in segmenting images into human,
animal and background patches.
Kellenberger et al. in [4] use Unmanned Aerial Vehicles (UAV) to monitor
animals and prevent poaching. A two branch custom CNN is built using AlexNet
[5] as the backbone. The authors report a 60% accuracy over the dataset gathered
from UAV images Kuzikus Wildlife Reserve park, Namibia.
Norouzzadeh et al. in [10] use the Snapshot Serengeti dataset [17] and apply
deep-neural networks to detect and identify animals in camera trap images. The
system consists of multiple parts a) detection stage (whether there is an animal
in the image), b) species identification stage, c) information stage, where the
network reports additional data such as the count and attributes of the animals
(standing, resting etc.). An ensemble of nine models is used, and obtains a top-1
accuracy of 99.4% for the species identification task, and the overall pipeline
accuracy was around 93.8%.
Parham et al. [11] propose a multi-stage pipeline for animal detection and
recognition. The fundamental steps include animal classification, animal local-
ization and predicting animal characteristics, such as orientation. Animal local-
ization is based on the YOLO [12] object detection model. The proposed system
achieves an overall detection accuracy of 76.58% over 6 species.
Matuska et al. [9] propose a novel system for monitoring animals, consisting
of a computing unit, for extracting features of animals, and a separate module
to track movements. SIFT [7] and SURF [1] are used for feature extraction, and
are classified using an SVM classifier [2]. Use of SIFT descriptors achieved an
accuracy of 94% for animal species classification.
Sharma et al. [15] describe a system for animal detection that uses cross-
correlation filters for template matching. The training data is used as a baseline
for classification, and new images are matched with images in the database to
detect presence of animals. This system obtains an overall accuracy of 86.25%.

2.2 Animal Intrusion Detection Systems


Suganthi et al. [16] present a system to detect intrusion by elephants to reduce
agricultural losses. The proposed system uses multiple vibration sensors, and
the number of triggered sensors is used to detect if an elephant is close by. If
the sensors are triggered, a photo is captured and Google Vision API is used to
detect the presence of an elephant in the image. On confirmation of detection,
alerts are sent to local authorities for further action.
Pooja et al. [3] use multiple PIR sensors to detect animal movement. The
sensors are set so that when triggered, the number of sensors set off provides
4 Sayagavi et al.

an indication of the species of animal. Based on the species, suitable actions are
taken, such as playing an audio clip and alerting local sentries.
Several challenges still exist with all the surveyed methods. Most notably, the
systems built for animal intrusion detection have to ensure the correct species
identification. Animals react differently to stimulus – playing loud sounds might
scare away wild boar, but might startle an elephant causing it to go on a ram-
page. It is not sufficient to detect the animal but also necessary to ascertain its
intentions before creating alerts to ensure fewer false positives. All of this needs
to be performed in real or near-real time. Since the solution also needs to be
cost-effective, small compute devices such as embedded systems would need to
be used, which can be deployed on site. The solution proposed in this paper
attempts to address all of these challenges, and an overview of the design is
presented in the next section.

3 Proposed Solution

The system proposed in this paper uses a network of cameras, connected to PIR
motion sensors, so that image capture is triggered only when some movement is
detected. This enables power conservation. The images captured through these
cameras are processed to detect presence of wild animals, and if an animal is
found, identify the species. Once identified, the animals are tracked for a suitable
time in order to determine their intent – such as to find whether they are moving
across the village, or into it. In the latter case, alerts are generated and local
authorities are notified through proper channels. Understanding the intent goes
a long way to reduce false positives, either due to a false detection or when there
is no actual threat posed due to presence of the animal.

3.1 Object Detection and Tracking

YOLO object detection [12] model is used to detect the presence of animals in
the captured images. YOLO is a DCNN object detection model which has good
performance both in terms of accuracy as well as speed of inference. For the
prototype version of the system proposed in this paper, five different species of
animals – elephant, zebra, giraffe, lion, cheetah are considered. Images of humans
are also included, so there are a total of six different categories in the training
data. The DCNN is fine tuned for better accuracy over these six categories.
Training data is obtained from publicly available wild animal videos, in-
cluding those from YouTube channels and National Geographic videos. Frames
extracted from these videos are manually annotated in the required format for
training. The model is trained using images of dimensions 448 x 448. The learn-
ing rate was initialized to 0.001 with a decay rate of 0.995, and momentum is
set to 0.9. The model converged after running it for 135 epochs. The average
accuracy over detecting five species of animals is 98.8%, and for human detection
the accuracy obtained is around 99.8%.
Animal Recognition and Tracking 5

Figure 1 illustrates a flow chart of the animal detection module. The YOLO
object detection model weights and configuration is loaded and the image is fed
into the model. The outputs of the object detection network are tested against
a threshold value and undergo non-max suppression to remove low confidence
and overlapping predictions. If wild animals are detected in the frame, then the
object tracking phase is triggered.
Once the animals are detected, identified and localized in the frame, object
tracking is used to determine intents or actions of the animal being monitored.
The CSRT tracker [8] is used to track animals effectively, since it is both fast
and accurate. The input to the tracker is the centroid of the detected bounding
box. It is not necessary to provide the bounding box in every frame, however;
once a tracker is initialized using one bounding box centroid, visual features
from the marked area of the image are used to infer location of the animal in
each subsequent frame. The tracker maintains state from previous frames, thus
allowing for identifying the direction of movement.
Each track can be monitored separately, and can be used to obtain the di-
rection of movement. Appropriate actions are taken if the intent of the animal
matches a set of predefined rules, such as sending a user notification if the animal
is found to be moving in a particular direction, or to flash lights, play audio clips
if the animal moves close to the village.

3.2 Prototype Design

This section presents the design of the overall system, integrating object detec-
tion, tracking and notifications. Raspberry PI2 is an embedded system which is
capable of interfacing with a variety of peripheral devices through various pro-
tocols. It is also quite powerful for its size, housing a quad-core processor and
1GB RAM. It supports a camera module which is capable of recording video at
30fps and 5MP resolution3 . Both integrate out-of-the box, and are used as the
endpoint devices.
Figure 2 depicts the operation of the system, using two Raspberry Pi devices
for illustration. Inter-device communication is restricted to the hand-off proce-
dure, where a device indicates to its neighbours that it can no longer see an
animal that it was previously monitoring. Tracking is implemented on device;
however, object detection is slow given the processors constrained size. A lite
version of the object detection model is run instead, which has lower accuracy
but is capable of achieving up to 1fps for detection.
If the animal moves out of range of the camera, a notification of the tracked
person is sent to the other Raspberry Pi devices. The transmitted message
contains information about class of object detected and identifies the sender,
through which a monitoring program finds the location of the camera, and an
alert notification is sent containing information about the type of animal spotted,
and its last known location.
1
https://www.raspberrypi.org/products/raspberry-pi-3-model-b-plus/
2
https://www.raspberrypi.org/products/camera-module-v2/
6 Sayagavi et al.

Fig. 2. System operation description us-


Fig. 1. Wild animal Object Detection
ing two Raspberry Pi devices, each
framework
equipped with a camera module

Unique identities are generated each time an object is detected, and a tracker
is initialized over it. The detected object is tracked until it is visible in the
camera’s field of vision (FoV). If the system is no longer able to detect the
object, a notification is sent to the other devices to signal an object tracking
hand-off. The identities assigned to the animal remain unique only until the
object is tracked in the given sequence, and could be assigned different ids if it
re-appears in the camera FoV. In case multiple wild animals are detected, each
group of animals is assigned as a single entity; for example, group of lions are
assigned a single identity. This is necessary because it is difficult to differentiate
one animal from other within the same species.
Since wild animal tracking is not possible in real-time, person tracking is used
instead to illustrate the working of the prototype. It is to be noted that all other
components remain identical, and the same system is adequate for tracking wild
animals in a similar fashion.

4 Results

Table 1 shows the performance of the pre-trained model described in this paper
against some of the surveyed object detection models. The pre-trained model
presented in this paper achieves 98.8% accuracy for animal detection and 99.8%
for humans. The table contains the average of these two values. Table 2 illustrates
the time taken for processing an image for a few state of the art object detectors,
when using a GPU. Figure 3 depicts a few qualitative results of running the
YOLO object detection model for animal detection. The bounding boxes for
animals in the frame are annotated with a confidence score and the type of
animal as predicted by the model. The figure shows results of running inference
on elephants, humans, giraffe and zebra.
Animal Recognition and Tracking 7

Model # Species Accuracy (%) Model mAP FPS


[10] (2018) 48 99.4 SSD300 [6] 41.2 46
[11] (2018) 6 76.6 SSD500 [6] 46.5 19
[18] (2017) 2 82.0 YOLOv2 [13] 48.1 40
[4] (2017) - 60.0 TinyYOLO [13] 23.7 244
[19] (2016) 23 82.1
[9] (2014) 5 94.0 Table 2. Object Detection per-
Ours 6 99.3 formance compared to YOLO.
mAP is reported over the COCO
Table 1. Summary of detection accuracy dataset, and FPS is measured over
of surveyed papers GPU. Source3

5 Discussion and Future Work


The object detection module is highly accurate. DCNN models for image classi-
fication and object detection are widespread in use, and it is evident that given
sufficient training data, the models can generalize well in most domains. Sim-
ilarly, the CSRT tracker is robust and reduces the need for continuous object
detection, which is costly and compute intensive. This is especially advantageous,
given the use of embedded devices like the Raspberry Pi. The notification system
can be customized to dispatch messages using multiple protocols, such as SMS
or e-mail. The action taken on animal detection can vary, and could include use
of deterrents such as flashing bright lights or playing loud sounds, based on the
animal species.
The YOLO object detection model is known for its accuracy and ease of use.
However, running object detection on embedded devices remains a challenge.
A faster and more resource optimal alternative for object detection could be
explored. Recent developments to create networks specific to mobile devices, such
as the MobileNet architecture [14] holds promise, and is a potential candidate
to be used for object detection. Another alternative is to use a GPU device, but
this would reduce cost-effectiveness of the solution.
One of the drawbacks of the approach presented here is when multiple cam-
eras detect the same individual animal – it might result in multiple notifications
being sent, and would appear as though more than one animal is detected, when
in reality there is only one. In order to circumvent this, a centralized server could
monitor detections from each unit, and determine if there actually is just a single
animal or several.
In addition, the CSRT tracker is a single object tracker – it bears no semantic
notation of the object being tracked, and uses visual features to keep the tracklets
continuous. It is thus, prone to failure if the background closely resembles the
appearance of the animal. A more robust tracking mechanism is required, which
considers not only visual features but also temporal and spatial features and can
effectively track the animal under various conditions.
3
https://pjreddie.com/darknet/yolo/
8 Sayagavi et al.

Fig. 3. Qualitative results of object detection using YOLO

Use of infrared imagery is yet another area that offers room for improvement.
In the proposed system, if the ambient light is not sufficient to capture a reli-
able image, object detection would fail. Since animal movement generally occurs
during the night, use of IR images to detect animals would make the intrusion
detection system more potent, offering a round-the-clock monitoring mechanism.

6 Conclusion
The proposed system attempts to reduce human-animal conflicts by continuous
and automatic monitoring of vulnerable areas using computer vision to detect
animal intrusions. The intrusion detection pipeline consists of three stages –
animal detection, animal tracking and user alerts and notifications. The proposed
system is cost-effective and highly efficient, with an average accuracy of 98.8% in
detecting and identifying animals in images. Although the prototype described
in this paper is trained to recognize five different species of animals, it is easily
extendable to detect and track other types of animals with sufficient training
data. The choice of species can also be region specific, thereby providing a unique
edge over other existing solutions. Such a system if implemented on a large scale,
has potential to largely reduce causalities due to animal intrusions.

References
1. Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. In:
A. Leonardis, H. Bischof, A. Pinz (eds.) Computer Vision – ECCV 2006, pp. 404–
417. Springer Berlin Heidelberg, Berlin, Heidelberg (2006)
2. Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297
(1995)
3. G, P., Bagal, M.U.: A smart farmland using raspberry pi crop vandalization pre-
vention & intrusion detection system. International Journal of Advance Research
and Innovative ideas in Education 1(S), 62–68 (2016)
Animal Recognition and Tracking 9

4. Kellenberger, B., Volpi, M., Tuia, D.: Fast animal detection in uav images using
convolutional neural networks. In: 2017 IEEE International Geoscience and Remote
Sensing Symposium (IGARSS), pp. 866–869 (2017)
5. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep con-
volutional neural networks. In: F. Pereira, C.J.C. Burges, L. Bottou, K.Q. Wein-
berger (eds.) Advances in Neural Information Processing Systems 25, pp. 1097–
1105. Curran Associates, Inc. (2012)
6. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.:
Ssd: Single shot multibox detector. Lecture Notes in Computer Science pp. 21–37
(2016)
7. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Interna-
tional Journal of Computer Vision 60(2), 91–110 (2004)
8. Lukežič, A., Vojı́ř, T., Čehovin Zajc, L., Matas, J., Kristan, M.: Discriminative
correlation filter tracker with channel and spatial reliability. International Journal
of Computer Vision 126(7), 671–688 (2018)
9. Matuska, S., Hudec, R., Benco, M., Kamencay, P., Zachariasova, M.: A novel sys-
tem for automatic detection and classification of animal. In: 2014 ELEKTRO, pp.
76–80 (2014)
10. Norouzzadeh, M.S., Nguyen, A., Kosmala, M., Swanson, A., Palmer, M.S., Packer,
C., Clune, J.: Automatically identifying, counting, and describing wild animals in
camera-trap images with deep learning. Proceedings of the National Academy of
Sciences 115(25), E5716–E5725 (2018)
11. Parham, J., Stewart, C., Crall, J., Rubenstein, D., Holmberg, J., Berger-Wolf, T.:
An animal detection pipeline for identification. In: 2018 IEEE Winter Conference
on Applications of Computer Vision (WACV), pp. 1075–1083 (2018)
12. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified,
real-time object detection. In: 2016 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pp. 779–788 (2016)
13. Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: 2017 IEEE Confer-
ence on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525 (2017)
14. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.: Mobilenetv2: Inverted
residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer
Vision and Pattern Recognition, pp. 4510–4520 (2018)
15. Sharma, S., Shah, D., Bhavsar, R., Jaiswal, B., Bamniya, K.: Automated detection
of animals in context to indian scenario. In: 2014 5th International Conference on
Intelligent Systems, Modelling and Simulation, pp. 334–338 (2014)
16. Suganthi, N., Rajathi, N., M, F.I.: Elephant intrusion detection and repulsive sys-
tem. International Journal of Recent Technology and Engineering 7(4S), 307–310
(2018)
17. Swanson, A., Kosmala, M., Lintott, C., Simpson, R., Smith, A., Packer, C.: Snap-
shot serengeti, high-frequency annotated camera trap images of 40 mammalian
species in an african savanna. Scientific data 2, 150,026 (2015)
18. Yousif, H., Yuan, J., Kays, R., He, Z.: Fast human-animal detection from highly
cluttered camera-trap images using joint background modeling and deep learning
classification. In: 2017 IEEE International Symposium on Circuits and Systems
(ISCAS), pp. 1–4 (2017)
19. Zhang, Z., He, Z., Cao, G., Cao, W.: Animal detection from highly cluttered natural
scenes using spatiotemporal object region proposals and patch verification. IEEE
Transactions on Multimedia 18(10), 2079–2092 (2016)

View publication stats

You might also like