0% found this document useful (0 votes)
32 views

Paper 7 - The Object Detection Based On Deep Learning

research paper

Uploaded by

abdullah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Paper 7 - The Object Detection Based On Deep Learning

research paper

Uploaded by

abdullah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2017 4th International Conference on Information Science and Control Engineering

The Object Detection Based on Deep Learning


Cong Tang1,2,3, Yunsong Feng1,2,3, Xing Yang1,2,3, Chao Zheng1,2,3, Yuanpu Zhou1,2,3
1
Electronic Engineering Institute, Hefei, 230037, China
2
State Key Laboratory of Pulsed Power Laser Technology, Hefei, 230037, China
3
Key Laboratory of Infrared and Low Temperature Plasma of Anhui Province, Hefei, 230037, China
Email: [email protected], [email protected], [email protected], [email protected],
[email protected]

Abstract—The object detection based on deep learning is an as follows: Hough transform [6] method, frame-difference [7]
important application in deep learning technology, which is method, background subtraction method [8], optical flow
characterized by its strong capability of feature learning and method [9, 10], sliding window model method [11] and
feature representation compared with the traditional object deformable part model [12, 13] method.
detection methods. The paper first makes an introduction of the The Hough transform can transform image space into
classical methods in object detection, and expounds the relation parameter space. Every pixel in the image space corresponds
and difference between the classical methods and the deep to a curve in the parameter space and the coordinates of the
learning methods in object detection. Then it introduces the intersection of most curves by voting in the parameter space
emergence of the object detection methods based on deep
are the parameters of curve in image space. The common
learning and elaborates the most typical methods nowadays in
the object detection via deep learning. In the statement of the
Hough transform is only applied to the object detection in
methods, the paper focuses on the framework design and the which the object contour can be expressed with the analytic
working principle of the models and analyzes the model function, such as roundness, straight line, etc. The generalized
performance in the real-time and the accuracy of detection. Hough transform [14, 15] could detect the objects of any shape
Eventually, it discusses the challenges in the object detection by combining the graphic edge information and the edge point
based on deep learning and offers some solutions for reference. direction information, which is of higher speed and more
accuracy compared with the common Hough transform.
Keywords-object detection; deep learning; framework design; Furthermore, the generalized Hough transform could not only
model analysis; performance analysis carry out the shape detection, but also the category detection
[16, 17, 18].
I. INTRODUCTION For the frame-difference method, the principle is that the
Object detection has already been the significant research difference image results from subtracting the two adjacent
direction and the focus in the computer vision [1], which can frame images and is denoised by binaryzation processing and
be applied in the driverless car, robotics, video surveillance morphological filtering to get the object motion area. At
and pedestrian detection [2, 3]. The emergence of deep present, the widely-adopted methods are two-frame-
learning technology has changed the traditional modes of difference method, three-frame-difference method [19] and
object identification and object detection. In the Imagenet four-frame-difference method [20].
Large Scale Visual Recognition Challenge (ILSVRC) 2012, a The background subtraction method has three processes
global contest about computer vision, the AlexNet won the that are background modeling, object detection and
championship [4], which was the first successful deep background updating. The process of background subtraction
convolution network in image recognition, and its Top5 method is similar to that of the frame-difference method and
accuracy surpassed the runner-up by 10%. Moreover, the the difference is that the former needs to define a background
methods of deep learning topped the list in the succeeding frame and update in a timely manner. The background
LSVRCs. In the year 2013, the LSVRC added the object modeling technologies of defining the background frame are
detection, which facilitated the development of deep learning usually done by combining with the image features, which in
in object detection. The deep neural network has the strong general are luminance information, texture information and
feature representation capacity [5] in image processing and is spatial information, such as Mixture of Gaussians Models
usually used as the feature extraction module in object (GMM) [21]and Local Binary Patterns (LBP) [22].
detection. Deep models don’t need special hand engineered The optical flow method is brought up by Horn and
features and can be designed as the classifier and regression Schunck. The method assumes that the gray change is merely
device. Therefore, the deep learning technology is of great related with the object motion and delineates the motion of
prospect in the object detection. image pixel by establishing an optical flow equation, thereby
delineating the motion of object.
II. REVIEW OF OBJECT DETECTION The sliding window model, by setting the sliding window
Object detection is an application to detect the object from of fixed sizes, slides on the image according to some strategies
the specified scenes by a certain measure or method. Before while extracting the features in the sliding window and then
the emergence of deep learning technology, the methods of classifies them by some classifier. In the model, the features
object detection are primarily accomplished by establishing the can choose color histogram, gradient histogram, SHIFT. And
mathematical models based on some prior knowledge. At the classifier can choose SVM and the Adaboost classifier.
present, the common classical methods in object detection are

978-1-5386-3013-6/17 $31.00 © 2017 IEEE 723


DOI 10.1109/ICISCE.2017.156
The Deformable Part Model (DPM) method, extensively ground truths marks are simultaneously adopted to make the
applied in object detection, has won the detection champion of regression. Those specific grids are designed to cover different
Visual Object Class (VOC) in the year 2007, 2008 and 2009. parts of target by general target mark, left target mark, right
It includes two models: main model and sub-model. Main target mark and bottom target mark. Even so, the accuracy of
model is for the global feature extraction and sub-model is for DNN is not satisfactory, and the mean Average Precision
the local feature extraction. The sub-model divides the object (mAP) in the VCO2007 test dataset is slightly over 30%.
into several parts and carries out feature extraction in every The Overfeat is proposed by LeCun’s team, which extracts
part. Meanwhile, there is a cost function of position offset features with the improved deep convolutional model
between the main model and sub-model to describe the AlexNet, enabling the offset and slide window to realize the
confidence of the deformation. The object detection method goal of object classification by using images of various scales
based on deformable part model also adopts the sliding and locate objects by combining the regression network, thus
window model to extract feature and classify. accomplishing the object detection. The process diagram of
The aforementioned classic object detection methods can Overfeat in object detection is shown in Fig. 2.
be generally divided into two categories. In the first category,
the Hough transform method, frame-difference method,
background subtraction method and optical flow method all
adopt the mode of the feature + mathematical model, which
utilizes some features of the data to set up a mathematical (a) (b)
model and gets the result by solving the model in object
detection scenes. In the second category, the sliding window
model method and deformable part model method both take
the mode of region selection+ feature extraction+ (c) (d)
classification that combines the hand-engineered feature with Figure 2. The process diagram of Overfeat in object detection: (a) Multi-
classifier to get the object detection result, and this belongs to scale; (b) Recognition; (c) Regression; (d) Merge.
the applications in machine learning. The object detection Fig. 2(a) is the multi-scale recognition of various scales for
method based on deep learning is much similar to the second the input images. In the four scales of the input image, only
one, but it doesn’t need hand-engineered feature because of its the bear has been recognized in the former two small scale
strong capability in feature expression and feature learning. In images, nevertheless, the bear and fish could been
addition, the classifier in the deep learning framework can be simultaneously recognized in the latter two large scale images.
achieved by neural network. Fig. 2(b) is the identification process, which gains more
predictable results to increase the recognition precision by
III. EMERGENCE OF OBJECT DETECTION BASED ON DEEP using offset operation and slide window operation; Fig. 2(c) is
LEARNING the regression process, which gains numerous object region
In the mode of region selection + feature extraction + proposals to enhance the location accuracy in the same way.
classification adopted by object detection method based on Fig. 2(d) is the detection result after scale integration.
deep learning, the region selection can be done according to However, the method of adopting offset operation and slide
the some strategy, the feature extraction can be achieved by window operation is enormous computation in the Fig. 2(b)
the convolutional neural network and the classification can be and Fig. 2(c). In terms of accuracy, the mAP in the test dataset
realized by traditional SVM or the special neural network. The of ILSVRC13 detected by Overfeat is 19.4%.
early typical modes of deep learning applied in object Actually, most of the early object detection models based
detection are DNN [23] and Overfeat [24], which draw up the on deep learning employ the sliding window operation
curtain for deep learning applying in object detection. adopted by Overfeat to obtain the object candidates, and this
The object detection by DNN has designed two blind and exhaustive method would result in the problem of
subnetworks that include the classification subnetwork for data explosion. Later models try to solve the issue by
recognition and the regression subnetwork for location. improving the existing methods or proposing new ideas.
Originally, DNN is the deep neural network for classification. Meanwhile, the early model design is not perfect and the
If the softmax layer in the rear is replaced with regression model accuracy is unsatisfied. These disadvantages all chart a
layer, DNN can work as the regression subnetwork and can course for the succeeding model design of object detection
accomplish the object detection task when combined with the based on deep learning.
classification subnetwork. The operation schematic diagram IV. DEVELOPMENT OF OBJECT DETECTION BASED ON
of DNN regression networks is shown in Fig. 1.
DEEP LEARNING
In recent years, with the development of deep learning in
object detection, a massive deep detection models are
proposed. Here, seven current mainstream deep learning
Figure 1. The operation schematic diagram of DNN regression networks models in object detection will be introduced and deeply
Just as shown in Fig. 1, DNN makes a regression by the analyzed according to the time order of emergency, and they
binarizated grid combination covering the object region to can be briefly divided into: (1) the model based on region
locate the object. To distinguish the two adjacent objects, five proposal; (2) the model based on regression.

724
A. Models based on region proposal Fast R-CNN [28] is the upgrade of R-CNN proposed by
The deep learning object detection based on region Girshick and has the capability to solve the repetitive
proposal includes two main works: one is the extraction of calculation problem of the 2000 region proposals passing
region candidates; the other is the building of deep neural through the convolution neural network in turn. The
networks. improvement of Fast R-CNN compared to R-CNN lies in that
1) R-CNN it maps the region proposal extracted by selective search
R-CNN [25] is the convolution neural network based on algorithm in input image to the feature layer of convolution
the region proposal brought up in 2014 by Girshick who came neural network and conducts the pooling on the mapped
up with the concept of region proposal for the first time. The region proposal of feature layer by ROI pooling. The ROI
principle of R-CNN is that it utilizes the region segmentation pooling can help Fast R-CNN obtain the feature vector of
method of selective search [26] to extract the region proposals fixed sizes, which is necessary to successfully connect with
in the image, which include the possible object candidates, and the full connection. The role of ROI pooling is just like the
loads them into convolution neural network to extract the spatial pyramid pooling of SPP-net. The operation process of
feature vectors. Later, the classifier SVM will be used to Fast R-CNN is as shown in Fig. 5.
classify the feature vectors to obtain the classification results
in each region proposal. After merging by non-maximal
suppression (NMS), the model outputs the precise object
classifications and object bounding boxes to achieve object
detection. The detailed process is shown in Fig. 3.
Figure 5. The operation process of Fast R-CNN
The method of mapping region proposal of input image to
the feature layer in Fast R-CNN shares the convolution
computation, which substantially reduces the calculation. In
addition, in order to decrease the parameters of full
connection, Fast R-CNN adopts truncated SVD to enable that
figure 3ˊThe description of R-CNN framework the single fully connected layer corresponding to weight
In the VOC2007 test dataset, the mAP of R-CNN object matrix is replaced by two small fully connected layers, which
detection reaches 58.5% that is considerably lifted up further lessens the network calculation. In the training stage,
compared with the former methods. Nevertheless, all the 2000 the speed of Fast R-CNN is 8.8 times that of R-CNN and 2.58
region proposals of R-CNN would pass through the times that of SPP-net. In the test stage, the speed of Fast R-
convolution neural network in turn, resulting in that the real- CNN is 146 times that of R-CNN without the truncated SVD
time is poor. Even handling only one image on GPU needs and 213 times that of R-CNN with the truncated SVD. When
tens seconds. Meanwhile, the data of the computation is great, compared with SPP-net, the test speed of Fast R-CNN is 7
for examples, the feature files of 5000 images producing from times that of the former without the truncated SVD and 10
the convolution operation need to be stored on the hardware, times that of the former with truncated SVD.
and occupy hundreds of gigabyte storage space. 4) Faster R-CNN
2) SPP-net Faster R-CNN [29], proposed by Ren, He, Girshick, et al.,
SPP-net [27] is a deep neural network based on the spatial is the upgrade version of the Fast-CNN. Faster R-CNN
pyramid pooling proposed by MSRA He in 2014. The spatial employs the region proposal network (RPN) to solve the
pyramid pooling layer can get rid of the crop/warp operation issues of huge computation and poor real-time caused by the
on the input image in the former method. And it enables the selective search method in R-CNN and Fast R-CNN. And
input images of different sizes to connect with the full Faster R-CNN is an end to end framework which can train the
connection layer with the feature vector of the same dimension model easier. The function of RPN in Faster R-CNN is to
after passing the convolution layer. The crop/warp operation replace the role of selective search in obtaining region
reshapes the sizes of input convolution neural network to the proposals. RPN could divide the feature layer into n×n regions
fixed size, which will lead to incompleteness of object image and obtain the feature regions of various scales and aspect
and object deformation, just as shown in Fig. 4. ratios that are centered on the region, and the method is called
anchors mechanism. The anchors in RPN are used to produce
object proposals and then the proposals are sent to the rear
classification and regression networks for the object
recognition and location. The operation principle is shown in
(a) (b) Fig. 6.
Figure 4. Crop/warp operation of images: (a) Crop operation;
(b) Warp operation.
Although SPP-net solves the problems of object image
incompleteness and object deformation, it is still of colossal
computation and poor real-time because its image processing
is similar to that of R-CNN.
3) Fast R-CNN

725
but the object detection is still of poor real-time that can't
satisfy the application requirement.
1) YOLO
YOLO [31], came up with by Redmon, Divvala, Girshick,
et al., is a convolution neural network for real-time object
detection and can accomplish end to end training. Because of
the cancel of RoI module, YOLO won’t extract the object
region proposal any more. The front end of YOLO connects a
convolution neural network for feature extraction and the rear
end connects two full connected layers for classification and
regression in the grid regions. YOLO divides the input image
scale into 7*7 grids, each of which will produce two bounding
Figure 6. The operation principle of Faster R-CNN boxes. The bounding box will output a 4-dimnesional vector
After Faster RCNN adopts the RPN, the region proposals of coordinate information and the object confidence.
are reduced from 2000(by selective search) to 300, which Meanwhile, each grid also outputs 20 category probabilities,
significantly decreases the computation of the whole neural thus each grid produces a 30-dimentional vector including
network. The experiment indicates that test speed of Faster R- recognition information and location information. During the
CNN reaches 5fp/s, 10 times that of Fast RCNN. Furthermore, detection, YOLO filters the object proposals with low
the accuracy is also improved. The mAPs of Faster R-CNN in confidence by setting the threshold and wipes off the
VOC2007 and VOC2012 dataset tests have been raised by 2% redundant object proposals to gain the detection results. The
to 3% to reach 69.9% and 67.0% respectively compared with operation process is shown in Fig. 8.
those of Fast R-CNN.
5) R-FCN
R-FCN [30], proposed by Dai, is a full convolution neural
network based on regions, having solved the problem that RoI
can’t share the computation. The object detection framework
of R-FCN also adopts RPN to generate candidate RoIs. With
the position-sensitive score maps (khkh(C+1) dimensional
convolution layer), R-FCN can record the response of every
object in different locations. R-RCN defines the feature vector
(C+1 dimension column vector) by voting according the RoIs
and adopts softmax classification to classify the feature
Figure 8. The operation principle diagram of YOLO
vectors in order to achieve the object recognition. The
As to the object detection performance, YOLO’s detection
operation principle is shown in Fig. 7.
speed is 45 fps, and fast YOLO’s can reach 155 fps to make
the real-time detection possible, however, the YOLO’s
detection accuracy has decreased in a certain extent compared
with other state-of-the-art object detection models of deep
learning. In the same condition where the VGG is used as the
convolution networks for feature extraction, the accuracy of
YOLO is 66.4%, while the accuracy of Faster R-CNN is
73.2% [31]. The primary reason of the YOLO’s accuracy
decline is the cancel of region proposal.
Figure 7. The operation principle of R-FCN 2) SSD
Moreover, the object location could be realized by SSD [32] is the single shot multi-box detector proposed by
appending a 4hkhk dimension convolutional layer to the Liu Wei. The design of SSD has integrated YOLO’s
above position-sensitive score maps and defining the feature regression idea and Faster R-CNN’s anchors mechanism.
vector (4 dimension column vector that represents the With the regression idea of YOLO, SSD simplifies the
coordinates and width-height (tx, ty, tw, th) in the RoI region) computation complexity of the neural network to guarantee
by voting according to the RoIs. the real-time. With the anchors mechanism, SSD can extract
Therefore, R-FCN could do the recognition and location the features of different scales and aspect ratios to guarantee
simultaneously to achieve object detection. As to the the detection accuracy. And the local feature extraction
performance, the accuracy of R-FCN is similar to that of method of SSD is more reasonable and effective compared
Faster R-CNN, but the test speed of R-FCN is 2.5 times that with the general feature extraction method of YOLO. What’s
of the Faster R-CNN. more, because the feature representations in different scales
are different, the method of multi-scale feature extraction has
B. Models based on regression been applied in SSD, which contributes to promoting the
At present, the object detection methods based on deep detection robustness of different-scale objects. The operation
learning using region proposal gets satisfactory achievements, principle is shown in Fig. 9.

726
learning models, there are various tricks to improve the
detection abilities and model robustness. To prevent the
overfitting of deep model, Hinton, by putting forward the
concept of Dropout [35], randomly controls the weight switch
of neural network to enhance its generalization ability.
Revising the activation function of neuron nodes, such as Relu
Figure 9. The operation principle diagram of SSD
[36] and Maxout [37], could increase the fitting ability of
In terms of object detection performance, the detection whole network. To adapt to the object detection of various
speed of SSD is 59 fps and its accuracy is 74.3%[32], which scales, Hypernet [38] has made use of the multi-level feature
is the first object detection architecture having an accuracy fusion to combine the feature of diverse resolution, which is
rate more than 70% and satisfying the real-time requirement. adopted by SSD as well. In terms of training strategies, the
However, it still has some disadvantages, one of which is its hard negative mining [39] is conducive to the precision by
weak detection capacity to the small objects. increasing the negative sample proportion, which reinforces
the detection capability of the hard samples. The consideration
V. CHALLENGES AND SOLLUTIONS of context link when the networks are designed can also help
to increase the model accuracy [40]. As to the details is little
With the further advancement of deep learning
in high-level feature layer, the DSSD [41] has added a
technology, the object detection model based on deep learning
deconvolution module on the basis of SSD, lifting up its
has been continuously ameliorated. Currently, the object
recognition ability to the small objects. YOLO V2 [42]ˈ the
recognition is of better and more active prospect than object
detection in deep learning. Whereas, it is observed that object upgrading version of YOLO, applies the anchor mechanism in
detection based on deep learning lays huge dependency on extracting the object candidates to improve the accuracy in
object recognition, thus the elevation of object recognition YOLO.
capability will promote the lift-up of object detection VI. CONCLUSION
capability. Now various kinds of visual recognition and
detection contests are held to further boost the advancement This paper firstly introduces the classical methodologies
of deep learning in object recognition and object detection. of object detection, discusses the relation and differences
Meanwhile, the ever-changing hardware update also drives between the classic methodologies and the deep learning
the applications of deep learning technology. According to the methodologies in object detection. Then it clarifies the ideas
overall analysis of current object detection methodologies of model design and the limitations of deep learning method
based on deep learning, it can be discovered that there are two by overviewing the early object detection methods based on
primary challenges which are the real-time and robustness of deep learning. Afterwards, it elaborates on the common object
the computation. In the future, the object detection will detection model based on deep learning, during whose
certainly develop in better real-time and robustness. process, it makes a detailed interpretation of the framework
design and operation principle of the model and points out its
A. The Real-time innovation and performance assessment. Finally, this paper
Deep learning has been applied in object detection due to makes a further analysis of the challenges in object detection
its powerful feature representation. Whereas, because of great based on deep learning, and offers some solutions for
amounts of network parameters and huge computation reference. With the innovation of deep learning theories and
algorithm, most of the deep learning models are poor real-time computer hardware upgrading, the performance of object
in terms of the existing computation capability. Therefore, the detection based on deep learning will be ceaselessly enhanced
real-time has become the bottleneck for the application of and the applications of it will be widely ranged. Spatially, the
deep learning in object detection. To improve the poor real- development and application of current embedded systems in
time, there are two approaches to reduce the computation of deep learning will pave a promising prospect for object
network parameters as follows: (1) design the network detection based on deep learning.
architecture; (2) ameliorate the model algorithm. For example,
Overfeat and R-FCN, they both take the measure of shared ACNOWLEDGEMENT
parameters to decrease the times of data passing the This work is supported by the National Nature Science
convolution neural network. Faster R-CNN applies the RPN Foundation of China (61405248, 61503394), the Nature
structure to obtain region proposal by learning instead of the Science Foundation of Anhui Province in China
selective search algorithm. PVANET [33] decreases the (1508085QF121) and the Higher Education Institutes Nature
computation by improving the method of feature extraction. Science Research Project of Anhui Province in China
The paper in [34] adopts a tiny network as the front end to (KJ2015ZD14)
preprocess the data and then send it to the object detection
network to decrease the unnecessary computation. REFERENCES
[1] D. Erhan, C. Szegedy, A. Toshev, et al, “Scalable object detection using
B. The Robustness deep neural networks,” 2014 IEEE Conference on Computer Vision
The object detection based on deep learning is good at and Pattern Recognition, 2014, pp. 2155-2162.
robustness compared with traditional detection. However, the
deep models is hard to train. During the design of deep

727
[2] A. Borji, M. M. Cheng, H. Jiang, et al, “Salient object detection: A [22] M. Heikkila, M. Pietikainen, “A texture-based method for modeling the
benchmark,” IEEE Transactions on Image Processing, vol. 24, Dec background and detecting moving objects,” IEEE Transactions on
2015, pp. 5706-5722. Pattern Aanalysis and Machine Intelligence, vol. 28, April 2006, pp.
[3] Y. Tian, P. Luo, X. Wang, et al, “Deep learning strong parts for 657-662.
pedestrian detection,” 2015 IEEE International Conference on [23] C. Szegedy, A. Toshev, D. Erhan, “Deep neural networks for object
Computer Vision, 2015, pp. 1904-1912. detection,” Advances in Neural Information Processing Systems, 2013,
[4] P. Ahmadvand, R. Ebrahimpour and P. Ahmadvand, “How popular pp. 2553-2561.
CNNs perform in real applications of face recognition,͇ 2016 24th [24] P. Sermanet, D. Eigen, X. Zhang, et al, “Overfeat: Integrated
Telecommunications Forum (TELFOR), 2016, pp.1-4. recognition, localization and detection using convolutional networks,”
[5] W. Ouyang, X. Wang, X. Zeng, et al, “Deepid-net: Deformable deep ICLR, 2014.
convolutional neural networks for object detection,” 2015 IEEE [25] R. Girshick, J. Donahue, T. Darrell, et al, “Rich feature hierarchies for
Conference on Computer Vision and Pattern Recognition, 2015, pp. accurate object detection and semantic segmentation,” 2014 IEEE
2403-2412. Conference on Computer Vision and Pattern Recognition. 2014, pp.
[6] P. M. Merlin, D. J. Farber, “A parallel mechanism for detecting curves 580-587.
in pictures,” IEEE Transactions on Computers, vol. C-24, Jan 1975, pp. [26] J. R. R. Uijlings, K. E. A. Van De Sande, T. Gevers, et al, “Selective
96-98. search for object recognition,” International Journal of Computer
[7] N. Singla, “Motion detection based on frame difference method,” Vision, vol.104, Feb. 2013, pp. 154-171.
International Journal of Information & Computation Technology, vol. [27] K. He, X. Zhang, S. Ren, et al, “Spatial pyramid pooling in deep
4, no. 15, 2014, pp. 1559-1565. convolutional networks for visual recognition,” European Conference
[8] D. S. Lee, “Effective Gaussian mixture learning for video background on Computer Vision, 2014, pp. 346-361.
subtraction,” IEEE Transactions on Pattern Analysis and Machine [28] R. Girshick. “Fast r-cnn,” 2015 IEEE International Conference on
Intelligence, vol. 27, May 2005, pp. 827-832. Computer Vision, 2015, pp. 1440-1448.
[9] B. K. P. Horn, B. G. Schunck, “Determining optical flow,” Artificial [29] S. Ren, K. He, R. Girshick, et al, “Faster r-cnn: Towards real-time
intelligence, vol. 17, 1981, pp. 185-203. object detection with region proposal networks,” Advances in Neural
[10] J. L. Barron, D. J. Fleet, S. S. Beauchemin, et al, “Performance of Information Processing Systems, 2015, pp. 91-99.
optical flow techniques,” 1992 IEEE Computer Society Conference on [30] Y. Li, K. He, J. Sun, “R-FCN: Object detection via region-based fully
Computer Vision and Pattern Recognition, 1992: pp. 236-242. convolutional networks,” Advances in Neural Information Processing
[11] P. Viola and M. Jones, “Rapid object detection using a boosted cascade Systems, 2016, pp. 379-387.
of simple features,” 2001 IEEE Computer Society Conference on [31] J. Redmon, S. Divvala, R. Girshick, et al, “You only look once:
Computer Vision and Pattern Recognition, 2001, pp. I-511-I-518. Unified, real-time object detection,” Proceedings of the IEEE
[12] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, et al, “Object Conference on Computer Vision and Pattern Recognition, 2016, pp.
detection with discriminatively trained part-based models," IEEE 779-788.
Transactions on Pattern Analysis and Machine Intelligence, vol. 32, [32] W. Liu, D. Anguelov, D. Erhan, et al, “SSD: Single shot multibox
2010, pp. 1627-1645. detector,” European Conference on Computer Vision, 2016, pp. 21-37.
[13] P. Felzenszwalb, D. McAllester, D. Ramanan, “A discriminatively [33] K. H. Kim, S. Hong, B. Roh, et al, “PVANET: Deep but Lightweight
trained, multiscale, deformable part model,” 2008 IEEE Conference on Neural Networks for Real-time Object Detection,” arXiv preprint
Computer Vision and Pattern Recognition, 2008, pp. 1-8. arXiv:1608.08021, 2016.
[14] M. Ulrich, C. Steger, A. Baumgartne, “Real-time object recognition [34] A. Angelova, A. Krizhevsky, V. Vanhoucke, et al, “Real-Time
using a modified generalized Hough transform,” Pattern Recognition, Pedestrian Detection with Deep Network Cascades,” BMVC, 2015, pp.
vol. 36, Nov. 2003, pp. 2557-2570. 32.1-32.12.
[15] J. Xu, X. Sun, D. Zhang, et al, “Automatic detection of inshore ships in [35] G. E. Hinton, N. Srivastava, A. Krizhevsky, et al, “Improving neural
high-resolution remote sensing images using robust invariant networks by preventing co-adaptation of feature detectors,” arXiv
generalized Hough transform,” IEEE Geoscience and Remote Sensing preprint arXiv:1207.0580, 2012.
Letters, vol. 11, Dec. 2014, pp. 2070-2074. [36] G. E. Dahl, T. N. Sainath, G. E. Hinton, “Improving deep neural
[16] B. Leibe, A. Leonardis, B. Schiele, “Robust object detection with networks for LVCSR using rectified linear units and dropout,” 2013
interleaved categorization and segmentation,” International Journal of IEEE International Conference on Acoustics, Speech and Signal
Computer Vision, vol. 77, 2008, pp. 259-289. Processing (ICASSP), 2013, pp. 8609-8613.
[17] S. Maji, J. Malik, “Object detection using a max-margin hough [37] I. J. Goodfellow, D. Warde-Farley, M. Mirza, et al, “Maxout
transform,” 2009 IEEE Conference on Computer Vision and Pattern Networks,” ICML, 2013, pp. 1319-1327.
Recognition, 2009, pp. 1038-1045. [38] T. Kong, A. Yao, Y. Chen, et al, “HyperNet: towards accurate region
[18] A. Lehmann, B. Leibe, L. Van Gool, “Fast prism: Branch and bound proposal generation and joint object detection,” 2016 IEEE Conference
hough transform for object class detection,” International Journal of on Computer Vision and Pattern Recognition, 2016, pp. 845-853.
Computer Vision, vol. 94, Feb. 2011, pp. 175-197. [39] A. Shrivastava, A. Gupta, R. Girshick, “Training region-based object
[19] K. Dai, G. Li, D. Tu, et al, “Prospects and current studies on detectors with online hard example mining,” 2016 IEEE Conference on
background subtraction techniques for moving objects detection from Computer Vision and Pattern Recognition. 2016, pp. 761-769.
surveillance video,” Journal of Image and Graphics, vol. 11, July 2006, [40] S. Gidaris, N. Komodakis, “Object detection via a multi-region and
pp. 919-927. semantic segmentation-aware cnn model,” 2015 IEEE International
[20] Q. Ji, S. Yu, “Object detection algorithm based on surendra background Conference on Computer Vision, 2015, pp. 1134-1142.
subtraction and four-frame difference,” Computer Applications and [41] C. Y. Fu, W. Liu, A. Ranga, et al, “DSSD: Deconvolutional Single Shot
Software, vol. 31, Dec. 2014, pp. 242-244. Detector,” arXiv preprint arXiv:1701.06659, 2017.
[21] C. Stauffer, W. E. L. Grimson. “Adaptive background mixture models [42] J. Redmon, A. Farhadi, “YOLO9000: Better, Faster, Stronger,” arXiv
for real-time tracking,” 1999 IEEE Computer Society Conference on preprint arXiv:1612.08242, 2016.
Computer Vision and Pattern Recognition, 1999. pp. 246-252.

728

You might also like