0% found this document useful (0 votes)
74 views8 pages

IEEE Conference 29 July

Uploaded by

mailmekaran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views8 pages

IEEE Conference 29 July

Uploaded by

mailmekaran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Multi View Face Detection using Deep Convolutional

Neural Network and Calibrated CNN Structure


Shivkaran Ravidas1 , M.A. Ansari2, Member, IEEE, R.S. Anand3
1,2
Dept. of Electrical Engineering, School of Engineering, Gautam Buddha University, Greater Noida, India
3
Department of Electrical Engineering, IIT Roorkee, INDIA
[email protected], [email protected]

Abstract—The aim of this paper is to detect multi-view faces (scale invariant feature transform) by D. Lowe [3]. The best
using deep convolutional neural network (DCNN) and calibrated practice of detecting the face obtains the above features on the
CNN (Convolutional Neural Network) strctures. Multi-view face images of face landmarks with various scales and concatenates
detection system produces rotated windows of image and their
them into feature vectors at high dimension as explained by
integral windows of image for every classifier which perform
operations of parallel classification to predict non frontal and
Simonyan et al. [4,5]. CNN (convolutional neural networks)
non upright faces in images. Implementation, detection and was involved in the community of computer vision by the
retrieval of faces will be obtained with the help of direct visual storm, effectively enhancing the art state in most of the
matching technology. Further, the probabilistic measure of the applications. Main significant elements for the success of
similarity of the face images will be done using Bayesian analysis. CNN methods are the accessibility of high amount of training
Experiment detects faces with ± 90degree out of plane rotations. data as illustrated by Simonyan et al [4] and Lin et al. [6].
Fine tuned AlexNet is used to detect faces. For this work, we CNNs are neural networks which are hierarchical those layers
extracted examples of training from AFLW (Annotated Facial in convolution exchange with subsampling layers, suggestive
Landmarks in the Wild) dataset that involve 21K images with of complex and simple cells in the fundamental visual cortex.
24K annotations of the face. Even though neural networks are adapted to tasks of computer
vision for obtaining good performance for generalization, it is
Keywords: Face detection, multi view fac detection, deep
learning, convolutional neural network (CNN), Computer vision. good to add before knowledge into an architecture of the
network. CNN aims to adopt spatial information between
images pixel and thus they are on the basis of discrete
convolution. According to Li et al. [7] detection of a face is
I. INTRODUCTION
formulated as an issue for classification for separating patterns
Multi-view detection for the face is very challenging when it for the face from non-face patterns. From the perspective of
is viewed from the fixed view; therefore, it is significant to statistics, there are three drawbacks for such issue patterns
adopt multi-view faces. Multi-view face detection system is dimensionality is high usually; probable quantity of non-face
used to predict the upright faces in images with 90 degrees patterns is huge and their distribution is not regular. Hence, it
ROP (rotation out of plane) pose changes. The meaning of is complex to model the distribution of probability in patterns
rotation invariant is to predict faces with 360 degrees RIP of the face, particularly the patterns in the multi-view face
(rotation in plane) pose variations. The multi-view face with a unimodal function for density. Issues concern with
detection system produces rotated windows of an image and rotation in profound and thus able to identify faces across
their integral windows of an image for every classifier which various views are not simple. Most of the investigators and
perform operations of parallel classification to predict non- researchers addressed such issue by constructing multiple
frontal and non-upright faces in images. Multi-view detection views on the basis on detecting the face (multi-view face
of a face can be detected by building few detectors, all detection) that is to categorize the sphere of view into certain
consequent to a particular view. Detection of the face was one small segments and to build one detector on all segments
of the main technologies for enabling natural interaction [1,8].
between human and computer. The performance of systems
for recognizing the face relies extremely on representing the
face that is physically coupled with most of the variations in II. MULTI-VIEW FACE DETECTION
the face type like expression, view, and illumination. As
images of a face are mostly noticed in unique views, the main We can define Face detection as the process of extracting faces
threat is to unpick the identity of face and representations of from the given images. Hence, the system should positively
the view. Large efforts are contributed for extracting features identify a certain region as a face. According to Yang et al.
for identity by hand like Gabor proposed by Liu et al. [1], [10] and Erik Hjelmas et al. [11], face detection is a process
LBP( Local Binary pattern) by Ahonen et al. [2], and SIFT of finding regions of the input image where the faces are
present.
A lot of work has been done in detecting faces in still and According to Parkhi et al. [20], the recognition of face
frontal faces in plane as well as complex background [12] from either a set of faces or single photograph tracked in a
With the advancement in the field of information technology video. Two major contributions were made in this particular
and computational power, computers are more interactive with research. First and foremost we have developed a procedure
humans. This human computer interface (HCI) is done mostly which can assemble a wide range of dataset, with the small
via traditional devices like mouse, keyboard, and display. One noise of label while reducing the quantity of manual
of the most important medium is the face and facial annotation included. One of the main concept was to adopt
expression. Face detection is the first step in any face weaker classifiers for ranking the data given to the annotators.
recognition system. Detecting face is well studied problem in At the same time, it was noted that such procedure was
the vision of computer. Contemporary detectors of the face designed for faces however appropriate for other classes of
can effortlessly identify near front faces. Complexities in objects and fine-grained responsibilities. The second
detecting the face come from two aspects such as large space contribution was to demonstrate that deep CNN, with
for searching of probable face sizes, positions and large visual appropriate training and without any additions can produce
differences of human faces in a chaotic environment. Former outcomes when compared with state of the art. Thus it can be
one imposes a requirement for the efficiency of time while concluded that deep CNN can outperform well without any
latter one needs a detector for face to perfectly addressing a additions and appropriate training than other counterparts as
binary issue in classification. well as reduce the quantity of manual annotation.

A. Deep Convolutional Neural Network (DCNN) Li et al. [21] analyzed about CNN cascade for detecting
the face. Developed detector estimates the image as input at
Convolutional neural network (CNN) are very popular in the low resolution to refuse non-face regions and cautiously
field of computer vision. One of the reason is availably of process the difficult region at higher resolution for exact
large amount of training data. Vaillant et al. [13] in 1994 have identification or detection. Nets for calibration are brought in
applied neural networks for detecting faces in uncluttered the cascade for accelerating identification and enhance the
images. They designed a convolutional neural network that quality of bounding box. Sharing the benefits of CNN,
can be trained to detect the presence or absence of a face in a developed detector for the face is robust to large variations in
the visual image. Apart from these, it was noted that on the
given image. This will scan the whole image at all possible
public FDDB (face detection data set and benchmark)
locations. Rowley et al. [14] developed a neural network for
developed detector performs well as compare to the state of
upright frontal face detection. later in 1998 [15] the method the art methods. It was also pointed out that developed
was extended for pose invariant face detection. Neural detector is fast, achieve 14 frames per second for typical video
networks are adopted in most of the applications such as graphics array images on the central processing unit and can
issues in recognition of pattern, recognition of character, be accelerated to 100 frames per second on the graphical
recognition of the object and autonomous robot driving. The processing unit. Thus it was clear from the findings of the
major purpose of this network in the recognition of face is the research are sharing the benefits of CNN, developed detector
training feasibility of the system for capturing the difficult for the face is robust to large variations in the visual image.
class of patterns in the face. Deep convolution CNN are not
only used for face detection but also for face alignment [16]. According to the study by Zhu et al. [21] analyzed multi-
For obtaining the best performance of such method, it has to view perception (MVP) through the deep model for learning
highly tune number of nodes, layers, rates for learning and so the identity of face and view representations. This work
on [17]. The drawback in the approach of a neural network is developed a generative deep network known as MVP to mimic
that when the quantity of classes maximizes. In template the capable of perception at multi-view in the primate brain.
matching, other templates for the face are exploited from MVP can disentangle the representations of view and
various prospects for characterizing single face. Such identities are obtained as input for the image and also create a
algorithms are not cost effective and cannot be easily carried full views spectrum of the image as input. From the findings
out as stated in [18]. of the experiment, it was demonstrated that detection features
Farfade et al. [19] conducted a research to examine multi- of MVP achieve better outcome and performance on
view detection of the face using deep CNN. Developed recognition of face than counterparts like state of the art
framework does not need landmark or pose annotation and can methods. In addition to these, it was demonstrated that
identify faces in a large choice of orientations with the help of modeling the factor for view representation as a continuous
a single model. DDFD (Deep dense face detector) is not variable allows MVP for predicting and interpolating images
dependent on common modules in deep objects for learning beneath the viewpoints that are unobserved in data for
the methods for detection like bounding-box regression and training, which imitate the reasoning human capacity. Thus it
segmentation of an image. We compared the developed can be inferred from the analysis that detection features of
method with R-CNN and few other methods for detecting the MVP achieve better outcome and performance on recognition
face which is designed particularly for multi-view detection of of face than counterparts like state of the art methods.
the face, for example, DPM-based and cascade-based.
III. IMPLEMENTATION METHOD
5
In the implementation, detection of face and retrieval of the y ij ,k = max {x ij . s+ m ,k . s+n }
image will be attained with the help of direct visual matching 0 <m , n<s
technology. A probabilistic computation of resemblance
among the images of the face will be conducted on the basis of Where output map pools over s × s non-overlapping
the Bayesian analysis for achieving various detection of the
face. After this, a neural network will be developed and
region.
trained in order to enhance the outcome of Bayesian analysis.
Next, to that, training and verification will be adopted to test
y j=max 0 , ∑ x 1i . w 1i , j+ ∑ x2i . w ii , j +b j
other images which involve similar face features.
Deep learning can be performed by supervisory signals [22].
( i ) 6

n
Ident ( ,t , ∅ id )=−∫∑ log ^pi =−log ^pt  Where x 1 ,w 1, x 2 , w 2 represent the neurons and weights in
i=1
3rd and 4th convolutional layers. Output of ConvNet is n-way
software to predict the distribution of probability over n-
unique identities [24].
∅ id is
Where is the feature vector, t represents target class and

softmax layer parameter. pi is the target probability exp ⁡( yi )
distribution ( pi =0 for all i except pt =1). ^pi =1 is the yi n
7
predicted probability distribution. The verification signal = ∑ exp ⁡( y❑j ¿ ) ¿
regularize feature and reduces intra personal variations given j =1
by Hadsell et al.[23].
DCNN is mostly adopted for classification and also adopted
Verif ( f ¿ ¿i , f j , y ij , ∅ ve )=¿¿ for detection and recognizing the face. Most of them consider
the cascade strategy as well as consider batches with various
¿ 
locations and scales as inputs. At the same time, it was also
noted that to a vast amount, these operations maximize time
and space difficulty. Other than conventional methods, DCNN
does not need to initialize locations’ shape. Therefore we can
neglect getting jammed in local optima for avoiding the poor
1 2 3 initialization of shape.
Veri f ( f ¿ ¿ i , f j , y ij , ∅ ve )= ( y ij − wd +b ) ¿
( )
2
A. CNN Structure

The CNN structure which is adopted in the present study


consists of 12-net CNN, 12-calibration-net, 24-net, 24-
Where ∅ ve ={w ,b }; are denote shifting parameters and calibration-net, 48-net.
learning scaling, 𝛔 represented as sigmoid function and y ij is
denoted as binary target of two compared facial images relate
to same identity. 12-net CNN
It is the first CNN that scans or tests the image quickly in
Further operation of convolution is represented as: the test pipeline. An image having the dimensions of w∗h
having the pixel spacing of 4 with 12x12 detection windows
for such type of image 12-net CNN is suitable to apply. This
y j(r) =max ¿ ¿ 4 would result a map of:

i
Where x is input map and y is output j
map, k ij
is (( W −12
4
+1
H −12
4
+1 )( )) 8

convolution between input and output.


Maxpooling is given by:
Point on the image map defines detection window of 12x12
onto the testing image. The minimum size of the face
acceptable for testing an image is ‘T’. Firstly an image 24-net CNN
pyramid is built through the test image in order to cover the In order to further lower down the number of the detection
face from varied scales. At each level an image pyramid is windows used, a binary classification of CNN called 24-net
created, it is resized by 12/T which would serve as an input CNN is used. The detection window which remained under
image for 12-net CNN. Under this structure 2500 detection the 12-calibration net are taken and then resized to 24*24
windows are created as shown in Figure1. image and then this image is re-evaluated using 24-net. Also
under this CNN, multi-resolution structure is adopted, through
this the overall overhead of the 12-net CNN structure got
12-Calibration-net reduced and hence the structure becomes discriminative. As
For bounding box calibration, 12-calibration-net is used. shown in Figure 2.
Under this the dimension of the detection window is ( x , y , w ,
h ) where ' x ' and ' y ' are the axis, ' x ' and ' h ' are the width
and height respectively. The calibration pattern adjusts itself 24-Calibration-net CNN
according to the size of the window is: It is another calibration CNN similar to that of 12-
calibrationnet. Also under this number of calibration patterns

xn w ynh w h S ( I a , I b )=P ( Ω I ) P ( ΩI|∆ ) / {P ( Ω I ) P ¿¿ 

¿  y  


sn sn  s n  s n  are N. the process of calibration is similar to that of 12-
calibration-net

In the present study number of patterns i.e. N=45. Such that:


Convolution layer Pooling LayerConnected Layer
Input Image Labels

sn  { 0.87 , 0.95 ,1.2 , 1.13 ,1.25 }

x n {−0.19,0 , 0.19 }
Size 24*24
Re-size in 12-net CNN
y n {−0.19,0 , 0.19 }
Figure 2: 24-net CNN

The image is cropped according to the size of detection


window that is 12*12 which would serve as an input image to 48-net CNN
12-calibration-net. Under this CNN average result of the It is the most effective CNN used after 24-calibration-net but
patterns are taken because the patterns obtained as an output is quite slower. It follows the same procedure as in 24-net.
are not orthogonal. A threshold value is taken i.e. t in order to This procedure used in this CNN is very complicated as
remove the patterns which are not the confidence patterns. compared to rest of the CNN sub structures. It also adopts the
multi resolution technique as in case of 24-net.

48-calibration-net CNN
Input Image Convolution Pooling Connected Labels
layer Layer Layer It is the last stage or sub-structure of CNN. The number of
calibration patterns used is same as in case of 12-calibration-
net i.e. N=45. In order to have more accurate calibration,
pooling layer is used under this CNN sub-structure.
Size 12 ×12
B. Proposed Algorithm for DCNN

Figure 1: 12-net CNN


This particular work develops an algorithm for detecting
the face using multi-view with the help of deep convolution Start
neural network. The steps of implementation are described
below:
Initialize training
Step 1: In the implementation, detection of face and retrieval Epoch=1
of image will be attained with the help of direct visual
matching technology which match the face directly. This
technology makes use of similarity metrics of an image which Initialize weights and
Biases
can either be normalized correlation or it can be Euclidean
distance, which corresponds to the approach called template
matching. The similarity between the two images is measured
through similarity measure, denoted by S( I a , I b ),Where, Ia
and Ib are the two images between which the similarity is Present input image and calculate
being measured. output values

Step 2: The next step is measuring probabilistic similarity or


∆(the measure of intensity difference between the two images) Calculate RSME
given by Probabilistic similarity or ∆=¿ ¿) . This calculation
of resemblance among the images of face will be conducted Epoch=Epoch +1
on the basis on the Bayesian analysis for achieving various Yes
detection of face. RMSE ≤ RMSEmin

Step 3: The probabilistic calculation of resemblance also


supports multiple face detection. In order to characterize the No
various types of image variations were used for statistical Yes
analysis. Under this the similarity measure S (I a, Ib) between Epoch ≥ Epoch
Stop
max
the pair of images Ia and Ib is given in terms of posteriori Training
probability (interpersonal variation) is provided by:
No
if the multi-view face detection is done for a single person
then P ( Ω I|∆ ) > P ¿ ¿or it can be said that S ( I a , I b ) >½ . Update weights and Bias

Step 4: Further a neural network will be developed and trained


in order to enhance the outcome from Bayesian analysis. Figure 3: Flow chart of training Process

Step 5: Next to that, training and verification will be adopted


to test other images which involve similar face features. The plot of RMSE in training and plot of MCR in training is
Implementation of the code is done step by step as follows: shown in the above figure. CNN training progress is shown in
the Figure 3. The screenshot explains about the DCNN
training is also shown in Figure 4.
1) First, the DCNN object is created.
2) Second after this Graphical user interface is The below equation is the CNN which is trained to minimize
initialized. the risk of soft max loss function.
3) Then MCR (Misclassification rate) calculation is
initialized and plot of MCR id created defining the
current epoch, iteration, RMSE (Root Mean Square R= ∑ log ¿ prob ( x i| y i) ∨¿ ¿ 
x i ∈β
Error), MCR value of the image data.
4) Training data is being loaded.
Here ‘β’ represents the batch used in iteration of stochastic
5) Training data is preprocessed, errors are deleted and
gradient decent and label is ' x i ' and ' y i ' . Hessian
then image data is simulated.
calculation progress is started. Current epoch used for this is
6) After the simulation the multi-faces are detected in 3.00. Iteration value used for this research is 759.00. RMSE
the image shown in the red rectangular boxes. value used for this research is 0.18. MCR value used for this
research is 0.90. Here ‘theta’ used is 8.000e -05. Plot of RMSE
in training is showed in zig zag lines. Plot of MCR in training
Detection Rate with multi resolution
is showed in curved lines.
1.2
1
0.8
0.6

ct

at
D

R
et

o
n
e

e
i
0.4
0.2
0
0 2000 4000 6000 8000 10000 12000

Number of False Detections

Figure 5: Detection rate with multi resolution in 24-net CNN

Figure 4: DCNN training Process However without the use of multi-resolution in CNN, more
number of faces are detected falsely as compared to that of
multi-resolution shown in Figure 6. This research develops an
algorithm for detecting the face using multi-view with the help
of deep convolution neural network. The main concept of the
IV. RESULT AND DISCUSSION
algorithm are influence the high ability of DCN to classify
and extract the feature for learning the single classifier for
To detect face across various views is very challenging detecting faces from different views and reduce the
when seeing from fixed view as the face appearance is unique computational difficulty to simplify the detector architecture.
from various views. Method for detecting the multi-view face
is to develop a single detector that focus on all face views.
Multi-view detection of face can be detecting by building few Detection Rate without multi resolution
detectors, all consequent to a particular view. Further, it was 1.2
stated that in execution-time, if one or more detectors provide 1
positive result for specific sample, then face will be 0.8
recognized. Multi-view face detection is a challenging issue 0.6
De-
tec-

Rat
tio

due to wide changes in appearance under different pose


e
n

0.4
expression and illumination conditions. The modern face
0.2
detection solutions performance on multi view face set of data
is unsatisfactory. 0
0 2000 4000 6000 8000 10000 12000 14000

The result of the study is determined on e basis of following Number of false Detections
parameters both in the presence and absence of multi
resolution in proposed CNN structure.
Figure 6: Detection rate without multi resolution in 24-net CNN
Detection Rate: It is defined as the rate at which the face of a
person in an image is detected.
FDDB( Face Detection data Set and benchmark) dataset
[26] contains annotated faces. This is a large scale face
Number of False Detection: The number of the face which are
detection benchmark. It uses ellipse faces annotation and also
not detected at all, or detected falsely.
defines two types of evaluations. One is discontinuous score
evaluation and other is continuous score evaluation. To
Under this it was observed that in the presence of multi-
augment the data, we randomly flip the illustration of training.
resolution in CNN which is shown in Figure 5, number of
In the fine tuned deep network, it is probable to take either
false detection comes to halt (at the 10000 number of falsely
approaches of sliding or region based window for obtaining
detected faces) and the face is detected or the detection rate is
the final detector for face. For this particular work, we have
achieved.
chosen a sliding approach of window since it has less
difficulty and it is not dependent of additional modules like
choosy search.
(a),(b),(c) and (d) are right profile faces. (e) frontal face. (f) left up profile Figure 7. In this figure, detected face for a b c d
face and (g) ,(h) right profile faces.
the various angle and poses for left and
right profile faces including frontal face e f g h
are shown. Our detector gives results for
images with varying poses with resolution.

This will be probable to efficiently execute the CNN on any


size images and uses a heat map in classifying the face. Every
point in the heat map indicates the response of CNN,
possibility of involving a face, for its consequent 227*227
region in real image. To recognize the face of various sizes,
investigator scaled the images up and down, and acquired
new heat maps. Here, we have attempted various schemes for
scaling and identified that image for rescaling three times per
octave provides good result. In addition to these, to detect the
face are enhanced by adopting bounding-box module for
regression.
Figure 7: Pose Invariant Face detected Images
The overall test sequence is shown in Figure 8. The detail
In this work, classifier of face, when compared to Alex Net explanation for all the CNNs has been already given in
involve 8 layers in which first five layers are convolutional previous sections. First of all test image is applied to the
and then final three layers are completely connected. For this system, a 12 net structure will scan the whole image and
work, we first transformed the completely linked layers into quickly rejects about 90% of detection windows. Remaining
the convolutional one to reshape the parameters of the layer. detected window will be processed by 24 calibrated CNNs. In
In our cascaded CNNs, we have used AlexNet [25] to apply next subsequent stages, the highly overlapped window will be
ReLU nonlinearity function after pooling layer and a fully removed. Then a 48 net will take detected windows and
connected layer. evaluate the window with calibrated boundary box and
produces as output as detected boundary box. Figure 9 shows
Examples of the input images for two different identities all detection stages with different stages.
with generated multi view output results are illustrated in

(a) (b) (c)

Figure 9: Detection results: (a) Original Image given for detection , (b) Image at preprocessing stage (c) Detected face position with CNN.

V. CONCLUSION examples of training from AFLW dataset that involve 21K


images with 24K annotations of the face. We randomly
sampled images sub-windows and adopted them as positive
In this work, we have presented deep CNN cascade structure
illustration if it was higher than a 50 per cent intersection over
which produces fast detection by rejecting non face regions in
union. It was noted that proposed method performs well in
varying resolutions for accurate detection. By fine-turning
terms of accuracy and the detection rate. Developed method
AlexNet to detect the face. For this research we extracted
easily identifies the face and produces the better result in the
fastest time. Effectiveness of the developed method is 13. R. Vaillant, C. Monrocq, and Y. Le Cun. : Original approach for
compared and the localisation of objects in images. IEE Proceedings Vision,
Contrasted with existing methods and techniques. It was Image and Signal Processing, 1994.
14. Rowley, Henry A., Shumeet Baluja, and Takeo Kanade. :Neural
noted that proposed method performs well in terms of
network-based face detection.  IEEE Transactions on pattern
accuracy and the detection rate. Exploiting the power of CNN, analysis and machine intelligence 20, no. 1 (1998): 23-38.
given method work well in images with large variations. In 15. Rowley, Henry A., Shumeet Baluja, and Takeo Kanade. :
future this work can be extended to better strategies for Rotation invariant neural network-based face detection.
sampling and more techniques can be adopted to enhance the In Computer Vision and Pattern Recognition, 1998.
detection through augmentation of data to further enhance the Proceedings. 1998 IEEE Computer Society Conference on, pp.
effectiveness of the developed method to detect the round, 38-44. IEEE, 1998.
occluded and rotated faces. 16. Sharma, Kartikeya, Shivkaran Ravidas, and M. A. Ansari. : A
Novel Technique for Face Alignment Using Deep Convolutional
Neural Networks.  Indian Journal of Industrial and Applied
Mathematics 8.1 (2017): 107-117.
17. Yang, Bin, Junjie Yan, Zhen Lei, and Stan Z. Li. : Aggregate
REFERENCES channel features for multi-view face detection. In Biometrics
(IJCB), 2014 IEEE International Joint Conference on, pp. 1-8.
1. Liu, Chengjun, and Harry Wechsler.: Gabor feature based IEEE, 2014.
classification using the enhanced fisher linear discriminant 18. Jyoti S. Bedre ,Shubhangi Sapkal, : Comparative Study of Face
model for face recognition. IEEE Transactions on Image Recognition Techniques: A Review. Emerging Trends in
processing 11.4 (2002): 467-476. Computer Science and InformationTechnology–
2. Ahonen, Timo, Abdenour Hadid, and Matti Pietikainen. : Face 2012(ETCSIT2012) Proceedings published in International
description with local binary patterns: Application to face Journal of Computer Applications® (IJCA) 12.
recognition.  IEEE transactions on pattern analysis and machine 19. Farfade, Sachin Sudhakar, Mohammad J. Saberian, and Li-Jia
intelligence 28.12 (2006): 2037-2041. Li. : Multi-view face detection using deep convolutional neural
3. Lowe, David G.: Distinctive image features from scale-invariant networks.  Proceedings of the 5th ACM on International
keypoints."International journal of computer vision 60.2 (2004): Conference on Multimedia Retrieval. ACM, 2015.
91-110. 20. Parkhi, Omkar M., Andrea Vedaldi, and Andrew Zisserman.:
4. Simonyan, K., Parkhi, O. M., Vedaldi, A., & Zisserman, A. Deep face recognition. British Machine Vision Conference. Vol.
Fisher Vector Faces in the Wild. In BMVC (Vol. 2, No. 3, p. 4). 1. No. 3. 2015.
(2013, September): 21. Li, Haoxiang, Zhe Lin, Xiaohui Shen, Jonathan Brandt, and
5. Simonyan, Karen, Andrea Vedaldi, and Andrew Zisserman.: Gang Hua. :A convolutional neural network cascade for face
Learning local feature descriptors using convex optimisation. detection. In Proceedings of the IEEE Conference on Computer
IEEE Transactions on Pattern Analysis and Machine Vision and Pattern Recognition, pp. 5325-5334. 2015.
Intelligence 36.8 (2014): 1573-1585. 22. Sun, Yi, Yuheng Chen, Xiaogang Wang, and Xiaoou
6. Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Tang. :Deep learning face representation by joint identification-
Ramanan, D., ... & Zitnick, C. L. Microsoft coco: Common verification. In Advances in neural information processing
objects in context. In European conference on computer systems, pp. 1988-1996. 2014.
vision (pp. 740-755). Springer, Cham.(2014) 23. Hadsell, Raia, Sumit Chopra, and Yann LeCun. "Dimensionality
7. Li, Haoxiang, Zhe Lin, Jonathan Brandt, Xiaohui Shen, and reduction by learning an invariant mapping." In Computer vision
Gang Hua.: Efficient boosted exemplar-based face detection. and pattern recognition, 2006 IEEE computer society
In Proceedings of the IEEE Conference on Computer Vision and conference on, vol. 2, pp. 1735-1742. IEEE, 2006.
Pattern Recognition, pp. 1843-1850. (2014) 24. Sun, Yi, Xiaogang Wang, and Xiaoou Tang. "Deep learning face
8. Matsumoto, Yoshio, and Alexander Zelinsky. : An algorithm for representation from predicting 10,000 classes." In Proceedings
real-time stereo vision implementation of head pose and gaze of the IEEE Conference on Computer Vision and Pattern
direction measurement. Automatic Face and Gesture Recognition, pp. 1891-1898. 2014.
Recognition, 2000. Proceedings. Fourth IEEE International 25. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton.
Conference on. IEEE, 2000. Imagenet classification with deep convolutional neural
9. Ansari, M. A., and Aishwarya Agnihotri. : An Efficient Face networks. In Advances in neural information processing
Recognition System Based on PCA and Extended systems, pp. 1097-1105. 2012.
Biogeography-Based Optimization Technique.  Indian Journal
of Industrial and Applied Mathematics 7.2 (2016): 285-305
10. Yang, Ming-Hsuan, David J. Kriegman, and Narendra Ahuja.:
Detecting faces in images: A survey." Pattern Analysis and
Machine Intelligence, IEEE Transactions on 24.1 (2002): 34-58.
11. Hjelmås, Erik, and Boon Kee Low. : Face detection: A survey.
Computer vision and image understanding 83.3 (2001): 236-
274.
12. Sheikh Amanur Rahman M.A. Ansari and Santosh Kumar
Upadhyay, :An Efficient Architecture for Face Detection in
Complex Images. International Journal of Advanced Research in
Computer Science and Software Engineering, Vol. 2,Issue 12,
2012.

You might also like