IEEE Conference 29 July
IEEE Conference 29 July
Abstract—The aim of this paper is to detect multi-view faces (scale invariant feature transform) by D. Lowe [3]. The best
using deep convolutional neural network (DCNN) and calibrated practice of detecting the face obtains the above features on the
CNN (Convolutional Neural Network) strctures. Multi-view face images of face landmarks with various scales and concatenates
detection system produces rotated windows of image and their
them into feature vectors at high dimension as explained by
integral windows of image for every classifier which perform
operations of parallel classification to predict non frontal and
Simonyan et al. [4,5]. CNN (convolutional neural networks)
non upright faces in images. Implementation, detection and was involved in the community of computer vision by the
retrieval of faces will be obtained with the help of direct visual storm, effectively enhancing the art state in most of the
matching technology. Further, the probabilistic measure of the applications. Main significant elements for the success of
similarity of the face images will be done using Bayesian analysis. CNN methods are the accessibility of high amount of training
Experiment detects faces with ± 90degree out of plane rotations. data as illustrated by Simonyan et al [4] and Lin et al. [6].
Fine tuned AlexNet is used to detect faces. For this work, we CNNs are neural networks which are hierarchical those layers
extracted examples of training from AFLW (Annotated Facial in convolution exchange with subsampling layers, suggestive
Landmarks in the Wild) dataset that involve 21K images with of complex and simple cells in the fundamental visual cortex.
24K annotations of the face. Even though neural networks are adapted to tasks of computer
vision for obtaining good performance for generalization, it is
Keywords: Face detection, multi view fac detection, deep
learning, convolutional neural network (CNN), Computer vision. good to add before knowledge into an architecture of the
network. CNN aims to adopt spatial information between
images pixel and thus they are on the basis of discrete
convolution. According to Li et al. [7] detection of a face is
I. INTRODUCTION
formulated as an issue for classification for separating patterns
Multi-view detection for the face is very challenging when it for the face from non-face patterns. From the perspective of
is viewed from the fixed view; therefore, it is significant to statistics, there are three drawbacks for such issue patterns
adopt multi-view faces. Multi-view face detection system is dimensionality is high usually; probable quantity of non-face
used to predict the upright faces in images with 90 degrees patterns is huge and their distribution is not regular. Hence, it
ROP (rotation out of plane) pose changes. The meaning of is complex to model the distribution of probability in patterns
rotation invariant is to predict faces with 360 degrees RIP of the face, particularly the patterns in the multi-view face
(rotation in plane) pose variations. The multi-view face with a unimodal function for density. Issues concern with
detection system produces rotated windows of an image and rotation in profound and thus able to identify faces across
their integral windows of an image for every classifier which various views are not simple. Most of the investigators and
perform operations of parallel classification to predict non- researchers addressed such issue by constructing multiple
frontal and non-upright faces in images. Multi-view detection views on the basis on detecting the face (multi-view face
of a face can be detected by building few detectors, all detection) that is to categorize the sphere of view into certain
consequent to a particular view. Detection of the face was one small segments and to build one detector on all segments
of the main technologies for enabling natural interaction [1,8].
between human and computer. The performance of systems
for recognizing the face relies extremely on representing the
face that is physically coupled with most of the variations in II. MULTI-VIEW FACE DETECTION
the face type like expression, view, and illumination. As
images of a face are mostly noticed in unique views, the main We can define Face detection as the process of extracting faces
threat is to unpick the identity of face and representations of from the given images. Hence, the system should positively
the view. Large efforts are contributed for extracting features identify a certain region as a face. According to Yang et al.
for identity by hand like Gabor proposed by Liu et al. [1], [10] and Erik Hjelmas et al. [11], face detection is a process
LBP( Local Binary pattern) by Ahonen et al. [2], and SIFT of finding regions of the input image where the faces are
present.
A lot of work has been done in detecting faces in still and According to Parkhi et al. [20], the recognition of face
frontal faces in plane as well as complex background [12] from either a set of faces or single photograph tracked in a
With the advancement in the field of information technology video. Two major contributions were made in this particular
and computational power, computers are more interactive with research. First and foremost we have developed a procedure
humans. This human computer interface (HCI) is done mostly which can assemble a wide range of dataset, with the small
via traditional devices like mouse, keyboard, and display. One noise of label while reducing the quantity of manual
of the most important medium is the face and facial annotation included. One of the main concept was to adopt
expression. Face detection is the first step in any face weaker classifiers for ranking the data given to the annotators.
recognition system. Detecting face is well studied problem in At the same time, it was noted that such procedure was
the vision of computer. Contemporary detectors of the face designed for faces however appropriate for other classes of
can effortlessly identify near front faces. Complexities in objects and fine-grained responsibilities. The second
detecting the face come from two aspects such as large space contribution was to demonstrate that deep CNN, with
for searching of probable face sizes, positions and large visual appropriate training and without any additions can produce
differences of human faces in a chaotic environment. Former outcomes when compared with state of the art. Thus it can be
one imposes a requirement for the efficiency of time while concluded that deep CNN can outperform well without any
latter one needs a detector for face to perfectly addressing a additions and appropriate training than other counterparts as
binary issue in classification. well as reduce the quantity of manual annotation.
A. Deep Convolutional Neural Network (DCNN) Li et al. [21] analyzed about CNN cascade for detecting
the face. Developed detector estimates the image as input at
Convolutional neural network (CNN) are very popular in the low resolution to refuse non-face regions and cautiously
field of computer vision. One of the reason is availably of process the difficult region at higher resolution for exact
large amount of training data. Vaillant et al. [13] in 1994 have identification or detection. Nets for calibration are brought in
applied neural networks for detecting faces in uncluttered the cascade for accelerating identification and enhance the
images. They designed a convolutional neural network that quality of bounding box. Sharing the benefits of CNN,
can be trained to detect the presence or absence of a face in a developed detector for the face is robust to large variations in
the visual image. Apart from these, it was noted that on the
given image. This will scan the whole image at all possible
public FDDB (face detection data set and benchmark)
locations. Rowley et al. [14] developed a neural network for
developed detector performs well as compare to the state of
upright frontal face detection. later in 1998 [15] the method the art methods. It was also pointed out that developed
was extended for pose invariant face detection. Neural detector is fast, achieve 14 frames per second for typical video
networks are adopted in most of the applications such as graphics array images on the central processing unit and can
issues in recognition of pattern, recognition of character, be accelerated to 100 frames per second on the graphical
recognition of the object and autonomous robot driving. The processing unit. Thus it was clear from the findings of the
major purpose of this network in the recognition of face is the research are sharing the benefits of CNN, developed detector
training feasibility of the system for capturing the difficult for the face is robust to large variations in the visual image.
class of patterns in the face. Deep convolution CNN are not
only used for face detection but also for face alignment [16]. According to the study by Zhu et al. [21] analyzed multi-
For obtaining the best performance of such method, it has to view perception (MVP) through the deep model for learning
highly tune number of nodes, layers, rates for learning and so the identity of face and view representations. This work
on [17]. The drawback in the approach of a neural network is developed a generative deep network known as MVP to mimic
that when the quantity of classes maximizes. In template the capable of perception at multi-view in the primate brain.
matching, other templates for the face are exploited from MVP can disentangle the representations of view and
various prospects for characterizing single face. Such identities are obtained as input for the image and also create a
algorithms are not cost effective and cannot be easily carried full views spectrum of the image as input. From the findings
out as stated in [18]. of the experiment, it was demonstrated that detection features
Farfade et al. [19] conducted a research to examine multi- of MVP achieve better outcome and performance on
view detection of the face using deep CNN. Developed recognition of face than counterparts like state of the art
framework does not need landmark or pose annotation and can methods. In addition to these, it was demonstrated that
identify faces in a large choice of orientations with the help of modeling the factor for view representation as a continuous
a single model. DDFD (Deep dense face detector) is not variable allows MVP for predicting and interpolating images
dependent on common modules in deep objects for learning beneath the viewpoints that are unobserved in data for
the methods for detection like bounding-box regression and training, which imitate the reasoning human capacity. Thus it
segmentation of an image. We compared the developed can be inferred from the analysis that detection features of
method with R-CNN and few other methods for detecting the MVP achieve better outcome and performance on recognition
face which is designed particularly for multi-view detection of of face than counterparts like state of the art methods.
the face, for example, DPM-based and cascade-based.
III. IMPLEMENTATION METHOD
5
In the implementation, detection of face and retrieval of the y ij ,k = max {x ij . s+ m ,k . s+n }
image will be attained with the help of direct visual matching 0 <m , n<s
technology. A probabilistic computation of resemblance
among the images of the face will be conducted on the basis of Where output map pools over s × s non-overlapping
the Bayesian analysis for achieving various detection of the
face. After this, a neural network will be developed and
region.
trained in order to enhance the outcome of Bayesian analysis.
Next, to that, training and verification will be adopted to test
y j=max 0 , ∑ x 1i . w 1i , j+ ∑ x2i . w ii , j +b j
other images which involve similar face features.
Deep learning can be performed by supervisory signals [22].
( i ) 6
n
Ident ( ,t , ∅ id )=−∫∑ log ^pi =−log ^pt Where x 1 ,w 1, x 2 , w 2 represent the neurons and weights in
i=1
3rd and 4th convolutional layers. Output of ConvNet is n-way
software to predict the distribution of probability over n-
unique identities [24].
∅ id is
Where is the feature vector, t represents target class and
❑
softmax layer parameter. pi is the target probability exp ( yi )
distribution ( pi =0 for all i except pt =1). ^pi =1 is the yi n
7
predicted probability distribution. The verification signal = ∑ exp ( y❑j ¿ ) ¿
regularize feature and reduces intra personal variations given j =1
by Hadsell et al.[23].
DCNN is mostly adopted for classification and also adopted
Verif ( f ¿ ¿i , f j , y ij , ∅ ve )=¿¿ for detection and recognizing the face. Most of them consider
the cascade strategy as well as consider batches with various
¿
locations and scales as inputs. At the same time, it was also
noted that to a vast amount, these operations maximize time
and space difficulty. Other than conventional methods, DCNN
does not need to initialize locations’ shape. Therefore we can
neglect getting jammed in local optima for avoiding the poor
1 2 3 initialization of shape.
Veri f ( f ¿ ¿ i , f j , y ij , ∅ ve )= ( y ij − wd +b ) ¿
( )
2
A. CNN Structure
i
Where x is input map and y is output j
map, k ij
is (( W −12
4
+1
H −12
4
+1 )( )) 8
x n {−0.19,0 , 0.19 }
Size 24*24
Re-size in 12-net CNN
y n {−0.19,0 , 0.19 }
Figure 2: 24-net CNN
48-calibration-net CNN
Input Image Convolution Pooling Connected Labels
layer Layer Layer It is the last stage or sub-structure of CNN. The number of
calibration patterns used is same as in case of 12-calibration-
net i.e. N=45. In order to have more accurate calibration,
pooling layer is used under this CNN sub-structure.
Size 12 ×12
B. Proposed Algorithm for DCNN
ct
at
D
R
et
o
n
e
e
i
0.4
0.2
0
0 2000 4000 6000 8000 10000 12000
Figure 4: DCNN training Process However without the use of multi-resolution in CNN, more
number of faces are detected falsely as compared to that of
multi-resolution shown in Figure 6. This research develops an
algorithm for detecting the face using multi-view with the help
of deep convolution neural network. The main concept of the
IV. RESULT AND DISCUSSION
algorithm are influence the high ability of DCN to classify
and extract the feature for learning the single classifier for
To detect face across various views is very challenging detecting faces from different views and reduce the
when seeing from fixed view as the face appearance is unique computational difficulty to simplify the detector architecture.
from various views. Method for detecting the multi-view face
is to develop a single detector that focus on all face views.
Multi-view detection of face can be detecting by building few Detection Rate without multi resolution
detectors, all consequent to a particular view. Further, it was 1.2
stated that in execution-time, if one or more detectors provide 1
positive result for specific sample, then face will be 0.8
recognized. Multi-view face detection is a challenging issue 0.6
De-
tec-
Rat
tio
0.4
expression and illumination conditions. The modern face
0.2
detection solutions performance on multi view face set of data
is unsatisfactory. 0
0 2000 4000 6000 8000 10000 12000 14000
The result of the study is determined on e basis of following Number of false Detections
parameters both in the presence and absence of multi
resolution in proposed CNN structure.
Figure 6: Detection rate without multi resolution in 24-net CNN
Detection Rate: It is defined as the rate at which the face of a
person in an image is detected.
FDDB( Face Detection data Set and benchmark) dataset
[26] contains annotated faces. This is a large scale face
Number of False Detection: The number of the face which are
detection benchmark. It uses ellipse faces annotation and also
not detected at all, or detected falsely.
defines two types of evaluations. One is discontinuous score
evaluation and other is continuous score evaluation. To
Under this it was observed that in the presence of multi-
augment the data, we randomly flip the illustration of training.
resolution in CNN which is shown in Figure 5, number of
In the fine tuned deep network, it is probable to take either
false detection comes to halt (at the 10000 number of falsely
approaches of sliding or region based window for obtaining
detected faces) and the face is detected or the detection rate is
the final detector for face. For this particular work, we have
achieved.
chosen a sliding approach of window since it has less
difficulty and it is not dependent of additional modules like
choosy search.
(a),(b),(c) and (d) are right profile faces. (e) frontal face. (f) left up profile Figure 7. In this figure, detected face for a b c d
face and (g) ,(h) right profile faces.
the various angle and poses for left and
right profile faces including frontal face e f g h
are shown. Our detector gives results for
images with varying poses with resolution.
Figure 9: Detection results: (a) Original Image given for detection , (b) Image at preprocessing stage (c) Detected face position with CNN.