08635054
08635054
Pournami S. Chandran1, Byju N B2, Deepak R U3, Nishakumari K N4, Devanand P5, Sasi P M6
The “Tuanyuan”, or “reunion” in Chinese, app developed for any matching with the database at any time using the
by Alibaba Group Holding Ltd. helped Chinese authorities proposed system.
recover hundreds of missing children [6]. The app has
In the following sections the paper details the work flow
allowed police officers to share information and work
for child matching methodology. The flow chart of the
together with public.
automatic child face identification methodology is as shown
III. WORK FLOW OF FACE RECOGNITION in Fig 2.
Here we propose a methodology for missing child
identification which combines facial feature extraction based
Input Child Images
on deep learning and matching based on support vector
machine. The proposed system utilizes face recognition for
missing child identification. This is to help authorities and
parents in missing child investigation. The architecture of the Face Images Preprocessing
proposed frame work is given below,
Rescaling into
Portal Login
224x224
Mobile
Alert Message App/Portal
Upload Photo VGG-Face Face Descriptors
(Matching (Public)
Found)
Feature
Extraction
Statistical Upload photo of
Reports suspicious
Find Matching children with
(Automatic) details
• Place
• Date and Time
• Landmarks
Get Child Multi class SVM
• Remarks
Details
114
2018 IEEE Recent Advances in Intelligent Computational Systems (RAICS) | December 06 - 08, 2018 | Trivandrum
basically defines a set of filter weights which are updated mean face image, computed from all the training set images,
during network training. subtracted.
ReLU followed by each convolutional layer introduces VI. EXTRACTION OF FACIAL FEATURES
nonlinearity in the system. This layer applies the function
f(x) = max(0,x) to the input data of the layer. VGG-Face is trained to recognize the 2622 identities and
other classes can’t be identified using this. But the activation
The pooling layers merge similar features into one by vectors extracted from VGG-Face architecture can be used as
down sampling with suitable size. The basic idea behind the feature representations to classify each child category.
pooling layer is that the relative position to other feature is The last classification layer is removed and extracts the 4K
more important than the exact location of a specific feature. dimensional features from the first fully connected layers.
It reduces the dimensions of feature maps and network The resulting feature vector is normalized by dividing each
parameters. component by the L2 norm of this 4096 dimensional vector.
Thus the pre-trained CNN VGG-Face is made to perform as
The final layer called fully connected layer outputs the
an automatic facial feature extractor for training the
number of classes. There are several fully-connected layers
classifier.
converting the 2D feature maps into a 1D feature vector, for
further feature representation. VII. MULTI CLASS SVM CLASSIFIER
A. VGG-Face CNN descriptor Each face image corresponds to a child and child face
A very deep CNN called VGG-Face network [8] is used recognition is considered as an image category classification
for face recognition and its architecture is given in full detail problem. The task is to classify input image uploaded by the
in Fig 3. The CNN architecture comprises 11 blocks, each public into one of the given category based on the image
containing a linear operator followed by one or more non- representation. Basically CNN architecture consists of
linearities such as ReLU and max pooling. The first eight computational layers for feature extraction and a classifier
such blocks are said to be convolutional as the linear layer at the final stage. The VGG-face CNN model
operator is a bank of linear filters (linear convolution). It uses employs the softmax activation function for labeled class
filters of size 3x3 with stride and pad of 1, throughout the prediction, suggesting the class each image belongs to. The
network. All the convolution layers are followed by a softmax in the CNN layers is replaced with a multi class
rectification layer (ReLU). Max pooling layers used only SVM trained with feature vector array from each image.
2x2 size with stride 2. The last three blocks are fully One-versus-rest linear SVM classifier is used and is trained
connected layers, they are the same as a convolutional layer, on the dataset. Extracted feature vector array is used to train
but the size of the filters matches the size of input data, such this classifier.
that each filter provides representative data from the entire
image. Output of the first two FC layers are 4096 VIII. RESULTS AND DISCUSSIONS
dimensional and the last FC layer has 2622 dimensions The face identification algorithm is implemented using
followed by L-dimensional metric embedding. Optimization MATLAB 2018a platform. The experiments are carried on
is done by stochastic gradient descent using mini-batches of Microsoft Windows 7, 64 bit Operating System with Intel
64 samples and momentum coefficient of 0.9 core i7, 3.60GHz processors having 32GB RAM. For
dealing with CNN architectures additional processing
capability is needed. Use of GPU is recommended for
training the models and Nvidia GeForce TitanX 12GB
graphics card is used.
The user defined database includes 846 child face images
with 43 unique children cases. Training and test set is
prepared by splitting the database images. 80% of images
from each child category are selected for training and 20%
for testing, resulting in 677 training set images and 169 test
set images. The training set and validation set consists of
images of each child in the earlier days and testing is done
Fig. 3. VGG-Face network architecture with images of children after an age gap to evaluate the
system in all conditions.
V. PREPROCESSING CNN implementation is based on MatConvNet package
Preprocessing input raw image in the context of face [9] with deep integration of CNN building blocks in
recognition involves acquiring the face region and MATLAB environment. Pre-trained VGG-Face CNN is also
standardizing images in a format compatible with the CNN provided by MatConvNet. For the experiments here
architecture employed. Each CNN has a different input size MatConvNet 1.0-beta25 version is downloaded and used.
requirement. The photographs of missing child acquired by a
digital camera or mobile phone are taken and categorized The training set images are preprocessed to the size
into separate cases for creating the database of face specified by the CNN architecture before passing to the CNN
recognition system. The face region in each image is model. The face region is cropped within a rectangular
identified and cropped for getting the input face images. The region from every image of the acquired input database. The
cropped face images are resized to 224x224 because VGG- images fed to VGG-Face are of fixed size by rescaling to
face network can process only RGB images in this particular 224x224. The activations to the input image produced by the
size. The input to the deep network is fixed sized image with first fully connected layer of the VGG-Face network
architecture is taken as the CNN Feature descriptor. The
115
2018 IEEE Recent Advances in Intelligent Computational Systems (RAICS) | December 06 - 08, 2018 | Trivandrum
normalized feature vector, each having a length of 4096, is softmax of the VGG-Face model and extracting CNN image
used for training the SVM classifier for classifying the image features to train a multi class SVM, it was possible to
of face and recognizes the child. achieve superior performance. Performance of the proposed
system is tested using the photographs of children with
different lighting conditions, noises and also images at
different ages of children. The classification achieved a
higher accuracy of 99.41% which shows that the proposed
methodology of face recognition could be used for reliable
missing children identification.
REFERENCES
[1] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning", Nature,
521(7553):436–444, 2015.
[2] O. Deniz, G. Bueno, J. Salido, and F. D. la Torre, "Face recognition
using histograms of oriented gradients", Pattern Recognition Letters,
32(12):1598–1603, 2011.
[3] C. Geng and X. Jiang, "Face recognition using sift features", IEEE
International Conference on Image Processing(ICIP), 2009.
[4] Rohit Satle, Vishnuprasad Poojary, John Abraham, Shilpa Wakode,
Fig. 4. GUI for child identification showing an input image and matched "Missing child identification using face recognition system",
output image in the database International Journal of Advanced Engineering and Innovative
Technology (IJAEIT), Volume 3 Issue 1 July - August 2016.
To assess the flexibility of face recognition deep [5] https://en.wikipedia.org/wiki/FindFace
architecture against variations in image quality, artificially [6] https://www.reuters.com/article/us-china-trafficking-apps/mobile-
degraded images are created. Images obtained by changing app-helps-china-recover-hundreds-of-missing-children-
noise level, brightness, contrast, lighting conditions, idUSKBN15J0GU
obstructions, blur, aspect ratio and face positions are used for [7] Simonyan, Karen and Andrew Zisserman, "Very deep convolutional
testing the child identification system. networks for large-scale image recognition", International Conference
on Learning Representations ( ICLR), April 2015.
Face identification accuracy is computed as the ratio of [8] O. M. Parkhi, A. Vedaldi, and A. Zisserman, "Deep Face
correctly identified face images to the total number of child Recognition," in British Machine Vision Conference, vol. 1, no. 3, pp.
face images in the test set. 1-12, 2015.
[9] A. Vedaldi, and K. Lenc, "MatConvNet: Convolutional Neural
Networks for MATLAB", ACM International Conference on
(1) Multimedia, Brisbane, October 2015.
IX. CONCLUSION
A missing child identification system is proposed, which
combines the powerful CNN based deep learning approach
for feature extraction and support vector machine classifier
for classification of different child categories. This system is
evaluated with the deep learning model which is trained with
feature representations of children faces. By discarding the
116