Age and Gender Detection Using CNN
Age and Gender Detection Using CNN
ABSTRACT
In recent years, much effort has been put forth to balance age and sexuality. It
has been reported that the age can be accurately measured under controlled
Article Info areas such as front faces, no speech, and stationary lighting conditions.
Volume 8, Issue 3 However, it is not intended to achieve the same level of accuracy in the real
Page Number : 29-33 world environment due to the wide variation in camera use, positioning, and
lighting conditions. In this paper, we use a recently proposed mechanism to
Publication Issue study equipment called covariate shift adaptation to reduce the change in
May-June-2021 lighting conditions between the laboratory and the working environment. By
examining actual age estimates, we demonstrate the usefulness of our proposed
Article History approach.
Accepted : 01 May 2021 Keywords : Face Detection, Skin Colour Segmentation, Face Features
Published : 05 May 2021 extraction, Feature's recognition, Fuzzy rules.
Copyright: © the author(s), publisher and licensee Technoscience Academy. This is an open-access article distributed under the 29
terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use,
distribution, and reproduction in any medium, provided the original work is properly cited
Prof. Jaydeep Patil et al Int J Sci Res Sci & Technol. May-June-2021, 8 (3) : 29-33
International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 8 | Issue 3 30
Prof. Jaydeep Patil et al Int J Sci Res Sci & Technol. May-June-2021, 8 (3) : 29-33
II. METHODS AND MATERIAL All three-color channels are processed directly by the
network. Images are first rescaled to 256 x 256 and a
1. A CNN for age and gender estimation: crop of 227 x 227 is fed to the network. The three
Gathering a large, labelled image training set for age subsequent convolutional layers are then defined as
and gender estimation from social image repositories follows.
requires either access to personal information on the
subjects appearing in the images (their birth date and 1. 96 filters of size 3x7x7 pixels are applied to the
gender), which is often private, or is tedious and input in the first convolutional layer, followed by
time-consuming to manually label. Data-sets for age a rectified linear operator (ReLU), a max-pooling
and gender estimation from real-world social images layer taking the maximal value of 3 x 3 regions
are therefore relatively limited in size and presently with two-pixel strides, and a local response
no match in size with the much larger image normalization layer.
classification data-sets (e.g., the ImageNet dataset) . 2. The 96 x 28 x 28 output of the previous layer is
Overfitting is a common problem when machine then processed by the second convolutional
learning-based methods are used on such small image layer, containing 256 filters of size 96 x 5 x 5
collections. This problem is exacerbated when pixels. Again, this is followed by ReLU, a max-
considering deep convolutional neural networks due pooling layer, and a local response normalization
to their huge numbers of model parameters. Care layer with the same hyperparameters as before.
must therefore be taken in order to avoid overfitting 3. Finally, the third and last convolutional layer
under such circumstances. operates on the 256 x 14 x 14 blobs by applying a
set of 384 filters of size 256 x 3 x 3 pixels,
Network architecture followed by ReLU and a max-pooling layer. The
Our proposed network architecture is used following fully connected layers are then defined
throughout our experiments for both age and gender by:
classification. It is illustrated in Figure 2. A more 4. A first fully connected layer that receives the
detailed, schematic diagram of the entire network output of the third convolutional layer and
design is additionally provided in Figure 1. The contains 512 neurons, followed by a ReLU and a
network comprises only three convolutional layers dropout layer.
and two fully-connected layers with a small number 5. A second fully connected layer that receives the
of neurons. This, by comparison to the much larger 512- dimensional output of the first fully
architectures, applied. Our choice of a smaller connected layer and again contains 512 neurons,
network design is motivated both by our desire to followed by a ReLU and a dropout layer.
reduce the risk of overfitting as well as the nature of 6. A third, fully connected layer maps to the final
Figure I. Full schematic diagram of our network classes for age or gender.
architecture. Please see the text for more details. of
the problems we are attempting to solve: age
Finally, the output of the last fully connected layer is
classification on the Audience set requires
fed to a soft-max layer that assigns a probability for
distinguishing between eight classes; gender only two.
each class. The prediction itself is made by taking the
This, compared to, e.g., the ten thousand identity
class with the maximal probability for the given test
classes used to train the network used for face
image.
recognition.
International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 8 | Issue 3 31
Prof. Jaydeep Patil et al Int J Sci Res Sci & Technol. May-June-2021, 8 (3) : 29-33
1. SSR-Net expects the input to be tensor size: N x 64 Sr. Survey for this Paper
No
x 64 x 3, where N is face number, 64x64 height and
Title Authors Description
width equally and 3 represents RGB. Individual Face Varun Garg Face detection
values per tensor should be measured at [O ... 1]. Recognition AndKritika with high
using Haar Garg efficiency
Please note that the call function cv.normalize (blob Cascade using OpenCV
[i,:,:,:], None, alpha= 0, beta= 255, norm_type = Classifier library'sHaar
cv.NORM_MINMAX) performing the required cascade
classifier.
customization. 2 Real time RN Real time face
2. Gil Levi and Tal Hassner's ConvNet expects the Face Daschoudhr detection
detection y from high
input to be tensor size: N x 3 x 227 x and definition
227, where N has a face value, 3 means the RGB tracking Rajshr Video
channels and 227x227 are the same height and width. using ee through
Haar Tripat raspberry
Each of the channels in the tensor should say 0 but classifiero hi pi
should not be rated. Please note the limitations caller n SoC
3 Gender and IEEE 2014 The research on
I .0 and mean (78.4263377603, 87.7689143744, J
Age Conference CNN and
14.895847746) on the work phone recognition ML ensures
cv.dnn.blobFromlmages. for video data
extraction
and
IV. RESULTS AND DISCUSSION analyti complete
cs mappingof
Solutio nodes, by
While implementing this project we analysed ns identifying,
different articles and models to estimate human --- IEEE analysing and
2014 paper interpreting
gender and age by image.
the suitable
Image of
We have discovered that there are a lot of good human
models with high accuracy that are yet too big and 4 Crowd It makes use of
counting "Model
slow to compute. with Replication"
respectto framework, in
On the other hand, there are some small models with age and which two
gender sequences are
lower accuracy that could be used for real-time video by using matched at
processing. faster mapping image
CNN pixel level and
detection thematching
We have successfully used two such models for real- results are
time estimation of age and gender using only average grouped for a
final decision. It
CPU:
further makes
use of ranking
• SSR-Net by Tsun-Yi Yang, Yi-Hsuan Huang, of
those groups
Yen-Yu Lin, Pi-Cheng Hsiu, Yung-Yu Chuang. based on
• ConvNet by Gil Levi and Tai Hassner. pattern.
International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 8 | Issue 3 32
Prof. Jaydeep Patil et al Int J Sci Res Sci & Technol. May-June-2021, 8 (3) : 29-33
The easy availability of huge image collections do not adequately reflect appearance variations
provides modem machine learning-based systems common to the real-world images on social websites
with effectively endless training data, though this and online repositories. Internet images, however, are
data is not always suitably labelled for supervised not simply more challenging: they are also abundant.
learning. Taking an example from the related problem
of face recognition we explore how well deep CNN VI. REFERENCES
performs on these tasks using Internet data. We
provide results with a lean deep-learning architecture [1]. A. A. Zaidan, B. B. Zaidan , A. Al-Haiqi, M. L. M.
designed to avoid overfitting due to the limitation of Kiah, M. Hussain, and M. Abdulnabi , "Evaluation
limited labelled data. Our network is "shallow" and selection of open-source EMR software packages
based on integrated AHP and TOPSIS," J. Biomed.
compared to some of the recent network architectures,
Inform., vol. 53,pp. 390--404,2015.
thereby reducing the number of its parameters and
[2]. T. Ahon en, A. Hadid, and M. Pietikainen , "Face
the chance for overfitting. We further inflate the size
description with local binary patterns: Application
of the training data by artificially adding cropped to face recognition," IEEE Trans. Pattern Anal.
versions of the images in our training set. The Mach. Intel!., no. 12, pp. 2037- 2041, 2006.
resulting system was tested on the Audience [3]. R. Sharma, T. S. Ashwin , and R. M. R. Guddeti, " A
benchmark of unfiltered images and shown to Novel Real-Time Face Detection System Using
significantly outperform the recent state of the art. Modified Affine Transformation and Haar Cascades,"
Two important conclusions can be made from our in Recent Findings in Intelligent Computing
results. Techniques, Springer, 2019, pp. 193- 204.
[4]. M. Hussain, A. Al-Haiqi, A. A. Zaidan, B. B. Zaidan,
M. L. M. Kiah, N. B. Anuar, and M. Abdulnabi, "The
First, CNN can be used to provide improved age and
landscape of research on smartphone medical apps:
gender classification results, even considering the
Coherent taxonomy, motivations, open challenges
much smaller size of contemporary unconstrained and recommendations ," Comput. Methods Programs
image sets labelled for age and gender. Second, the Biomed., vol. 122 , no. 3, pp. 393-408 , 2015.
simplicity of our model implies that more elaborate [5]. N. Kalid, A. A. Zaidan, B. B. Zaid an, 0 . H. Salman,
systems using more training data may well be capable M. Hashim, and H. Muzammil, "Based real time
of substantially improving results beyond those remote health monitoring systems: A review on
reported here patients prioritization and related" big data" using
body sensors information and communication
technology ," J. Med. Syst., vol. 42
International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 8 | Issue 3 33