0% found this document useful (0 votes)
5 views

Crowd Detection and Analysis For Surveillance Videos Using Deep Learning

crowd detection research paper

Uploaded by

emankulsoom07
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Crowd Detection and Analysis For Surveillance Videos Using Deep Learning

crowd detection research paper

Uploaded by

emankulsoom07
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Crowd Detection and Analysis for Surveillance

Videos using Deep Learning


Aman Ahmed Prateek Bansal
2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC) | 978-1-6654-2867-5/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICESC51422.2021.9532683

Student, Department of Computer Science and Engineering, Student, Department of Electronics and Telecommunication
G H Raisoni College of Engineering, Engineering,
Nagpur, India G H Raisoni College of Engineering, Nagpur, India.
[email protected] [email protected]

Atiya Khan Neha Purohit


Faculty, Department of Computer Science and Engineering, Faculty, Department of Computer Science and Engineering,
G H Raisoni College of Engineering, Nagpur, India G H Raisoni College of Engineering, Nagpur, India.
[email protected] [email protected]

Abstract—Crowd identification and analysis has drawn a lot of CDE is essential for disaster management and for
attention recently, owing to a wide variety of video surveillance maintaining the maximum people count during the COVID-
applications. We present a detailed review of crowd analysis and 19 pandemic.
management, focusing on state-of-the-art methods for both Moreover, in the recent past, analysis of facial attributes
controlled and unconstrained conditions. The paper illustrates has attained much credit in the field of computer vision.
both the advantages as well as disadvantages of state-of-the-art Various features of the human face, such, emotions, age,
methods. Mass or crowd gathering can be seen at a lot of gender and ethnicity, can be used for catego rization.
places like airports, sports stadiums, at various religious,
educational, and entertainment-related events, etc. When tens of Some of these features can be used for Security, and video
thousands of people gather in limited space, a tragedy is monitoring, electronic customer relationship management,
probably bound to happen. Automated video surveillance has biometrics, cosmetology, and forensic art are only a few of
become the need of the day and supports the analysis and the real-world applications where age and gender
management of data on a massive scale. It is very important to classification is essentially helpful. However, the results
identify the presence of a crowd and detect thenumber of people obtained are not up to the mark. Various age and gender
in the gathering. This can prove very useful for the detection of classifica tion problem s remain persistent complic ations.
sudden troupe build-up to avoid riots. Moreover, it can also be
Even with the advancement that has been made in the
very useful in the Covid-19 pandemic situation to avoid people
computer vision community with the continuous onward
gathering at a place. This paper presents a system to detect the
motion of modern techniques that improves state of the art,
presence of a crowd by counting unique people and then
performing crowd analysis. The crowd is analyzed by detecting
age, and gender predictions of the raw, real-life face are
the gender and age of people in the crowd. ineffective to meet the demands of commercial and real-world
applications. Over many years, a lot of effort has gone into
solving the classification problem. Many of these custom
Keywords—Deep Learning, Crow d Density Estimatio n, methods perform poorly when it comes to determining the age
CNN , Mobile N ets, Neural networks and gender of unconstrained in the wild pictures. These
traditional tailor-made methods rely on discrepancies in
I. INT RODUCT ION attributes of facial feature and face descriptors, which are
unable to cope with the various unpredictable variations
With the expanding population and several problems encountered in these difficult unconstrained imaging
arising due to crowded scenarios in the cities, there is a conditions. There is a variation of images in these
necessity for crowd detection. Automatic crowd detection categories, such as Noise, posture, and lighting can all affect
involves assessing the number of individuals in the video or the ability of those manually developed computer vision
an image. Further, the estimation of such crowd density can methods to accurately classify the age and gender of the
be done from the images of the crowded scene extracted image s.
from the surveillance video. Deep learning-based approaches have recently shown
Crowd Density Estimation (CDE) is a challenging promising results in the age and gender classification of
problem that can assist in solving various real-life problems. unfiltered face images. Further elaboration of existing works

Authorized licensed use limited to: UNIV OF ENGINEERING AND TECHNOLOGY TAXILA. Downloaded on September 12,2024 at 08:53:53 UTC from IEEE Xplore. Restrictions apply.
in age and gender prediction, as well as evidence of deep model is presented in [12] that uses a compact convolutional
learning and CNN advances, we propose a VGG16 neural network to save computational resources and at the
architecture model based on deep learning that predicts age same time achieve great real-time speed, which is superior to
group and gender of unfiltered in-the-wild facial images the existing lightweight models. [13] proposes a deep model
generated using a crowd detection model based on object focused on Convolutional Neural Networks (CNN) and
tracking algorithm. It works by calculating the Euclidean Spatio-Temporal context. The CNN model is used to detect
distance between existing object centroids and new object people, and STC is used to monitor moving people's heads.
centroids between subsequent frames in a video. We build an [14] describes another supervised approach that uses Spatio-
object tracker for each of our detected objects to track it as it tempo ral featur es and their fusion .
moves around the frame. We monitor until we hit the Nth
frame, then rerun our object detector. This complete process B. Gender and Age Prediction
then repeats.
Face images are often used in gender and age detection
The remaining part of the paper is organized as follows. We methods. After extracting facial features, the images are
present related work in crowd identification and age and classified into age and gender groups using classification and
gender classification in Section 2. The background of the regression methods. The classification method [15- 16] used
models used in the method is presented in Section 3. The the Support Vector Machine based method. Various
proposed method is described in Section 4. Section 5 presents classification methods like Support Vector Machines, Radial
the datasets used for training and experimentation. In section Basis Function Networks, and the classical Discriminant
6, the results are reported, accompanied by conclusions in methods were compared in [17], in which SVMs were able to
sectio n 7. succeed to give an acceptable error rate with storage of 20 per
cent the training set. In [18], two competing hyper bf
networks, one for male and one for female, were trained on
II. LITERATURE REVIEW geometrical shapes. Standard regression methods for age and
gender classification include linear regression [19], Reinforce
In this section, we present a review of various crowd Vector Regression (SVR) [20], and Partial Least Squares
detection approaches followed by a review of gender and age (PLS) [21].
prediction methods.
In [22], a gender and age prediction system is proposed.
A. Crowd Detection Face images are used for the task. First, the quality of the face
image is improved using the histogram equalization method
There are various approaches for person detection. Mostly called Brightness Preserving Dynamic Fuzzy Histogram
the crowd detection methods proceed by finding individual Equalization (BPDFHE). For detection of a face in the given
persons and counting them. People with a clear vision are image, Image segmentation and image filling are applied. For
involved in detecting, recognizing, and tracking items. age estimation, Eigen face is used. More recently, Weber law
Clustering-based methods,regression-based methods, and descriptor was used in [23] for gender recognition which
detection-based methods are the three types of methods. The demonstrated outstanding performance on the FERET
clustering method[1-2], is used to detect different objects, benchmark[24].The best results in [23] were obtained with the
and their trajectories are clustered to count the objects. block size of 12X12 and T, M, and S values of 8, 4, and 4
Regression-based methods [3-4] first find low-level respectively. In [25], features like intensity, shape, and texture
information such as foreground features, edge, and texture were used with mutual information which again resulted in
features. Scene level information is extracted from local and almost perfe ct results on the FERET benchm ar k.
global properties such as Histogram of Oriented Gradients
(HOG), Local Binary Pattern (LBP), etc. Finally, a regression In recent history, numerous methods have been introduced
function is exploited for counting. Detection-based methods to solve classification problems leveraging deep learning
involve person detection, target localization, tracking, and techniques. However, the early neural network methods
trajectory classification. A comprehensive study on person utilized small datasets. In [26], A neural network was trained
detection can be found in [5]. In this section we briefly review on a minimal collection of 90 images (45 male and 45
is the detection-based methods, as we utilize the detection- female), with an error rate of 8.1%. Ranjan et al. [27]
based metho d. presented a model utilizing CNN for gender recognition and
age estimation. It's an end-to-end network that shares CNN's
The early methods are based on low-level features. In lower layers' parameters. In [28], a robust estimations solution
[6],both appearance and motion features are utilized for the (CNN 2E L M ) is propose d. It uses CNN and ELM.
detection of a pedestrian. Haar filter, absolute difference Haar
filter, and shifted difference filter are used to detect the For age classification, authors in [29] brought the
objects with motion. Moreover, eight pedestrian detectors are importance of deep neural networks how adding or
trained using the Adaboost algorithm. Salim, et al. [7] subtracting a layer could change the output of the model. In
presented method that can detect all the people passing [30] the measurements of the face were utilized for age
through the field of view of the camera, with an average detection. It was shown that using readily available dense
efficiency of 83.14%. It uses the Kalman filter [8] for building blocks to approximate the expected optimal sparse
predicting the tracked person's location in each frame. structure can be a viable method for improving neural
PETS2009 dataset was utilized for experimentation. Frame networks [31].
differencing is used in [9] to segment the crowd for people
and then counting. Features are described for individual
patterns, and counting is performed. Various methods [10,11]
are also proposed that utilize Kinect camera to capture the
depth information along with the low-level features. The most
recent methods rely on deep learning. A crowd counting

Authorized licensed use limited to: UNIV OF ENGINEERING AND TECHNOLOGY TAXILA. Downloaded on September 12,2024 at 08:53:53 UTC from IEEE Xplore. Restrictions apply.
III. BACKGROUND

A. Single Shot Detectors and MobileNet

Faster R-CNNs (Girshick et al.) [32], You Only Look


Once (YOLO) (Redmon and Farhadi) [33], and Single Shot
Detectors (Liu et al.) [34] are the most common methods for
object detection. The R-CNN technique is difficult to train
and is based on the Region proposal.The YOLO method is
the fastest algorithm, capable of processing between 40-90 Figure 2. VGG- 16 Architectu re (from [35])
frames per second. SSDs, on the other hand, were developed
by Google as a balance between the two. Depending on RGB images with an input size of 224 x 224 x 3 are
which network version is used, a faster FPS throughput can accepted by the input layer. The images are passed through
be achie ve d. SSDs are even more precise than YOLO . the convolution layers, which each have a small receptive
For image classification and mobile vision, MobileNet is field of 3x3 and a stride of 1. After the convolution, the
the CNN architecture model used for classification. resolution remains unchanged. A maxpool window with a
Figure 1 depicts the structure of MobileNet. The size of 2x2 and a stride equal to 2 is used. The maxpool
Common Objects in Context dataset was used to train windows are non-overlapping in this case. Moreover, the
MobileNet. MobileNets are different from conventional maxpool layer does not follow the convolution layers.
CNNs because they construct lightweight deep neural Without the having maxpool layer in between , a few
networks using depth wise separable convolution. convolution layers are followed by another convolution
layer. The output layer has 1000 channels, one for each
Depth wise separable convolution, which was group of images in the dataset, and the first two completely
introduced in [33], is a combination of depth wise connected layers have 4096 channels each. The activation
convolution and point wise convolutio n. function of the hidden layer s is ReLU .
Convolution is split into two stages by depth wise
separ able convolution: IV.PROPOSED APPROACH

1. A 3×3 depth wise convolution


2. Follow ed by a 1×1 point wise convolution A crowd detection approach is presented in this section.
The people detected in the crowd are also analyzed by
This allows reducing the number of parameters in the
finding the gender and age of the people.
network.
The overview of the proposed approach is presented in
In comparison to other existing models, the MobileNet
Figure 3. A video is accepted as input. Frames from the video
architecture needs very little computing power to run or
apply transfer learning. They provide better performance are extracted by performing the uniform sampling. The frames
for resource - constrained devices. are considered after every 30 frames. These video frames are
passed as input to the Person detection and tracking module.
This module detects the presence of persons in the image and
extracts the regions from an image containing persons. It
also counts the unique people in the given frames that assists
in counting the number of people in the presented video.
There by, one or more sub- images are created by the
extraction of the regions of interest from a single frame. All
the sub -images containing unique people in them are passed
on to the Recognition module. The extracted person images
passed to the recognition module are then analyzed to predict
the gender and age of the person. The proposed approach is
divided into two parts- crowd detection, and crowd
Figure 1- Archite ctur e of Mobile N et analysis. We discuss these modules next.

B. VGG-16 Model
VGG-16 [35] is a pre-trained CNN proposed by
Zisserman and Simonyan. VGG Net is trained on ImageNet
dataset and is capable of performing the classification task
with good accuracy. Images from 1000 classes are divided
into three sets: 1.3 million training images, 100,000 testing
images, and 50,000 validation images in the ImageNet
dataset. There are 16 filtering layers in VGG- 16.

Figure 2 shows the VGG-16architecture. The convolution


layers with a non-linear activation function, which is a
ReLU(rectified linear unit), are represented by the blue
rectangles, 13 convolution layers and 5 max-pooling layers
are included. The network has completely connected layers Figure 3. Over vie w of the Propose d Approa ch
repre se nte d by three green recta ngles.

Authorized licensed use limited to: UNIV OF ENGINEERING AND TECHNOLOGY TAXILA. Downloaded on September 12,2024 at 08:53:53 UTC from IEEE Xplore. Restrictions apply.
A. CROWD DET ECT ION AND CROWD DENSITY detection of bounding boxes, the centroid is computed for
EST IMAT ION (CDE) each bounding box utilizing their respective (x,y)
coordin ates. Each bounding box is allotted a unique ID.
For detection of the crowd, Person detection is performed and However, if a new unique ID is allotted to objects in
the detected persons are tracked. The presence of humans is each new incoming frame, it may cause a problem in the
found using MobileNet SSDs Tracking is performed using motive of object tracking. To alleviate this, we relate the
Unique Person Detection Algorithm and it also displays the centroid of the new object to that of the already existing
count of detected people. human proposal and calculate the distance between
For efficient detection of humans, we combine MobileNets them.
and Single Shot Detectors. Specifically, MobileNets + SSD is
used along with Opencv 3.3's DNN module (Deep Neural A list of tracked humans (TH) is maintained to detect the
Network) to detect humans in images. The centroid tracking prese nc e of unique people . To detect if any new human s
algorithm works as follows.
are present in the new frame compared to the previous
Crowd Density Estimation (CDE) Algorithm used in this paper frame, the number of objects is counted. If the count of
is inspired by the centroid tracking algorithm[13]. Figure 4 human proposals in the new frame is more compared to
shows the diagrammatic explanation of the algorithm and the the previous frame then those newly detected objects are
algorithm is presented as algorithm1.First, the bounding boxes added to the list TH and a unique ID is allotted. Again the
having coordinates (x, y) for each detected human in each frame bounding box and centroid of the new proposal is
is found. The bounding boxes can be generated using some of computed. Following that, the path of the human proposal
the common object detectors such as Haar cascades, RNN, color is detected by calculating the minimum distance using the
thresholding, etc. After detection of bounding boxes, the Euclidean distance formula.
centroid is computed for each bounding box utilizing their Furthermore, for any given video it is important to
respective (x,y) coordinates. Each bounding box is allotted a consider the fact that a human will move out of the field of
unique ID. view after some time. To address this issue, we deregister
the object by removing the unique ID. When there is no
match of the human proposal with the other existing object
for a certain number of frames, say for N frames then we
consider that the human proposal is lost and has moved out
of the view, and hence it is deregistered. This assists in
counting unique people and to develop the crowd
counter.
Algorithm 1 Crowd Density Estimation
(CDE) Algorithm Input: Video Frames

(a) (b) Output: Crowd Count


Step 1: Bounding box and Centroid Detection
Step 2: Compute Euclidean distance between the two
human proposals, using the following formula:

𝑑 (𝑥, 𝑦) =√(x1 − y1 )2 + (x2 − y2)2


where (x1,y1) and (x2,y2) are the coordinates of the
(c) (d) bounding boxes for which distance is computed.
Step 3: Register new human proposals
(a ) Create a list of tracked humans called TH.
Figure 4. Unique Person Detection Proce dure (b) Consider consecutive frames and find if any new
human proposal is detected. If yes then add to the TH
and assign a new unique ID to it.
(a) Compute centroids and find boundin g box coordinate s Step 4: Deregister the human proposals that are out
(b) Calculate the distance betwee n new bounding boxes and of the field of view
existin g objects in Euclide a n.
(c) Existing object coordinates are updated. Step 5: Count all the unique human proposals
(d) Allot unique ids to objects detected in the video frames and output the crowd
count.

Crowd Density Estimation (CDE) Algorithm used in this B. CROWD A NALYSIS


paper is inspired by the centroid tracking algorithm[13].
Figure 4 shows the diagrammatic explanation of the For analysis of unique people detected in the previous module,
algorithm and the algorithm is presented as algorithm1.First, age and gender are predicted. We cater to the task of age and
the bounding boxes having coordinates (x, y) for each gender prediction as a classification task. For both the tasks we
detected human in each frame is found. The bounding boxes use a pre-trained CNN model for feature extraction as done in
can be generated using some of the common object detectors [23]. A two-level CNN architecture is leveraged to perform age
such as Haar cascades, RNN, color thresholding, etc. After and gender prediction. First, the features are extracted from the

Authorized licensed use limited to: UNIV OF ENGINEERING AND TECHNOLOGY TAXILA. Downloaded on September 12,2024 at 08:53:53 UTC from IEEE Xplore. Restrictions apply.
pre-trained VGG-16 [34], and then classification is performed. inTable-1.Thedataset was divided into three broad categories
We classify the humans into three age groups (1-30), (30-60), of age group 1-30,30 -60 and 60+ to obtain better accu ra cy.
and 60+. The age prediction model has 4 dense layers after The dataset for gender prediction as shown in table-2 was
feature extraction from the VGG-16. The last layer of Softmax taken from Kaggle, they were divided into 80% training,
has 3 nodes as we have three classification classes. The model 20% validation, and testing was done manually. The dataset
summary is shown in figure 5. Test accuracy of the age contained JPG images for men and women and was equally
prediction model is 69%. divided into directories and cleaning of data was done
manually. The total number of images in the dataset and
distribution of images into labels can be seen in Table-2.

Table 1. Dataset for Age prediction

Figure 5. Age predictio n model summar y


The same images of persons, passed to the age prediction
module are passed through the gender prediction module also.
The test accuracy of 90% was achieved by the model shown in
figure 6. The gender prediction network's output layer is
Softmax, which has two nodes that represent the two classes Table 2. Dataset for Gende r predictio n
"Male" and "Female.

VI. RESULTS AND DISCUSSION

Experimentations were performed on various videos from


CCTV available over the internet. Figure 7 shows the home
screen of our system. Following functionalities are available.
Run Script: To run the MobileNet SSD model to find the
unique people count in the given video. Predict age and
gender: To pass the frames obtained from videos to the
models created for age and gende r prediction.
A crowd summary is then presented by displaying the
extracted person images, their age, and gender. Complete
the model: To rerun the model for another input and to
Figure 6. Gende r prediction model summa ry erase the capture d frame s from the specifie d folder.

V. DATASETS Figure 8 shows the result obtained for a sample video where
the people are clearly visible and the video quality is good.
Whereas figure 9 shows the results obtained on a night video.
Although human proposals were detected correctly, the
analysis was not completely correct. The age and gender were
not correctly classified. Based on the experimentations
performed, it was observed that the images obtained from
videos that are of high resolution gave accurate results for
both age and gender whereas the images obtained from videos
that had low resolution or in which a person's face was not
clearly visible did not give accurate results.

Figure 7. Home page of the UI

The dataset for age prediction as shown in the table-1 was


taken from Kaggle, they were divided into 80% training, 20%
validation, and testing was done manually. The dataset for age
prediction contained JPG and PNG images for three age
categories and was equally divided into directories and cleaning
of data was done manually. The total number of images in the
dataset and distribution of images into labels can be seen
Authorized licensed use limited to: UNIV OF ENGINEERING AND TECHNOLOGY TAXILA. Downloaded on September 12,2024 at 08:53:53 UTC from IEEE Xplore. Restrictions apply.
the person detected are sent to the age and gender prediction
module. From the analysis of experiments performed on
different types of surveillance videos, it can be concluded
that the system can be very effective for solving many
security and surveillance problems and can be used for data
generation for analysis and further ruse. This system can be
easily plugged into many other systems. Further, we intend
to handle the night vision videos for the summarization of
the crowd.

By far the most difficult portion of this project was


setting up the training infrastructure to properly divide
the data into folds, train each classifier, cross -validate,
and combine the resulting classifiers into a test-ready
classifier. I foresee future directions building off of this
work to include using gender and age classification to aid
face recognition, improve experiences with photos on
social media, and much more. Finally, As a part of future
work, a survey on other classifcation methods for age
estimation can be done. Moreover, gender and ethnicity
Figure 8. Result obtain ed on a sample input estimation and various other demographic features can be
video tested for their performance using Neural Networks
classifer. I hope that additional training data will become
available with time for the task of age and gender
classification, which will allow successful techniques
from other types of classification with huge datasets to be
applied to this area as well.

References
[1] G. Antonini and J. P. T hiran, "Counting pedestrians in video sequences
using trajectory clustering," IEEE Transactions on Circuits and Systems for
Video T echnology, vol. 16, no. 8, pp. 1008 –1020, 2006.
[2] I. S. T opkaya, H. Erdogan, and F. Porikli, "Counting people by clustering
person detector outputs," in11th IEEE International Conference on Advanced
Video and Signal-Based Surveillance, AVSS 2014, 2014, pp.313–318.
[3] Gould, Stephen, T ianshi Gao, and Daphne Koller. "Region -based
segmentation and object detection." Advances in neural information
processing systems 22 (2009): 655-663.
[4] Idrees, H., Saleemi, I., Seibert, C., Shah, M., 2013. "Multi-source
multiscale counting in extremely dense crowd images", in: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 2547 -
2554.
[5] Raghavachari, Chakravartula, V. Aparna, S. Chithira, and Vidhya
Balasubramanian. "A comparative study of vision based human detection
techniques in people counting applications." Procedia Computer Science 58
(2015): 461-469.
[6] Jones, M.J., Snow, D., 2008. "Pedestrian detection using boosted features
over many frames". In: 19th International Conference on Pattern Recognition,
2008. ICPR 2008. IEEE, pp. 14, http://dx.doi.org/10.1109
/ICPR.2008.4761703.
[7] Salim, Sohail, et al. "Crowd Detection And T racking In Surveillance
Video Sequences." 2019 IEEE International Conference on Smart
Instrumentation, Measurement and Application (ICSIMA) . IEEE, 2019.
[8] Q. Wan and Y. Wang, "Multiple moving objects tracking under complex
scenes," in T he Sixth World Congress on Intelligent Control and Automation,
Figure 9. Results obtained on a sample input
Proc. IEEE 2, pp. 9871–9875, 2006.
video captured during nighttime.
[9] C. Chen, T . Chen, D. Wang, and T . Chen, "A cost-effectivepeople-counter
for a crowd of moving people based on two-stagesegmentation," J Inform
VII.CONCLUSION AND FUTURE WORK Hiding Multimedia . . ., vol. 3, no. 1, pp.12–23, 2012.
[10] L. Del Pizzo, P. Foggia, A. Greco, G. Percannella, and M. Vento,
The proposed system is capable of detecting the presence "Counting people by RGB or depth overhead cameras," PatternRecognition
of a huge number of people in surveillance videos. It Letters, vol. 81, pp. 41–50, 2016.
[11] D. Pizzo, P. Foggia, A. Greco, G. Percannella, and M. Vento, "Aversatile
successfully detects the gender and age of people and gives
and effective method for counting people on either RGB ordepth overhead
the summary of people belonging to different age groups
cameras," in2015 IEEE International Conference onMultimedia Expo
along with the gender for further analysis by the authorities.
Workshops (ICMEW), 2015, pp. 1–6.
The system accepts input in the form of a video which
[12] Nascimento, Jacinto C., Arnaldo J. Abrantes, and Jorge S. Marques. "An
further gets divided into frames using MobilNet SSD. The
algorithm for centroid-based tracking of moving objects." In 1999 IEEE
person detected in the frame is tracked, and the images of International Conference on Acoustics, Speech, and Signal Processing.

Authorized licensed use limited to: UNIV OF ENGINEERING AND TECHNOLOGY TAXILA. Downloaded on September 12,2024 at 08:53:53 UTC from IEEE Xplore. Restrictions apply.
Proceedings. ICASSP99, vol. 6, pp. 3305-3308. IEEE Computer Society, [34] Fu, Cheng-Yang, Wei Liu, Ananth Ranga, Ambrish T yagi, and
1999. Alexander C. Berg. "Dssd: Deconvolutional single shot detector." arXiv
[13] G. Liu, Z. Yin, Y. Jia, and Y. Xie, "Passenger flow estimation basedon preprint arXiv:1701.06659 (2017).
convolutional neural network in public transportation system," Knowledge- [35] Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks
Based Systems, vol. 123, pp. 102–115, 2017. for large-scale image recognition. arXiv preprint arXiv:1409.1556 .
[14] X. Wei, J. Du, M. Liang, and L. Ye, "Boosting Deep Attribute Learning [36] Shakya, Subarna. "Collaboration of Smart City Services with Appropriate
via Support Vector Regression for Fast Moving Crowd Counting" Pattern Resource Managementand Privacy Protection." Journal of Ubiquitous
Recognition Letters, vol. 47, pp. 178–193,2017. Computing and Communication T echnologies (UCCT ) 3, no. 01 (2021).
[15] E. Eidinger, R. Enbar, and T . Hassner, "Age and gender estimation of [37] Ranganathan, G. "Real Life Human Movement Realization in
unfiltered faces," IEEE Transactions on Information Forensics and Security, Multimodal Group Communication Using Depth Map Information and
vol. 9, no. 12, pp. 2170–2179, 2014. Machine Learning." Journal of Innovative Image
[16] M. A. Beheshti-nia and Z. Mousavi, "A new classification method based Processing (JIIP) 2, no. 02 (2020): 93-101.
on pairwise support vector machine (SVM) for facial age estimation," Journal
of Industrial and Systems Engineering, vol. 10, no. 1, pp. 91–107, 2017.
[17] Moghaddam, Baback, and Ming-Hsuan Yang. "Learning gender with
support faces." IEEE Transactions on Pattern Analysis and Machine
Intelligence 24, no. 5 (2002): 707-711.
[18] Poggio, Brunelli, R. Brunelli, and T . Poggio. "HyberBF networks for
gender classification." (1992).
[19] A. Demontis, B. Biggio, G. Fumera, and F. Roli, "Super-sparse
regression for fast age estimation from faces at test time," Image Analysis and
Processing—ICIAP, Springer, Berlin, Germany, 2015.
[20] G. Guo, Y. Fu, C. R. Dyer, and T . S. Huang, "Image-based human age
estimation by manifold learning and locally adjusted robust regression," IEEE
Transactions on Image Processing, vol. 17, no. 7, pp. 1178–1188, 2008.
[21] G. Guo and G. Mu, "Simultaneous dimensionality reduction and human
age estimation via kernel partial least squares regression," in Proceedings of
the 24th IEEE Conference on Computer Vision and Pattern Recognition , pp.
657–664, Colorado Springs, CO, USA, June 2011.
[22] Kumar, S., Singh, S. and Kumar, J., 2019, January. Gender classification
using machine learning with multi-feature method. In 2019 IEEE 9th Annual
Computing and Communication Workshop and Conference (CCWC) (pp.
0648-0653). IEEE.
[23] Ullah, Ihsan, Muhammad Hussain, Ghulam Muhammad, and Anwar
Mirza. "Gender Recognition From Face Images With Spatial WLD
Descriptor."
[24] Phillips, P. Jonathon, Harry Wechsler, Jeffery Huang, and Patrick J.
Rauss. "T he FERET database and evaluation procedure for face-recognition
algorithms." Image and vision computing 16, no. 5 (1998): 295-306.
[25] Perez, Claudio, Juan Tapia, Pablo Estévez, and Claudio Held. "Gender
classification from face images using mutual information and feature fusion."
International Journal of Optomechatronics 6, no. 1 (2012): 92-119
[26] Golomb, Beatrice A., David T . Lawrence, and T errence J.
Sejnowski. "Sexnet: A neural network identifies sex from human faces." In
NIPS, vol. 1, p. 2. 1990.
[27] R. Ranjan, S. Sankaranarayanan, C. D. Castillo, and R. Chellappa, "An
all-in-one convolutional neural network for face analysis," in Proceedings of
the 12th IEEE International Conference on Automatic Face & Gesture
Recognition (FG 2017), pp. 17–24, Biometrics Wild, Bwild, Washington, DC,
USA, June 2017.
[28] M. Duan, K. Li, and K. Li, "An ensemble CNN2ELM for age
estimation," IEEE Transactions on Information Forensics and Security, vol.
13, no. 3, pp. 758–772, 2018.
[29] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification
with deep convolutional neural networks," in Advances in neural information
processing systems, 2012, pp. 1097-1105.
[30] X. Geng, Z.-H. Zhou, and K. Smith-Miles, "Automatic age estimation
based on facial aging patterns," IEEE T ransactions on pattern analysis and
machine intelligence, vol. 29, pp. 2234-2240, 2007.
[31] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al.,
"Going deeper with convolutions," in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 2015, pp. 1 -9.
[32] Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster r-cnn:
T owards real-time object detection with region proposal networks." IEEE
transactions on pattern analysis and machine intelligence 39, no. 6 (2016):
1137-1149.
[33] Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. "You
only look once: Unified, real-time object detection." In Proceedings of the
IEEE conference on computer vision and pattern recognition, pp. 779-788.
2016.

Authorized licensed use limited to: UNIV OF ENGINEERING AND TECHNOLOGY TAXILA. Downloaded on September 12,2024 at 08:53:53 UTC from IEEE Xplore. Restrictions apply.

You might also like